Suppose we generate a vector of counts for K categories with N draws from a multinomial distribution. Let’s assume each category has equal probability 1/K.
Now, when I see the counts (N1, …, NK), is there a distribution for the largest of these counts? It’s true that each count is expected to be (N/K) but sampling means that counts for a few categories will be greater than N/K and other categories will be less than N/K. I’d like to determine what the expected value of the first, second, etc. of these counts would be.
Like if there are 3 categories with counts N1, N2, and N3, these numbers can be sorted from least to greatest. The middle value will probably have expected value N/3. But would the distribution of the largest and smallest values be?
I found one idea, and I’d like to get thoughts on it. I could sample K uniform random variables (U1, U2, …, UK, sorted from least to greatest, and normalize them so they sum to N. Then the distribution of the largest or 2nd largest corresponds to the distribution of largest or 2nd largest count, respectively.
Is this idea on the right track? It is derived from a post I found on sampling from a simplex at https://cs.stackexchange.com/questions/3227/uniform-sampling-from-a-simplex