As an initial disclaimer, I have very little background in statistics about Nevin Manimala.

I was helping an engineering student with his homework the other day, and he was being asked to compute the p-value of an observation about average cholesterol. The context was testing to see if the average cholesterol of a certain group was higher than average.

My understanding of p-values is: you fix a null hypothesis H, make an observation X, then:

p = P(X | H)

When using a two tailed test on a gaussian distributed population with mean mu and standard deviation sigma, the p-value would then be:

p = P(|X-mu|/sigma> c | X ~ N(mu, sigma) )

Where c is your normalized observed value.

So when these students are then asked to do a one tailed test, they just take the two-tailed p-value and double it, i.e. they remove the absolute value bars and compute:

p = P((X-mu)/sigma> c | X ~ N(mu, sigma) )

But this seems very wrong to me, because in the one tailed case, shouldn’t the null hypothesis change from:

X ~ N(mu, sigma)

To

X ~ N(lambda, sigma), where lambda <=mu

Because now your null hypothesis is that the true mean of your sample population is <=mu, so that rejecting it would mean you can conclude that it is bigger.

This second null hypothesis seems like it is the correct one in this case, but maybe slightly harder for students to compute, and it also seems like it should result in stronger p-values. Am I missing something, or is the philosophy that the simpler approach results in weaker conclusions, and is therefore harmless to adopt?

submitted by Nevin Manimala Nevin Manimala /u/rtlnbntng

[link] [comments]