# Linear regression where my dependent variable is a probability value (bounded 0 to 1 inclusive)

Let’s say I did an experiment where I recorded subjects’ blood pressure and I wanted to test whether it predicts subjects’ abilities to win games of whack-a-mole. So I got the blood pressure values for each subject and I had the subjects all play 20 games of whack-a-mole and I wanted to measure the correlation between blood pressure and whack-a-mole win-probability. I have been told that because whack-a-mole win-probability is a value bounded by 0 to 1 that I can’t just take a regression like I would for any other problem. I was told that I should first normalize my probability value using the norminv function in excel (which converts a probability into a z-score).

However, what if some of my subjects won every game of whack-a-mole or lost every game! Then the probability would be zero and I would have infinity and negative infinity values for the z-scores.

What do I do in this case? One idea I have is to do some Bayesian bullshit to calculate the sum of the possible true probabilities multiplied by their likelihood. (ie if a person’s theoretical true win-rate is 93% then they have a 30% chance of going 20/20, a 40% chance of going 19/20, etc.). I can use this principle along with the beta-distribution and stuff to get the meansum of the possible true probabilities multiplied by their likelihood. I would then take that measured mean win-rate and do norminv on that. I would elaborate more on this procedure, but I think it’s vaguely clear what I can do (and I need to run). Would this procedure be a good approach or what is the blatant correct answer?

Thanks

submitted by /u/Fatlark 