confidence interval definitions

hi /r/statistics

i need some clarification here. i have the following in my notes:

1) Confidence interval is a type of interval estimate that gives a range of values in which a population statistics might lie

2) 95% confidence interval does NOT mean that the probability of our population mean lying in the interval is 95%

3) 95% confidence interval means that if we calculated the 95% confidence interval for 100 samples, about 95 of these would contain the true population mean

I’m having a hard time distinguishing #2 and #3. For #3, doesn’t that mean that if we took 100 samples and calculated a confidence interval for each, if I randomly chose one of those confidence intervals, then there is a 95% chance that my population mean lies in this interval?

Or in other words, for #3, another way to say it is if I took an increasing number of samples, let’s say 100,000 samples and calculated each confidence interval, then roughly 95,000 of those CI will contain the population mean. Which sounds like if I have a random sample and calculate its confidence interval, then there is a 95% chance that it contains the population mean

Are any of my definitions/notes off? I’m sure my logic is faulty here. Thanks!

submitted by /u/vatom14
[link] [comments]

Studying for test in my stats class & have some questions

So for the test we will be given 3 datasets, with these data sets we have to find the confidence intervals for slope, intercept and an individual/group/population of any of the values in the datasets.

We are allowed to use excel to perform a simple regression with the data analysis toolpack during the test. However I dont know what the difference between the confidence intervals for slope and intercept are. And how to find the conf. intervals for an individual, a group and the population of the data set.

Any help is greatly appreciated!:)

submitted by /u/Woopsie_Goldberg
[link] [comments]

Confused about calculating confidence internal via scipy

Let us consider a toy example. We measure heights of 40 randomly chosen men, and a mean height of 175 cm. We also know the std dev of men’s heights is 20 cm.

When I calculate 95% confidence interval for above example using usual formula, I get following range –

(168.8, 181.2) 

To get same results, I used following Scipy function –

from scipy import stats stats.norm.interval(.95, loc = 175, scale = 20/np.sqrt(40)) 

What I am not getting here is why scale is set to 20/np.sqrt(40) rather than 20.

This relevant SO link isn’t helping me either.

What am I missing here?

submitted by /u/mayankkaizen
[link] [comments]

Questions regarding the standard deviation and standard error

Hey there!

I hope this doesn´t count as homework, but we spoke about a paper in uni and I really had difficulties understanding one of the tables. We really just took a short look at it, but I didn´t understand it and am trying to figure this out, so I won´t go crazy.

So I basically have actually two questions regarding This table.

If you look at “Energy from Carbohydrate %”-row then the numbers are the percentage that carbohydrates make up of the daily energy-intake.

Now when it comes to the numbers in the brackets behind these percentages, I ´m not sure what to think. Generally I´m pretty sure that these bracket-numbers are the standard deviation, except for the the on in the “overall”-column.

Why is the bracket-number (11,6) in that column higher than every other bracket-number in this row? I´d expect it to be the average of all the standard deviations in that row. That obviously can´t be the case, since the average can´t have the highest value. Looking at the next row the “overall-bracket-number” suddenly hasn´t the highest value anymore.

I have absolutely no idea what the “overall-bracket-number” could be and how they calculated it. I also thought of the standard error, but I guess it can´t be the SE, since (as much as I know) it always has to be smaller than the standard deviation. but here it would be higher than all of them.

And since I tried to calculate the standard error to see if it maybe would match the “overall-bracket-number”, I realized that I am not sure how to calculate the SE in this case?

First you´d calculate the average mean of the carbohydrate %, then calculate the variance and pull the root (do you say that like this?). This way you end up with the standard deviation. Now to get to the standard error you´d divide the SD by the root of “n”.

But what is “n” in this case? Obviously the total “n” = 135335, but since the standard error makes a statement about the deviation of the averages to the “average of the averages”, shouldn´t “n” = amount of averages?

So in this case shouldn´t “n” = 7, since there are 7 averages from 7 countries? Or am I thinking way to much and “n” is simply 135335?

submitted by /u/rockglf
[link] [comments]

Doubling effect size for less items?

Hello! I’m really stuck as my question is pretty specific. I’ve searched high and low and have asked my advisor to no avail (because the question is specific). I am running an experiment testing the anxiolytic effects of exercise and will be using within subjects design to compare participants using exercise in interventions vs an alternative. We will also have a control group to compare the exercise group to who would have no intervention.

I was to use g power to calculate the sample size needed for this experiment. I was to get effect sizes from a similar study that uses the same indexes for anxiety. However, the research that I have found that has been similar uses 40 items to test trait and state anxiety. We will only be using the first 20 items as we are only attempting to capture state anxiety. No other study we have found does this and I had even struggled to find published research that displayed effect sizes, power etc. This particular study uses an effect size of f 0.14, alpha level was 0.05 and power (1-beta) was 0.80. I was under the assumption for no reason at all really that since we are using half the items we should double the effect size to 0.28. This also lead us to a manageable sample size of 28, whereas using the same figures we got to like 102 (this isn’t including the control mind you 🙃). The whole project needs to be completed by March or so and that’s including the dissertation write up so 102 ppts will not be realistic on top of exams etc. Am I right in believing with less items we would need a greater effect size? Am I also right to assume that of this 28, 14 should be in each intervention and then another 14 in the control to make it all balanced?

My other question is that I’m very confused by the terminology on g power. I was using the anova: repeated measures within factors to calculate my sample size. Is this the same as between subjects design or does it mean something else? Am I also right to believe I will be using two ANOVAs in my analysis; one within subjects for the experimental group (comparing interventions) and one between subjects comparing the exercise intervention with the control.

Any help at all would be great. I wouldn’t come to Reddit unless I had really really tried because people tend to complain about helping with projects. I’ve done everything I could and have worked so hard. I just really struggle with statistics and would like to get better. At this point I don’t know what else to do or where to turn to. I have also asked friends but g power is completely new to us all.

Much much much appreciated thank you!

submitted by /u/branstarksbitch
[link] [comments]