non-normal 3 way anova

So I am working on my thesis and I am using a 3 factor anova model for analyzing my data. I ran the code, in SAS, as usual and got a three way interaction. Homogeneity was satisfied however I just found that normality is not, based on the shapiro-wilk test. I am unsure of how to proceed. I thought about doing the nonparametric kruskal wallis test but I only know how to do it with one way anova. Please let me know what you think!

submitted by /u/someplace__else
[link] [comments]

2019 NCAA men’s basketball shiny app


If there is any interest I can try and get a write-up of the model. Wrote a web scraper then fit the model using Stan.

I only have 25 free hours for a month and have already used 8 so let me know if it isn’t working. The plots take a second to show up, simulating lots of games for each matchup and ggplot is kinda slow.

Let me know what you think.

submitted by /u/tempdsguy
[link] [comments]

Smoothing probabilities (or rather, proportions)

Hi /r/statistics, I need a bit of help.

I have a matrix that summarizes population movement between different “classes” from one year to the next. Something like this :

Class 1 Class 2 Class 3
Class 1 2300 1323 234
Class 2 232 1423 687
Class 3 142 0 345

So the “probability” of moving from class 1 to class 2 is 1323/(2300+1323+234).

My issue is that I’m missing observations for certain transition cases (That are very much possible in theory, the historical data I’m working with isn’t very reliable). Any ideas as to how I could go about smoothing this matrix?


submitted by /u/K4ntum
[link] [comments]

2nd generation p-values

Because of that other article of p values thought I would show this

What do you all think about this concept? There is also a FAQ in the paper attached in the end.

I don’t really see how this is going to solve the problems with p values. How am I to determine the confidence interval for the true mean as a frequentist? I understand the blood pressure example he gave and how you can determine that based on the instrument precision. But more generally if the expected value is unknown, then how could you even determine what is within the acceptable range close to the mean?

submitted by /u/ice_shadow
[link] [comments]

Binomial vs Gaussian GLM for binomial data.

Dear esteemed colleagues. I’m trying to compare, given binomial data, the p-values obtained from running a Gaussian-family identity-link GLM vs a Binomial-family logit-link fit. I’ve generated 1000 data sets from H0, and counted how many times I reject H0 for an alpha of 0.05. It seems that the Gaussian version rejects H0 in around 5% of the data sets, and the binomial version rejects around 1% of the data sets. This confuses me profusely, shouldn’t an alpha of 0.05 reject 5% of H0 data when using the correct binomial model rather than the incorrect Gaussian model? R-code follows

base_p = 0.2 # probability in the baseline drug_p = 0.2 # probability in the drug N = 20 # number of observations n = 50 # number of bernoulli experiments p_gauss = rep(NaN, 1000) p_binom = rep(NaN, length(p_gauss)) for (i in 1:length(p_gauss)) { df = data.frame(y = c(rbinom(N/2, n, base_p), rbinom(N/2, n, drug_p)), a = c(rep(0,N/2), rep(1,N/2)), n = n) p_gauss[i] = coef(summary(glm( y ~ a, family = gaussian, data = df)))[,'Pr(>|t|)'][2] p_binom[i] = coef(summary(glm(cbind(y,n) ~ a, family = binomial, data = df)))[,'Pr(>|z|)'][2] } cat(sprintf('gauss reject H0: %.1f%%n', 100 * mean(p_gauss < 0.05))) cat(sprintf('binom reject H0: %.1f%%n', 100 * mean(p_binom < 0.05))) 

submitted by /u/Z01C
[link] [comments]

Question about one way ANOVA on SPSS

I’m trying to check if perception (interval scale, likert) towards digital wallets varies with age. Do I need to create age groups for this purpose or can I just use the ages as they are? And if I do need to create age groups, do they need to be equal groups or can they be disproportionate (eg. group 1: 23 – 25, group 2: 26 – 30, group 3: 31 – 40)? My sample size is 71.

submitted by /u/dejavu619
[link] [comments]

Data Analysis Ideas (Solar Irradiation Data)

I’m learning how to use NumPy and Pandas in Python. I downloaded twenty years of solar irradiance data and have been slicing it up. Right now I’ve got it in a daily format (20 years, 365 days).

I want to do some statistical analysis of this data with NumPy, but frankly I don’t know enough stats to figure out what to do… How can I calculate something like inter-year variation/stdev? (P50/P90 curves for anyone who knows anything about these kinds of things) Other things that would be interesting might be frequency of cloudy days (off-trend spikes)?

Here you can see three years of data plotted together. Perhaps this will help germinate some ideas.

submitted by /u/BringBackTheOldFrog
[link] [comments]

I need HELP with the central limit theorem

I have to preform an experiment that demonstrates the central limit theorem, keep track of the data and then graph the results. I barely understand the concept but I’m thinking I’d like it to have something to do with rolling dice since that’s what I have readily available to me. If anyone can give me any advice possible it would be greatly appreciated.

submitted by /u/wumbo-supreme
[link] [comments]

Biostatistics: is it acceptable to do more analyses than you put in your protocol or SAP (unlike vice versa)?

I know that in an ideal world, you lay out exactly what analyses you’re going to do, and then you do all of those analyses, and not a single one more or less.

If you don’t do an analysis that you said you would do, that should definitely raise a red flag. But for doing more tests than you originally said, I doubt this is uncommon, punishable or even frowned upon – am I wrong?

In other words if you keep your analysis plans to the bare minimum (as in just the most important metric), but then after you get the data run a few additional analyses that help add context or re-iterate your first metric, will you get in trouble in any way? I ask because I’m preparing my first protocol and SAP and figure I should keep it to a bare minimum even though I will likely do a couple extra things once we get all the data.

submitted by /u/Jmzwck
[link] [comments]