Formulas 3.18 & 3.19 in Fleiss JL et al. Statistical Methods for Rates and Proportions, second edition. Wiley. 1981.

Dear Colleagues,

I notice a page for determining sample sizes (http://web1.sph.emory.edu/cdckms/sample%20size%202%20grps%20cohort.htm) cite formulas in the second edition of Fleiss’s Statistical Methods for Rates and Proportions book. What I would like to know is, what formulas do these correspond to in Fleiss’s third edition of the book, as I dont have access to a copy of the second edition.

Many thanks.

submitted by Nevin Manimala Nevin Manimala /u/akar79
[link] [comments]

Need help properly analyzing and configuring study data

I’m working on a project where I am comparing disease rates, I have little to no statistical background, save a few intermediate classes and youtube.Do you guys know of a company that I can pay in order to help me properly analyze the data?

submitted by Nevin Manimala Nevin Manimala /u/jonkent3713
[link] [comments]

Who’s Familiar with Dedoose Software?

I’m using Dedoose for qualitative research. I have a complete codebook, and 720 conversations to code. What I want in the end is the complete count of each code within each conversation, and I’ll use SPSS for the actual analysis.

I’m having a hard time figuring out how to do two things: 1. measure interrater reliability 2. collapse the codes applied by multiple raters on a single conversation so the codes aren’t double-counted

I might have more questions as I go along. I’d love it if anyone who has experience with Dedoose can respond to this thread. Thank you!

submitted by Nevin Manimala Nevin Manimala /u/sara-34
[link] [comments]

Cross-validation failed when regression model contains many multiple-level categorical variables. What should I do?

Hi everyone, I am doing a project on linear regression using Automobile dataset from UCI Machine Learning Repository. So far, I have had multiple linear regression, ridge regression and the lasso. The dataset only has about more than 200 records, whereas there are 4 binary variables and 7 multiple-level categorical variables. The variable with the most factors has 22 levels. For multiple linear regression, R complains that some new levels introduced. For ridge regression and the lasso, R notified that X and Y mismatched dimensions.

I think why the problem arose is understandable. The test set is just a small part of the original data, so it is easily unable to cover all factors. I am quite stuck at this point. I hope those of who have experience in this problem can help me. Thank you so much in advance!

submitted by Nevin Manimala Nevin Manimala /u/fantasticsky_hng
[link] [comments]

Gotta make a grad school decision….

Got into 2 MS in Statistics programs: George Washington U and Villanova. I want to go into industry right out of the program. To me they both feel like awesome programs but cost aside what should I be making my decision based on? GWU is super pricy which is why I’m shying away from it. Anyone know anything about either of these programs?

submitted by Nevin Manimala Nevin Manimala /u/Amishjohnthomas
[link] [comments]

comparing distributions – bayesian decision analysis (X-cross-validated)

I have posted a question on stackexchange regarding comparison of distributions here.

Rather than repeat the post, Ill just give you the gist…

I am trying to compare distributions of two measurements. The two distributions are from a “positive” and “negative” unobserved variable, and I want to test if the observed variable differs between the two groups. I have sampled from the posteror and computed p(diff_means > 0) = 0.3.

However, since one distribtion has a longer tail, is it possible to threshold to select/ deselect a population with higher values?

There is a figure in the link to help explain!

submitted by Nevin Manimala Nevin Manimala /u/joefromlondon
[link] [comments]

What sort of test should I be doing to analyze this data?

Just finished collecting data on a project I’ve been working on for a while and I’m having a brain fart on how to process it. I’ve been working to see if empathetic imitation extends to complex motor movements. I had participants watch videos that included models on a screen writing at points in them. Videos were ranked ordinaly based on the amount of the target motion visible over the duration of the video. The Dv is the amount of writing if any. Participants had their hands on a pencil to paper while watching and the length of the farthest point from the starting middle point is measured to determine how much movement occurred. I believe this would be ratio due to it being possible no movement occurred. How then would I analyze this data. It’s probably something simple but I can’t for the life of me remember what my original plan was.

submitted by Nevin Manimala Nevin Manimala /u/TheSuavestOrange
[link] [comments]

assumptions of linear regression.

Hi! Have any of you met a textbook which states the dependent variable (y) is supposed to be normally distrubuted as an assumption for linear regression model? (I know it’s not neccessery, for the sake of normality – only residuals. But finding this not-so-correct statement in any textbook will help me with a bet ; ). )

submitted by Nevin Manimala Nevin Manimala /u/chalwanna
[link] [comments]