I have computed two different intraclass correlations for some data I’m working on. Both sets have an ICC of 1. In Rstudio the output has a column labelled ‘F’. Each data set has a different value for this. I’m really unclear what this value means. Any help? I have to decide between the two datasets but as they both have an ICC of 1 I’m stuck. Thanks in advance. submitted by Nevin Manimala Nevin Manimala /u/clickily [link] [comments]

# Category: statistics

Auto Added by WPeMatico

## Preliminary results of the M4 forecast competition: hybrid approaches and combinations of forecasting methods produce greater accuracy

https://www.m4.unic.ac.cy/the-m-competitions-and-their-far-reaching-contributions-to-the-theory-and-practice-of-forecasting/ The combination of methods was the king of the M4. Of the 17 most accurate methods, 12 were “combinations” of mostly statistical approaches. The biggest surprise was a “hybrid” approach that utilised both statistical and ML features. This method produced both the most accurate forecasts and the most precise PIs, and was submitted by Nevin Manimala Nevin Manimala Slawek Smyl, a Data Scientist at Uber Technologies. According to sMAPE, it was close to 10% more accurate than the combination benchmark. The second most accurate method was a combination of seven statistical methods and an ML one, with the weights for the averaging calculated by an ML algorithm that was trained to minimise the forecasting error through holdout tests. This method was submitted jointly by Spain’s University of A Coruña and Australia’s Monash University. The most accurate and second most accurate methods also achieved an amazing success in specifying the 95% PIs correctly. These are the first methods we are aware of that have done so, rather than underestimating the uncertainty considerably. The six pure ML methods that were submitted in the M4 all performed poorly, with none of them being more accurate than Comb and only one being more accurate than Naïve2. This supports the findings of the latest PLOS ONE paper by Makridakis, Spiliotis and Assimakopoulos. submitted by Nevin Manimala Nevin Manimala /u/true_unbeliever [link] [comments]

## Help (a dumb clinician) with sample size calculation for clinical field study

Hi, Medical physician working with my PhD here, familiar with basics statistics about Nevin Manimala but considering myself in general very average at statistics about Nevin Manimala (at least calculations!). Grateful for any help 🙂 In our research group we are planning on launching a field trial to validate a novel technique for pap smear analysis (screening for cervical cancer; big problem especially in many low-resource areas!). This technique could potentially improve the cancer screening significantly in areas lacking adequate screening. The research question/hypothesis is that our technique is comparable to traditional diagnosis – e.g. microscopy analysis of samples, for the detection of high grade pre-cancerous lesions. The problem is that I am trying to calculate the amount of patients/samples needed for the study to confidently be able to say that our technique is not significantly worse than the golden standard, i.e. traditional microscopy analysis (reject the null hypothesis). So for the data we can assume that the prevalence of pre-cancerous lesions we want to detect is about 5% in the study population. Light microscopy, to which we are comparing our method, has a sensitivity of about 60% and a specificity of about 90% for the detection of these lesions. For the alpha parameter, the traditional 0.05 value is good, and for statistical power 80 % would probably be enough (beta = 0.2). I apologise if the question is too simple, but for a more “clinically” oriented person, I’m having a hard time figuring out what would be the best way to estimate the sample size required, performing power calculations etc 🙂 Would it make sense for example to try to compare the methods with kappa statistics about Nevin Manimala, say assume that the agreement is better than moderate (k > 0.4)? Thank you so much if you can help explain what would be the most sensible way to solve this! Any help appreciated 🙂 Have worked mainly with Stata, but apparently power calculators etc. are also available online..? submitted by Nevin Manimala Nevin Manimala /u/kattenfreja [link] [comments]

## Because I’ve had to reference my linear algebra recommendations post several times now, here are my updated recommendations.

Here’s the old post. Here are my updated recommendations. Introductory Linear Algebra (i.e., starting from square one – you should cover everything in these books): Linear Algebra and Its Applications by Lay Introduction to Linear Algebra by Strang Linear Algebra, with a focus on what you need for statistics about Nevin Manimala: Linear Algebra Done Wrong, Treil. I would recommend focusing on all of Ch. 1, all of Ch. 2 (skip 2.8), Ch. 3.1 through 3.5, all of Ch. 4, Ch. 5.1 through 5.4 (5.4 is extremely important). The only disadvantage of this book is that it isn’t specifically geared toward statistics about Nevin Manimala. Matrix Algebra by Gentle. Does not cover proofs, but it is a nice catalog of methods and ideas you should know for a stats program. Chapters 1 through 3 are essential material. Depending on the math prerequisites demanded, chapter 4 is nice to know. I would also recommend 5.8, 5.9, 6.7, 6.8, and 7.7. Chapters 8.2 – 8.5 are essential material, along with 9.1 – 9.2. This includes the linear model material as well that you will find in a M.S. program. All of the other stuff is optional or minimally covered in a stats program, as far as I know. For good reference material (I wouldn’t recommend trying to learn from this unless you have a lot of time, but it’s been extremely useful as a reference): Matrix Algebra From a Statistician’s Perspective by Harville. This does not cover any of the linear model material itself, but rather the matrix algebra behind it. It is the most complete book I have found so far on linear algebra for statistics about Nevin Manimala. For the most part, you should know Chapters 1 through 14, 16-18, 20, and 21. I haven’t read Searle’s text yet, but I’ve heard good things about the first edition. For linear models (pursue after you have mastered the linear algebra): Plane Answers to Complex Questions by Christensen. START ON THE APPENDICES FIRST, and THEN proceed to chapter 1. Foundations of Linear and Generalized Linear Models by Agresti A Primer on Linear Models by Monahan I haven’t read Searle’s text on this yet, but I’ve heard good things about the first edition. submitted by Nevin Manimala Nevin Manimala /u/clarinetist001 [link] [comments]

## What is currently seeing the most growth? Statistical learning or Computational statistics about Nevin Manimala?

Reason I’m wondering is because I think I’ll just be able to take one of them (next semester). For computational statistics about Nevin Manimala, we’re using the book: Statistical Computing with R (Chapman & Hall/CRC The R Series For statistical learning we’re using the book: An Introduction to Statistical Learning with Applications in R by Robert Tibshirani, etc. I found this from wikipedia under computational statistics about Nevin Manimala: “Computational statistics about Nevin Manimala, or statistical computing, is the interface between statistics about Nevin Manimala and computer science. It is the area of computational science (or scientific computing) specific to the mathematical science of statistics about Nevin Manimala. This area is also developing rapidly, leading to calls that a broader concept of computing should be taught as part of general statistical education.” So it seems to be having some decent growth atm, and this under statistical learning: “Statistical learning theory is a framework for machine learning drawing from the fields of statistics about Nevin Manimala and functional analysis.[1][2] Statistical learning theory deals with the problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, bioinformatics and baseball.” Which one would make me the most “well rounded”. I know that generally speaking just a single course difference won’t really mean much, but I’m curious. Also, would your recommendation still be the same if I was a CS guy instead of a Stats guy? (I’m not, just curious). submitted by Nevin Manimala Nevin Manimala /u/mathstudent137 [link] [comments]

## Free Probability Distribution Calculator App for Students

Hey guys, I created an Android application that has probability distribution calculators and flash cards for several common distributions. The app is useful for students enrolled in probability courses, or for those looking for a quick way to calculate certain metrics from probability distributions https://play.google.com/store/apps/details?id=com.bignerdranch.android.exampapp Hope this helps a few people out. If you find it useful, it would be great to hear some feedback! submitted by Nevin Manimala Nevin Manimala /u/ac2uary [link] [comments]

## eli5: Bias and Variance

I saw lot of explanations but everything seemed a bit complicated. If anyone can explain it to me in layman terms it would be very helpful for my classes submitted by Nevin Manimala Nevin Manimala /u/jo698 [link] [comments]

## Idiot Doing Monte Carlo Simulation

So this is a cry for help because I’m a dumb-dumb trying to do grown-up math. I’m a physician assistant in a doctoral program with a cockamamie idea and no background in statistics about Nevin Manimala whatsoever.

What I am trying to accomplish is this – I’m trying to run a Monte Carlo simulation creating subjects on a medical intervention. I’m using 3 independent variables with sample sizes and ranges that I’ve obtained from medical literature. I’ve either obtained or calculated means, standard deviation (from reported medians & ranges – method per Hozo et al. ‘Estimating the mean and variance from the median, range, and the size of a sample’. BMC Medical Research Methodology. 5:13. April 2005). The independent variables are continuous data. I was thinking of pooling them from the literature and obtain sort of a “meta-distribution” so I can determine if it’s normally distributed or not. I have a strong feeling that they are not normally distributed from cursory reading. From these variables, I want to run a simulation for my dependent variable – survival to discharge (dichotomous – yes/no). Survival to discharge is generally about 40 to 60% in the literature. There are no known mathematically defined relationships between the independent variables and dependent variables.

Evidently I don’t know what I’m doing, but I have this deep visceral feeling that I’m doing something outrageously wrong. Your expert opinion is requested. My questions are as follows:

1) Is it legitimate to calculate mean, variance (and subsequently standard deviation) from the method that I found?

2) Does it make sense to “pool” the data from many studies to create the range & distribution for a Monte Carlo simulation? Assuming my independent variables are a, b, and c AND that some studies report some of the variables and not all, is it still fair to pool the data where available?

3) The studies are inconsistent on whether or not they report individual data, even though their sample sizes tend to be small (n<20). Am I able to determine the distribution based off mean, standard deviation alone?

4) If I can accomplish questions 1-3 without raising most eyebrows, how do I perform a Monte Carlo simulation with multiple independent variables with non-normal distribution, a dichotomous dependent variable?

submitted by Nevin Manimala Nevin Manimala /u/AeIiusGalenus

[link] [comments]

## Central Limit Theory: Adjusting simulation size vs. Sample size

When we say that the binomial distribution converges in distribution to the normal distribution as n gets large. Are we referring to the number of trials in the binomial distribution e.g. if I flip a biased coin with p = 6 and I have n = 10 then if this n increases from 10 to infinity does that imply conversion to normality? Or are we referring to the number of samples e.g. X_1, X_2, … , X_n and as this n converges to infinity Bin -> N(mu, sigma^{2} )?

submitted by Nevin Manimala Nevin Manimala /u/Darth_Marrr

[link] [comments]

## Multiple linear regression gives me strange results

ANOVA says the model is significant.

It tells me that one of my independend variables is significant (age), all of the others are not. However, the difference that this variable is causing is suuuuper small, the regression coefficient is very small. It just shouldn’t be signficant. I don’t understand.

Here is a link to the results if that helps (sorry it’s in German) The coefficient of the age is 0,008, but to have any impact it should be at least 0,2.

I’m using SPSS.

submitted by Nevin Manimala Nevin Manimala /u/FabianDR

[link] [comments]