Redesign Styling Update #2: Updated Data Sources!

Hi all, ​ In the redesign theming, we’ve now got two sets of data sources in the menu bar up top. I’d love for additional suggestions of repositories to include, as well! If we get enough, then consolidating everything into a maintainable wiki page here on /r/statistics might be the move. ​ As always, please reply with questions, concerns, or suggestions. ​ –Toasty submitted by /u/mmm_toasty [link] [comments]

Why do people always assume math majors want to become teachers?

Whenever someone hears that I am majoring in math, I always get the response “ what are you going to do with your degree?” Or “so you want to be a teacher”. I believe that math is applicable to almost anything but I’m always met with these responses. Does anyone else get annoyed by this? submitted by /u/stressedout4 [link] [comments]

How do you structure your work?

Hi everyone, I recently started an internship in Talent Innovation and Diversity and Inclusion as a data scientist. My background is in psychology, with a masters in statistics and psychometrics. My first big report was due today and what I noticed is that my work flow needs to be structured in a different way. The way we were taught to start projects, is to explore the data and play around with it to get a good understanding of the data itself before trying to answer the assignment or question. Up until now, I never really structured this “playing around”, meaning: I never systematically reported my findings in this stage nor did I have a ‘list’ of things that I certainly wanted to look at. Writing this report, I realised that a more systematic approach will save me a lot of time, and will also create a “standard” for my work. As I am the first one with a statistical background in this department, and because I want to develop a solid statistical working style, I am very curious to hear how you do it. How do you structure your work/reporting/analyses? Do you have a system or strategy? Or maybe a vision on how work flow should be? Any links to further documentation, discussions or other posts are also highly appreciated. It’s a broad question, but I’m open to hearing very different experiences or opinions, it doesn’t require a one size fits all answer and I’m also not looking for a magical solution. Looking forward to reading your responses! submitted by /u/leendersh [link] [comments]

Analyzing disruption/moderating third variables with ordinal scales…

Hi everyone, my knowledge of statistics limits itself to what I learned at university studying media and communication science (the basics of everything, or so I thought). My gf is currently writing her thesis and ended up with less-than-ideal data: All of her scales are basically ordinal (including a few Likert-scaled questions). Her H1 had to be analyzed via Spearman’s rank correlation because of the ordinal scaling and ended up being r(s) = .137, p = .282 (at alpha = 95%). That’s fine per se, but now the H2 and H3 are supposed to test the same relationship between IV and DV but each with a moderating/disrupting variable introduced (different variable for H2 and H3, same IV and DV though). For metric DVs I’d just try to calculate a multifactorial ANOVA, but with ordinally scaled variables, how do we do that? I seem to be stuck. Thank you in advance for your help! submitted by /u/strikedamic [link] [comments]

Looking for advice on how to model some data.

We have observations of individuals at various points in their life. We are curious if Variable A predicts later values of Variable B (i.e., if being high on A early on means you will be high on B later on). The trick is that everyone was observed at different time points. Here is some example data to help imagine what I mean. Person Age A B 1 12 2 4 1 15 3 7 1 17 4 8 2 14 1 2 2 20 1 8 I know it will require a mixed effects model. But I’m unsure how to model the relationship between A and B. My initial thought was to have each value of A predict the next observation of B. The problem is that there are different amounts of time passing between each observation for each person. Another thought would be to dichotomize the data. Perhaps take the median age across all individuals and then compute the averages of A and B for each person below and above that value. Any advice? submitted by /u/UnderwaterDialect [link] [comments]

Isolating the Gaussian distribution with the largest mode in a multi-modal distribution

I have a multi-modal distribution and would like to filter out data outside of the Gaussian with the largest mode.

Any recommendations from the Reddit brain trust would be greatly appreciated!

R examples would be a bonus!

submitted by /u/Shrimpio
[link] [comments]

Testing a gambler’s edge

I posted this question on r/askmath and while I received some helpful replies, I’m still not sure if I am taking the correct approach.

Here is the question:

A gambler believes he has an edge on a game with a known probability of 1:4.21

That is the overall chances of winning are 1 in 4.21

Each game is either a winner or a loser.

He observes 13 plays of the game. He guesses 4 as winners ahead of time, and is correct 1 time. The other 3 are losers.

In the 13 plays, 3 are winners. 1 the gamblers identified in advance, 2 he did not.

How many guesses/games must the gambler observe before he can determine that he has an edge or he does not have an edge?

Here is my attempt at a solve:

My intuition here is that this is not large enough sample to be statistically valid, although first impressions indicate the gambler may have no edge.

Using the binom.dist function in excel I’m seeing the chances of this many successes as 42.1%

The excel formula I am using is =binom.dist(1,4,1/4.21,false)

If the ratio of correct guesses (1) to total guesses (4) to total plays of the game (13) holds constant…

70 correct guesses, 280 total guesses, 910 plays of the game. Plug this into the excel function above and you get 4.9%

Is this a 90% confidence interval? I believe the excel function above is 1 sided.

Am I thinking about this correctly?

Edits: grammar

submitted by /u/412champyinz
[link] [comments]

Recommeded Learning and path for a Career in Pharmaceutical Industry

I just graduated with a Master’s degree in Statistics after a BA in Mathematics and have been looking for all sorts of jobs in the LA/Orange County area. Ultimately, I would love to eventually in the clinical trials process as a biostatistician or programmer.

Obviously finding a job when you would be basically an entry level hire is tough. My experience in the bio/drug industry is limited to a summer research program with the FDA. I am wondering if anyone has suggestions here on paths I might explore to grow toward my target role. Is training for and getting a SAS certification worth it? Are there certifications I might not be aware of? Would a job as a biostatistician at USC or UCLA or a hospital be unlikely? What job titles might be good to look for as positions that might teach me skills I need?

submitted by /u/sporkredfox
[link] [comments]