# Category: r/statistics

## [Education] What’s the difference between rankings? Can I afford to be picky?

I decided to go back to school recently to give myself more options and now have the semblance of a post-grad alternative transcript. My experience of taking two intro to stat classes one at the undergrad (#1) and one at the grad level (#2) at two different online extensions has me questioning what kind of learning experience I want going forward. At least In terms of textbook choice and the quality of the extra material, I found one so much better than the other, i.e. the higher ranked one is better. I’m not sure how that would translate to in person classes.

I’m open either an online or brick and mortar setting once I have an offer of admission, but I can’t move beforehand. The problem is with my undergrad gpa gained at a top ten school but sub-3.0 by quite a bit due to some personal issues. I looked around and decided to do what other people in my situation were doing, which is to *take a few classes as a non-matriculated student at the school you want to apply to.*

At the end of #2’s degree, I will have a diploma that doesn’t differentiate online vs in person, but it will be a self-taught and solitary. Mainly, I will be leveraging the degree to get internships in the fields I’m interested in. More highly ranked schools like #1 only have in person degrees which may mean more competitive admissions.

In terms of education quality, given the usnews rankings (I know it’s a peer score) how much of a difference is there between the top ten, within the 10-20s, within the 20-40s, and beyond or compared to the top 10s? Should I just stick with the school I’ve chosen which presumably is less selective for its online admits? How realistic is applying to a different school with several grad level stats classes that may or may not transfer? I’m just worried that I’m not doing everything that I can. Everything means finding a job near the school of choice and use the above strategy for the off-chance of admission. I’m not sure what’s the cutoff at which job placements at great places become less common. I’m aware that state schools or lesser ranked schools are a minus or not helpful getting in the door in some fields which at least could be mitigated if their programs are really strong.

## [Q] How were the use cases for the probability distributions developed?

So each probability distribution has its own use case on when to use it, for example, poisson with fixed time interval, binomial for number of successes with replacement, etc.

How were the use cases determined? Was it done by simulating the use case (and assuming their associated probabilities) and running it infinitely many times which led to the specific probability distribution function? TIA!

## [Q] Cross-validation: How is bootstrapping different from repeated random holdouts?

As far I can tell, bootstrapping is just random holdouts with around a 63.2/36.8 train/test split. I think the question may come down to how would the model treat duplicates in the training set. Is there some other reason for using bootstrapping? I read somewhere that bootstrapping is preferred for smaller datasets, but I don’t understand why that would be the case.

## [Question] GoogleTrends data by state/year: downloading results for individual years vs. longitudinal series

Greetings,

My ultimate goal is to assess whether state-level search interest in a particular topic correlates with contemporaneous survey responses. I’ve assembled state-level survey data for 2007 and 2010-2017. My question is how I should go about pulling the GoogleTrends data. Specifically, should I download the results for each individual year per state (i.e. Alabama/2007, Alabama/2010, Alabama/2011 etc. etc.) or should I simply be inputting the entire time range (2007-2017) into the date field? Your input is greatly appreciated. Thanks in advance!

## [Q] Regression Coefficient Vs. Pearson Coefficient in a paper?

Hey Reddit, to a recent paper submission I received the following remarks:

“In the regression analysis, the authors should report estimated coefficient values rather than Pearson correlation coefficients.”

I am curious about why I should write regression coefficients rather than Pearson Coefficients because as I understand, Pearson coefficient would convey to the reader the linearity of a given variable, which I think is more important as motivation for someone to do future work based on my results?

I will provide more detail if necessary, but I was curious about coefficient makes sense to list? Thanks!

## [Q] Finding the difference between two machines!

I’ll be beginning a research project at college soon, and as someone relativley new to the statistics world, I had a question.

We will be using two different machines, that test the same characteristic. We want to know if they are any different from each other in terms of the result they give. Is there a specific statistical test we should run to determine that? I have access to various statistical programs (R,minitab, etc) if need be.

Sorry if this is a dumb question. Thanks!

## [C][Q] When to apply to jobs after graduating with MS?

Hi all, this is my first time coming to this community and I’m in need of some insight. I’ll be graduating with my MS in May 2020 and I would like to apply to jobs to start in summer 2020. I heard today that application cycles happen in December, but I’ve also heard that they happen in Feb/Mar. What’s the deal? When did you apply and interview? Thank you in advance.

## [Q] Probability in Normal distribution

I got this question that looks really simple:

A random variable that follows Normal distribution with mean 98 and std 12. Find the probability that the variable assumes a value of at least 120.20

Using excel, I got 0.0322 but the answer is 0.0313? Please this is killing me, could anyone provide explanation for this question?

## [Q] I need help understanding this equation

At work this equation is used to compare a bunch of values to known mins and maxes to get a benchmark of the product.

Score = 10 * (Max – value) / (Max – min)

I took a some classes in statistics and I don’t understand this equation. To be honest the scoring should be based on Z-scores.

Does this equation make any sense?

