FOR EMPLOYED STATISTIANS: What are the most important skills for beginners?

I don’t know how to make up for not having a stricly statistical degree in a job interview. So what should I highlight? What can I do?

THANK you! I’ll get my bachelor degree in sociology, which includes statistic. I worked 2 months in a apprenticeship with databanks. I worked 4 months in another apprenticeship in a big company on a 40 page report (including presentation). I can just a little programming (javascript) and know R Basics. I speak english, german and a little russian and mandarin chinese.

[Kaggle] Youtube Trending Videos – Detailed ANALYSIS (EDA)

on a udacity project , I completed an EDA(Exploratory Data Analysis) on Youtube Trending Videos

84% or more trending videos are using one of its tag on the video title for at least once.

Other than 604 trending videos,all trending videos are appeared in the trending list for more than 1 once.

Maximum number of Youtube videos are listed on trending, within 0 to 14 days of the video publishing date.

More users engaged in conversation when they were disliking a trending video rather than liking a trending video.

If difference between first trending date & publish date is less than 4 days,then there is a big chance,that video would not be re-trended for more than 3 times.

There is a impact on Youtube trending videos views count over tag_appeared_in_title or not.

Trending videos those have listed for more than 5 times got the highest number of views.

Videos belongs to categories where number of subscriber is/are most ;those videos are using at least one of its tag on the trending video title.

Many of Youube trending videos get listed on trending list for more than 1 time(or day), but they did not get higher number of traffics.

Another point I already discussed,many of the trending videos have lower number of subscriber(some of them have 0) & yet they managed to get greater number of viewers than top subscriber channels present in the Youtube.

Also I saw there are many trending videos managed to get higher number of views counts,but they have very few likes(many of them have 0).

Sample Size for Customer Survey

My boss is setting up a survey of 300 locations and of around 50,000 of our subscribers. The survey will contain 6 questions about customer satisfaction.

Right now, he wants to do 30 locations (10% of the network) and 20 customers per location. Now, intuitively it seems to me like a decent spread but the decision on this entirely lacked any methodology and they just kinda followed their gut.

Firstly, I’d like to ask… is this a decent/reasonable spread? Also, can anyone explain a bit about sampling methodology or point in a direction about where to find this information?

How to impute missing data using a predictive model?

I recently attended a talk about imputing missing data using machine learning and during the Q&A a random audience member commented that when using machine learning (or any predictive model) for imputation, we should use the resulting probability distribution to impute rather than the highest prediction probability.

From what I understand, the commenter was saying if the model says there is a 60% chance it is “A” vs “B.” Rather than assigning the missing value “A”, you should flip a weighted coin (60%/40%) and impute the outcome that way.

Can anyone explain the reasoning here? Wouldn’t it make sense to impute with the best prediction?

Statistical test for comparing populations means based on a big sample and a small one

I have some sets of data and I would like to compare their means.

For the moment I just calculated their means and compared them but I think that viewing each set as a sample of a bigger population and using a statistical test to compare their mean would be more appropriate.

I would like to hear some opinions regarding this approach.

Besides that, I am not sure what statistical test to use. I can’t say that these data sets follow a normal distribution. The data is continuous and some sets have a few hundred items but some have less than 10.

Could you please recommend a statistical test for comparing the mean of two samples for which one is sufficiently large (more than 30 items) but the other one has less than 10?

I was thinking about using a T test but since I can’t say that the populations follow normal distributions and the samples aren’t big enough in all cases, I’m not sure if that’s appropriate.

Is statistics a worthwhile career choice?

Hello! I am currently a college student pursuing a degree in Data Analytics (its a mix of comp sci classes and stats classes). I like statistics as it allows me to help people and solve problems using math. However, I have seen plenty of people say that they dislike their stats jobs. They mainly say that they are boring, require too much education, and feel pointless. Should I stay in stats or are they onto something? Thank you!

Please help me develop a better intuition for understanding the basics hypothesis testing

I’m currently doing an introductory course on statistics, and specifically a module on hypothesis testing.

I can follow along with the examples just fine, but what I struggle with is intuitively understanding why H0 is rejected when the test statistic falls within the rejection region.

My current best understanding is as follows: if the test statistic (which is a standardised measure of how far a sample mean is removed from the population mean) falls within the rejection region (which is determined by how much confidence you want in the inference; significance level alpha) then it means that, since the distribution is normal, the sample mean differs from the population mean due to something more than luck (this is as far as my intuition goes 😐).

Any ideas for how I can better understand what’s going on here? Maybe (likely) I’m missing some basics that I need to go back to.

Good book to read over the summer to prepare

I’m pursuing a ML degree and aside from the normal CS courses, there’s a ton of stats courses as well. I can handle the cs courses but I just took Intro to Probability and Intro to Statistics over the last year and it was pretty difficult for me. I still don’t think I properly learned anything.

Going forward, I have to take ~10 more stats courses and looking at the overview this is what I have to know

  • Stochastic Processes – Topics covered include finite dimensional distributions and the existence theorem, discrete time Markov chains, discrete time martingales, the multivariate normal distribution, Gaussian processes and Brownian motion.

  • Regression Analysis – Orthogonal projections. Univariate normal distribution theory. The linear model and its statistical analysis, residual analysis, influence analysis, collinearity analysis, model selection procedures. Analysis of designs. Random effects. Models for categorical data. Nonlinear models. Instruction in the use of SAS.

  • Statistical Inference – Principles of statistical reasoning and theories of statistical analysis. Topics include: statistical models, likelihood theory, repeated sampling theories of inference, prior elicitation, Bayesian theories of inference, decision theory, asymptotic theory, model checking, and checking for prior-data conflict. Advantages and disadvantages of the different theories.

So it’s pretty safe to say I’m at a very basic knowledge/understanding of statistics. I remember and understood a bit of probability, but the stats course was a complete blur.

It’s currently the summer semester so I’m hoping to read atleast an hour or two every day and try to strengthen my knowledge.

Thanks everyone!

Graduate Degree Opportunity


I’m a forty year old programmer who works primary on database applications. I have the opportunity to enroll in a MS program for statistics and have it mostly paid for. I’m interested in analyzing public programs to see if they’re achieving their goals, but I’d be happy doing any kind of work that had a positive social impact.

Is what I’d like to do practical? Who employs people who do what I want to do?


