[Statistics Question] Help interpreting this LSAT data?

Hey! I’m considering doing some LSAT tutoring, but this data from LSAC has been bugging me for a while.

I’m a total statistical layman, but it seems, on its face, to suggest that studying for the LSAT, by any method, has (basically) no effect on LSAT score.

A TLDR of the study: LSAC asks testers to fill out survey at the end of every exam, with 9 different options for how that tester prepared for the exam. You can select more than one option, one of the options, option 9, is “No preparation.” Students who selected only option 9 performed only 1-4 points lower than any other combination of choices.

I believe the relevant data is in the tables on pages 22-23. LSAT scores are the numbers between ~148 and 152. Here’s the pdf: https://www.lsac.org/docs/default-source/research-(lsac-resources)/tr-11-03.pdf/tr-11-03.pdf)

Again, any insight is appreciated! I have my reservations about charging someone for tutoring services if it seems like it will largely be ineffective, but I’d love to understand the data better!

Also, there are obviously alternative explanations for the data that seem to exonerate it. For example, maybe students who are scoring far lower when starting, say around ~135, are much more likely to seek out tutoring or prep courses, and they are bringing their scores up ~15 points. Plausible?


submitted by /u/Elsatonmyface
[link] [comments]

How do you avoid someone looking a 1 to 1 statical comparison and not saying “correlation is not causation “

Let’s say I did an annual chart comparing daily temperature averages in Palo Alto and googles day close stock price. And saw a correlation that hotter days ended with higher valued google shares. This would not be credible insight. How do you intentionally avoid something like this ?

submitted by /u/citizenofacceptance2
[link] [comments]

Test of distributions for interval data

Hi all!

I’m looking for something similar to a chi-squared test but that considers the extent of drift between values. For example, using these three distributions I’m looking for one that would give a more extreme output when comparing distribution 3 vs 1 than when comparing 2 vs 1.

The context that I’m using this in is comparing two different graders’ grade distributions to get some insight on whether they are likely to be grading similarly.

Any help is much appreciated!

submitted by /u/artifaxiom
[link] [comments]

Where does the computational load occur when running a logistic regression?

I’m trying to figure out where the computational intensity lies for a logistic regression (I’ll be doing an elastic net version later). For a linear regression model, the biggest matrix used when using OLS is X’X. So if you have 50 variables and 100 observations, the largest matrix you’ll have is a 50×50 matrix, and the most cumbersome calculation is finding the inverse of that. I’m looking at MLE for logistic regression, and I can’t quite tell what the largest matrix is or the most cumbersome calculation.


Scrolling down to equation 11, would it be this? Or would it be equation 16? It looks like the statistical software isn’t even needing to find derivatives or second derivatives, assuming it’s just using these formulae here.

submitted by /u/problydroppingout
[link] [comments]

My first AB test… is this a valid approach?

EDIT. Can someone explain why I am being downvoted.

My experience with stats so far is limited but I am trying to improve. I wor at an online publisher and we would like to A/B test content recommendation changes. My question is if there is anything wrong or improvable with my setup. Here is an overview:

We currently pay for a content rec widget which has 5 panels in the sidebar. It links to our own content. I don’t know the specifics but it links to the current, top content across the site. The hypothesis within the company is that defining recommended content based on WHERE the user came from may yield better results. For instance, giving different results to users from FBK, AOL, Direct, etc.

The plan is to take every single existing and newly created article currently and run 50% of the traffic to the existing rec widget and 50% to a custom widget. The custom widget would recommend different content depending on where the user came from. We will have 1) one set of recommendations for paid FBK users, 2) another for AOL users, 3) another for Organic FBK users and 4) another for everyone else.

Every time a user clicks on the widget we will track: 1) user source 2) content they clicked from 3) content they clicked to 4) whether they clicked on existing widget or custom widget 5) datetime of click.

Already, this is not a traditional AB test. We would basically be using each piece of content as a host for a variety of AB tests differentiated on user source. We would only compare click through rate between existing and custom widget within source bucket, so CTR of FBK users for existing widget would NOT be intermingled with CTR of AOL users from custom widget.

One parameter we could tune is the type of content we recommend. We can recommend content based on views, comments and how recently it was “hot”. My thought would be to adjust these parameters only after the existing test has run for a while and basically having that act as a 3rd/4th test etc.

Although it varies on source, across all traffic sources we receive millions of visitors a week. We can run this as long as it takes to get enough data.

As stated originally, what are you thoughts with the setup itself? Is there anything invalid happening here? Thanks

submitted by /u/joshman108
[link] [comments]

Testing how high 2 different groups score on 4 categories, which test?

I am doing some basic research and I have 2 different groups A and B. By means of a survey I tested how each group scored on 4 aspects which consist of a score (calculated from multiple questions).

I want to see which group scored highest on which aspect. I also want to see if these highest scores are statistically different. Or is it better to compare all 4 scores of group A with all 4 scores of group B, if so, how?

What test do I need if it is normally distributed, what if it is not normally distributed.

Thank you so much for helping this confused student.

submitted by /u/NightOwlAnna
[link] [comments]


Hi,m as you know one of the assumptions of a linear regressione is that there must be no correlation.

Sorry for my ignorance (I am at the beginning), but which kind of correlation do they refer to?

Let’s suppose I want to predict weight (Y) based on height (X)…

let’s assume that before to run the model I don’t know if they are linear or not. And I would just to spot if there is autocorrelation because if any, I can’t make a linear regression model.

Here the question:

Well what the auto-correlation is supposed to be? Is it the correlation between the values in the predictor (X) and the values in the Y?

Or is it the correlation between the values of the X only? (The correlation between the values of the heights… so 1,75 correlated to 1,80cm, and so on…)

Basically, as an assumption of linear model, there must not be correlation,

but correlation between what? X and Y? This is what I don’t understand

submitted by /u/luchins
[link] [comments]

Evaluating monthly return outliers – Poisson?

Apologies for the basic question but I’m just not sure I’m using the right method:

Every month we check the proportion of sales returned by team as a quality check. Generally the returns are around 1%, one month a team got 1.8% returned. Should I use Poisson probabilities using lambda as the expected returns for that team based on sales (ie sales x 1%), vs. Their actual returns? Or is there a more appropriate method?

submitted by /u/Peter_Weller
[link] [comments]

Does anyone have any resource recommendations for self teaching statistics?

Hi! Apologies if there’s some really obvious resource linked here I’m overlooking. I also wasn’t sure whether to post this here or r/AskStatistics, so I understand if this has to be deleted!

My only stats background is a class I had to take as part of an undergraduate biology degree, and I’m now attempting to learn statistics for fun. I tried Khan Academy, but felt a little bored with the pace it’s taught at. I’m currently taking a course on Coursera and really enjoying it. But I still feel really lost with resources as I’m not sure which are reliable and which aren’t, or where to look. So I wondered if anyone had any recommendations please (textbooks, websites, etc, anything that helps gratefully taken on board!)?

submitted by /u/adarium
[link] [comments]