[Q] Guys please help me with a simple data analysis (not homework or anything, just a personal research project)

Hello everyone. Here’s my problem. I have two data sets of time measurements. For each set, I would like to determine the likelihood that the measurements were taken of an event which takes x minutes and the likelihood if it’s an event of 2x minutes (that is, I want to know what is the more likely REAL duration of the measured event, given the data set, or for which of the two events the data fits better). I hope that my question is clear enough!

Here are the data sets: (all numbers are in minutes)

Set #1: (was the event 42 or 84 minutes?) 60, 60, 50, 45, 45, 41, 50, 33

Set #2: (was the event 60 or 120 minutes?) 65, 80, 75, 80, 64, 45, 67.5, 61.5

Keep in mind that I’m mathematically illiterate, so please give the most simple and straightforward answers that you possibly can. Thanks for your help!

submitted by /u/Fafner_88
[link] [comments]

Bought 98 Civic Sedan for a project car, any advice?

Hi, I’ve been watching car YouTube videos for a couple years and I’ve always been interested learning how a car works and gain some knowledge. I have done few basic things on my new accord such as brakes, oil, tranny fluid change etc.

Anyways, my friend and I bought this 1998 Civic EX with 279,000KM. We paid $125 each. The car runs great for a few miles but I noticed eventually the Temperature gauge reaches Hot and I have to let the car cool down. I notice there is fluid bubbling near the edge of the plastic portion on top of the radiator when I let the car cool. Another few things the Heat does not work along with the power steering. In terms of the overheating what are some areas I could start checking first. (Coolant reservoir looked dirty I was maybe going to try flushing it)

Thank You!

submitted by /u/orange-pk
[link] [comments]

[Q]: Why is the Chi-squared distribution used for testing for a goodness of fit?

I get that it works, but why?

For the probability distributions I’ve studied up to this point, I never had to ask this question because it was always obvious.

Take the normal distribution. Say my battery dies after a short period of just x months. The manufacturer says batter life of all batteries in this range is normally distributed with a mean of μ months and a standard deviation of σ months. I know the normal distribution and I know how to use that knowledge to check likelihood of my battery by consulting a z table. The connection is intrinsic and it makes sense.

Similarly, I know the binomial distribution and I can easily see that it’s my go to distribution for working out the probability of a specific outcome of multiple coin tosses. It’s also an intrinsic connection.

Now, back to chi-square. My first issue is with the abstract nature chi-square distribution itself. I struggle to see how it can be tangibly related with real world data.

In trying to understand this distribution, I manually built a chi-square distributed dataset in R to test if I understood the mechanics of the distribution. I understood the mechanics, but that didn’t help me understand how it related to the real world in the same way that the normal or binomial distributions do.

Also, I can clearly see that a chi-square goodness of fit test resolves a problem between expected and observed distributions, but I don’t get why the chi-square distribution reveals this. They seem like two separate concepts and I’m struggling to see the relationship.

Here’s what I understand to be a comparison of the chi-square dataset I built and a goodness of fit test.

In the chi-square dataset I built, I squared k number of randomly sampled values from a normal distribution and added them together. My degrees of freedom was equal to k.

The goodness of fit test involves finding the error between observations for k number of bins (e.g. days) with the expected value for each bin. To do this, for each bin (K1-Ki), I subtract expected value of K from the observed value of K, square that and divide by the expected value of K. I then add all these together. My degrees of freedom are K-1.

Here are the similarities I do see. We have a value for k in both scenarios. We also do some squaring for each value of k.

Here are the differences I understand. We’re subtracting expected from observed because we’re interested in the distribution of the error.

Here are the differences I don’t understand. Why are we squaring the difference between observed and expected and why are we dividing that by the expected? I get that squaring values is a reason for the connection, but I don’t understand why we’re doing it here. And why are our degrees of freedom K-1 in the test?

I feel like if I could answer these questions, I’d be able to see why the chi-square distribution works for me here.


submitted by /u/temujin64
[link] [comments]

[Question] Which test best to use for comparison of two groups of variables?

Hey r/statistics!

I know this is sub is not about people asking for help on their homework but I hope my problem is “advanced” enough for you to be able to help. I’m not an english native speaker so some terms I use might not be technically correct. I have some basic understanding of statistics from a university course I attended. I’m expected to use the “SPSS”-Software for my analysis.

I’m working on my Bachelor’s thesis and looking to compare two sets of variables. Very basically, I’m working with two groups of patients (different patients!), each of which has been administered a certain amount of radiation to specific areas. Each area is administered a certain radiation dose (unit: Gy), expressed by several numerical values (e.g. 95% of the organ gets 50Gy, 50% of the organ gets 45Gy, 5% of the organ gets 35Gy). In addition, I’ll evalute which percentage of a given organ’s volume is administered a certain threshold dose – For example, 50% of the organ’s volume might be above that threshold.

After aquiring that data for both groups, I would like to compare said groups in hopes of finding significant differences between them. However, I’m unsure about which kind of test would be most appropriate. I have looked at research papers that cover this type of problem and have found some using the “t-test” (the one I was thinking about using) and others using a two-sided “Wilcoxon Signed-Rank Test”.

Unfortunately I’m not experienced enough to know if one, both, or neither of the tests would work for what I intend to do. I’m also not sure of the exact differences between the two tests. Lastly, I don’t know the significance of having a one-sided vs. a two-sided test. Since I’m looking to evaluate significant differences between the two groups I’m assuming the test would have to be two-sided(?). Reading up on both tests has not been very helpful due to my lack of understanding and the language barrier, but I will continue trying.

If anyone could help point me in the right direction I would be grateful; I’m happy to provide further details.

edit: I am trying to see if there are significant differences in exposure between the two groups. To go into some further detail, each patient group has completed radiotherapy and this therapy was planned using two different planning systems (Software). I’m trying to evaluate if there is a difference in how much exposure the systems can administer to the target area and the organs at risk (which should get as little exposure as possible).

submitted by /u/the_simonius
[link] [comments]