Statistical simulation and inference in the browser

Hi! I’m developing a free open-source app StatSim for statistical simulations, modeling and Bayesian inference. It includes more than 20 probability distributions, different inference algorithms (Markov Chain Monte Carlo, Hamiltonian MC, Rejection Sampling etc), various charts for result analysis. It’s 100% free, browser-side and doesn’t require registration. You can experiment with probabilistic models just to study or use data to find unknown parameters of your models. Comment here if you need help or found a bug. Feel free to share your models. Feedback is really welcomed!

Some examples: – Accumulator model – Pi estimation – Autoregressive model

You can also download Windows and Linux binaries for offline usage: (~50MB)

News and announcements:


submitted by /u/zemlyansky
[link] [comments]

Here in Europe, Statistics discipline remains rather unpopular compared to the traditional fields and CS. Why do you think that this is the case?

Im a bit amazed to see how low statistics undergrad is in the preferences, compared to how excellent the job prospects are and how versatile a stat degree can be. Every european country ive been is no exception. The most popular STEM choices continue to be engineering, cs, biology, business studies and ofc med school. Physics and math are also higher in preference.

Why do you think that this is happening?

submitted by /u/Sorokose
[link] [comments]

T tests between 2 different distributions

I am trying to test for differences in nanoparticle size. I have a couple questions ranging from simpler to harder.

First of all, nanoparticle size distributions are “log-normal”. This is a fact from theory.

Between 2 nanoparticle solutions if I get the sampled distribution (which is log-normal) how do I test for differences in means?

As far as I know the t-test does not apply to log-normal data? Do I do a 2-sample t-test in the “log space” where I use the mean(log size) and SD(log size) rather than mean size and SD(size)?

Also, what if I am looking at unimodal normal distribution vs a bimodal or even trimodal distribution. How do I test for differences in means between these?

Sometimes the distributions are also so funky that I don’t know how to compare the means of them.

The t-test as far as I know assumes normality so how can I compare when its not normal?

Do I need to do my own “manual” hypothesis test and use characteristic functions or something to come up with the distribution of X-Y and see if this distribution contains 0 in a significant area?

submitted by /u/ice_shadow
[link] [comments]

Is there some metric (kinda like variance) that satisfies my desires?

Lets say you have two different sets of four points. Both sets have a mean of (0,0) and the same variance:

Set A: (1,0), (1,0), (-1,0), (-1,0)

Set B: (1,0), (0,1), (-1,0), (0, -1)

Is there some metric kinda like variance except one that gives a higher value for the second set? A metric that measures how spread apart all the values are in multiple dimensions?


submitted by /u/FireBoop
[link] [comments]

Looking for additional ways to analyze my data

For example I am currently running a One-Way ANOVA on a set of very large data and I’m looking for some other ways to do analysis on the data set. Below is some necessary background information.

The data comes from a test station that is gathering 32 different parameters from each instrument (16 instruments in total) I am testing. 8 of the instruments will have a baseline configuration and the other 8 I will change one parameter and gather information on them. So in the end I’ll have 8 instruments that are identical to each other (baseline) and another 8 that have one piece of their configuration changed and are identical to each other as well.

So for example, I run a One-Way ANOVA on the data and my two groups I am testing against each other are made up of the:

  1. Baseline Instruments 2. Configured change Instruments

Are there additional methods for analyzing the data?

submitted by /u/SchadenfreudeIstGut
[link] [comments]

Fortnite statistics.

I consider myself a recreational statistician. I recently bought a gaming PC so I downloaded fortnite. For those of you who don’t know fortnite is a battle royale game in which 100 players are dropped on a map. They all kill each other and the last man standing wins. While the game was not for me (couldn’t kill even one person), it gave rise to an extremely interesting statistical question. Given that the players are all the exact same good, AND you do not factor in that if Someone kills the people around him he will have to travel far to find another person to kill, what are the odds of the last player (the winner) to kill any N number of people. Give it a try it is much harder than it looks.

Edit: I can ask it in a simpler way . Whats the chance of flipping 100 coins such that you get n amount of heads assuming that the chance of the first flip is 99/100 that tails would show up, the second 98/100 the third 97/100 and so on. Let’s start by solving it if the chances were exactly even. You would then have to figure out the amount of possibilities to arrive at any result. For example, if we were playing with 4 coins the possibilities of getting 1 heads is 4, (HTTT, THTT, TTHT, TTTH,) out of 16 total possibilities(24). The problem is if we make the chances successively different then each possibility will contribute differently to the end result.

submitted by /u/superloser318
[link] [comments]

Would love to create a calibration plot from my nomogram

I used MedCalc to generate a multivariate logistic regression model of something and ROC analysis of bootstrapped samples. I would love to be able to generate a calibration plot of the nomogram but MedCalc doesn’t have that option. I wonder if there is a reference people can provide or an approach to generate the plot based on the data I have.

submitted by /u/jlphillipsmd
[link] [comments]

Modelling Stratgies for Longitudinal data with events causing “seasonal” change


So I am trying to model longitudinal data for indiviudals. Over time I meassure a biomarker concentration. This biomakrer concentration is rather stable over time and does not seem to change for each individual. However I also recorded when these people are vaccinated that biomarker spikes and then over time decreases back to normal. Each indivudal can have multiple vaccines over time. But the reaction of the biomarker apears to be quite similar over time and for each indiviual. What models could be used, that would be suitable for such data. I reall have a hard time finding appropriate modeling strategies.

submitted by /u/Metatronx
[link] [comments]

Sample size when adding means of two groups

If I took two populations, both with known sample size, means, and standard deviation, I can add the means together and propagate the standard deviation forward, but what is the sample size of this new mean? Statistics-wise, would it be fair to use the smaller sample size of the two groups (to be conservative when doing a summary t-test) or is there some other means to determine that sample size?

Edit: worded my question better

submitted by /u/HeataFajita
[link] [comments]

Statistics noob asks for help to solve a problem with calculating a confidence interval. Got this task from a group leader and dont know what to do now. Not even sure if I understand his question…

Let me say first, that I have already done quite a lot of research and I think that I basically know what a confidence interval is about. I just don’t get how I can apply it on my problem.

The question is: “You should state by the width of the confidence interval, why the sample size leads to a satisfactorily accurate estimate of prevalence.”

Some background information: There will be done a screening for a disease which prevalence is currently unknown. After identifying a certain numer of cases (= sample size), there will be further tests going on.

Problem 1: Can I choose the sample size on my own? Let’s say its 10 or 20, what could I do with that information that helps me to calculate the confidence interval? What would be a good sample size?

Problem 2: The prevalence is unknown. It could be 10, 20, or even 50%. How does this effect the confidence interval?

I would be so thankful if anybody could help me out. Any kind of information is much appreciated 🙂

submitted by /u/S_leGrand
[link] [comments]