[Q] Power analysis question for percent reduction?

How would I find the required sample size for alpha = 0.05, power = 0.80 for the following situation? We are interested in the percent reduction of arterial volume before and after treatment in the same individual. There will be a placebo group and a treatment group. From previous literature, the placebo group should have 0% change and the treatment group is expected to see a 2% change with a group standard deviation of 0.5%. We are only interested in the percent reduction because different individuals will have different artery sizes so absolute differences wouldn’t make sense to compare.

When I put those numbers into Stata’s power analysis, it gives me that only 3 subjects are needed. This both makes sense and makes me weary because a 2% change is essentially 4 SDs away from 0% so that’s quite a large change but the 3 subjects needed is quite low. Can anyone shed some light on this?

submitted by /u/ButtholePlungerz
[link] [comments]

In an infinite amount of time, can a dice be rolled to show a number 6 one thousand times in a row?

We are having a debate at work.

One member of the office says that because rolling a dice 1000 times in a row to show a 6 is so unlikely, it can be considered as 0. Therefore, even in an infinite timeframe, 1000 6’s will never be thrown in a row. He argues this is due to the law of a large numbers etc…

Our argument is that because it is an infinite timeframe, no matter the unlikeliness of the probability, everything will happen and not only will 1000 6’s be thrown in a row, 1000 6’s will be thrown in a row an infinite number of times.

Please can someone show me a mathematical concept or proof that proves the answer in either direction.

Thank you

submitted by /u/Cubicks
[link] [comments]

Fun Topological Puzzles?

I’m designing a Dungeons and Dragons campaign which is going to take place in a surrealist fantasy world. Being a math major, I’d really like to incorporate some mathematical puzzles and riddles into the campaign. In particular I feel like Topology would be a really good area to borrow from, as one could quite easily design a room or an area with odd topological properties to add to the surrealist feel of the world, but would not necessarily require explicit knowledge of topology to understand. (Important, as my players are not mathematicians or math majors) Does anyone know of any interesting topological puzzles with semi-intuitive solutions that I might be able to adapt for use? Additionally, if you happen to have an idea from another area of mathematics that might be applicable or fun, feel free to share those as well.

submitted by /u/VFB1210
[link] [comments]

[Q] What is the difference between Generalised Cross Validation and K-Fold Cross Validation ?

Hey folks,

I just implemented a 5-fold Cross Validation to determine the optimal penalty value for a ridge regression. (Code Below). I am using the lm.ridge function in the library(MASS).

I double checked the results of my own 5 fold cross validation function with integrated generalised cross validation function in lm.ridge function. To my surprise the optimal penalty value are quite far from each other (difference of about 4.6).

It got me curious on why the results are so far from each other ? Can the difference in the optimal lambda parameter value be explained by the difference in the two methods?

# Rridge Regression set.seed(3) library(MASS) grid = 10^seq(10, -2, length = 100) # grid with lambda/penalty values ridge_res = matrix(NA,1,100) # adapt lm crossvalidaiton for ridge grid cross_val_ridge = function(data,k) { require(MASS) set.seed(1) # student number as seed cv_index = sample(rep(1:5, length = nrow(data)) , nrow(data)) cv_train_e = matrix(NA, k) # create empty matrix to store cv_errors in cv_test_e = matrix(NA, k) for ( i in 1:k) { cv_train = data[cv_index!=i,] cv_test = data[cv_index==i,] cv_lm = lm.ridge(MEDV ~ . , data = cv_train, lambda = grid[j]) # compute prediction by hand pred.ridge = coef(cv_lm)[1] + coef(cv_lm)[2]*cv_test[,1] + coef(cv_lm)[3]*cv_test[,2] + coef(cv_lm)[4]*cv_test[,3] + coef(cv_lm)[5]*cv_test[,4] + coef(cv_lm)[6]*cv_test[,5] + coef(cv_lm)[7]*cv_test[,6] + coef(cv_lm)[8]*cv_test[,7] + coef(cv_lm)[9]*cv_test[,8] + coef(cv_lm)[10]*cv_test[,9] + coef(cv_lm)[11]*cv_test[,10] + coef(cv_lm)[12]*cv_test[,11] + coef(cv_lm)[13]*cv_test[,12] + coef(cv_lm)[14]*cv_test[,13] #cv_train_e[i,] = mean(cv_lm$residuals^2) cv_test_e[i,] = mean((cv_test$MEDV - pred.ridge) ^ 2) } return(mean(cv_test_e)) } for (j in 1:100) { ridge_res[j] = cross_val_ridge(train, k=5) } which.min(colMeans(ridge_res)) grid[76] # optimal lambda value as per 5k-cv own method = 8.111308 # double check using generalized cv ridge = lm.ridge(MEDV ~ . , data = train, lambda = grid) which.min(ridge$GCV) grid[79] # optimal lambda value as per GCV3.511192 

submitted by /u/deniz_sen
[link] [comments]

[Q] What sample size is appropriate to detect a small failure rate?

I have a failure rate of 0.6% on a process month to month, standard deviation of 0.2% month to month. If we make a change to the process that we expect will reduce the failure rate to 0.3%, how many parts need to be processed to have a 95% confidence that we succeeded?

submitted by /u/I_ate_it_all
[link] [comments]