So I know the best way to gain experience is through internships, but most places only take graduate students, so my next step was to look for research opportunities. I am an incoming second year and I looked up the faculty and their research interests for my school. I wanted to email them asking if they needed help with their research, but I have not heard nor understood any of their research interests. Because of this, I am assuming that I will not be able to get a research opportunity. What else can I do to gain experience?
I have two questions.
First, I’m looking for ways to get deeper engaged with data science for the next couple years. I’m taking undergraduate-level math in preparation for grad-level work. I have some coding skills but have never worked in tech. I’m wondering if there are any 6-8 week boot camps in data science that I could take that would allow me to get jobs in or related to the field that are flexible and short term – averaging perhaps 8-12 hours a week, something I could work in around my school schedule and main job. My hope is to earn some money while gaining experience and contacts in this industry prior to starting grad school, as I have to work anyway to support myself (I’m a post-bacc).
Even better would be a program that would make me a suitable candidate to gain research experience in preparation for a PhD, and ideas for where to seek out such opportunities.
Second, I’ve heard that math video games like DragonBox are great ways for younger kids to learn math, through algebra. Is there anything like that to teach statistics? I’d love it if there was some kind of stats/data-science educational video game I could play on my phone when I have 10 minutes to spare. Even if it was more like math tutoring software that was especially built to teach stats via a smartphone, that would be a great option. I just use it anyway and have it with me, so harnessing that time to learn rather than browse would be useful.
Thanks for any suggestions!
I have very little background in statistics (mainly intro courses and some basic statistics for life sciences courses) and am working with a project right now where we have implemented a program in a health organization and are looking at a count of hospitalizations prior- and post-implementation of this initiative. We are measuring hospitalizations only, and are doing so through a state-run database that health organizations all around our state are required to report to anytime a patient is hospitalized. This data should be as accurate as we can get and there are no specific time points.I have gone and parsed this data going back about two years but our initiative is only about a year old, so we are hoping to do an examination going back one year from the patient’s enrollment date and looking forward one year. The goal is to examine the effect it has had on reducing hospitalizations over the past couple of years, and I am looking for some advice. I wanted to consult people who were more knowledgeable in this area before I move forward with our analysis and want to make sure I am using the appropriate statistical tests for our purposes.
From what I understand: this is a pre-test, post-test study design and since there are paired observations before and after for the totality of patients enrolled on this program, would I be correct in assuming that I could use a paired sample t-test or would an ANOVA be a better analysis of this data? The confusion I have is that from what I remember in my undergrad, you couldn’t use paired t-tests for discrete variables as that was one of the assumptions that must be met. Would this group also in this case function as their own control? I have tried to do some independent research on this and it seems to be that the t-test is robust against assumption violations if our sample size is large enough (we have an n of about 300), but again I want to get some outside input before I move forward.
Thank you in advance for your help!
So I’ve been working on some distribution of sample mean problems with some students and was curious about solving for standard error of mean. Normally, it seems like my students are given the formula standard deviation/sqrt(n) to solve this (either using the population or sample standard deviation). But could you also compute this by taking lets say 100 samples of size 50, computing the mean, and then computing the standard deviation of the new data set of made of all the means? Yes, its a longer method and maybe unrealistic but I’m wondering if it’s theoretically sound. Thanks!
Do you know of any Statisticians with an MS in CS? I am going to start an MS in CS program in the Spring and I feel like I have always had very dynamic interests but my main interests have always been to find a career where I can utilize equal parts CS and Stats. For the last few years i’ve wanted to become a Data Scientist but I know positions are competitive(For real ML type positions) so I wanted to pick an MS in CS because all of my professors say that MS in CS will be much more versatile for me in the future if things were to go south with CS, DS, or Stats. I have a BS in Math and minor in CS and after my MS I will have a MS in CS with a bunch of stats classes(Bayesian Data analysis, Data mining, Machine Learning, theory of stats, 1&2, design of experiments, and Stochastic Processes) as my electives and hopefully my thesis will be about ML/NLP/or Neural Networks with a bunch of stats. Will this be enough to have a career as either a Biostatistician, Statistician, Data Scientist, or some other career with heavy CS and Math applications?
I always want to note if I hadn’t made it clear, I like Stats and CS very much equally and whenever I tried to decide between Stats, Biostats, or CS i always wanted to take classes in the other so I feel like CS will give me the most freedom for an ever changing career in both Stats or CS. Am I taking the wrong path?
I’m rather new* to the field of Statistics and Data and all that it entails. I came across something called Survivorship Bias a couple of days ago, and I was wondering if someone could give an ELI5 example on how that could be implemented when analyzing data. What I mean is, an example where the Survivorship Bias showed better or clearer results than the results collected (I’m guessing the S.B data would be extracted using the already collected data.) Any study or explanation would be appreciated.
If I’m posting in the wrong sub, please let me know.
*: I have never studied or worked on anything Data related until I started this Monitoring and Impact Evaluation job as a technical assistant, and even after a year of doing what is mostly regarded as data collection and data accuracy, I should be entering and learning about Data Analysis and Impact Evaluation soon, which is what I am doing on my own through online courses, so excuse me if what I say doesn’t make much sense.
I have data from a website where a specific advertising campaign happened a couple of years ago. What I want to do is to estimate how the signups on that website would have been without that big campaign.
In specific, I have the signups of every single day for the last 10 years and I have one event that happened 2 years ago. The time series is not linear or doesn’t have any fixed seasonality. However, the general trend is going downward. I don’t have any control groups. The campaign was applied to everyone at once.
Things I tried to do:
- Ran a forecasting model (Prophet) fitting the data up to the campaign event. Then predict with the model the next 2 years to see how the number of signups could have been today. The problem here is that the time series is affected by smaller events that had an impact (far far smaller than the big one) after the big campaign. As a result the Prophet model doesn’t take those into account.
- I tried CausalImpact, but since I didn’t have any control group, I used other time series as the estimators. Like the number of visitors, the number of logins etc. I got decent results with this, but I would like to experiment more and evaluate the CausalImpact prediction with another model.
Is there any intervention analysis I could do without control groups, but also taking into account the impact of events after the one we study?
I want to run the equivalent of a nested design except my factors are all crossed, so it is like a repeated measures anova but I have more than one factor repeated within the other.
So factor B is repeated within factor A
factor C is repeated within factor B
I see this as multi level repeated measures anova. I have read about split-plot designs, but I’m not sure if that is the right design as I’m having trouble understanding it.
Can someone confirm or guide me to a better design ?
Thank you very much
Hi all– I plan on living and going to school in the NYC-metro area (New Jersey, specifically), and I’m looking to get an idea of the the quality of the programs in the area and how they are viewed by employers. I also want to get an idea of the job market in general and how employers might view someone in my situation.
Specifically, I’m looking to get an MS in Statistics or Biostatistics. My goal is to get a PhD, so I’d like to know how job prospects are both with and without (in case the PhD plans fall through).
One big issue I have is that I’m a non-traditional student– my undergrad degree is about as far away from a math or statistics degree as you could possibly get, so I also am trying to gauge how much a non-traditional background will hurt me in the market. I obviously have the prereqs to get into a master’s (Calc 1-3, Linear, Stats 1) and also have a few additional courses I’ve taken for fun– Physics, Chem, Differential Equations (no STEM courses during my undergrad, so I dabbled afterwords). I have a 4.0 in these courses, but not from a great school. I will be taking real analysis in the fall and can afford another two additional courses prior to entering an MS program (any advice as to what I should take?). These classes will be from a better school.
My programming skills are entirely self-taught, and I have some research experience that involved cleaning and processing data and running machine learning algorithms, however I have not taken a “formal” CS class. I taught myself using CS50 on edx.org (which I’d recommend to anyone self-teaching because it’s a great resource). I know a small bit of C++ and Matlab but most of my experience is with R. I feel pretty comfortable with the concepts taught in a first semester CS class.
I also have no biology background (with the exception of the research I was involved in, which involved genomic data).
That’s where I’m at now, but I’d like to know where I’m going and how to get there. My love is with math and theory first and foremost and I want to do research so I know for certain I want a thesis based MS and not a course-based MS.
I’d like to know what you guys think of Rutgers New Brunswick and NJIT (New Jersey Institute of Technology). Also, what are other schools in the NJ/NYC area that I should look at?
Also, I noticed that Columbia graduates a huge number of students from their master’s programs in statistics. What’s up with that? Is it saturating the market at all? The school’s out of my price range regardless.
Now, for the job market– my target is in pharmaceuticals, but I’m pretty sure I’d be happy anywhere that I’d be involved in research. I’m looking for anyone familiar with how the market is for research-based jobs in the NJ/NYC area to give input as to how tough it will be to get employment once I graduate with an MS.
Also, do you see any stigma against career changers in the field?