[Q] Controlling for seasonality in hypothesis testing?

If I have a dataset with the following columns: month_of_year, is_cloudy, temperature.

I’m looking to see if there’s a significant difference in the temperature when it’s cloudy vs when it isn’t. However, my datapoints aren’t evenly distributed and I want to make sure that I have the same proportions of the month_of_year variable in each feature set. Because where I am is rarely cloudy, I have many more datapoints for non-cloudy days.

How would I go about preparing the data for this test? (I’m using python.)

My plan was to do the following:

  1. Get dummy variables for the months of the year.
  2. Get the proportion of the presence of the different dummy month variable.
  3. Sample the non-cloudy dataset to get the same proportions of the dummy variables.
  4. Run a z-test on the two datasets to see if the difference in temperature is significant.

submitted by /u/amusinghawk
[link] [comments]

Published by

Nevin Manimala

Nevin Manimala is interested in blogging and finding new blogs https://nevinmanimala.com

Leave a Reply

Your email address will not be published. Required fields are marked *