Testing means for my data set against the Pareto principle?

I am looking at an emarketing data set and am interested in the distribution of my revenue amongst customers who do make a purchase. As you may know, the Pareto principle would suggest that 80% of revenue comes from 20% of customers. For my data set (again, amongst only those who do buy) the top 20% of revenue-producers generate 60.1% of revenue ($297,119). One of my website channels (Google/Organic) is at 67.1%. Now I would like to test both 60.1% (total mean) and 67.1% against the 80% figure (and perhaps a lower one if that doesn’t work out) through a 90% confidence interval I was also thinking of running a paired t-test between the mean revenues of each of the major Source / Mediums of the top 20% to see if there is significant variance in the distribution of different source / revenues that would warrant further exploration. My BIGGEST concern here is violating assumptions/conditions. My data set is not really normal, a lot of the distribution are clustered around zero, etc. Can I run these tests? Do I have to convert the percentages into counts? For reference, here is my data set: https://docs.google.com/spreadsheets/d/1pIapZXgaScU44SFhwOaBcM1BBURvTB4p6wjSre5OP2Q/edit?usp=sharing submitted by /u/T00Human [link] [comments]

Published by

Nevin Manimala

Nevin Manimala is interested in blogging and finding new blogs https://nevinmanimala.com

Leave a Reply

Your email address will not be published. Required fields are marked *