Gini index (Gini coefficient) formula based on average objects ranking

Hi All,

Recently I have found very nice formula to calculate the Gini index particularly for a predictive model (this can be generalized easily in my opinion). The formula is really simple, it applies simple expected values. The article is in polish, but I will translate and pass the most important points.

Article link: https://mathspace.pl/matematyka/wskaznik-giniego-na-bazie-wartosci-oczekiwanej-tips-tricks-na-krzywych-czyli-ocena-jakosci-klasyfikacji-czesc-19/

I have not seen this formula before, so I assume this was proposed by Mariusz Gromada – blog post author.

The formula for Gini index

1. The object and the class

Let’s assume we variable y indicating class for the object x, where y is taking two values 0 and 1.

[; y(x)in{0,1} ;]

[; xin X ;]

y = 1 means x is in class positive, y = 0 means x is in class negative, X is object space (X is finite).

2. The model estimating class probability for the object x

Let’s assume we additionally have a model

[; p:Xto[0,1] ;]

Model p maps X onto continuous interval [0,1]. Interpretation of p is as follows

[; xin X ;]

For an object x, we have p(x) that estimates the probability of y(x) = 1.

Another way of thinking on p is to consider X and y as random variables, then

[; p(x)=p(y(x)=1|X=x)=p(y=1|x)=p(1|x) ;]

In the next part of the text I will be using the p(1|x) notation

[; p(1|x) ;]

[; p(0|x)=1-p(1|x) ;]

Additionally let’s define a-priori probabilities

[; N=#{xin X};]

[; pi_1=p(1)=frac{#{xin X~:~y(x) = 1}}{N} ;]

[; pi_0=p(0)=frac{#{xin X~:~y(x) = 0}}{N}=1-pi_0 ;]

3. The model strength

Gini index is a great measure of the model strength, as Gini index shows statistical dispersion. In case of above defined model the higher p(1|x) the higher share of y(x)=1, to lower p(1|x) the higher share of y(x)=0.

4. Sorting the X and getting the position

Let’s sort the X set by p(1|x) using the descending order

[; r:Xto{1,2,ldots,N} ;]

r(x) is the position of object x after sorting descending by p(1|x). Ranking normalization is provided by the function R(x) defined as follows.

[; R(x)=frac{r(x)}{N} ;]

[; R(x)in[0,1] ;]

R(x) is a kind of normalized position in a set, can be even interpreted as a random variable R.

5. Final Gini for p (two formulas!)

[; Gini(p)=frac{2times E(R|y=0)-1}{pi_1} ;]

[; Gini(p)=frac{1-2times E(R|y=1)}{pi_0} ;]

where E(R|y=0) is an average position of objects from class 0 and E(R|y=1) is an average position of object from class 1.

Very nice and very simple to calculate!

I hope you will like it.

Please let me know what other nice formulas for Gini index you know.

Best regards

submitted by /u/leroykegan
[link] [comments]

Published by

Nevin Manimala

Nevin Manimala is interested in blogging and finding new blogs https://nevinmanimala.com

Leave a Reply

Your email address will not be published. Required fields are marked *