[Q] Deriving the gradient of softmax function?

This post might be better suited for r/learnmachinelearning but I thought I’d get more qualitative answers here.

In class we’re discussing building a linear classifier from scratch in python using softmax (link below.) The resource is pretty informative, however, it skips over the differentiation step, entirely, citing only the final result. Gradient descent is applied generating an approximately correct solution after several iterations.

Reportedly, the gradient is:

P(class == k) - indicator(y==k) 

I’d like to know how to differentiate the softmax function myself, specifically with respect to the matrix of w values and b values.

(multi class) linear classifier in python

submitted by /u/jbuddy_13
[link] [comments]

Published by

Nevin Manimala

Nevin Manimala is interested in blogging and finding new blogs https://nevinmanimala.com

Leave a Reply

Your email address will not be published. Required fields are marked *