So I’m optimizing a non-convex objective by gradient descent, and I’m trying to figure out a good way to adaptively choose a step size.
Right now I’m using a fixed step size, but this ‘dumb’ approach has the advantage of not being as susceptible to being caught in local minima. I tried backtracking, and while backtracking ensures monotonic descent for smooth functions, I get further trapped in local minima.
I was thinking I could try backtracking except still permit a step as long as its <= 150%*currentValue or something like that, but I was curious if you guys had suggestions.
I assumed this was a common obvious problem people have, but my googling / google scholar’ing hasn’t been fruitful.