Step through Adam — watch first and second moment estimates adapt the learning rate per parameter, combining momentum and RMSProp.
Adaptive Moment Estimation
Adam keeps a running average of gradients (m = momentum) and squared gradients (v = RMSProp). Bias correction adjusts for the cold start at t=0.