Optimize the learning rate

# Optimize the learning rate Simple steps to find a good learning rate: 1. Identify a lower bound rate just before the loss stops decreasing. 2. Identify an upper bound rate just before training becomes unstable. 3. Generate an exponentially increasing list of losses from the lower to the upper bound. 4. Train one batch with each rate, starting from the lowest, and measure the loss after each rate increase. 5. Plot the exponent of the rate against the loss to find the optimal learning rate with the lowest loss. As an example, if you found `0.0001 = 1e-3` and `1.0 = 1e0` as the lower and upper boundaries, to establish the optimal the learning rate over 1000 batch steps: ```python n_batches = 1000 lr_exp = torch.linspace(-3.2, 0.2, n_batches) lr_s = 10 ** lr_exp losses = train_batched(X, Y, n_batches, lr_s) plt.plot(losses, lr_exp) ``` Example plot (x is the exponent of the learning rate; y is the loss): ![Plot of the exponent of the LR (x) against the loss (y).](LR_exp%20vs%20loss%20plot.png) Here, the optimal learning rate would be around `1e-0.75 = 0.178`