
**Bigger models and longer training _don’t necessarily overfit the way classical theory predicts_.**
Once a model passes the “**interpolation threshold**” (where it can fit the training data perfectly), increasing its size can actually _reduce_ test error again—leading to better generalization.
### In practice
1. **You often shouldn’t stop at the point of minimal classical bias–variance risk.**
Modern deep networks frequently generalize _better_ when you make them larger than what would normally be considered the “right” capacity.
2. **Overparameterization is not only safe—it’s often beneficial.**
Very large models, despite having more parameters than training samples, tend to land in smoother, more stable solutions that generalize better.
3. **Regularization and optimization matter more than model size alone.**
SGD, weight decay, early stopping, etc., help guide the model toward the “good” part of double descent.
4. **Scaling laws replace classical capacity control.**
In practice, practitioners keep scaling model size and data because performance tends to keep improving.
Source: [Reconciling modern machine learning practice and the bias-variance trade-off](https://arxiv.org/abs/1812.11118)