8. Tune your models

# 8. Tune your models ## Preamble - This stage applies to statistical and machine learning projects only. - You will want to use as much data as possible for this step, especially as you move toward the end of fine-tuning. - As always, automate what you can. - Don’t tweak your model after measuring the generalization error: you would just start overfitting the test set. This is also known as the [repeat-testing anti-pattern](Repeat-testing.md). ## Steps 1. Fine-tune the hyperparameters using cross-validation: - Treat your data transformation choices as hyperparameters, especially when you are not sure about them (e.g., if you’re not sure whether to replace missing values with zeros or with the median value, or to just drop the rows). - Unless there are very few hyperparameter values to explore, prefer random search over grid search. If training is very long, you may prefer a Bayesian optimization approach (e.g., using Gaussian process priors, as described by [Jasper Snoek et al.](https://homl.info/134)[1](app01.xhtml#idm45983065032688)). - [Log all tuning efforts in detail](No%20model%20tuning%20logs.md) so it can be introspected and reflected. 2. Try ensemble methods. Combining your best models will often produce better performance than running them individually. 3. Once you are confident about your final model, measure its performance on the test set to estimate the generalization error. ## Pitfalls - Inefficient setups that prevent quick iteration. - Inefficient tooling and infrastructure. - Lack of model versioning. - No documentation of the model exploration. Next up is [9. Present your solution](9.%20Present%20your%20solution.md)