Machine Learning Interview Questions

# Machine Learning Interview Questions ## When deploying models to production, what should you take care of and think about? Ensure the candidate has experience taking models to production. They should be thinking of things like making deployments easy (one-click), monitoring and alerting, tracking model usage (incl. A/B testing) and errors/misses, providing evidence for the model decisions to the user, allowing users to provide feedback, building automated feature/data pipelines, integration in human workflows, etc. ## How do you compare two objects for value equality in Python, Java, C#, C++, or JavaScript? In **Java** or **Python**, they should be able to identify the need to jointly override the hashing method (`hashCode` in Java, `__hash__` in Python) when overriding the equality check method. (Probe: What do you need to worry about when you override the equality check methods that Python (`__eq__`) or Java provide (`equals`)?) For **Python**, they could also suggest to use a `dataclass` instead. (Java has open "value class" proposals.) In **C#,** you should implement your objects as `record` types ([ref](https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/statements-expressions-operators/how-to-define-value-equality-for-a-type)). Otherwise, a value class can be implemented by implementing the `IEquatable<T>` interface and overriding `Equals(Object)`. Like Python or Java, they should recall to override `GetHashCode()`. For **C++** classes don't have default hash code methods. Therefore, you can simply override the various operators you want to handle, such as `bool operator==(const T& other) const` to make each value of your class [equality comparable](https://en.cppreference.com/w/cpp/named_req/EqualityComparable). In **JavaScript**, how would you go about comparing objects and what matters? You might explore if they know the difference between coercive equality (double equals) and strict equality (triple equals). Objects could then be compared by implementing a deep equality check. They might suggest some library for this task, like Lodash or jsclass. Candidates who cannot answer this question with ease for at least one language are a red flag for tasks involving programming. Ensure the candidate has a basic understanding of at least one of the programming languages. Use the question as a springboard to probe if the candidate understands more complex programming techniques, such as polymorphism, dependency injection, or structural subtyping. ## What are the most significant data science-specific challenges you must worry about when running ML projects? A proper DS should be able to name at least a few things, like data quality & quantity, overfitting & underfitting, data drift & sparsity, the need for normalizing data, or the NFL theorem. ## Can you discuss any two ML models and compare them against each other? Why would you use one over the other, and in which contexts? Look for the ability to describe more than one model (like knowing only about neural nets), and understanding them in a meaningful, comparative way (RF vs. Gradient Boosting, Neural Nets vs. SVMs, etc.) ## What would you do to investigate lacking model performance and pin-point the most effective way of improving it? Scenario: You have built some kind of model. You have verified that the data quality and quantity itself is sufficient, even if you would love to have way more and with even better quality, but cannot afford to keep annotating blindly. However, the model is under-performing. What next? Look for the ability to identify things like - stratifying the inputs to understand certain areas or sources of the data - ensuring the input data been properly preprocessed and normalized - looking at the data using confidence-prioritized reviews of FPs and FNs to understand the error sources - ensuring that the training process is stable and reproducible - figuring out if the hyper-parameters have been optimally set