Algorithmic Complexity of ML Algos

### Algorithmic Complexity Table | Algorithm Name | Acronym | Training Complexity | Inference Complexity | |-------------------------------|---------|---------------------------------------------|----------------------------------------------------| | Convolutional Neural Network | CNN | $O(n \cdot k^2 \cdot d)$ | $O(n)$ | | Decision Tree | | $O(nm\log n)$ | $O(\log n)$ | | Facebook AI Similarity Search | FAISS | $O(n \cdot d_{model} + n \cdot d_{model} / k)$ | $O(\sqrt{n} \cdot d_{model} + m)$ | | Feedforward Neural Network | FFNet | $O(W \cdot I \cdot H \cdot O \cdot n_{epochs})$ | $O(W \cdot I \cdot H \cdot O)$ | | Gradient Boosting | | $O(tdx\log n)$ | $O(td)$ | | k-Nearest Neighbor | kNN | No training | $O(n\log k)$ or $O(nk)$ | | Long Short-Term Memory | LSTM | $O(T \cdot W \cdot n_{epochs})$ | $O(T \cdot W)$ | | Locality-Sensitive Hashing | LSH | $O(nd_{model})$ | $O(\log n + k)$ | | Multi-Head Attention Layer | MHA | $O(N^2 \cdot d_{model} + N \cdot d_{model}^2)$ | $O(N^2 \cdot d_{model} + N \cdot d_{model}^2)$ | | Naive Bayes | | $O(nm)$ | $O(mc)$ | | Recurrent Neural Network | RNN | $O(T \cdot W \cdot n_{epochs})$ | $O(T \cdot W)$ | | Support Vector Machine | SVM | $O(n^2m)$ to $O(n^3m)$ | $O(vm)$ | | t-Distributed Stochastic Neighbor Embedding | t-SNE | $O(n^2)$ | Not typically used for inference | ### Explanation of Variables - **c**: Number of classes (for classification tasks). - **d**: Depth of the convolutional layers or trees. - **I, H, K, O**: Input size, hidden layer size, output size, and output dimension, respectively. - **k**: Kernel/filter size in CNNs or number of neighbors for kNN. - **n**: Number of data points or samples in the dataset. - **N**: Sequence length for Multi-Head Attention. - **τ (tau)**: Number of clusters searched during inference in Faiss. - **v**: Number of support vectors (for SVMs). - **W**: Total number of weights in a neural network. - For RNNs: $W = I \cdot H + H^2 + H \cdot K$ - For LSTMs: Each gate (input, forget, output, cell state update) has weights, so $W = 4(I \cdot H + H^2 + H \cdot K)$ - **t**: Number of trees (e.g., in Random Forest). - **d_model**: Embedding dimension in Multi-Head Attention or dimensionality of dense vectors in LSH/Faiss. - **u**: Number of features considered at each split in Random Forests. - **x**: Number of boosting rounds in Gradient Boosting algorithms like XGBoost. - **T**: Sequence length (for RNNs/LSTMs). ### Model Notes 1. **Convolutional Neural Network (CNN):** Extracts spatial features using convolutional layers with kernels/filters. Training complexity depends on input size, kernel size, and depth; inference involves applying learned filters to input data. 2. **Decision Tree:** A non-parametric model that recursively splits data based on feature thresholds. Training scales with data size and features; inference is a tree traversal. 3. **Facebook AI Similarity Search (Faiss):** Combines clustering (IVF) and quantization (PQ) for scalable similarity search in large datasets. Training involves clustering vectors into partitions and compressing them; inference searches within relevant partitions using approximate distance computations. 4. **Feedforward Neural Network (FFNet):** A fully connected neural network trained via backpropagation. Training complexity depends on architecture and the number of epochs; inference involves only forward passes through the layers. 5. **Gradient Boosting (e.g., XGBoost):** Sequentially builds decision trees to minimize residual errors. Training involves gradient descent over boosting rounds; inference sums predictions from all boosted trees. 6. **k-Nearest Neighbors (kNN):** A lazy learning algorithm that predicts based on the majority class among nearest neighbors. There is no explicit training phase; inference requires searching for nearest neighbors in the dataset. 7. **Long Short-Term Memory (LSTM):** An extension of RNNs designed to handle long-term dependencies using gating mechanisms. Each gate introduces additional parameters, making training/inference four times more expensive than RNNs per time step. 8. **Locality-Sensitive Hashing (LSH):** An approximate nearest neighbor method that hashes similar items into the same bucket for efficient search. Training involves hashing vectors; inference searches within relevant buckets. 9. **Multi-Head Attention Layer:** A key component of Transformer models like BERT and GPT. Complexity arises from scaled dot-product attention ($N^2\cdot d_{model}$ for pairwise comparisons between tokens) and linear transformations ($N\cdot d_{model}^2$). 10. **Naive Bayes:** A generative model based on Bayes' theorem; assumes feature independence. Training involves calculating probabilities for each feature-class pair. 11. **Recurrent Neural Network (RNN):** Processes sequential data by maintaining hidden states across time steps. Training involves backpropagation through time (BPTT), scaling with sequence length ($T$), hidden size ($H$), and input/output sizes ($I/K$). 12. **Support Vector Machines (SVM):** A discriminative model that finds a hyperplane to separate classes. Training uses quadratic programming; inference depends on the number of support vectors. Linear kernels have only quadratic complexity. 13. **t-Distributed Stochastic Neighbor Embedding (t-SNE):** A dimensionality reduction technique for visualization by preserving local structure in high-dimensional data. It is computationally expensive and not used for inference. 14. **Gradient Boosting:** This method builds models sequentially, where each new model corrects errors made by previously trained models.