### Algorithmic Complexity Table
| Algorithm Name | Acronym | Training Complexity | Inference Complexity |
|-------------------------------|---------|---------------------------------------------|----------------------------------------------------|
| Convolutional Neural Network | CNN | $O(n \cdot k^2 \cdot d)$ | $O(n)$ |
| Decision Tree | | $O(nm\log n)$ | $O(\log n)$ |
| Facebook AI Similarity Search | FAISS | $O(n \cdot d_{model} + n \cdot d_{model} / k)$ | $O(\sqrt{n} \cdot d_{model} + m)$ |
| Feedforward Neural Network | FFNet | $O(W \cdot I \cdot H \cdot O \cdot n_{epochs})$ | $O(W \cdot I \cdot H \cdot O)$ |
| Gradient Boosting | | $O(tdx\log n)$ | $O(td)$ |
| k-Nearest Neighbor | kNN | No training | $O(n\log k)$ or $O(nk)$ |
| Long Short-Term Memory | LSTM | $O(T \cdot W \cdot n_{epochs})$ | $O(T \cdot W)$ |
| Locality-Sensitive Hashing | LSH | $O(nd_{model})$ | $O(\log n + k)$ |
| Multi-Head Attention Layer | MHA | $O(N^2 \cdot d_{model} + N \cdot d_{model}^2)$ | $O(N^2 \cdot d_{model} + N \cdot d_{model}^2)$ |
| Naive Bayes | | $O(nm)$ | $O(mc)$ |
| Recurrent Neural Network | RNN | $O(T \cdot W \cdot n_{epochs})$ | $O(T \cdot W)$ |
| Support Vector Machine | SVM | $O(n^2m)$ to $O(n^3m)$ | $O(vm)$ |
| t-Distributed Stochastic Neighbor Embedding | t-SNE | $O(n^2)$ | Not typically used for inference |
### Explanation of Variables
- **c**: Number of classes (for classification tasks).
- **d**: Depth of the convolutional layers or trees.
- **I, H, K, O**: Input size, hidden layer size, output size, and output dimension, respectively.
- **k**: Kernel/filter size in CNNs or number of neighbors for kNN.
- **n**: Number of data points or samples in the dataset.
- **N**: Sequence length for Multi-Head Attention.
- **τ (tau)**: Number of clusters searched during inference in Faiss.
- **v**: Number of support vectors (for SVMs).
- **W**: Total number of weights in a neural network.
- For RNNs: $W = I \cdot H + H^2 + H \cdot K$
- For LSTMs: Each gate (input, forget, output, cell state update) has weights, so $W = 4(I \cdot H + H^2 + H \cdot K)$
- **t**: Number of trees (e.g., in Random Forest).
- **d_model**: Embedding dimension in Multi-Head Attention or dimensionality of dense vectors in LSH/Faiss.
- **u**: Number of features considered at each split in Random Forests.
- **x**: Number of boosting rounds in Gradient Boosting algorithms like XGBoost.
- **T**: Sequence length (for RNNs/LSTMs).
### Model Notes
1. **Convolutional Neural Network (CNN):** Extracts spatial features using convolutional layers with kernels/filters. Training complexity depends on input size, kernel size, and depth; inference involves applying learned filters to input data.
2. **Decision Tree:** A non-parametric model that recursively splits data based on feature thresholds. Training scales with data size and features; inference is a tree traversal.
3. **Facebook AI Similarity Search (Faiss):** Combines clustering (IVF) and quantization (PQ) for scalable similarity search in large datasets. Training involves clustering vectors into partitions and compressing them; inference searches within relevant partitions using approximate distance computations.
4. **Feedforward Neural Network (FFNet):** A fully connected neural network trained via backpropagation. Training complexity depends on architecture and the number of epochs; inference involves only forward passes through the layers.
5. **Gradient Boosting (e.g., XGBoost):** Sequentially builds decision trees to minimize residual errors. Training involves gradient descent over boosting rounds; inference sums predictions from all boosted trees.
6. **k-Nearest Neighbors (kNN):** A lazy learning algorithm that predicts based on the majority class among nearest neighbors. There is no explicit training phase; inference requires searching for nearest neighbors in the dataset.
7. **Long Short-Term Memory (LSTM):** An extension of RNNs designed to handle long-term dependencies using gating mechanisms. Each gate introduces additional parameters, making training/inference four times more expensive than RNNs per time step.
8. **Locality-Sensitive Hashing (LSH):** An approximate nearest neighbor method that hashes similar items into the same bucket for efficient search. Training involves hashing vectors; inference searches within relevant buckets.
9. **Multi-Head Attention Layer:** A key component of Transformer models like BERT and GPT. Complexity arises from scaled dot-product attention ($N^2\cdot d_{model}$ for pairwise comparisons between tokens) and linear transformations ($N\cdot d_{model}^2$).
10. **Naive Bayes:** A generative model based on Bayes' theorem; assumes feature independence. Training involves calculating probabilities for each feature-class pair.
11. **Recurrent Neural Network (RNN):** Processes sequential data by maintaining hidden states across time steps. Training involves backpropagation through time (BPTT), scaling with sequence length ($T$), hidden size ($H$), and input/output sizes ($I/K$).
12. **Support Vector Machines (SVM):** A discriminative model that finds a hyperplane to separate classes. Training uses quadratic programming; inference depends on the number of support vectors. Linear kernels have only quadratic complexity.
13. **t-Distributed Stochastic Neighbor Embedding (t-SNE):** A dimensionality reduction technique for visualization by preserving local structure in high-dimensional data. It is computationally expensive and not used for inference.
14. **Gradient Boosting:** This method builds models sequentially, where each new model corrects errors made by previously trained models.