| Method | Strengths | Limitations | Example Use Cases | Implementation |
|---|---|---|---|---|
| [Simple Fill](https://en.wikipedia.org/wiki/Imputation_(statistics)#Mean_and_median_imputation) | - Simple and fast - Works well with small datasets |
- May not handle complex data relationships - Sensitive to outliers |
- Basic data analysis - Quick data cleaning |
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer) |
| [KNN Imputation](https://machinelearningmastery.com/knn-imputation-for-missing-values-in-machine-learning/) | - Can capture the relationships between features - Works well with moderately missing data |
- Computationally intensive for large datasets - Sensitive to the choice of k |
- Medical data analysis - Market research |
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.impute.KNNImputer.html) |
| [Soft Impute](https://en.wikipedia.org/wiki/Matrix_completion) | - Effective for matrix completion in large datasets - Works well with low-rank data |
- Assumes low-rank data structure - Can be sensitive to hyperparameters |
- Recommender systems - Large-scale data projects |
[Python](https://github.com/iskandr/fancyimpute) |
| [Iterative Imputer](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/) | - Can model complex relationships - Suitable for multiple imputation |
- Computationally expensive - Depends on the choice of model |
- Complex datasets with multiple types of missing data | [Python](https://scikit-learn.org/stable/modules/generated/sklearn.impute.IterativeImputer.html) |
| [Iterative SVD](https://pubmed.ncbi.nlm.nih.gov/11395428/) | - Good for matrix completion with low-rank assumption - Handles larger datasets |
- Sensitive to rank selection - Computationally demanding |
- Image and video data processing - Large datasets with structure |
[Python](https://github.com/iskandr/fancyimpute) |
| [Matrix Factorization](https://en.wikipedia.org/wiki/Matrix_decomposition) | - Useful for recommendation systems - Can handle large-scale problems |
- Requires careful tuning - Not suitable for all types of data |
- Recommendation engines - User preference analysis |
[Python](https://github.com/iskandr/fancyimpute) |
| [Nuclear Norm Minimization](https://arxiv.org/abs/0805.4471) | - Theoretically strong for matrix completion - Finds the lowest rank solution |
- Very computationally intensive - Impractical for very large datasets |
- Research in theoretical data completion - Small to medium datasets |
[Python](https://github.com/iskandr/fancyimpute) |
| [BiScaler](https://arxiv.org/abs/1410.2596) | - Normalizes data effectively - Often used as a preprocessing step |
- Not an imputation method itself - Doesn't always converge |
- Preprocessing for other imputation methods - Data normalization |
[Python](https://github.com/iskandr/fancyimpute) |
Summary table of models + methods
Introduction
Throughout the course, we will go over several supervised and unsupervised machine learning models. This page summarizes the models.
| Model Type | Strengths | Limitations | Example Use Cases | Implementation |
|---|---|---|---|---|
| [Logistic Regression](https://en.wikipedia.org/wiki/Logistic_regression) | - Simple and interpretable - Fast to train |
- Assumes linear boundaries - Not suitable for complex relationships |
- Credit approval - Medical diagnosis |
[Python](https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression) |
| [Decision Trees](https://en.wikipedia.org/wiki/Decision_tree_learning) | - Intuitive - Can model non-linear relationships |
- Prone to overfitting - Sensitive to small changes in data |
- Customer segmentation - Loan default prediction |
[Python](https://scikit-learn.org/stable/modules/tree.html) |
| [Random Forest](https://en.wikipedia.org/wiki/Random_forest) | - Handles overfitting - Can model complex relationships |
- Slower to train and predict - Black box model |
- Fraud detection - Stock price movement prediction |
[Python](https://scikit-learn.org/stable/modules/ensemble.html#forest) |
| [Support Vector Machines (SVM)](https://en.wikipedia.org/wiki/Support_vector_machine) | - Effective in high dimensional spaces - Works well with clear margin of separation |
- Sensitive to kernel choice - Slow on large datasets |
- Image classification - Handwriting recognition |
[Python](https://scikit-learn.org/stable/modules/svm.html) |
| [K-Nearest Neighbors (KNN)](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) | - Simple and intuitive - No training phase |
- Slow during query phase - Sensitive to irrelevant features and scale |
- Product recommendation - Document classification |
[Python](https://scikit-learn.org/stable/modules/neighbors.html) |
| [Neural Networks](https://en.wikipedia.org/wiki/Artificial_neural_network) | - Capable of approximating complex functions - Flexible architecture Trainable with backpropagation |
- Can require a large number of parameters - Prone to overfitting on small data Training can be slow |
- Pattern recognition - Basic image classification - Function approximation |
[Python](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html) |
| [Deep Learning](https://en.wikipedia.org/wiki/Deep_learning) | - Can model highly complex relationships - Excels with vast amounts of data State-of-the-art results in many domains |
- Requires a lot of data Computationally intensive - Interpretability challenges |
- Advanced image and speech recognition - Machine translation - Game playing (like AlphaGo) |
[Python](https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html) |
| [Naive Bayes](https://en.wikipedia.org/wiki/Naive_Bayes_classifier) | - Fast - Works well with large feature sets |
- Assumes feature independence - Not suitable for numerical input features |
- Spam detection - Sentiment analysis |
[Python](https://scikit-learn.org/stable/modules/naive_bayes.html) |
| [Gradient Boosting Machines (GBM)](https://en.wikipedia.org/wiki/Gradient_boosting) | - High performance - Handles non-linear relationships |
- Prone to overfitting if not tuned - Slow to train |
- Web search ranking - Ecology predictions |
[Python](https://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting) |
| [Rule-Based Classification](https://en.wikipedia.org/wiki/Rule-based_machine_learning) | - Transparent and explainable - Easily updated and modified |
- Manual rule creation can be tedious - May not capture complex relationships |
- Expert systems - Business rule enforcement |
[Python](https://www.geeksforgeeks.org/rule-based-classifier-machine-learning/) |
| [Bagging](https://en.wikipedia.org/wiki/Bootstrap_aggregating) | - Reduces variance - Parallelizable |
- May not handle bias well | - Random Forest is a popular example | [Python](https://scikit-learn.org/stable/modules/ensemble.html#bagging-meta-estimator) |
| [Boosting](https://en.wikipedia.org/wiki/Boosting_(machine_learning)) | - Reduces bias - Combines weak learners |
- Sensitive to noisy data and outliers | - AdaBoost - Gradient Boosting |
[Python](https://scikit-learn.org/stable/modules/ensemble.html#boosting) |
| [XGBoost](https://en.wikipedia.org/wiki/Xgboost) | - Scalable and efficient - Regularization |
- Requires careful tuning - Can overfit if not used correctly |
- Competitions on Kaggle - Retail prediction |
[Python](https://xgboost.readthedocs.io/en/latest/) |
| [Linear Discriminant Analysis (LDA)](https://en.wikipedia.org/wiki/Linear_discriminant_analysis) | - Dimensionality reduction - Simple and interpretable |
- Assumes Gaussian distributed data and equal class covariances | - Face recognition - Marketing segmentation |
[Python](https://scikit-learn.org/stable/modules/lda_qda.html) |
| [Regularized Models (Shrinking)](https://en.wikipedia.org/wiki/Regularization_(mathematics)) | - Prevents overfitting - Handles collinearity |
- Requires parameter tuning - May result in loss of interpretability |
- Ridge and Lasso regression | [Python](https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression-and-classification) |
| [Stacking](https://en.wikipedia.org/wiki/Ensemble_learning#Stacking) | - Combines multiple models - Can improve accuracy |
- Increases model complexity - Risk of overfitting if base models are correlated |
- Meta-modeling - Kaggle competitions |
[Python](https://scikit-learn.org/stable/modules/ensemble.html#stacked-generalization) |
| Model Type | Strengths | Limitations | Example Use Cases | Implementation |
|---|---|---|---|---|
| [Linear Regression](https://en.wikipedia.org/wiki/Linear_regression) | - Simple and interpretable | - Assumes linear relationship - Sensitive to outliers |
- Sales forecasting - Risk assessment |
[Python](https://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares) |
| [Polynomial Regression](https://en.wikipedia.org/wiki/Polynomial_regression) | - Can model non-linear relationships | - Can overfit with high degrees | - Growth prediction - Non-linear trend modeling |
[Python](https://scikit-learn.org/stable/modules/linear_model.html#polynomial-regression) |
| [Ridge Regression](https://en.wikipedia.org/wiki/Tikhonov_regularization) | - Prevents overfitting - Regularizes the model |
- Does not perform feature selection | - High-dimensional data - Preventing overfitting |
[Python](https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression-and-classification) |
| [Lasso Regression](https://en.wikipedia.org/wiki/Lasso_(statistics)) | - Feature selection - Regularizes the model |
- May exclude useful variables | - Feature selection - High-dimensional datasets |
[Python](https://scikit-learn.org/stable/modules/linear_model.html#lasso) |
| [Elastic Net Regression](https://en.wikipedia.org/wiki/Elastic_net_regularization) | - Balance between Ridge and Lasso | - Requires tuning for mixing parameter | - High-dimensional datasets with correlated features | [Python](https://scikit-learn.org/stable/modules/linear_model.html#elastic-net) |
| [Quantile Regression](https://en.wikipedia.org/wiki/Quantile_regression) | - Models the median or other quantiles | - Less interpretable than ordinary regression | - Median house price prediction - Financial quantiles modeling |
[Python](https://www.statsmodels.org/stable/quantile_regression.html) |
| [Support Vector Regression (SVR)](https://en.wikipedia.org/wiki/Support_vector_machine#Regression) | - Flexible - Can handle non-linear relationships |
- Sensitive to kernel and hyperparameters | - Stock price prediction - Non-linear trend modeling |
[Python](https://scikit-learn.org/stable/modules/svm.html#regression) |
| [Decision Tree Regression](https://en.wikipedia.org/wiki/Decision_tree_learning) | - Handles non-linear data - Interpretable |
- Can overfit on noisy data | - Price prediction - Quality assessment |
[Python](https://scikit-learn.org/stable/modules/tree.html#regression) |
| [Random Forest Regression](https://en.wikipedia.org/wiki/Random_forest) | - Handles large datasets - Reduces overfitting |
- Requires more computational resources | - Large datasets - Environmental modeling |
[Python](https://scikit-learn.org/stable/modules/ensemble.html#forest) |
| [Gradient Boosting Regression](https://en.wikipedia.org/wiki/Gradient_boosting) | - High performance - Can handle non-linear relationships |
- Prone to overfitting if not tuned | - Web search ranking - Price prediction |
[Python](https://scikit-learn.org/stable/modules/ensemble.html#gradient-boosting) |
| Model Type | Strengths | Limitations | Example Use Cases | Implementation |
|---|---|---|---|---|
| [K-Means Clustering](https://en.wikipedia.org/wiki/K-means_clustering) | - Simple and widely used - Fast for large datasets |
- Sensitive to initial conditions - Requires specifying the number of clusters |
- Market segmentation - Image compression |
[Python](https://scikit-learn.org/stable/modules/clustering.html#k-means) |
| [Hierarchical Clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering) | - Doesn't require specifying the number of clusters - Produces a dendrogram |
- May be computationally expensive for large datasets | - Taxonomies - Determining evolutionary relationships |
[Python](https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering) |
| [DBSCAN (Density-Based Clustering)](https://en.wikipedia.org/wiki/DBSCAN) | - Can find arbitrarily shaped clusters - Doesn’t require specifying the number of clusters |
- Sensitive to scale - Requires density parameters to be set |
- Noise detection and anomaly detection | [Python](https://scikit-learn.org/stable/modules/clustering.html#dbscan) |
| [Agglomerative Clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering#Agglomerative_clustering_example) | - Variety of linkage criteria - Produces a hierarchy of clusters |
- Not scalable for very large datasets | - Sociological hierarchies - Taxonomies |
[Python](https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering) |
| [Mean Shift Clustering](https://en.wikipedia.org/wiki/Mean_shift) | - No need to specify number of clusters - Can find arbitrarily shaped clusters |
- Computationally expensive - Bandwidth parameter selection is crucial |
- Image analysis - Computer vision tasks |
[Python](https://scikit-learn.org/stable/modules/clustering.html#mean-shift) |
| [Affinity Propagation](https://en.wikipedia.org/wiki/Affinity_propagation) | - Automatically determines the number of clusters - Good for data with lots of exemplars |
- High computational complexity - Preference parameter can be difficult to choose |
- Image recognition - Data with many similar exemplars |
[Python](https://scikit-learn.org/stable/modules/clustering.html#affinity-propagation) |
| [Spectral Clustering](https://en.wikipedia.org/wiki/Spectral_clustering) | - Can capture complex cluster structures - Can be used with various affinity matrices |
- Choice of affinity matrix is crucial - Can be computationally expensive |
- Image and speech processing - Graph-based clustering |
[Python](https://scikit-learn.org/stable/modules/clustering.html#spectral-clustering) |
| Method | Strengths | Limitations | Example Use Cases | Implementation |
|---|---|---|---|---|
| [PCA](https://en.wikipedia.org/wiki/Principal_component_analysis) | - Dimensionality reduction - Preserves variance |
- Linear method - Not for categorical data |
- Feature extraction - Data compression |
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) |
| [t-SNE](https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding) | - Captures non-linear structures - Good for visualization |
- Computationally expensive - Not for high-dimensional data |
- Data visualization - Exploratory analysis |
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) |
| [Autoencoders](https://en.wikipedia.org/wiki/Autoencoder) | - Dimensionality reduction - Non-linear relationships |
- Neural network knowledge - Computationally intensive |
- Feature learning - Noise reduction |
[Python](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) |
| [Isolation Forest](https://en.wikipedia.org/wiki/Isolation_forest) | - Effective for high-dimensional data - Fast and scalable |
- Randomized - May miss some anomalies |
- Fraud detection - Network security |
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html) |
| [SVD](https://en.wikipedia.org/wiki/Singular_value_decomposition) | - Matrix factorization - Efficient for large datasets |
- Assumes linear relationships - Sensitive to scaling |
- Recommender systems - Latent semantic analysis |
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html) |
| [ICA](https://en.wikipedia.org/wiki/Independent_component_analysis) | - Identifies independent components - Signal separation |
- Non-Gaussian components - Sensitive to noise |
- Blind signal separation - Feature extraction |
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html) |
| Method | Strengths | Limitations | Example Use Cases | Implementation |
|---|---|---|---|---|
| [Apriori Algorithm](https://en.wikipedia.org/wiki/Apriori_algorithm) | - Well-known and widely used - Easy to understand and implement |
- Can be slow on large datasets - Generates a large number of candidate sets |
- Market basket analysis - Cross-marketing strategies |
[Python](http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/) |
| [FP-Growth Algorithm](https://en.wikipedia.org/wiki/Association_rule_learning#FP-growth_algorithm) | - Faster than Apriori - Efficient for large datasets |
- Memory intensive - Can be complex to implement |
- Frequent itemset mining in large databases - Customer purchase patterns |
[Python](http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/fpgrowth/) |
| [Eclat Algorithm](https://en.wikipedia.org/wiki/Eclat_algorithm) | - Faster than Apriori - Scalable and easy to parallelize |
- Limited to binary attributes - Generates many candidate itemsets |
- Market basket analysis - Binary classification tasks |
[Python](https://github.com/tommyod/Efficient-Apriori) |
| [GSP (Generalized Sequential Pattern)](https://en.wikipedia.org/wiki/Sequential_Pattern_Mining#GSP_(Generalized_Sequential_Pattern)_Algorithm) | - Identifies sequential patterns - Flexible for various datasets |
- Can be computationally expensive - Not as efficient for very large databases |
- Customer purchase sequence analysis - Event sequence analysis |
[Python](https://github.com/jacksonpradolima/gsp-py) |
| [RuleGrowth Algorithm](https://www.philippe-fournier-viger.com/spmf/rulegrowth.pdf) | - Efficient for mining sequential rules - Works well with sparse datasets |
- Requires careful parameter setting - Less known and used than Apriori or FP-Growth |
- Analyzing customer shopping sequences - Detecting patterns in web browsing data |
[Python](https://pypi.org/project/spmf/) |
| Technique | Strengths | Limitations | Example Use Cases | Implementation |
|---|---|---|---|---|
| [Accuracy](https://en.wikipedia.org/wiki/Accuracy_and_precision#In_binary_classification) | - Simple and intuitive - Effective for balanced datasets |
- Misleading for imbalanced datasets - Doesn't reflect true positives/negatives |
- General classification problems - Comparing baseline models |
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) |
| [AUC-ROC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve) | - Effective for binary classification - Good for imbalanced datasets |
- Can be overly optimistic in imbalanced data - Not threshold-specific |
- Medical diagnosis classification - Fraud detection models |
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html) |
| [Precision](https://en.wikipedia.org/wiki/Precision_and_recall#Precision) | - Focuses on positive class - Reduces false positives |
- Ignores false negatives - Not useful alone in imbalanced datasets |
- Spam detection - Content moderation systems |
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html) |
| [Recall](https://en.wikipedia.org/wiki/Precision_and_recall#Recall) | - Identifies actual positives well - Minimizes false negatives |
- Ignores false positives - Can be misleading if positives are rare |
- Disease outbreak detection - Recall-focused tasks |
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html) |
| [F1-Score](https://en.wikipedia.org/wiki/F-score) | - Balances precision and recall - Useful for imbalanced datasets |
- May not reflect true model performance - Depends on balance of precision and recall |
- Customer churn prediction - Sentiment analysis |
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) |
| [Cross-Validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)) | - Reduces overfitting - Provides robust model evaluation |
- Computationally expensive - May not be ideal for very large datasets |
- General model evaluation - Comparing multiple models |
[Python](https://scikit-learn.org/stable/modules/cross_validation.html) |
| [The Validation Set Approach](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets) | - Simple and easy to implement - Good for initial model assessment |
- Can lead to overfitting - Dependent on the split |
- Quick model prototyping - Small datasets |
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) |
| [Leave-One-Out Cross-Validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)#Leave-one-out_cross-validation) | - Very detailed - Each observation used for validation exactly once |
- Computationally intensive - Not suitable for large datasets |
- Small but rich datasets - Highly sensitive models |
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.LeaveOneOut.html) |
| [k-Fold Cross-Validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)#k-fold_cross-validation) | - Balances computational cost and validation accuracy - Suitable for various data sizes |
- Variability in results depending on how data is divided - Choice of 'k' can impact results |
- Medium-sized datasets - Model selection |
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html) |
| [The Bootstrap Method](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)) | - Good for estimating model accuracy - Effective for small datasets |
- Results can be sensitive to outliers - May overestimate accuracy for small datasets |
- Small or medium-sized datasets - Uncertainty estimation |
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.utils.resample.html) |