Summary table of models + methods

Introduction

Throughout the course, we will go over several supervised and unsupervised machine learning models. This page summarizes the models.

Method	Strengths	Limitations	Example Use Cases	Implementation
[Simple Fill](https://en.wikipedia.org/wiki/Imputation_(statistics)#Mean_and_median_imputation)	- Simple and fast - Works well with small datasets	- May not handle complex data relationships - Sensitive to outliers	- Basic data analysis - Quick data cleaning	[Python](https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer)
[KNN Imputation](https://machinelearningmastery.com/knn-imputation-for-missing-values-in-machine-learning/)	- Can capture the relationships between features - Works well with moderately missing data	- Computationally intensive for large datasets - Sensitive to the choice of k	- Medical data analysis - Market research	[Python](https://scikit-learn.org/stable/modules/generated/sklearn.impute.KNNImputer.html)
[Soft Impute](https://en.wikipedia.org/wiki/Matrix_completion)	- Effective for matrix completion in large datasets - Works well with low-rank data	- Assumes low-rank data structure - Can be sensitive to hyperparameters	- Recommender systems - Large-scale data projects	[Python](https://github.com/iskandr/fancyimpute)
[Iterative Imputer](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/)	- Can model complex relationships - Suitable for multiple imputation	- Computationally expensive - Depends on the choice of model	- Complex datasets with multiple types of missing data	[Python](https://scikit-learn.org/stable/modules/generated/sklearn.impute.IterativeImputer.html)
[Iterative SVD](https://pubmed.ncbi.nlm.nih.gov/11395428/)	- Good for matrix completion with low-rank assumption - Handles larger datasets	- Sensitive to rank selection - Computationally demanding	- Image and video data processing - Large datasets with structure	[Python](https://github.com/iskandr/fancyimpute)
[Matrix Factorization](https://en.wikipedia.org/wiki/Matrix_decomposition)	- Useful for recommendation systems - Can handle large-scale problems	- Requires careful tuning - Not suitable for all types of data	- Recommendation engines - User preference analysis	[Python](https://github.com/iskandr/fancyimpute)
[Nuclear Norm Minimization](https://arxiv.org/abs/0805.4471)	- Theoretically strong for matrix completion - Finds the lowest rank solution	- Very computationally intensive - Impractical for very large datasets	- Research in theoretical data completion - Small to medium datasets	[Python](https://github.com/iskandr/fancyimpute)
[BiScaler](https://arxiv.org/abs/1410.2596)	- Normalizes data effectively - Often used as a preprocessing step	- Not an imputation method itself - Doesn't always converge	- Preprocessing for other imputation methods - Data normalization	[Python](https://github.com/iskandr/fancyimpute)

Model Type	Strengths	Limitations	Example Use Cases	Implementation
[Logistic Regression](https://en.wikipedia.org/wiki/Logistic_regression)	- Simple and interpretable - Fast to train	- Assumes linear boundaries - Not suitable for complex relationships	- Credit approval - Medical diagnosis	[Python](https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression)
[Decision Trees](https://en.wikipedia.org/wiki/Decision_tree_learning)	- Intuitive - Can model non-linear relationships	- Prone to overfitting - Sensitive to small changes in data	- Customer segmentation - Loan default prediction	[Python](https://scikit-learn.org/stable/modules/tree.html)
[Random Forest](https://en.wikipedia.org/wiki/Random_forest)	- Handles overfitting - Can model complex relationships	- Slower to train and predict - Black box model	- Fraud detection - Stock price movement prediction	[Python](https://scikit-learn.org/stable/modules/ensemble.html#forest)
[Support Vector Machines (SVM)](https://en.wikipedia.org/wiki/Support_vector_machine)	- Effective in high dimensional spaces - Works well with clear margin of separation	- Sensitive to kernel choice - Slow on large datasets	- Image classification - Handwriting recognition	[Python](https://scikit-learn.org/stable/modules/svm.html)
[K-Nearest Neighbors (KNN)](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)	- Simple and intuitive - No training phase	- Slow during query phase - Sensitive to irrelevant features and scale	- Product recommendation - Document classification	[Python](https://scikit-learn.org/stable/modules/neighbors.html)
[Neural Networks](https://en.wikipedia.org/wiki/Artificial_neural_network)	- Capable of approximating complex functions - Flexible architecture Trainable with backpropagation	- Can require a large number of parameters - Prone to overfitting on small data Training can be slow	- Pattern recognition - Basic image classification - Function approximation	[Python](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)
[Deep Learning](https://en.wikipedia.org/wiki/Deep_learning)	- Can model highly complex relationships - Excels with vast amounts of data State-of-the-art results in many domains	- Requires a lot of data Computationally intensive - Interpretability challenges	- Advanced image and speech recognition - Machine translation - Game playing (like AlphaGo)	[Python](https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html)
[Naive Bayes](https://en.wikipedia.org/wiki/Naive_Bayes_classifier)	- Fast - Works well with large feature sets	- Assumes feature independence - Not suitable for numerical input features	- Spam detection - Sentiment analysis	[Python](https://scikit-learn.org/stable/modules/naive_bayes.html)
[Gradient Boosting Machines (GBM)](https://en.wikipedia.org/wiki/Gradient_boosting)	- High performance - Handles non-linear relationships	- Prone to overfitting if not tuned - Slow to train	- Web search ranking - Ecology predictions	[Python](https://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting)
[Rule-Based Classification](https://en.wikipedia.org/wiki/Rule-based_machine_learning)	- Transparent and explainable - Easily updated and modified	- Manual rule creation can be tedious - May not capture complex relationships	- Expert systems - Business rule enforcement	[Python](https://www.geeksforgeeks.org/rule-based-classifier-machine-learning/)
[Bagging](https://en.wikipedia.org/wiki/Bootstrap_aggregating)	- Reduces variance - Parallelizable	- May not handle bias well	- Random Forest is a popular example	[Python](https://scikit-learn.org/stable/modules/ensemble.html#bagging-meta-estimator)
[Boosting](https://en.wikipedia.org/wiki/Boosting_(machine_learning))	- Reduces bias - Combines weak learners	- Sensitive to noisy data and outliers	- AdaBoost - Gradient Boosting	[Python](https://scikit-learn.org/stable/modules/ensemble.html#boosting)
[XGBoost](https://en.wikipedia.org/wiki/Xgboost)	- Scalable and efficient - Regularization	- Requires careful tuning - Can overfit if not used correctly	- Competitions on Kaggle - Retail prediction	[Python](https://xgboost.readthedocs.io/en/latest/)
[Linear Discriminant Analysis (LDA)](https://en.wikipedia.org/wiki/Linear_discriminant_analysis)	- Dimensionality reduction - Simple and interpretable	- Assumes Gaussian distributed data and equal class covariances	- Face recognition - Marketing segmentation	[Python](https://scikit-learn.org/stable/modules/lda_qda.html)
[Regularized Models (Shrinking)](https://en.wikipedia.org/wiki/Regularization_(mathematics))	- Prevents overfitting - Handles collinearity	- Requires parameter tuning - May result in loss of interpretability	- Ridge and Lasso regression	[Python](https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression-and-classification)
[Stacking](https://en.wikipedia.org/wiki/Ensemble_learning#Stacking)	- Combines multiple models - Can improve accuracy	- Increases model complexity - Risk of overfitting if base models are correlated	- Meta-modeling - Kaggle competitions	[Python](https://scikit-learn.org/stable/modules/ensemble.html#stacked-generalization)

Model Type	Strengths	Limitations	Example Use Cases	Implementation
[Linear Regression](https://en.wikipedia.org/wiki/Linear_regression)	- Simple and interpretable	- Assumes linear relationship - Sensitive to outliers	- Sales forecasting - Risk assessment	[Python](https://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares)
[Polynomial Regression](https://en.wikipedia.org/wiki/Polynomial_regression)	- Can model non-linear relationships	- Can overfit with high degrees	- Growth prediction - Non-linear trend modeling	[Python](https://scikit-learn.org/stable/modules/linear_model.html#polynomial-regression)
[Ridge Regression](https://en.wikipedia.org/wiki/Tikhonov_regularization)	- Prevents overfitting - Regularizes the model	- Does not perform feature selection	- High-dimensional data - Preventing overfitting	[Python](https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression-and-classification)
[Lasso Regression](https://en.wikipedia.org/wiki/Lasso_(statistics))	- Feature selection - Regularizes the model	- May exclude useful variables	- Feature selection - High-dimensional datasets	[Python](https://scikit-learn.org/stable/modules/linear_model.html#lasso)
[Elastic Net Regression](https://en.wikipedia.org/wiki/Elastic_net_regularization)	- Balance between Ridge and Lasso	- Requires tuning for mixing parameter	- High-dimensional datasets with correlated features	[Python](https://scikit-learn.org/stable/modules/linear_model.html#elastic-net)
[Quantile Regression](https://en.wikipedia.org/wiki/Quantile_regression)	- Models the median or other quantiles	- Less interpretable than ordinary regression	- Median house price prediction - Financial quantiles modeling	[Python](https://www.statsmodels.org/stable/quantile_regression.html)
[Support Vector Regression (SVR)](https://en.wikipedia.org/wiki/Support_vector_machine#Regression)	- Flexible - Can handle non-linear relationships	- Sensitive to kernel and hyperparameters	- Stock price prediction - Non-linear trend modeling	[Python](https://scikit-learn.org/stable/modules/svm.html#regression)
[Decision Tree Regression](https://en.wikipedia.org/wiki/Decision_tree_learning)	- Handles non-linear data - Interpretable	- Can overfit on noisy data	- Price prediction - Quality assessment	[Python](https://scikit-learn.org/stable/modules/tree.html#regression)
[Random Forest Regression](https://en.wikipedia.org/wiki/Random_forest)	- Handles large datasets - Reduces overfitting	- Requires more computational resources	- Large datasets - Environmental modeling	[Python](https://scikit-learn.org/stable/modules/ensemble.html#forest)
[Gradient Boosting Regression](https://en.wikipedia.org/wiki/Gradient_boosting)	- High performance - Can handle non-linear relationships	- Prone to overfitting if not tuned	- Web search ranking - Price prediction	[Python](https://scikit-learn.org/stable/modules/ensemble.html#gradient-boosting)

Model Type	Strengths	Limitations	Example Use Cases	Implementation
[K-Means Clustering](https://en.wikipedia.org/wiki/K-means_clustering)	- Simple and widely used - Fast for large datasets	- Sensitive to initial conditions - Requires specifying the number of clusters	- Market segmentation - Image compression	[Python](https://scikit-learn.org/stable/modules/clustering.html#k-means)
[Hierarchical Clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering)	- Doesn't require specifying the number of clusters - Produces a dendrogram	- May be computationally expensive for large datasets	- Taxonomies - Determining evolutionary relationships	[Python](https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering)
[DBSCAN (Density-Based Clustering)](https://en.wikipedia.org/wiki/DBSCAN)	- Can find arbitrarily shaped clusters - Doesn’t require specifying the number of clusters	- Sensitive to scale - Requires density parameters to be set	- Noise detection and anomaly detection	[Python](https://scikit-learn.org/stable/modules/clustering.html#dbscan)
[Agglomerative Clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering#Agglomerative_clustering_example)	- Variety of linkage criteria - Produces a hierarchy of clusters	- Not scalable for very large datasets	- Sociological hierarchies - Taxonomies	[Python](https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering)
[Mean Shift Clustering](https://en.wikipedia.org/wiki/Mean_shift)	- No need to specify number of clusters - Can find arbitrarily shaped clusters	- Computationally expensive - Bandwidth parameter selection is crucial	- Image analysis - Computer vision tasks	[Python](https://scikit-learn.org/stable/modules/clustering.html#mean-shift)
[Affinity Propagation](https://en.wikipedia.org/wiki/Affinity_propagation)	- Automatically determines the number of clusters - Good for data with lots of exemplars	- High computational complexity - Preference parameter can be difficult to choose	- Image recognition - Data with many similar exemplars	[Python](https://scikit-learn.org/stable/modules/clustering.html#affinity-propagation)
[Spectral Clustering](https://en.wikipedia.org/wiki/Spectral_clustering)	- Can capture complex cluster structures - Can be used with various affinity matrices	- Choice of affinity matrix is crucial - Can be computationally expensive	- Image and speech processing - Graph-based clustering	[Python](https://scikit-learn.org/stable/modules/clustering.html#spectral-clustering)

Method	Strengths	Limitations	Example Use Cases	Implementation
[PCA](https://en.wikipedia.org/wiki/Principal_component_analysis)	- Dimensionality reduction - Preserves variance	- Linear method - Not for categorical data	- Feature extraction - Data compression	[Python](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)
[t-SNE](https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding)	- Captures non-linear structures - Good for visualization	- Computationally expensive - Not for high-dimensional data	- Data visualization - Exploratory analysis	[Python](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html)
[Autoencoders](https://en.wikipedia.org/wiki/Autoencoder)	- Dimensionality reduction - Non-linear relationships	- Neural network knowledge - Computationally intensive	- Feature learning - Noise reduction	[Python](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)
[Isolation Forest](https://en.wikipedia.org/wiki/Isolation_forest)	- Effective for high-dimensional data - Fast and scalable	- Randomized - May miss some anomalies	- Fraud detection - Network security	[Python](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html)
[SVD](https://en.wikipedia.org/wiki/Singular_value_decomposition)	- Matrix factorization - Efficient for large datasets	- Assumes linear relationships - Sensitive to scaling	- Recommender systems - Latent semantic analysis	[Python](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html)
[ICA](https://en.wikipedia.org/wiki/Independent_component_analysis)	- Identifies independent components - Signal separation	- Non-Gaussian components - Sensitive to noise	- Blind signal separation - Feature extraction	[Python](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html)

Method	Strengths	Limitations	Example Use Cases	Implementation
[Apriori Algorithm](https://en.wikipedia.org/wiki/Apriori_algorithm)	- Well-known and widely used - Easy to understand and implement	- Can be slow on large datasets - Generates a large number of candidate sets	- Market basket analysis - Cross-marketing strategies	[Python](http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/)
[FP-Growth Algorithm](https://en.wikipedia.org/wiki/Association_rule_learning#FP-growth_algorithm)	- Faster than Apriori - Efficient for large datasets	- Memory intensive - Can be complex to implement	- Frequent itemset mining in large databases - Customer purchase patterns	[Python](http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/fpgrowth/)
[Eclat Algorithm](https://en.wikipedia.org/wiki/Eclat_algorithm)	- Faster than Apriori - Scalable and easy to parallelize	- Limited to binary attributes - Generates many candidate itemsets	- Market basket analysis - Binary classification tasks	[Python](https://github.com/tommyod/Efficient-Apriori)
[GSP (Generalized Sequential Pattern)](https://en.wikipedia.org/wiki/Sequential_Pattern_Mining#GSP_(Generalized_Sequential_Pattern)_Algorithm)	- Identifies sequential patterns - Flexible for various datasets	- Can be computationally expensive - Not as efficient for very large databases	- Customer purchase sequence analysis - Event sequence analysis	[Python](https://github.com/jacksonpradolima/gsp-py)
[RuleGrowth Algorithm](https://www.philippe-fournier-viger.com/spmf/rulegrowth.pdf)	- Efficient for mining sequential rules - Works well with sparse datasets	- Requires careful parameter setting - Less known and used than Apriori or FP-Growth	- Analyzing customer shopping sequences - Detecting patterns in web browsing data	[Python](https://pypi.org/project/spmf/)

Technique	Strengths	Limitations	Example Use Cases	Implementation
[Accuracy](https://en.wikipedia.org/wiki/Accuracy_and_precision#In_binary_classification)	- Simple and intuitive - Effective for balanced datasets	- Misleading for imbalanced datasets - Doesn't reflect true positives/negatives	- General classification problems - Comparing baseline models	[Python](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html)
[AUC-ROC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve)	- Effective for binary classification - Good for imbalanced datasets	- Can be overly optimistic in imbalanced data - Not threshold-specific	- Medical diagnosis classification - Fraud detection models	[Python](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html)
[Precision](https://en.wikipedia.org/wiki/Precision_and_recall#Precision)	- Focuses on positive class - Reduces false positives	- Ignores false negatives - Not useful alone in imbalanced datasets	- Spam detection - Content moderation systems	[Python](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html)
[Recall](https://en.wikipedia.org/wiki/Precision_and_recall#Recall)	- Identifies actual positives well - Minimizes false negatives	- Ignores false positives - Can be misleading if positives are rare	- Disease outbreak detection - Recall-focused tasks	[Python](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html)
[F1-Score](https://en.wikipedia.org/wiki/F-score)	- Balances precision and recall - Useful for imbalanced datasets	- May not reflect true model performance - Depends on balance of precision and recall	- Customer churn prediction - Sentiment analysis	[Python](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html)
[Cross-Validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics))	- Reduces overfitting - Provides robust model evaluation	- Computationally expensive - May not be ideal for very large datasets	- General model evaluation - Comparing multiple models	[Python](https://scikit-learn.org/stable/modules/cross_validation.html)
[The Validation Set Approach](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets)	- Simple and easy to implement - Good for initial model assessment	- Can lead to overfitting - Dependent on the split	- Quick model prototyping - Small datasets	[Python](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)
[Leave-One-Out Cross-Validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)#Leave-one-out_cross-validation)	- Very detailed - Each observation used for validation exactly once	- Computationally intensive - Not suitable for large datasets	- Small but rich datasets - Highly sensitive models	[Python](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.LeaveOneOut.html)
[k-Fold Cross-Validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)#k-fold_cross-validation)	- Balances computational cost and validation accuracy - Suitable for various data sizes	- Variability in results depending on how data is divided - Choice of 'k' can impact results	- Medium-sized datasets - Model selection	[Python](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html)
[The Bootstrap Method](https://en.wikipedia.org/wiki/Bootstrapping_(statistics))	- Good for estimating model accuracy - Effective for small datasets	- Results can be sensitive to outliers - May overestimate accuracy for small datasets	- Small or medium-sized datasets - Uncertainty estimation	[Python](https://scikit-learn.org/stable/modules/generated/sklearn.utils.resample.html)