Summary table of models + methods

Introduction

Throughout the course, we will go over several supervised and unsupervised machine learning models. This page summarizes the models.

Method Strengths Limitations Example Use Cases Implementation
[Simple Fill](https://en.wikipedia.org/wiki/Imputation_(statistics)#Mean_and_median_imputation) - Simple and fast
- Works well with small datasets
- May not handle complex data relationships
- Sensitive to outliers
- Basic data analysis
- Quick data cleaning
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer)
[KNN Imputation](https://machinelearningmastery.com/knn-imputation-for-missing-values-in-machine-learning/) - Can capture the relationships between features
- Works well with moderately missing data
- Computationally intensive for large datasets
- Sensitive to the choice of k
- Medical data analysis
- Market research
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.impute.KNNImputer.html)
[Soft Impute](https://en.wikipedia.org/wiki/Matrix_completion) - Effective for matrix completion in large datasets
- Works well with low-rank data
- Assumes low-rank data structure
- Can be sensitive to hyperparameters
- Recommender systems
- Large-scale data projects
[Python](https://github.com/iskandr/fancyimpute)
[Iterative Imputer](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/) - Can model complex relationships
- Suitable for multiple imputation
- Computationally expensive
- Depends on the choice of model
- Complex datasets with multiple types of missing data [Python](https://scikit-learn.org/stable/modules/generated/sklearn.impute.IterativeImputer.html)
[Iterative SVD](https://pubmed.ncbi.nlm.nih.gov/11395428/) - Good for matrix completion with low-rank assumption
- Handles larger datasets
- Sensitive to rank selection
- Computationally demanding
- Image and video data processing
- Large datasets with structure
[Python](https://github.com/iskandr/fancyimpute)
[Matrix Factorization](https://en.wikipedia.org/wiki/Matrix_decomposition) - Useful for recommendation systems
- Can handle large-scale problems
- Requires careful tuning
- Not suitable for all types of data
- Recommendation engines
- User preference analysis
[Python](https://github.com/iskandr/fancyimpute)
[Nuclear Norm Minimization](https://arxiv.org/abs/0805.4471) - Theoretically strong for matrix completion
- Finds the lowest rank solution
- Very computationally intensive
- Impractical for very large datasets
- Research in theoretical data completion
- Small to medium datasets
[Python](https://github.com/iskandr/fancyimpute)
[BiScaler](https://arxiv.org/abs/1410.2596) - Normalizes data effectively
- Often used as a preprocessing step
- Not an imputation method itself
- Doesn't always converge
- Preprocessing for other imputation methods
- Data normalization
[Python](https://github.com/iskandr/fancyimpute)
Model Type Strengths Limitations Example Use Cases Implementation
[Logistic Regression](https://en.wikipedia.org/wiki/Logistic_regression) - Simple and interpretable
- Fast to train
- Assumes linear boundaries
- Not suitable for complex relationships
- Credit approval
- Medical diagnosis
[Python](https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression)
[Decision Trees](https://en.wikipedia.org/wiki/Decision_tree_learning) - Intuitive
- Can model non-linear relationships
- Prone to overfitting
- Sensitive to small changes in data
- Customer segmentation
- Loan default prediction
[Python](https://scikit-learn.org/stable/modules/tree.html)
[Random Forest](https://en.wikipedia.org/wiki/Random_forest) - Handles overfitting
- Can model complex relationships
- Slower to train and predict
- Black box model
- Fraud detection
- Stock price movement prediction
[Python](https://scikit-learn.org/stable/modules/ensemble.html#forest)
[Support Vector Machines (SVM)](https://en.wikipedia.org/wiki/Support_vector_machine) - Effective in high dimensional spaces
- Works well with clear margin of separation
- Sensitive to kernel choice
- Slow on large datasets
- Image classification
- Handwriting recognition
[Python](https://scikit-learn.org/stable/modules/svm.html)
[K-Nearest Neighbors (KNN)](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) - Simple and intuitive
- No training phase
- Slow during query phase
- Sensitive to irrelevant features and scale
- Product recommendation
- Document classification
[Python](https://scikit-learn.org/stable/modules/neighbors.html)
[Neural Networks](https://en.wikipedia.org/wiki/Artificial_neural_network) - Capable of approximating complex functions
- Flexible architecture
Trainable with backpropagation
- Can require a large number of parameters
- Prone to overfitting on small data
Training can be slow
- Pattern recognition
- Basic image classification
- Function approximation
[Python](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)
[Deep Learning](https://en.wikipedia.org/wiki/Deep_learning) - Can model highly complex relationships
- Excels with vast amounts of data
State-of-the-art results in many domains
- Requires a lot of data
Computationally intensive
- Interpretability challenges
- Advanced image and speech recognition
- Machine translation
- Game playing (like AlphaGo)
[Python](https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html)
[Naive Bayes](https://en.wikipedia.org/wiki/Naive_Bayes_classifier) - Fast
- Works well with large feature sets
- Assumes feature independence
- Not suitable for numerical input features
- Spam detection
- Sentiment analysis
[Python](https://scikit-learn.org/stable/modules/naive_bayes.html)
[Gradient Boosting Machines (GBM)](https://en.wikipedia.org/wiki/Gradient_boosting) - High performance
- Handles non-linear relationships
- Prone to overfitting if not tuned
- Slow to train
- Web search ranking
- Ecology predictions
[Python](https://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting)
[Rule-Based Classification](https://en.wikipedia.org/wiki/Rule-based_machine_learning) - Transparent and explainable
- Easily updated and modified
- Manual rule creation can be tedious
- May not capture complex relationships
- Expert systems
- Business rule enforcement
[Python](https://www.geeksforgeeks.org/rule-based-classifier-machine-learning/)
[Bagging](https://en.wikipedia.org/wiki/Bootstrap_aggregating) - Reduces variance
- Parallelizable
- May not handle bias well - Random Forest is a popular example [Python](https://scikit-learn.org/stable/modules/ensemble.html#bagging-meta-estimator)
[Boosting](https://en.wikipedia.org/wiki/Boosting_(machine_learning)) - Reduces bias
- Combines weak learners
- Sensitive to noisy data and outliers - AdaBoost
- Gradient Boosting
[Python](https://scikit-learn.org/stable/modules/ensemble.html#boosting)
[XGBoost](https://en.wikipedia.org/wiki/Xgboost) - Scalable and efficient
- Regularization
- Requires careful tuning
- Can overfit if not used correctly
- Competitions on Kaggle
- Retail prediction
[Python](https://xgboost.readthedocs.io/en/latest/)
[Linear Discriminant Analysis (LDA)](https://en.wikipedia.org/wiki/Linear_discriminant_analysis) - Dimensionality reduction
- Simple and interpretable
- Assumes Gaussian distributed data and equal class covariances - Face recognition
- Marketing segmentation
[Python](https://scikit-learn.org/stable/modules/lda_qda.html)
[Regularized Models (Shrinking)](https://en.wikipedia.org/wiki/Regularization_(mathematics)) - Prevents overfitting
- Handles collinearity
- Requires parameter tuning
- May result in loss of interpretability
- Ridge and Lasso regression [Python](https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression-and-classification)
[Stacking](https://en.wikipedia.org/wiki/Ensemble_learning#Stacking) - Combines multiple models
- Can improve accuracy
- Increases model complexity
- Risk of overfitting if base models are correlated
- Meta-modeling
- Kaggle competitions
[Python](https://scikit-learn.org/stable/modules/ensemble.html#stacked-generalization)
Model Type Strengths Limitations Example Use Cases Implementation
[Linear Regression](https://en.wikipedia.org/wiki/Linear_regression) - Simple and interpretable - Assumes linear relationship
- Sensitive to outliers
- Sales forecasting
- Risk assessment
[Python](https://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares)
[Polynomial Regression](https://en.wikipedia.org/wiki/Polynomial_regression) - Can model non-linear relationships - Can overfit with high degrees - Growth prediction
- Non-linear trend modeling
[Python](https://scikit-learn.org/stable/modules/linear_model.html#polynomial-regression)
[Ridge Regression](https://en.wikipedia.org/wiki/Tikhonov_regularization) - Prevents overfitting
- Regularizes the model
- Does not perform feature selection - High-dimensional data
- Preventing overfitting
[Python](https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression-and-classification)
[Lasso Regression](https://en.wikipedia.org/wiki/Lasso_(statistics)) - Feature selection
- Regularizes the model
- May exclude useful variables - Feature selection
- High-dimensional datasets
[Python](https://scikit-learn.org/stable/modules/linear_model.html#lasso)
[Elastic Net Regression](https://en.wikipedia.org/wiki/Elastic_net_regularization) - Balance between Ridge and Lasso - Requires tuning for mixing parameter - High-dimensional datasets with correlated features [Python](https://scikit-learn.org/stable/modules/linear_model.html#elastic-net)
[Quantile Regression](https://en.wikipedia.org/wiki/Quantile_regression) - Models the median or other quantiles - Less interpretable than ordinary regression - Median house price prediction
- Financial quantiles modeling
[Python](https://www.statsmodels.org/stable/quantile_regression.html)
[Support Vector Regression (SVR)](https://en.wikipedia.org/wiki/Support_vector_machine#Regression) - Flexible
- Can handle non-linear relationships
- Sensitive to kernel and hyperparameters - Stock price prediction
- Non-linear trend modeling
[Python](https://scikit-learn.org/stable/modules/svm.html#regression)
[Decision Tree Regression](https://en.wikipedia.org/wiki/Decision_tree_learning) - Handles non-linear data
- Interpretable
- Can overfit on noisy data - Price prediction
- Quality assessment
[Python](https://scikit-learn.org/stable/modules/tree.html#regression)
[Random Forest Regression](https://en.wikipedia.org/wiki/Random_forest) - Handles large datasets
- Reduces overfitting
- Requires more computational resources - Large datasets
- Environmental modeling
[Python](https://scikit-learn.org/stable/modules/ensemble.html#forest)
[Gradient Boosting Regression](https://en.wikipedia.org/wiki/Gradient_boosting) - High performance
- Can handle non-linear relationships
- Prone to overfitting if not tuned - Web search ranking
- Price prediction
[Python](https://scikit-learn.org/stable/modules/ensemble.html#gradient-boosting)
Model Type Strengths Limitations Example Use Cases Implementation
[K-Means Clustering](https://en.wikipedia.org/wiki/K-means_clustering) - Simple and widely used
- Fast for large datasets
- Sensitive to initial conditions
- Requires specifying the number of clusters
- Market segmentation
- Image compression
[Python](https://scikit-learn.org/stable/modules/clustering.html#k-means)
[Hierarchical Clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering) - Doesn't require specifying the number of clusters
- Produces a dendrogram
- May be computationally expensive for large datasets - Taxonomies
- Determining evolutionary relationships
[Python](https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering)
[DBSCAN (Density-Based Clustering)](https://en.wikipedia.org/wiki/DBSCAN) - Can find arbitrarily shaped clusters
- Doesn’t require specifying the number of clusters
- Sensitive to scale
- Requires density parameters to be set
- Noise detection and anomaly detection [Python](https://scikit-learn.org/stable/modules/clustering.html#dbscan)
[Agglomerative Clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering#Agglomerative_clustering_example) - Variety of linkage criteria
- Produces a hierarchy of clusters
- Not scalable for very large datasets - Sociological hierarchies
- Taxonomies
[Python](https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering)
[Mean Shift Clustering](https://en.wikipedia.org/wiki/Mean_shift) - No need to specify number of clusters
- Can find arbitrarily shaped clusters
- Computationally expensive
- Bandwidth parameter selection is crucial
- Image analysis
- Computer vision tasks
[Python](https://scikit-learn.org/stable/modules/clustering.html#mean-shift)
[Affinity Propagation](https://en.wikipedia.org/wiki/Affinity_propagation) - Automatically determines the number of clusters
- Good for data with lots of exemplars
- High computational complexity
- Preference parameter can be difficult to choose
- Image recognition
- Data with many similar exemplars
[Python](https://scikit-learn.org/stable/modules/clustering.html#affinity-propagation)
[Spectral Clustering](https://en.wikipedia.org/wiki/Spectral_clustering) - Can capture complex cluster structures
- Can be used with various affinity matrices
- Choice of affinity matrix is crucial
- Can be computationally expensive
- Image and speech processing
- Graph-based clustering
[Python](https://scikit-learn.org/stable/modules/clustering.html#spectral-clustering)
Method Strengths Limitations Example Use Cases Implementation
[PCA](https://en.wikipedia.org/wiki/Principal_component_analysis) - Dimensionality reduction
- Preserves variance
- Linear method
- Not for categorical data
- Feature extraction
- Data compression
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)
[t-SNE](https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding) - Captures non-linear structures
- Good for visualization
- Computationally expensive
- Not for high-dimensional data
- Data visualization
- Exploratory analysis
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html)
[Autoencoders](https://en.wikipedia.org/wiki/Autoencoder) - Dimensionality reduction
- Non-linear relationships
- Neural network knowledge
- Computationally intensive
- Feature learning
- Noise reduction
[Python](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)
[Isolation Forest](https://en.wikipedia.org/wiki/Isolation_forest) - Effective for high-dimensional data
- Fast and scalable
- Randomized
- May miss some anomalies
- Fraud detection
- Network security
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html)
[SVD](https://en.wikipedia.org/wiki/Singular_value_decomposition) - Matrix factorization
- Efficient for large datasets
- Assumes linear relationships
- Sensitive to scaling
- Recommender systems
- Latent semantic analysis
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html)
[ICA](https://en.wikipedia.org/wiki/Independent_component_analysis) - Identifies independent components
- Signal separation
- Non-Gaussian components
- Sensitive to noise
- Blind signal separation
- Feature extraction
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html)
Method Strengths Limitations Example Use Cases Implementation
[Apriori Algorithm](https://en.wikipedia.org/wiki/Apriori_algorithm) - Well-known and widely used
- Easy to understand and implement
- Can be slow on large datasets
- Generates a large number of candidate sets
- Market basket analysis
- Cross-marketing strategies
[Python](http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/)
[FP-Growth Algorithm](https://en.wikipedia.org/wiki/Association_rule_learning#FP-growth_algorithm) - Faster than Apriori
- Efficient for large datasets
- Memory intensive
- Can be complex to implement
- Frequent itemset mining in large databases
- Customer purchase patterns
[Python](http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/fpgrowth/)
[Eclat Algorithm](https://en.wikipedia.org/wiki/Eclat_algorithm) - Faster than Apriori
- Scalable and easy to parallelize
- Limited to binary attributes
- Generates many candidate itemsets
- Market basket analysis
- Binary classification tasks
[Python](https://github.com/tommyod/Efficient-Apriori)
[GSP (Generalized Sequential Pattern)](https://en.wikipedia.org/wiki/Sequential_Pattern_Mining#GSP_(Generalized_Sequential_Pattern)_Algorithm) - Identifies sequential patterns
- Flexible for various datasets
- Can be computationally expensive
- Not as efficient for very large databases
- Customer purchase sequence analysis
- Event sequence analysis
[Python](https://github.com/jacksonpradolima/gsp-py)
[RuleGrowth Algorithm](https://www.philippe-fournier-viger.com/spmf/rulegrowth.pdf) - Efficient for mining sequential rules
- Works well with sparse datasets
- Requires careful parameter setting
- Less known and used than Apriori or FP-Growth
- Analyzing customer shopping sequences
- Detecting patterns in web browsing data
[Python](https://pypi.org/project/spmf/)
Technique Strengths Limitations Example Use Cases Implementation
[Accuracy](https://en.wikipedia.org/wiki/Accuracy_and_precision#In_binary_classification) - Simple and intuitive
- Effective for balanced datasets
- Misleading for imbalanced datasets
- Doesn't reflect true positives/negatives
- General classification problems
- Comparing baseline models
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html)
[AUC-ROC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve) - Effective for binary classification
- Good for imbalanced datasets
- Can be overly optimistic in imbalanced data
- Not threshold-specific
- Medical diagnosis classification
- Fraud detection models
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html)
[Precision](https://en.wikipedia.org/wiki/Precision_and_recall#Precision) - Focuses on positive class
- Reduces false positives
- Ignores false negatives
- Not useful alone in imbalanced datasets
- Spam detection
- Content moderation systems
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html)
[Recall](https://en.wikipedia.org/wiki/Precision_and_recall#Recall) - Identifies actual positives well
- Minimizes false negatives
- Ignores false positives
- Can be misleading if positives are rare
- Disease outbreak detection
- Recall-focused tasks
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html)
[F1-Score](https://en.wikipedia.org/wiki/F-score) - Balances precision and recall
- Useful for imbalanced datasets
- May not reflect true model performance
- Depends on balance of precision and recall
- Customer churn prediction
- Sentiment analysis
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html)
[Cross-Validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)) - Reduces overfitting
- Provides robust model evaluation
- Computationally expensive
- May not be ideal for very large datasets
- General model evaluation
- Comparing multiple models
[Python](https://scikit-learn.org/stable/modules/cross_validation.html)
[The Validation Set Approach](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets) - Simple and easy to implement
- Good for initial model assessment
- Can lead to overfitting
- Dependent on the split
- Quick model prototyping
- Small datasets
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)
[Leave-One-Out Cross-Validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)#Leave-one-out_cross-validation) - Very detailed
- Each observation used for validation exactly once
- Computationally intensive
- Not suitable for large datasets
- Small but rich datasets
- Highly sensitive models
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.LeaveOneOut.html)
[k-Fold Cross-Validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)#k-fold_cross-validation) - Balances computational cost and validation accuracy
- Suitable for various data sizes
- Variability in results depending on how data is divided
- Choice of 'k' can impact results
- Medium-sized datasets
- Model selection
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html)
[The Bootstrap Method](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)) - Good for estimating model accuracy
- Effective for small datasets
- Results can be sensitive to outliers
- May overestimate accuracy for small datasets
- Small or medium-sized datasets
- Uncertainty estimation
[Python](https://scikit-learn.org/stable/modules/generated/sklearn.utils.resample.html)