Cours connexes :
- Embedded IA for IoT - S9 - Application de l'IA sur objets embarques
- Processus Stochastiques - S8 - Fondements probabilistes du ML
Machine Learning
PARTIE A : GENERALITES
Presentation
Le cours "Machine Learning" introduit les concepts fondamentaux de l'apprentissage automatique, permettant aux machines d'apprendre a partir de donnees sans etre explicitement programmees. Ce cours couvre les algorithmes classiques d'apprentissage supervise et non supervise, ainsi que leur implementation pratique avec Python et scikit-learn.
Annee Academique : 2023-2024
Semestre : 8
Categorie : Intelligence Artificielle / Data Science
PARTIE B : PARTIE DESCRIPTIVE
Details de l'experience
Environnement et contexte
Le cours combinait theorie mathematique (statistiques, algebre lineaire, optimisation) avec implementation pratique en Python. Nous avons travaille sur des datasets reels (Iris, MNIST, etc.) et utilise des bibliotheques standard (NumPy, pandas, scikit-learn, matplotlib) pour developper des modeles predictifs.
Ma fonction
Dans ce cours, j'ai ete responsable de :
- Comprendre les fondements theoriques du Machine Learning
- Pretraiter et explorer des donnees (nettoyage, visualisation, feature engineering)
- Implementer des algorithmes d'apprentissage supervise (regression, classification)
- Appliquer des techniques d'apprentissage non supervise (clustering, reduction de dimensionnalite)
- Evaluer et optimiser les modeles (cross-validation, hyperparametres)
- Interpreter les resultats et identifier les biais
- Developper des pipelines ML complets de bout en bout
PARTIE C : PARTIE TECHNIQUE
Cette section explore les aspects techniques du Machine Learning.
Concepts techniques appris
1. Types d'Apprentissage
Apprentissage Supervise :
Apprendre a partir de donnees etiquetees (X, y).
- Regression : predire valeur continue
- Classification : predire classe discrete
Apprentissage Non Supervise :
Trouver structure cachee dans donnees non etiquetees.
- Clustering : regrouper donnees similaires
- Reduction de dimensionnalite : PCA, t-SNE
Apprentissage par Renforcement :
Agent apprend par interaction (recompenses/punitions).
(Non couvert en detail dans ce cours)
2. Regression Lineaire
Modele :
y = b0 + b1x1 + b2x2 + ... + bnxn + e
ou en vectoriel: y = X^T b + e
Fonction de cout (MSE) :
J(b) = (1/2m) S(h(x(i)) - y(i))^2
Solution analytique (Normal Equation) :
b = (X^T X)^(-1) X^T y
Gradient Descent :
b := b - a * grad(J(b))
Regularisation :
- Ridge (L2) : penalise ||b||^2
- Lasso (L1) : penalise ||b||_1 (selection de features)
3. Regression Logistique
Pour classification binaire.
Fonction sigmoid :
sigma(z) = 1 / (1 + e^(-z))
Modele :
P(y=1|x) = sigma(b^T x)
Fonction de cout (cross-entropy) :
J(b) = -(1/m) S[y log(h(x)) + (1-y) log(1-h(x))]
Optimisation : Gradient Descent
Extension multiclasse : Softmax Regression
Figure : Architecture d'un perceptron multicouche avec propagation avant
4. Arbres de Decision
Principe :
Partitionner l'espace des features par tests successifs.
Construction :
- Choisir feature et seuil qui maximisent gain d'information
- Repeter recursivement sur sous-ensembles
Criteres de split :
- Gini Impurity : 1 - S p^2_i
- Entropy : -S p_i log(p_i)
Avantages :
- Interpretable
- Gere non-linearites
- Pas de normalisation necessaire
Inconvenients :
- Overfitting facile
- Instable (petite variation donnees = arbre different)
Regularisation :
- Profondeur max
- Nombre min d'echantillons par feuille
- Elagage (pruning)
5. Methodes d'Ensemble
Bagging (Bootstrap Aggregating) :
Entrainer plusieurs modeles sur sous-ensembles aleatoires, moyenner predictions.
Random Forest :
Bagging d'arbres + selection aleatoire de features a chaque split.
- Reduit variance
- Tres performant
- Moins interpretable
Boosting :
Entrainer modeles sequentiellement, chacun corrigeant erreurs du precedent.
AdaBoost :
Ponderer exemples mal classes plus fortement.
Gradient Boosting :
Ajuster modele sur residus du modele precedent.
XGBoost :
Implementation optimisee de Gradient Boosting.
- Tres performant (competitions Kaggle)
- Regularisation integree
- Gestion valeurs manquantes
6. Support Vector Machines (SVM)
Principe :
Trouver hyperplan qui maximise la marge entre classes.
Marge : distance au point le plus proche de chaque classe.
Hard Margin : donnees lineairement separables
Soft Margin : tolerer erreurs (parametre C)
Kernel Trick :
Projeter donnees dans espace haute dimension ou lineairement separables.
Kernels courants :
- Lineaire : K(x,x') = x^T x'
- Polynomial : K(x,x') = (x^T x' + c)^d
- RBF (Gaussian) : K(x,x') = exp(-gamma||x-x'||^2)
SVM pour regression (SVR) :
Minimiser erreur hors marge epsilon.
7. K-Nearest Neighbors (KNN)
Principe :
Classer selon majorite des K voisins les plus proches.
Distance : Euclidienne, Manhattan, Minkowski
Choix de K :
- K petit : sensible au bruit
- K grand : lisse, peut ignorer patterns locaux
Avantages :
- Simple, intuitif
- Pas d'entrainement (lazy learning)
- Gere non-linearites
Inconvenients :
- Cout prediction eleve (calculer distances)
- Sensible a dimensionnalite (curse of dimensionality)
- Necessite normalisation des features
8. Clustering - K-Means
Objectif :
Partitionner donnees en K clusters.
Algorithme :
- Initialiser K centroides aleatoirement
- Affecter chaque point au centroide le plus proche
- Recalculer centroides (moyenne des points)
- Repeter 2-3 jusqu'a convergence
Inertie : S ||x - centroide(x)||^2
Choix de K : methode du coude (elbow method)
Limitations :
- K a fixer a priori
- Sensible a initialisation
- Suppose clusters spheriques
Variantes : K-Means++, Mini-Batch K-Means
Autres algorithmes de clustering :
- DBSCAN : densite, decouvre formes arbitraires
- Hierarchical Clustering : dendrogramme
- Gaussian Mixture Models : probabiliste
9. Reduction de Dimensionnalite - PCA
PCA (Principal Component Analysis) :
Projeter donnees sur axes de variance maximale.
Algorithme :
- Centrer donnees (moyenne = 0)
- Calculer matrice de covariance
- Decomposition en valeurs propres/vecteurs propres
- Projeter sur k premiers vecteurs propres
Variance expliquee :
Proportion de variance totale conservee.
Usages :
- Visualisation (projection 2D/3D)
- Compression
- Reduction bruit
- Accelerer algorithmes
t-SNE :
Projection non-lineaire pour visualisation.
Preserve structure locale (voisinages).
10. Evaluation des Modeles
Classification :
Matrice de confusion :
Predit + Predit -
Reel + TP FN
Reel - FP TN
Metriques :
- Accuracy : (TP+TN)/(TP+TN+FP+FN)
- Precision : TP/(TP+FP)
- Recall (Sensibilite) : TP/(TP+FN)
- F1-Score : 2x(PrecisionxRecall)/(Precision+Recall)
Courbe ROC (Receiver Operating Characteristic) :
TPR vs FPR a differents seuils.
AUC (Area Under Curve) : aire sous courbe ROC (0.5 a 1).
Regression :
- MSE (Mean Squared Error) : Moyenne des (y - y_hat)^2
- RMSE : racine(MSE)
- MAE (Mean Absolute Error) : Moyenne des |y - y_hat|
- R^2 : proportion de variance expliquee
11. Validation et Optimisation
Train/Test Split :
Separer donnees (typiquement 80/20 ou 70/30).
Cross-Validation :
K-Fold : diviser en K sous-ensembles, entrainer K fois en utilisant K-1 pour train, 1 pour validation.
Overfitting vs Underfitting :
- Overfitting : modele trop complexe, memorise donnees d'entrainement
- Underfitting : modele trop simple, ne capture pas patterns
Courbes d'apprentissage :
Erreur train et validation vs taille dataset ou complexite.
Hyperparametres :
Parametres non appris (a fixer avant entrainement).
Grid Search :
Tester toutes combinaisons d'hyperparametres.
Random Search :
Echantillonner aleatoirement combinaisons.
Regularisation :
Penaliser complexite (L1, L2, dropout, early stopping).
12. Feature Engineering
Importance :
"Donnees > Algorithmes". Bonnes features sont cruciales.
Techniques :
- Scaling : MinMaxScaler, StandardScaler
- Encoding : One-Hot pour variables categorielles
- Polynomial Features : creer interactions
- Binning : discretiser variables continues
- Log Transform : pour distributions asymetriques
- Feature Selection : eliminer features redondantes/inutiles
Gestion valeurs manquantes :
- Suppression (si peu nombreuses)
- Imputation (moyenne, mediane, mode, KNN)
Detection outliers :
Z-score, IQR, isolation forest
PARTIE D : PARTIE ANALYTIQUE
Connaissances et competences mobilisees
- Comprehension des algorithmes de Machine Learning
- Mathematiques (algebre lineaire, probabilites, optimisation)
- Programmation Python (NumPy, pandas, scikit-learn)
- Pretraitement et exploration de donnees
- Entrainement, evaluation et optimisation de modeles
- Interpretation des resultats et diagnostic (overfitting, biais)
- Pensee critique sur limitations et biais des modeles
- Visualisation de donnees (matplotlib, seaborn)
Auto-evaluation
Ce cours a ete une decouverte passionnante de l'apprentissage automatique. Le ML transforme de nombreux domaines et comprendre ses mecanismes est devenu essentiel pour tout ingenieur.
La theorie mathematique peut etre intimidante au debut (gradient descent, matrices, optimisation), mais avec la pratique, elle devient intuitive. La regression lineaire, bien que simple, introduit des concepts fondamentaux reutilises dans tous les algorithmes.
Le pretraitement des donnees est souvent sous-estime mais crucial. "Garbage in, garbage out" : un modele ne peut pas compenser des donnees mal preparees. Nettoyer, normaliser, gerer les valeurs manquantes sont des etapes essentielles.
La diversite des algorithmes est impressionnante. Chacun a ses forces et faiblesses. Il n'y a pas de "meilleur" algorithme universel (No Free Lunch Theorem). L'art du ML est de choisir et adapter l'algorithme au probleme.
Les Random Forests et XGBoost sont remarquablement performants sur de nombreux problemes. Leur popularite en competitions Kaggle en temoigne. Cependant, ils sont moins interpretables que des modeles simples.
L'evaluation rigoureuse (cross-validation, metriques adaptees) est critique. L'accuracy seule peut etre trompeuse (classes desequilibrees). Il faut choisir metrique selon contexte (precision vs recall selon cout des erreurs).
L'overfitting est un piege constant. La validation croisee et la regularisation sont essentielles. Voir qu'un modele performe bien sur train mais mal sur test est une lecon importante.
Scikit-learn est une bibliotheque excellente : API coherente, documentation claire, implementations optimisees. Elle permet de se concentrer sur la logique ML plutot que sur les details d'implementation.
Le feature engineering reste largement manuel et creatif. C'est la qu'intervient l'expertise metier. Creer les bonnes features peut faire plus de difference que choisir le bon algorithme.
Mon avis
Ce cours est incontournable a l'ere de l'IA. Le Machine Learning est applique partout : moteurs de recherche, recommandations, diagnostics medicaux, voitures autonomes, finance, etc.
Points forts :
- Couverture large des algorithmes classiques
- Equilibre theorie/pratique
- Projets pratiques sur donnees reelles
- Utilisation de bibliotheques standard (scikit-learn)
Points a ameliorer :
- Plus sur Deep Learning (reseaux de neurones)
- Aspects de production (MLOps, deploiement)
- Ethique et biais des modeles
- Big Data et scalabilite
Reflexions personnelles :
Le ML est puissant mais pas magique. Il necessite :
- Donnees suffisantes et de qualite
- Probleme bien formule
- Metriques d'evaluation pertinentes
- Validation rigoureuse
- Interpretation critique des resultats
Les limites du ML doivent etre comprises :
- Biais : modeles refletent biais des donnees d'entrainement
- Generalisabilite : performance peut degrader sur nouvelles donnees
- Explicabilite : modeles complexes (deep learning) sont des "boites noires"
- Causalite : ML trouve correlations, pas causalite
L'ethique est cruciale :
- Fairness (equite entre groupes)
- Transparence et explicabilite
- Privacy (donnees sensibles)
- Responsabilite (qui est responsable des erreurs ?)
Applications professionnelles :
Competences ML applicables dans nombreux domaines :
- Data Science : analyse predictive, insights business
- Ingenierie produit : recommandations, personnalisation
- Sante : diagnostic assiste, decouverte de medicaments
- Finance : detection fraude, trading algorithmique
- Industrie : maintenance predictive, optimisation processus
- Marketing : segmentation clients, prediction churn
- Cybersecurite : detection anomalies
Le marche du ML est en forte croissance. Competences recherchees :
- Data Scientist
- ML Engineer
- Research Scientist (PhD souvent requis)
La frontiere avec le Deep Learning :
Ce cours couvre ML "classique". Le Deep Learning (reseaux neurones profonds) a revolutionne certains domaines (vision, NLP) mais necessite plus de donnees et ressources. Les bases du ML restent essentielles pour comprendre le DL.
L'avenir :
- AutoML : automatisation du pipeline ML
- Transfer Learning : reutiliser modeles pre-entraines
- Federated Learning : entrainer sans centraliser donnees
- Explainable AI : rendre modeles interpretables
- Quantum ML : exploiter ordinateurs quantiques
Ces bases en Machine Learning nous permettent de concevoir des systemes plus "intelligents", capables d'apprendre et de s'adapter, une competence devenue essentielle dans presque tous les domaines de l'ingenierie moderne.
Documents de Cours
Cours Complet Machine Learning Full Machine Learning Course
Cours complet : apprentissage supervise/non-supervise, reseaux de neurones, arbres de decision, SVM et metriques. Complete course: supervised/unsupervised learning, neural networks, decision trees, SVM and metrics.
Perceptron & Reseaux de Neurones Perceptron & Neural Networks
Slides sur le perceptron : modele lineaire, fonction d'activation, regle d'apprentissage et limitations. Slides on the perceptron: linear model, activation function, learning rule and limitations.
Reseaux de Neurones Profonds Deep Neural Networks
Architectures multicouches, retropropagation, fonctions d'activation avancees et techniques d'optimisation. Multi-layer architectures, backpropagation, advanced activation functions and optimization techniques.
Cours suivi en 2023-2024 a l'INSA Toulouse, Departement Genie Electrique et Informatique.
Related courses:
- Embedded IA for IoT - S9 - Applying AI on embedded devices
- Stochastic Processes - S8 - Probabilistic foundations of ML
Machine Learning
PART A: GENERALITIES
Presentation
The "Machine Learning" course introduces the fundamental concepts of machine learning, enabling machines to learn from data without being explicitly programmed. This course covers classic supervised and unsupervised learning algorithms, as well as their practical implementation with Python and scikit-learn.
Academic Year: 2023-2024
Semester: 8
Category: Artificial Intelligence / Data Science
PART B: DESCRIPTIVE PART
Experience Details
Environment and Context
The course combined mathematical theory (statistics, linear algebra, optimization) with practical implementation in Python. We worked on real datasets (Iris, MNIST, etc.) and used standard libraries (NumPy, pandas, scikit-learn, matplotlib) to develop predictive models.
My Function
In this course, I was responsible for:
- Understanding the theoretical foundations of Machine Learning
- Preprocessing and exploring data (cleaning, visualization, feature engineering)
- Implementing supervised learning algorithms (regression, classification)
- Applying unsupervised learning techniques (clustering, dimensionality reduction)
- Evaluating and optimizing models (cross-validation, hyperparameters)
- Interpreting results and identifying biases
- Developing complete end-to-end ML pipelines
PART C: TECHNICAL PART
This section explores the technical aspects of Machine Learning.
Technical Concepts Learned
1. Types of Learning
Supervised Learning:
Learning from labeled data (X, y).
- Regression: predict continuous value
- Classification: predict discrete class
Unsupervised Learning:
Finding hidden structure in unlabeled data.
- Clustering: grouping similar data
- Dimensionality Reduction: PCA, t-SNE
Reinforcement Learning:
Agent learns through interaction (rewards/penalties).
(Not covered in detail in this course)
2. Linear Regression
Model:
y = b0 + b1x1 + b2x2 + ... + bnxn + e
or in vector form: y = X^T b + e
Cost Function (MSE):
J(b) = (1/2m) S(h(x(i)) - y(i))^2
Analytical Solution (Normal Equation):
b = (X^T X)^(-1) X^T y
Gradient Descent:
b := b - a * grad(J(b))
Regularization:
- Ridge (L2): penalizes ||b||^2
- Lasso (L1): penalizes ||b||_1 (feature selection)
3. Logistic Regression
For binary classification.
Sigmoid function:
sigma(z) = 1 / (1 + e^(-z))
Model:
P(y=1|x) = sigma(b^T x)
Cost function (cross-entropy):
J(b) = -(1/m) S[y log(h(x)) + (1-y) log(1-h(x))]
Optimization: Gradient Descent
Multiclass extension: Softmax Regression
Figure: Multi-layer perceptron architecture with forward propagation
4. Decision Trees
Principle:
Partitioning the feature space through successive tests.
Construction:
- Choose feature and threshold that maximize information gain
- Repeat recursively on subsets
Split criteria:
- Gini Impurity: 1 - S p^2_i
- Entropy: -S p_i log(p_i)
Advantages:
- Interpretable
- Handles non-linearities
- No normalization required
Disadvantages:
- Easy overfitting
- Unstable (small data variation = different tree)
Regularization:
- Max depth
- Min number of samples per leaf
- Pruning
5. Ensemble Methods
Bagging (Bootstrap Aggregating):
Train multiple models on random subsets, average predictions.
Random Forest:
Bagging of trees + random feature selection at each split.
- Reduces variance
- Very performant
- Less interpretable
Boosting:
Train models sequentially, each correcting errors of the previous one.
AdaBoost:
Weight misclassified examples more heavily.
Gradient Boosting:
Fit model on residuals of the previous model.
XGBoost:
Optimized implementation of Gradient Boosting.
- Very performant (Kaggle competitions)
- Built-in regularization
- Missing value handling
6. Support Vector Machines (SVM)
Principle:
Find hyperplane that maximizes the margin between classes.
Margin: distance to the closest point of each class.
Hard Margin: linearly separable data
Soft Margin: tolerate errors (parameter C)
Kernel Trick:
Project data into a high-dimensional space where linearly separable.
Common kernels:
- Linear: K(x,x') = x^T x'
- Polynomial: K(x,x') = (x^T x' + c)^d
- RBF (Gaussian): K(x,x') = exp(-gamma||x-x'||^2)
SVM for regression (SVR):
Minimize error outside the epsilon margin.
7. K-Nearest Neighbors (KNN)
Principle:
Classify according to the majority of the K nearest neighbors.
Distance: Euclidean, Manhattan, Minkowski
Choice of K:
- Small K: sensitive to noise
- Large K: smooth, may ignore local patterns
Advantages:
- Simple, intuitive
- No training (lazy learning)
- Handles non-linearities
Disadvantages:
- High prediction cost (computing distances)
- Sensitive to dimensionality (curse of dimensionality)
- Requires feature normalization
8. Clustering - K-Means
Objective:
Partition data into K clusters.
Algorithm:
- Initialize K centroids randomly
- Assign each point to the nearest centroid
- Recompute centroids (mean of points)
- Repeat 2-3 until convergence
Inertia: S ||x - centroid(x)||^2
Choice of K: elbow method
Limitations:
- K must be set a priori
- Sensitive to initialization
- Assumes spherical clusters
Variants: K-Means++, Mini-Batch K-Means
Other clustering algorithms:
- DBSCAN: density-based, discovers arbitrary shapes
- Hierarchical Clustering: dendrogram
- Gaussian Mixture Models: probabilistic
9. Dimensionality Reduction - PCA
PCA (Principal Component Analysis):
Project data onto axes of maximum variance.
Algorithm:
- Center data (mean = 0)
- Compute covariance matrix
- Eigenvalue/eigenvector decomposition
- Project onto the first k eigenvectors
Explained variance:
Proportion of total variance retained.
Uses:
- Visualization (2D/3D projection)
- Compression
- Noise reduction
- Speed up algorithms
t-SNE:
Non-linear projection for visualization.
Preserves local structure (neighborhoods).
10. Model Evaluation
Classification:
Confusion matrix:
Predicted + Predicted -
Actual + TP FN
Actual - FP TN
Metrics:
- Accuracy: (TP+TN)/(TP+TN+FP+FN)
- Precision: TP/(TP+FP)
- Recall (Sensitivity): TP/(TP+FN)
- F1-Score: 2x(PrecisionxRecall)/(Precision+Recall)
ROC Curve (Receiver Operating Characteristic):
TPR vs FPR at different thresholds.
AUC (Area Under Curve): area under the ROC curve (0.5 to 1).
Regression:
- MSE (Mean Squared Error): Average of (y - y_hat)^2
- RMSE: sqrt(MSE)
- MAE (Mean Absolute Error): Average of |y - y_hat|
- R^2: proportion of explained variance
11. Validation and Optimization
Train/Test Split:
Split data (typically 80/20 or 70/30).
Cross-Validation:
K-Fold: divide into K subsets, train K times using K-1 for training, 1 for validation.
Overfitting vs Underfitting:
- Overfitting: model too complex, memorizes training data
- Underfitting: model too simple, fails to capture patterns
Learning curves:
Train and validation error vs dataset size or complexity.
Hyperparameters:
Parameters not learned (to be set before training).
Grid Search:
Test all hyperparameter combinations.
Random Search:
Randomly sample combinations.
Regularization:
Penalize complexity (L1, L2, dropout, early stopping).
12. Feature Engineering
Importance:
"Data > Algorithms". Good features are crucial.
Techniques:
- Scaling: MinMaxScaler, StandardScaler
- Encoding: One-Hot for categorical variables
- Polynomial Features: create interactions
- Binning: discretize continuous variables
- Log Transform: for skewed distributions
- Feature Selection: eliminate redundant/useless features
Missing value handling:
- Deletion (if few)
- Imputation (mean, median, mode, KNN)
Outlier detection:
Z-score, IQR, isolation forest
PART D: ANALYTICAL PART
Knowledge and Skills Mobilized
- Understanding Machine Learning algorithms
- Mathematics (linear algebra, probability, optimization)
- Python programming (NumPy, pandas, scikit-learn)
- Data preprocessing and exploration
- Model training, evaluation and optimization
- Interpreting results and diagnostics (overfitting, bias)
- Critical thinking about model limitations and biases
- Data visualization (matplotlib, seaborn)
Self Evaluation
This course was a fascinating introduction to machine learning. ML is transforming many fields and understanding its mechanisms has become essential for any engineer.
The mathematical theory can be intimidating at first (gradient descent, matrices, optimization), but with practice it becomes intuitive. Linear regression, although simple, introduces fundamental concepts reused across all algorithms.
Data preprocessing is often underestimated but crucial. "Garbage in, garbage out": a model cannot compensate for poorly prepared data. Cleaning, normalizing, handling missing values are essential steps.
The diversity of algorithms is impressive. Each has its strengths and weaknesses. There is no universal "best" algorithm (No Free Lunch Theorem). The art of ML is choosing and adapting the algorithm to the problem.
Random Forests and XGBoost are remarkably performant on many problems. Their popularity in Kaggle competitions attests to this. However, they are less interpretable than simpler models.
Rigorous evaluation (cross-validation, appropriate metrics) is critical. Accuracy alone can be misleading (imbalanced classes). The metric must be chosen according to context (precision vs recall depending on error cost).
Overfitting is a constant trap. Cross-validation and regularization are essential. Seeing a model perform well on training data but poorly on test data is an important lesson.
Scikit-learn is an excellent library: consistent API, clear documentation, optimized implementations. It allows focusing on ML logic rather than implementation details.
Feature engineering remains largely manual and creative. This is where domain expertise comes in. Creating the right features can make more difference than choosing the right algorithm.
My Opinion
This course is essential in the age of AI. Machine Learning is applied everywhere: search engines, recommendations, medical diagnostics, autonomous vehicles, finance, etc.
Strengths:
- Broad coverage of classic algorithms
- Theory/practice balance
- Hands-on projects with real data
- Use of standard libraries (scikit-learn)
Areas for improvement:
- More on Deep Learning (neural networks)
- Production aspects (MLOps, deployment)
- Ethics and model biases
- Big Data and scalability
Personal reflections:
ML is powerful but not magic. It requires:
- Sufficient and quality data
- Well-formulated problem
- Relevant evaluation metrics
- Rigorous validation
- Critical interpretation of results
The limitations of ML must be understood:
- Bias: models reflect biases in training data
- Generalizability: performance can degrade on new data
- Explainability: complex models (deep learning) are "black boxes"
- Causality: ML finds correlations, not causality
Ethics is crucial:
- Fairness (equity between groups)
- Transparency and explainability
- Privacy (sensitive data)
- Accountability (who is responsible for errors?)
Professional applications:
ML skills applicable in many fields:
- Data Science: predictive analysis, business insights
- Product Engineering: recommendations, personalization
- Healthcare: assisted diagnosis, drug discovery
- Finance: fraud detection, algorithmic trading
- Industry: predictive maintenance, process optimization
- Marketing: customer segmentation, churn prediction
- Cybersecurity: anomaly detection
The ML market is growing rapidly. In-demand skills:
- Data Scientist
- ML Engineer
- Research Scientist (PhD often required)
The boundary with Deep Learning:
This course covers "classic" ML. Deep Learning (deep neural networks) has revolutionized certain fields (vision, NLP) but requires more data and resources. ML fundamentals remain essential for understanding DL.
The future:
- AutoML: automation of the ML pipeline
- Transfer Learning: reuse pre-trained models
- Federated Learning: train without centralizing data
- Explainable AI: make models interpretable
- Quantum ML: leverage quantum computers
These Machine Learning fundamentals enable us to design "smarter" systems, capable of learning and adapting, a skill that has become essential in nearly every field of modern engineering.
Course Documents
Course taken in 2023-2024 at INSA Toulouse, Department of Electrical and Computer Engineering.