Cours connexes :


Machine Learning

PARTIE A : GENERALITES

Presentation

Le cours "Machine Learning" introduit les concepts fondamentaux de l'apprentissage automatique, permettant aux machines d'apprendre a partir de donnees sans etre explicitement programmees. Ce cours couvre les algorithmes classiques d'apprentissage supervise et non supervise, ainsi que leur implementation pratique avec Python et scikit-learn.

Annee Academique : 2023-2024
Semestre : 8
Categorie : Intelligence Artificielle / Data Science


PARTIE B : PARTIE DESCRIPTIVE

Details de l'experience

Environnement et contexte

Le cours combinait theorie mathematique (statistiques, algebre lineaire, optimisation) avec implementation pratique en Python. Nous avons travaille sur des datasets reels (Iris, MNIST, etc.) et utilise des bibliotheques standard (NumPy, pandas, scikit-learn, matplotlib) pour developper des modeles predictifs.

Ma fonction

Dans ce cours, j'ai ete responsable de :

  • Comprendre les fondements theoriques du Machine Learning
  • Pretraiter et explorer des donnees (nettoyage, visualisation, feature engineering)
  • Implementer des algorithmes d'apprentissage supervise (regression, classification)
  • Appliquer des techniques d'apprentissage non supervise (clustering, reduction de dimensionnalite)
  • Evaluer et optimiser les modeles (cross-validation, hyperparametres)
  • Interpreter les resultats et identifier les biais
  • Developper des pipelines ML complets de bout en bout

PARTIE C : PARTIE TECHNIQUE

Cette section explore les aspects techniques du Machine Learning.

Concepts techniques appris

1. Types d'Apprentissage

Apprentissage Supervise :
Apprendre a partir de donnees etiquetees (X, y).

  • Regression : predire valeur continue
  • Classification : predire classe discrete

Apprentissage Non Supervise :
Trouver structure cachee dans donnees non etiquetees.

  • Clustering : regrouper donnees similaires
  • Reduction de dimensionnalite : PCA, t-SNE

Apprentissage par Renforcement :
Agent apprend par interaction (recompenses/punitions).
(Non couvert en detail dans ce cours)

2. Regression Lineaire

Modele :

y = b0 + b1x1 + b2x2 + ... + bnxn + e
ou en vectoriel: y = X^T b + e

Fonction de cout (MSE) :

J(b) = (1/2m) S(h(x(i)) - y(i))^2

Solution analytique (Normal Equation) :

b = (X^T X)^(-1) X^T y

Gradient Descent :

b := b - a * grad(J(b))

Regularisation :

  • Ridge (L2) : penalise ||b||^2
  • Lasso (L1) : penalise ||b||_1 (selection de features)

3. Regression Logistique

Pour classification binaire.

Fonction sigmoid :

sigma(z) = 1 / (1 + e^(-z))

Modele :

P(y=1|x) = sigma(b^T x)

Fonction de cout (cross-entropy) :

J(b) = -(1/m) S[y log(h(x)) + (1-y) log(1-h(x))]

Optimisation : Gradient Descent

Extension multiclasse : Softmax Regression

Reseau de neurones multicouche

Figure : Architecture d'un perceptron multicouche avec propagation avant

4. Arbres de Decision

Principe :
Partitionner l'espace des features par tests successifs.

Construction :

  • Choisir feature et seuil qui maximisent gain d'information
  • Repeter recursivement sur sous-ensembles

Criteres de split :

  • Gini Impurity : 1 - S p^2_i
  • Entropy : -S p_i log(p_i)

Avantages :

  • Interpretable
  • Gere non-linearites
  • Pas de normalisation necessaire

Inconvenients :

  • Overfitting facile
  • Instable (petite variation donnees = arbre different)

Regularisation :

  • Profondeur max
  • Nombre min d'echantillons par feuille
  • Elagage (pruning)

5. Methodes d'Ensemble

Bagging (Bootstrap Aggregating) :
Entrainer plusieurs modeles sur sous-ensembles aleatoires, moyenner predictions.

Random Forest :
Bagging d'arbres + selection aleatoire de features a chaque split.

  • Reduit variance
  • Tres performant
  • Moins interpretable

Boosting :
Entrainer modeles sequentiellement, chacun corrigeant erreurs du precedent.

AdaBoost :
Ponderer exemples mal classes plus fortement.

Gradient Boosting :
Ajuster modele sur residus du modele precedent.

XGBoost :
Implementation optimisee de Gradient Boosting.

  • Tres performant (competitions Kaggle)
  • Regularisation integree
  • Gestion valeurs manquantes

6. Support Vector Machines (SVM)

Principe :
Trouver hyperplan qui maximise la marge entre classes.

Marge : distance au point le plus proche de chaque classe.

Hard Margin : donnees lineairement separables
Soft Margin : tolerer erreurs (parametre C)

Kernel Trick :
Projeter donnees dans espace haute dimension ou lineairement separables.

Kernels courants :

  • Lineaire : K(x,x') = x^T x'
  • Polynomial : K(x,x') = (x^T x' + c)^d
  • RBF (Gaussian) : K(x,x') = exp(-gamma||x-x'||^2)

SVM pour regression (SVR) :
Minimiser erreur hors marge epsilon.

7. K-Nearest Neighbors (KNN)

Principe :
Classer selon majorite des K voisins les plus proches.

Distance : Euclidienne, Manhattan, Minkowski

Choix de K :

  • K petit : sensible au bruit
  • K grand : lisse, peut ignorer patterns locaux

Avantages :

  • Simple, intuitif
  • Pas d'entrainement (lazy learning)
  • Gere non-linearites

Inconvenients :

  • Cout prediction eleve (calculer distances)
  • Sensible a dimensionnalite (curse of dimensionality)
  • Necessite normalisation des features

8. Clustering - K-Means

Objectif :
Partitionner donnees en K clusters.

Algorithme :

  1. Initialiser K centroides aleatoirement
  2. Affecter chaque point au centroide le plus proche
  3. Recalculer centroides (moyenne des points)
  4. Repeter 2-3 jusqu'a convergence

Inertie : S ||x - centroide(x)||^2

Choix de K : methode du coude (elbow method)

Limitations :

  • K a fixer a priori
  • Sensible a initialisation
  • Suppose clusters spheriques

Variantes : K-Means++, Mini-Batch K-Means

Autres algorithmes de clustering :

  • DBSCAN : densite, decouvre formes arbitraires
  • Hierarchical Clustering : dendrogramme
  • Gaussian Mixture Models : probabiliste

9. Reduction de Dimensionnalite - PCA

PCA (Principal Component Analysis) :
Projeter donnees sur axes de variance maximale.

Algorithme :

  1. Centrer donnees (moyenne = 0)
  2. Calculer matrice de covariance
  3. Decomposition en valeurs propres/vecteurs propres
  4. Projeter sur k premiers vecteurs propres

Variance expliquee :
Proportion de variance totale conservee.

Usages :

  • Visualisation (projection 2D/3D)
  • Compression
  • Reduction bruit
  • Accelerer algorithmes

t-SNE :
Projection non-lineaire pour visualisation.
Preserve structure locale (voisinages).

10. Evaluation des Modeles

Classification :

Matrice de confusion :

                Predit +    Predit -
Reel +         TP          FN
Reel -         FP          TN

Metriques :

  • Accuracy : (TP+TN)/(TP+TN+FP+FN)
  • Precision : TP/(TP+FP)
  • Recall (Sensibilite) : TP/(TP+FN)
  • F1-Score : 2x(PrecisionxRecall)/(Precision+Recall)

Courbe ROC (Receiver Operating Characteristic) :
TPR vs FPR a differents seuils.

AUC (Area Under Curve) : aire sous courbe ROC (0.5 a 1).

Regression :

  • MSE (Mean Squared Error) : Moyenne des (y - y_hat)^2
  • RMSE : racine(MSE)
  • MAE (Mean Absolute Error) : Moyenne des |y - y_hat|
  • R^2 : proportion de variance expliquee

11. Validation et Optimisation

Train/Test Split :
Separer donnees (typiquement 80/20 ou 70/30).

Cross-Validation :
K-Fold : diviser en K sous-ensembles, entrainer K fois en utilisant K-1 pour train, 1 pour validation.

Overfitting vs Underfitting :

  • Overfitting : modele trop complexe, memorise donnees d'entrainement
  • Underfitting : modele trop simple, ne capture pas patterns

Courbes d'apprentissage :
Erreur train et validation vs taille dataset ou complexite.

Hyperparametres :
Parametres non appris (a fixer avant entrainement).

Grid Search :
Tester toutes combinaisons d'hyperparametres.

Random Search :
Echantillonner aleatoirement combinaisons.

Regularisation :
Penaliser complexite (L1, L2, dropout, early stopping).

12. Feature Engineering

Importance :
"Donnees > Algorithmes". Bonnes features sont cruciales.

Techniques :

  • Scaling : MinMaxScaler, StandardScaler
  • Encoding : One-Hot pour variables categorielles
  • Polynomial Features : creer interactions
  • Binning : discretiser variables continues
  • Log Transform : pour distributions asymetriques
  • Feature Selection : eliminer features redondantes/inutiles

Gestion valeurs manquantes :

  • Suppression (si peu nombreuses)
  • Imputation (moyenne, mediane, mode, KNN)

Detection outliers :
Z-score, IQR, isolation forest

PARTIE D : PARTIE ANALYTIQUE

Connaissances et competences mobilisees

  • Comprehension des algorithmes de Machine Learning
  • Mathematiques (algebre lineaire, probabilites, optimisation)
  • Programmation Python (NumPy, pandas, scikit-learn)
  • Pretraitement et exploration de donnees
  • Entrainement, evaluation et optimisation de modeles
  • Interpretation des resultats et diagnostic (overfitting, biais)
  • Pensee critique sur limitations et biais des modeles
  • Visualisation de donnees (matplotlib, seaborn)

Auto-evaluation

Ce cours a ete une decouverte passionnante de l'apprentissage automatique. Le ML transforme de nombreux domaines et comprendre ses mecanismes est devenu essentiel pour tout ingenieur.

La theorie mathematique peut etre intimidante au debut (gradient descent, matrices, optimisation), mais avec la pratique, elle devient intuitive. La regression lineaire, bien que simple, introduit des concepts fondamentaux reutilises dans tous les algorithmes.

Le pretraitement des donnees est souvent sous-estime mais crucial. "Garbage in, garbage out" : un modele ne peut pas compenser des donnees mal preparees. Nettoyer, normaliser, gerer les valeurs manquantes sont des etapes essentielles.

La diversite des algorithmes est impressionnante. Chacun a ses forces et faiblesses. Il n'y a pas de "meilleur" algorithme universel (No Free Lunch Theorem). L'art du ML est de choisir et adapter l'algorithme au probleme.

Les Random Forests et XGBoost sont remarquablement performants sur de nombreux problemes. Leur popularite en competitions Kaggle en temoigne. Cependant, ils sont moins interpretables que des modeles simples.

L'evaluation rigoureuse (cross-validation, metriques adaptees) est critique. L'accuracy seule peut etre trompeuse (classes desequilibrees). Il faut choisir metrique selon contexte (precision vs recall selon cout des erreurs).

L'overfitting est un piege constant. La validation croisee et la regularisation sont essentielles. Voir qu'un modele performe bien sur train mais mal sur test est une lecon importante.

Scikit-learn est une bibliotheque excellente : API coherente, documentation claire, implementations optimisees. Elle permet de se concentrer sur la logique ML plutot que sur les details d'implementation.

Le feature engineering reste largement manuel et creatif. C'est la qu'intervient l'expertise metier. Creer les bonnes features peut faire plus de difference que choisir le bon algorithme.

Mon avis

Ce cours est incontournable a l'ere de l'IA. Le Machine Learning est applique partout : moteurs de recherche, recommandations, diagnostics medicaux, voitures autonomes, finance, etc.

Points forts :

  • Couverture large des algorithmes classiques
  • Equilibre theorie/pratique
  • Projets pratiques sur donnees reelles
  • Utilisation de bibliotheques standard (scikit-learn)

Points a ameliorer :

  • Plus sur Deep Learning (reseaux de neurones)
  • Aspects de production (MLOps, deploiement)
  • Ethique et biais des modeles
  • Big Data et scalabilite

Reflexions personnelles :

Le ML est puissant mais pas magique. Il necessite :

  • Donnees suffisantes et de qualite
  • Probleme bien formule
  • Metriques d'evaluation pertinentes
  • Validation rigoureuse
  • Interpretation critique des resultats

Les limites du ML doivent etre comprises :

  • Biais : modeles refletent biais des donnees d'entrainement
  • Generalisabilite : performance peut degrader sur nouvelles donnees
  • Explicabilite : modeles complexes (deep learning) sont des "boites noires"
  • Causalite : ML trouve correlations, pas causalite

L'ethique est cruciale :

  • Fairness (equite entre groupes)
  • Transparence et explicabilite
  • Privacy (donnees sensibles)
  • Responsabilite (qui est responsable des erreurs ?)

Applications professionnelles :

Competences ML applicables dans nombreux domaines :

  • Data Science : analyse predictive, insights business
  • Ingenierie produit : recommandations, personnalisation
  • Sante : diagnostic assiste, decouverte de medicaments
  • Finance : detection fraude, trading algorithmique
  • Industrie : maintenance predictive, optimisation processus
  • Marketing : segmentation clients, prediction churn
  • Cybersecurite : detection anomalies

Le marche du ML est en forte croissance. Competences recherchees :

  • Data Scientist
  • ML Engineer
  • Research Scientist (PhD souvent requis)

La frontiere avec le Deep Learning :
Ce cours couvre ML "classique". Le Deep Learning (reseaux neurones profonds) a revolutionne certains domaines (vision, NLP) mais necessite plus de donnees et ressources. Les bases du ML restent essentielles pour comprendre le DL.

L'avenir :

  • AutoML : automatisation du pipeline ML
  • Transfer Learning : reutiliser modeles pre-entraines
  • Federated Learning : entrainer sans centraliser donnees
  • Explainable AI : rendre modeles interpretables
  • Quantum ML : exploiter ordinateurs quantiques

Ces bases en Machine Learning nous permettent de concevoir des systemes plus "intelligents", capables d'apprendre et de s'adapter, une competence devenue essentielle dans presque tous les domaines de l'ingenierie moderne.


Documents de Cours

Cours Complet Machine Learning Full Machine Learning Course

Cours complet : apprentissage supervise/non-supervise, reseaux de neurones, arbres de decision, SVM et metriques. Complete course: supervised/unsupervised learning, neural networks, decision trees, SVM and metrics.

Telecharger Download

Perceptron & Reseaux de Neurones Perceptron & Neural Networks

Slides sur le perceptron : modele lineaire, fonction d'activation, regle d'apprentissage et limitations. Slides on the perceptron: linear model, activation function, learning rule and limitations.

Telecharger Download

Reseaux de Neurones Profonds Deep Neural Networks

Architectures multicouches, retropropagation, fonctions d'activation avancees et techniques d'optimisation. Multi-layer architectures, backpropagation, advanced activation functions and optimization techniques.

Telecharger Download


Cours suivi en 2023-2024 a l'INSA Toulouse, Departement Genie Electrique et Informatique.

Related courses:


Machine Learning

PART A: GENERALITIES

Presentation

The "Machine Learning" course introduces the fundamental concepts of machine learning, enabling machines to learn from data without being explicitly programmed. This course covers classic supervised and unsupervised learning algorithms, as well as their practical implementation with Python and scikit-learn.

Academic Year: 2023-2024
Semester: 8
Category: Artificial Intelligence / Data Science


PART B: DESCRIPTIVE PART

Experience Details

Environment and Context

The course combined mathematical theory (statistics, linear algebra, optimization) with practical implementation in Python. We worked on real datasets (Iris, MNIST, etc.) and used standard libraries (NumPy, pandas, scikit-learn, matplotlib) to develop predictive models.

My Function

In this course, I was responsible for:

  • Understanding the theoretical foundations of Machine Learning
  • Preprocessing and exploring data (cleaning, visualization, feature engineering)
  • Implementing supervised learning algorithms (regression, classification)
  • Applying unsupervised learning techniques (clustering, dimensionality reduction)
  • Evaluating and optimizing models (cross-validation, hyperparameters)
  • Interpreting results and identifying biases
  • Developing complete end-to-end ML pipelines

PART C: TECHNICAL PART

This section explores the technical aspects of Machine Learning.

Technical Concepts Learned

1. Types of Learning

Supervised Learning:
Learning from labeled data (X, y).

  • Regression: predict continuous value
  • Classification: predict discrete class

Unsupervised Learning:
Finding hidden structure in unlabeled data.

  • Clustering: grouping similar data
  • Dimensionality Reduction: PCA, t-SNE

Reinforcement Learning:
Agent learns through interaction (rewards/penalties).
(Not covered in detail in this course)

2. Linear Regression

Model:

y = b0 + b1x1 + b2x2 + ... + bnxn + e
or in vector form: y = X^T b + e

Cost Function (MSE):

J(b) = (1/2m) S(h(x(i)) - y(i))^2

Analytical Solution (Normal Equation):

b = (X^T X)^(-1) X^T y

Gradient Descent:

b := b - a * grad(J(b))

Regularization:

  • Ridge (L2): penalizes ||b||^2
  • Lasso (L1): penalizes ||b||_1 (feature selection)

3. Logistic Regression

For binary classification.

Sigmoid function:

sigma(z) = 1 / (1 + e^(-z))

Model:

P(y=1|x) = sigma(b^T x)

Cost function (cross-entropy):

J(b) = -(1/m) S[y log(h(x)) + (1-y) log(1-h(x))]

Optimization: Gradient Descent

Multiclass extension: Softmax Regression

Multi-layer neural network

Figure: Multi-layer perceptron architecture with forward propagation

4. Decision Trees

Principle:
Partitioning the feature space through successive tests.

Construction:

  • Choose feature and threshold that maximize information gain
  • Repeat recursively on subsets

Split criteria:

  • Gini Impurity: 1 - S p^2_i
  • Entropy: -S p_i log(p_i)

Advantages:

  • Interpretable
  • Handles non-linearities
  • No normalization required

Disadvantages:

  • Easy overfitting
  • Unstable (small data variation = different tree)

Regularization:

  • Max depth
  • Min number of samples per leaf
  • Pruning

5. Ensemble Methods

Bagging (Bootstrap Aggregating):
Train multiple models on random subsets, average predictions.

Random Forest:
Bagging of trees + random feature selection at each split.

  • Reduces variance
  • Very performant
  • Less interpretable

Boosting:
Train models sequentially, each correcting errors of the previous one.

AdaBoost:
Weight misclassified examples more heavily.

Gradient Boosting:
Fit model on residuals of the previous model.

XGBoost:
Optimized implementation of Gradient Boosting.

  • Very performant (Kaggle competitions)
  • Built-in regularization
  • Missing value handling

6. Support Vector Machines (SVM)

Principle:
Find hyperplane that maximizes the margin between classes.

Margin: distance to the closest point of each class.

Hard Margin: linearly separable data
Soft Margin: tolerate errors (parameter C)

Kernel Trick:
Project data into a high-dimensional space where linearly separable.

Common kernels:

  • Linear: K(x,x') = x^T x'
  • Polynomial: K(x,x') = (x^T x' + c)^d
  • RBF (Gaussian): K(x,x') = exp(-gamma||x-x'||^2)

SVM for regression (SVR):
Minimize error outside the epsilon margin.

7. K-Nearest Neighbors (KNN)

Principle:
Classify according to the majority of the K nearest neighbors.

Distance: Euclidean, Manhattan, Minkowski

Choice of K:

  • Small K: sensitive to noise
  • Large K: smooth, may ignore local patterns

Advantages:

  • Simple, intuitive
  • No training (lazy learning)
  • Handles non-linearities

Disadvantages:

  • High prediction cost (computing distances)
  • Sensitive to dimensionality (curse of dimensionality)
  • Requires feature normalization

8. Clustering - K-Means

Objective:
Partition data into K clusters.

Algorithm:

  1. Initialize K centroids randomly
  2. Assign each point to the nearest centroid
  3. Recompute centroids (mean of points)
  4. Repeat 2-3 until convergence

Inertia: S ||x - centroid(x)||^2

Choice of K: elbow method

Limitations:

  • K must be set a priori
  • Sensitive to initialization
  • Assumes spherical clusters

Variants: K-Means++, Mini-Batch K-Means

Other clustering algorithms:

  • DBSCAN: density-based, discovers arbitrary shapes
  • Hierarchical Clustering: dendrogram
  • Gaussian Mixture Models: probabilistic

9. Dimensionality Reduction - PCA

PCA (Principal Component Analysis):
Project data onto axes of maximum variance.

Algorithm:

  1. Center data (mean = 0)
  2. Compute covariance matrix
  3. Eigenvalue/eigenvector decomposition
  4. Project onto the first k eigenvectors

Explained variance:
Proportion of total variance retained.

Uses:

  • Visualization (2D/3D projection)
  • Compression
  • Noise reduction
  • Speed up algorithms

t-SNE:
Non-linear projection for visualization.
Preserves local structure (neighborhoods).

10. Model Evaluation

Classification:

Confusion matrix:

                Predicted +    Predicted -
Actual +         TP             FN
Actual -         FP             TN

Metrics:

  • Accuracy: (TP+TN)/(TP+TN+FP+FN)
  • Precision: TP/(TP+FP)
  • Recall (Sensitivity): TP/(TP+FN)
  • F1-Score: 2x(PrecisionxRecall)/(Precision+Recall)

ROC Curve (Receiver Operating Characteristic):
TPR vs FPR at different thresholds.

AUC (Area Under Curve): area under the ROC curve (0.5 to 1).

Regression:

  • MSE (Mean Squared Error): Average of (y - y_hat)^2
  • RMSE: sqrt(MSE)
  • MAE (Mean Absolute Error): Average of |y - y_hat|
  • R^2: proportion of explained variance

11. Validation and Optimization

Train/Test Split:
Split data (typically 80/20 or 70/30).

Cross-Validation:
K-Fold: divide into K subsets, train K times using K-1 for training, 1 for validation.

Overfitting vs Underfitting:

  • Overfitting: model too complex, memorizes training data
  • Underfitting: model too simple, fails to capture patterns

Learning curves:
Train and validation error vs dataset size or complexity.

Hyperparameters:
Parameters not learned (to be set before training).

Grid Search:
Test all hyperparameter combinations.

Random Search:
Randomly sample combinations.

Regularization:
Penalize complexity (L1, L2, dropout, early stopping).

12. Feature Engineering

Importance:
"Data > Algorithms". Good features are crucial.

Techniques:

  • Scaling: MinMaxScaler, StandardScaler
  • Encoding: One-Hot for categorical variables
  • Polynomial Features: create interactions
  • Binning: discretize continuous variables
  • Log Transform: for skewed distributions
  • Feature Selection: eliminate redundant/useless features

Missing value handling:

  • Deletion (if few)
  • Imputation (mean, median, mode, KNN)

Outlier detection:
Z-score, IQR, isolation forest

PART D: ANALYTICAL PART

Knowledge and Skills Mobilized

  • Understanding Machine Learning algorithms
  • Mathematics (linear algebra, probability, optimization)
  • Python programming (NumPy, pandas, scikit-learn)
  • Data preprocessing and exploration
  • Model training, evaluation and optimization
  • Interpreting results and diagnostics (overfitting, bias)
  • Critical thinking about model limitations and biases
  • Data visualization (matplotlib, seaborn)

Self Evaluation

This course was a fascinating introduction to machine learning. ML is transforming many fields and understanding its mechanisms has become essential for any engineer.

The mathematical theory can be intimidating at first (gradient descent, matrices, optimization), but with practice it becomes intuitive. Linear regression, although simple, introduces fundamental concepts reused across all algorithms.

Data preprocessing is often underestimated but crucial. "Garbage in, garbage out": a model cannot compensate for poorly prepared data. Cleaning, normalizing, handling missing values are essential steps.

The diversity of algorithms is impressive. Each has its strengths and weaknesses. There is no universal "best" algorithm (No Free Lunch Theorem). The art of ML is choosing and adapting the algorithm to the problem.

Random Forests and XGBoost are remarkably performant on many problems. Their popularity in Kaggle competitions attests to this. However, they are less interpretable than simpler models.

Rigorous evaluation (cross-validation, appropriate metrics) is critical. Accuracy alone can be misleading (imbalanced classes). The metric must be chosen according to context (precision vs recall depending on error cost).

Overfitting is a constant trap. Cross-validation and regularization are essential. Seeing a model perform well on training data but poorly on test data is an important lesson.

Scikit-learn is an excellent library: consistent API, clear documentation, optimized implementations. It allows focusing on ML logic rather than implementation details.

Feature engineering remains largely manual and creative. This is where domain expertise comes in. Creating the right features can make more difference than choosing the right algorithm.

My Opinion

This course is essential in the age of AI. Machine Learning is applied everywhere: search engines, recommendations, medical diagnostics, autonomous vehicles, finance, etc.

Strengths:

  • Broad coverage of classic algorithms
  • Theory/practice balance
  • Hands-on projects with real data
  • Use of standard libraries (scikit-learn)

Areas for improvement:

  • More on Deep Learning (neural networks)
  • Production aspects (MLOps, deployment)
  • Ethics and model biases
  • Big Data and scalability

Personal reflections:

ML is powerful but not magic. It requires:

  • Sufficient and quality data
  • Well-formulated problem
  • Relevant evaluation metrics
  • Rigorous validation
  • Critical interpretation of results

The limitations of ML must be understood:

  • Bias: models reflect biases in training data
  • Generalizability: performance can degrade on new data
  • Explainability: complex models (deep learning) are "black boxes"
  • Causality: ML finds correlations, not causality

Ethics is crucial:

  • Fairness (equity between groups)
  • Transparency and explainability
  • Privacy (sensitive data)
  • Accountability (who is responsible for errors?)

Professional applications:

ML skills applicable in many fields:

  • Data Science: predictive analysis, business insights
  • Product Engineering: recommendations, personalization
  • Healthcare: assisted diagnosis, drug discovery
  • Finance: fraud detection, algorithmic trading
  • Industry: predictive maintenance, process optimization
  • Marketing: customer segmentation, churn prediction
  • Cybersecurity: anomaly detection

The ML market is growing rapidly. In-demand skills:

  • Data Scientist
  • ML Engineer
  • Research Scientist (PhD often required)

The boundary with Deep Learning:
This course covers "classic" ML. Deep Learning (deep neural networks) has revolutionized certain fields (vision, NLP) but requires more data and resources. ML fundamentals remain essential for understanding DL.

The future:

  • AutoML: automation of the ML pipeline
  • Transfer Learning: reuse pre-trained models
  • Federated Learning: train without centralizing data
  • Explainable AI: make models interpretable
  • Quantum ML: leverage quantum computers

These Machine Learning fundamentals enable us to design "smarter" systems, capable of learning and adapting, a skill that has become essential in nearly every field of modern engineering.


Course Documents


Course taken in 2023-2024 at INSA Toulouse, Department of Electrical and Computer Engineering.