Machine Learning

Approches d'apprentissage hybrides pour les applications industrielles

Published on

Authors: Vincent Laurent

In an industrial context, questions regarding the applicability of machine learning methods are central because, in many cases, the low quantity or quality of data, particularly labels, limits their use in real-world conditions, for example, with a view to industrialization.For this thesis project, several themes have emerged from research projects linked to the French rail network operator and the French electricity transmission network operator. The first is active learning, which may provide a solution to the low quantity of labels and/or design of experiments issues. For this issue, we address a bipartite ranking problem, i.e., where the target variable is binary and we seek to order the instances according to the probability of a positive outcome. This is a scenario of interest for industrial asset management, where we seek to order equipment according to its potential for failure. For this approach, we create an algorithm based on an extension of K-armed bandit problems. We demonstrate that the algorithm is Probably Approximately Correct and calculate an upper and lower bound for this algorithm, which is the first to tackle this type of problem. A series of empirical experiments are then proposed, confirming the theoretical results. In this regard, it is shown that the global nature of the ranking problem completely changes the sampling procedure.The second topic addressed is hybrid modeling. By nature, this research has always been at the interface with physics, particularly on issues of equipment fatigue. To model the fatigue of materials over a large number of cycles, we adopt a Markovian approach that allows us to estimate the service life of equipment without labels, using only empirical laws of failure cycles (SN curves). To this end, we combine estimates of survival laws and Markov chain parameters with physical models coupled with surrogate models. These physical models allow us to link a vibration state to a stress state.Another characteristic of industrial data is often the poor quality of labels, which is also a characteristic of fatigue defects, as advanced fatigue, which manifests itself in microcracks, is not always measurable. We therefore propose a Bayesian approach to estimate the probabilities of label inversion due to necessarily imprecise measurement methods.In the final section of this thesis, the issue of automated machine learning is addressed. The methods described in the literature often focus on hyperparameter search, whereas the challenges surrounding learning methods lie more in the integration of these solutions. Experience in the industry shows that the implementation of machine learning tools very often encounters methodological problems, often related to data, such as generalization issues, distribution drifts, model fairness issues, data leakage problems, etc. On this topic, we propose a few avenues for reflection, along with a Python library that implements a version of automated learning that is more relevant to these issues, with the aim of offering a toolbox for the wider public.