Research Data Scientist (postdoc)

The researcher will participate to an exciting research project in collaboration with Predictive Layer. Predictive Layer is specialized into time series forecasting, using artificial intelligence and machine-learning to solve business problems. It processes its clients’ data with exogenous data to make near-term forecasts. The systems to be developed under the research projects are based on forecasting time series data. Fluctuations and trends of financial target signals result from strong dynamic interactions with the external environment whose underlying factors are constantly changing in their relative contribution. As trading strategies require to adapt to changing market conditions and scale with increasing allocation of computing resources under strict time execution constraints, new research and algorithms need to be developed to model dynamically the final structure of the deep neural network (DNN) – based predictive engines. 
Nowadays, Predictive Layer delivers solutions mainly based on the optimization of Ensemble Trees and static neural network architectures upon processing prior information from available time series datasets. The core algorithm needs to work on large set of financial data under strict time constraints. The novel DNN architecture will select information and shape dynamically compared to traditional methods in order to achieve high performances and stability over time and ensure sufficient model diversity in the final decision process. The efficient design of an adaptive neural network architecture is extremely challenging from both scientific and engineering point of views due to the inherent complexity and ill-conditioning of the problem, the combination of multiple models and the execution time constraints. These news methods will be implemented and tested in collaboration with Predictive Layer.
  • PhD in data science or computer science, mathematics, physics or related field (possibly with a strong background in data science and machine learning algorithms) with an excellent academic and publication record
  • Very good knowledge of Python
  • Excellent writing and verbal communication skills, as well as presentation skills. Besides proficiency in English, creativity, innovative and independent thinking is a must. He shows motivation to collaborate in an interdisciplinary international team, to participate in training programs, and is willing to travel to present his work to international conferences
Activity: 100%
Starting date: when available
Contract:18 months with a possible extension at Predictive Layer after the end of the contract
Applications, including a resume, a list of publications, and the name of at least three references (physical and email addresses, phones numbers) should be sent as soon as possible to  (.pdf, .ps, MS Word or plain text). Applications will be handled confidentially.

Internships and Master Projects

I am happy to give internships (paid) and Master projects (min. 6 months, preferably mathematicians with very good skills in CS) to motivated non-HEIG-Vd students. If you do want to do an internship or a Master project with me, please send me a resume with your academic records and work or project experience, and explain me your motivation.

Data mining for emergencies

Within the framework of a new European Interreg project, our research team has been entrusted with the mission of setting up a new artificial intelligence software for medical decision support that will allow the regulation centers (144 for example) to adapt « very quickly » to each situation. The trainee will be integrated in a project that consists in finding patterns in our dataset (emergencies recorded at the hospital). Thus, we will try to answer the following questions: Which parameter has an impact on the number of emergencies? Which parameter has an impact on the number of P1/P2/P3 emergencies? Which parameters have an impact on the emergence of risk states (which still need to be rigorously defined)? More specifically, we would like to find and quantify the influence of certain parameters. This problem will be very valuable for the following aspects: 1. For example, it is obvious that heavy traffic conditions have an influence on the number of incidents. But to what extent? Moreover, if we find an unexpected pattern, it could be very useful for the Emergency Medical Services (EMS), because one has to take into account these parameters for the development of the software. 2. This information could also be very useful for a feature selection (it is possible that our model would be better using a well-chosen feature selection). But to do so, we need to choose the most relevant model. 3. To make our predictions, we make extensive use of deep learning. However, the black box nature of this approach raises the question of the reliability of our predictions. If we are able to detect certain factors that affect the number of incidents, we could test whether changing this parameter in the input data changes the output as expected. Such behavior will be desired, as it will suggest that our model has learned a relevant pattern. 4. Finally, it is worth mentioning that more than one person could work on this problem. For example, one person could work on finding the parameters that impact the number of incidents and another could work on the emergences of risky states.

Management of real data

Within the framework of a new European Interreg project, our research team has been entrusted with the mission of setting up a new artificial intelligence software for medical decision support that will allow the regulation centers (144 for example) to adapt « very quickly » to each situation. The trainee will be integrated in a project where we work with real data. Although this project is really exciting, it also comes with some challenges. In particular, how do we deal with missing data? For example, in the data provided by MeteoSuisse, sometimes a sensor does not work for a while. Obviously, we could replace it with the latest measured data, but since a sensor can be down for a long time, this does not seem to be a good approach. So how do we deal with this missing data? Could we generate synthetic data to replace this missing data? What is the best strategy? The project will consist to set up a clean dataset which will be very useful for us.

Safe RL

The intern will be integrated into a reinforcement learning (RL) project, in which we choose to attack the real time allocation problem using an offline risk averse RL approach, which is a special case of safe RL. We could use other reinforcement learning techniques to solve our problem. In particular, we found a very interesting paper (Garrett Thomas, Yuping Luo, and Tengyu Ma. Safe reinforcement learning by imagining the near future. Advances in Neural Information Processing Systems, 34, 2021). In this paper, the author presents SMBPO, an algorithm that penalizes unsafe paths. With the proposed approach, they are able to train a policy that avoids risky states. We could try to implement this algorithm in our real-time resource allocation problem. However there will be some challenges: how to define a risky state? How to detect them in our context? And then there is always the same problem. SMBPO works well with a small representation of the environment, but how can we extend this algorithm to handle high dimensional states? The aim of the project is to answer these questions

Safe imitation learning 

Within the framework of a new European Interreg project, our research team has been entrusted with the mission of setting up a new artificial intelligence software for medical decision support that will allow the regulation centers (144 for example) to adapt « very quickly » to each situation. If we have a dataset of experts, we can use a truly natural approach and train a model to imitate the expert’s behavior. This approach is called imitation learning or behavior cloning. However, in general, imitation learning does not care about safety. For our problem, safety is really important. Therefore, we need to train our imitation agent with some safety parameters in mind. This approach is called safe imitation learning.

Convergence of Markov Chains to self-similar Processes

Self-similar processes are stochastic processes that are invariant in distribution under suitable scaling of time and space. These processes can be used to model many space-time scaling random phenomena that can be observed in physics, biology and other fields. One could mention stellar fragments, growth and genealogy of populations, option pricing in finance, various areas of image processing, climatology, environmental science, . . . Self-similar processes appear in various parts of probability theory, such as in Lévy processes, branching processes, statistical physics, fragmentation theory, random fields, … Some well known examples are: stable Lévy process, fractional Brownian motion,. . . but it has also been shown that relatively simple Markov models can produce self-similarity. Even though the cardinality of the state space increases to infinity, it has also been shown that its rate is quite low. The aim of the study is to prove how theses Markov models converge to self-similar limit processes, under which conditions.

Small Datasets Reinforcement Learning for Emergencies

The intern will be incorporated in a project, which aims to find optimal solutions to guide and involve the most appropriate vector (helicopter, ambulance) in the face of an emergency, accident or disaster. For moderately severe cases, it is often preferable to use a vector that is not the closest, but the one which minimizes the redistribution of the vectors on the territory. We believe that machine learning and more precisely reinforcement learning could provide an interesting framework to solve this problem. Unfortunately, reinforcement learning needs millions of samples to achieve good performance and we only have access to a few hundreds of thousands. The intern will explore methods, which aims to use reinforcement learning with a (reasonably) small dataset. The precise project will be determined with the intern according to his background and ideas.