PhD Research Seminar: Error correction for non-native speakers, overestimation in continous control, storage system simulators and change-point detection
First talk: Methods for Automated Error Detection and Correction in Russian Texts by Non-native Speakers
Speaker: Nikita Remnev, second-year PhD student, Faculty of Computer Science
With tens and hundreds of new words appearing every day, spellchecking and autocorrection become issues of ever-growing importance for both end users and NLP systems developers. However, spelling correction techniques for languages with rich morphology such as Russian are still poorly covered in academic literature. There are also errors specific to people who learn the language or by so-called heritage language speakers. For the first category, the Russian language is not native, while those in the second category began to learn it as a first language in childhood, but, for various reasons, they use another language as a communication language. Their errors are usually not handled by modern spellcheckers.
The first part of the talk will briefly overview current results. We will discuss a number of methods that can be used togehter for error detection and correction, namely, blacklists and pre-compiled dictionaries, word2vec models, N-gram language models, and tripartite error models.
In the second part of the talk, we will address several important issues: Russian Learner Corpus as a source of errors made by non-native speakers, the Native Language Identification problem for Russian, and problems with developing spellchecker for Russian.
Second talk: The Problem of Overestimation and Frequency-Domain Approach to Continuous Control
Speaker: Pavel Shvechikov, third-year PhD student, Faculty of Computer Science
Continuous control is an important field used by many practitioners for robotic control, networking scheduling and caching, marketing and auctions. However, planning-based continuous control systems usually rely on the known dynamics and cost function. In practice, this knowledge is unrealistic even for highly structured environments, such as robotic hand manipulators. Complicated graspings of small and fragile objects, as well as generaliazation to unseen objects, are a big challenge nowadays. In the talk, we will discuss one of the impediments to reliable optimization of the functions learned from data: the overestimation bias. In the second part of the talk, I will present an ongoing project concerning the frequency-domain view of the continuous reinforcement learning control.
Third talk: Storage System Simulators and Change-Point Detection
Speaker: Kenenbek Arzymatov, third-year PhD student, Faculty of Computer Science
The first part of the talk will be dedicated to the experience of developing a Go-based package for simulating the behavior of modern storage infrastructure with the support of Reinforcement Learning. The software is based on the discrete-event modeling paradigm and captures the structure and dynamics of high-level storage system building blocks.
The second part will be about online generalization of change-point detection in time series data. The goal of the change-point detection is to discover changes of time series distribution. One of the state-of-the-art approaches to change-point detection are based on direct density ratio estimation. In this work, I will show how existing algorithms can be changed to work with online stream data.
Арзыматов Кененбек
Научно-учебная лаборатория методов анализа больших данных: Стажер-исследователь
Ремнев Никита Валерьевич
Швечиков Павел Дмитриевич
Аспирант факультета компьютерных наук