PhD Research Seminar: Machine learning for sequence mining, error correction for non-native speakers

Мероприятие завершено
When: May 24, 18:10–19:30 

First talk: Interpretability and Effectiveness of Machine Learning Methods for Sequence Mining in Various Domains      
Speaker: Anna Muratova, third-year PhD student, Faculty of Computer Science

Results of comparing machine learning methods for analyzing demographic data will be presented. Methods such as decision trees, SVMs, recurrent and convolutional neural networks are considered. The comparison takes into account both the accuracy of predicting events and the interpretability of the results.

The results of applying a neural network model to predict ratings in the Movielens 1M dataset will also be presented. An experimental assessment of the influence of various features, in particular, movie titles, on the accuracy of prediction will be given.

Second talk: Language Model for Error Correction in Russian Texts by Non-native Speakers  
Speaker: Nikita Remnev, third-year PhD student, Faculty of Computer Science

With tens and hundreds of new words appearing every day, spellchecking and autocorrection become issues of ever-growing importance for both end users and NLP systems developers. However, spelling correction techniques for languages with rich morphology such as Russian are still poorly covered in academic literature. There are also errors specific to people who learn the language or by so-called heritage language speakers. For the first category, the Russian language is not native, while those in the second category began to learn it as a first language in childhood, but, for various reasons, they use another language as a communication language. Their errors are usually not handled by modern spellcheckers.

The first part of the talk will briefly overview current results. We will discuss our approch based on a language model that also makes use of Symspell, RuBERT, and other existing techniques and algorithms. We will  discuss our results on the RULEC-GEC corpus and compare them with thoses achieved by other approaches. 

In the second part of the talk, we will speak about orthographic and contextual error correction approaches in detail and discuss problems in grammatical error correction of Russian texts written by non-native speakers.