Multi-word Expressions (MWEs) are sequences of two or more words, which are used together frequently (e.g., vaikų darželis (kindergarden), laikas bėga (literally *time is running, time is passing by), daryti spaudimą (*make pressure, put pressure on). Often the meaning of such sequence is not predictable from the meaning of its component words (pakišti koją (literaly to make sb trip, figuratively to prevent someone from succeeding in sth), pralaužti ledus (to break the ice in the figurative sense)).
Analysis of MWE is a very important topic in theoretical, comparative, and applied linguistics, as well as natural language processing and language technologies. MWEs constitute more than 40% of any text in almost any natural language. Therefore the analysis of MWEs is important to any natural language. Although the number of studies on MWEs has recently increased, it is generally acknowledged that descriptions of MWEs in lexicographic resources are not sufficient; the methodology of automatic identification of MWEs also raises many questions.
The goal of this project is to create a methodology for analysing Lithuanian MWEs by creating or adapting necessary tools and resources. Corpus linguistics, computational linguistics, and machine learning methods will be applied.
The main results of the project are the following: the methodology of identification of MWEs, tools for MWE extraction, a database of Lithuanian MWEs with multifunctional search options, a corpus-based dictionary of Lithuanian collocations, scientific and popular publications based on the project data, and papers presented in national and international conferences. The duration of the project: 2016-2018.