Software and resources


A geenric tool for the automatic discovery of Multiword Expressions in corpora developed by Carlos Ramisch, Silvio Cordeiro, Manon Sholivet, Vitor de Araújo, and Sandra Castellanos in collaboration with Aline Villavicencio.


SLICE is a model to build Supersense-based Lightweight Interpretable Contextual Embeddings. It was developed by Cindy Aloui under the supervision of Alexis Nasr, Lucie Barque and Carlos Ramisch.


A neural network tagger for MWE identification developed by Nicolas Zampieri based on an initial prototybe developed by Manon Scholivet. Both authors were advised by Carlos Ramisch during their internships.


A system for the identification of MWE variants developed by Caroline Pasquer, under the supervision of Agata Savary, Jean-Yves Antoine and Carlos Ramisch.

PARSEME Shared task corpora

A collection of multilingual corpora annotated with verbal MWEs. These corpora were prepared, annotated and released by multiple authors of the PARSEME community. The coordination of edition 1.1 of the shared task was carried out by Agata Savary, Silvio Cordeiro, Veronika Vincze and Carlos Ramisch.

Dataset - Cross-lingual UD parsing using the WALS

Corpora and features for cross-lingual UD parsing using the WALS, created by Manon Scholivet. This dataset was used for the NAACL 2019 paper "Typological Features for Multilingual Delexicalised Dependency Parsing".

Dataset - compositionality of nominal compounds

Lists of nominal compounds in English, French and Portuguese annotated for compositionality on a numerical scale. The datasets were prepared, annotated and released by Silvio Cordeiro, Aline Villavicencio, Marco Idiart, Carlos Ramisch and numerous anonymous annotators.

Dataset - MORPH: French complex function words

Sentences containing ambiguous de+DET and ADV+que French constructions, annotated for their MWE status by André Valli and José Deulofeu in collaboration with Alexis Nasr and Carlos Ramisch.

Dataset - comparison of MWE acquisition

Datasets used to compare MWE discovery tools. Prepared by Vitor de Araújo and Carlos Ramisch.


A MINImalist multi-threaded tool in C for building standard distributional seMANTICS models developed by Carlos Ramisch.

Grammar Editor

A Java program to manipulate context-free grammars - developed to an undergrad course on formal languages.