SLICE: Supersense-based Lightweight Interpretable Contextual Embeddings

Cindy Aloui and Alexis Nasr and Lucie Barque and Carlos Ramisch

This package contains the datasets used in the experiments of the submitted paper "SLICE: Supersense-based Lightweight Interpretable Contextual Embeddings". SLICE is a hybrid model that combines supersense labels with contextual embeddings. We introduce a weakly supervised method to learn interpretable embeddings from raw corpora and a small lists of seed words. Our model is able to represent both a word and its context as embeddings into the same compact space, whose dimensions correspond to interpretable supersenses.

The data and code can be downloaded here: slice-data-scripts-20201101.zip

This package contains :

If you use SLICE, please cite the following paper:

@InProceedings{aloui-etAl-2020:coling,
  authors = "Cindy Aloui and Alexis Nasr and Lucie Barque and Carlos Ramisch",
  title = "SLICE: Supersense-based Lightweight Interpretable Contextual Embeddings",
  booktitle = "28th International Conference on Computational Linguistics (COLING 2020)",
  year = "2020",
  publisher = "ICCL",  
}

Seed lists

Evaluation corpus for WSD

Lexical and context signatures

  1. sentence ID
  2. noun lemma
  3. reference class (1=ANI, 2=NAT, 3=MAN, 4=INF, 5=DYN, 6=STA)
  4. 6 contextual scores sorted in the order above
  5. 6 lexical scores (from lexsignatures) sorted in the order above

Code

Requires Python3 + tensorflow, keras, HuggingFace's transformers, conllu and py torch, all installable via pip3. Run the scripts without any argument for help.