ARRAU

Anaphora resolution and underspecification

Corpus

The ARRAU Corpus is a corpus annotated for anaphoric information and focusing in particular on the ‘difficult’ cases of anaphora: plural anaphora, anaphora to abstract objects, and ambiguous anaphoric expressions.

Papers

See the Publications page.

Guidelines

Two coding manuals were written, one for the spoken dialogue data, one for the text data:

Availability

The ARRAU Corpus is available as follows:

  • from the LDC (here), except for the GNOME domain data, that are directly distributed from the authors (contact: poesio@gmail.com);
  • from the authors, to any requester who can show they have purchased the Penn Treebank and TRAINS-93 from the LDC (contact: poesio@gmail.com);
%d bloggers like this: