Anaphora resolution and underspecification


The ARRAU Corpus is a corpus annotated for anaphoric information and focusing in particular on the ‘difficult’ cases of anaphora: plural anaphora, anaphora to abstract objects, and ambiguous anaphoric expressions.


See the Publications page.


Two coding manuals were written, one for the spoken dialogue data, one for the text data:


The ARRAU Corpus is available as follows:

  • from the LDC (here), except for the GNOME domain data, that are directly distributed from the authors (contact:;
  • from the authors, to any requester who can show they have purchased the Penn Treebank and TRAINS-93 from the LDC (contact:;
%d bloggers like this: