SummerFest '09 Summer School: Anaphora Resolution

Course Description

Overview

Identifying which nominal phrases in a discourse (mentions) refer to the same (discourse) entity is a key ingredient of human natural language understanding ability and is becoming an increasingly important area of research in Computational Linguistics, which is currently moving towards high-level linguistic tasks requiring NLU capabilities. This tutorial aims at providing the CL / NLP community with a gentle introduction to the task of anaphora resolution from both a theoretical and an application-oriented perspective. Its main purposes are: (1) to introduce a general audience of NLP researchers to the core ideas underlying state-of-the-art computational models of anaphora resolution; (2) to provide that same audience with an overview of NLP applications which can benefit from anaphora information.

Outline

The tutorial is divided in four main parts:

1. Introduction

We start by introducing the task of anaphora resolution, which resources are available to study it, and the techniques for evaluating anaphora resolution systems

2. Machine learning approaches to anaphora resolution.

We begin with the work of Soon et al. (2001) and Ng & Cardie (2002). We then analyze the main limitations of these approaches, i.e. their clustering of mentions from a local pairwise classification of nominal phrases in text. We finally move on to present more complex models which attempt to model coreference as a global discourse phenomenon (Yang et al., 2003; Luo et al., 2004; Daume & Marcu, 2005; inter alia).

3. Lexical and encyclopedic knowledge for coreference resolution.

Resolving anaphors to their correct antecedents requires in many cases lexical and encyclopedic knowledge. We accordingly introduce approaches which attempt to include semantic information into the coreference models from a variety of knowledge sources, e.g. WordNet (Harabagiu et al., 2001), Wikipedia (Ponzetto & Strube, 2006) and automatically harvested patterns (Poesio et al., 2002; Markert & Nissim, 2005; Yang & Su, 2007).

4. Applications and future directions.

We present an overview of NLP applications which have been shown to profit from coreference information, e.g. question answering and summarization. We conclude with remarks on future work directions. These include: a) bringing together approaches to coreference using semantic information with global discourse modeling techniques; b) exploring novel application scenarios which could potentially benefit from coreference resolution, e.g. relation extraction and extracting events and event chains from text.

Target audience

This tutorial is designed for students and researchers in ComputerScience and Computational Linguistics. No prior knowledge of coreference topics is assumed.

Presenter

Professor Massimo Poesio

Presenter Biography

Massimo Poesio

Massimo Poesio is Chair in Humanities Computing at the University of Trento and Director of the Language Interaction and Computation Lab at the Center for Mind / Brain Sciences. He has been involved in all aspects of research in anaphora resolution, including the study of agreement on anaphoric judgments; the creation of anaphorically annotated resources such as the GNOME and ARRAU corpora; and the development of state-of-the-art algorithms such as the Viera / Poesio algorithm (2000), and of anaphora resolution systems such as GUITAR, which have been applied to tasks such as summarization and information extraction. He coordinated the 2007 Johns Hopkins Workshop on Using Lexical and Encyclopedic Knowledge for Entity Disambiguation, that led to the development of the BART system.