Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses

Until now, grammatical error correction (GEC) has been primarily evaluated on text written by non-native English speakers, with a focus on student essays. This paper enables GEC development on text written by native speakers by providing a new data set and metric. We present a...

Full description

Bibliographic Details
Main Authors:	Napoles, Courtney, Nădejde, Maria, Tetreault, Joel
Format:	Article
Language:	English
Published:	The MIT Press 2019-11-01
Series:	Transactions of the Association for Computational Linguistics
Online Access:	https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00282

Description
Summary:	Until now, grammatical error correction (GEC) has been primarily evaluated on text written by non-native English speakers, with a focus on student essays. This paper enables GEC development on text written by native speakers by providing a new data set and metric. We present a multiple-reference test corpus for GEC that includes 4,000 sentences in two new domains ( formal and informal writing by native English speakers) and 2,000 sentences from a diverse set of non-native student writing. We also collect human judgments of several GEC systems on this new test set and perform a meta-evaluation, assessing how reliable automatic metrics are across these domains. We find that commonly used GEC metrics have inconsistent performance across domains, and therefore we propose a new ensemble metric that is robust on all three domains of text.
ISSN:	2307-387X

Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses

Similar Items