Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses

Until now, grammatical error correction (GEC) has been primarily evaluated on text written by non-native English speakers, with a focus on student essays. This paper enables GEC development on text written by native speakers by providing a new data set and metric. We present a...

Full description

Bibliographic Details
Main Authors: Napoles, Courtney, Nădejde, Maria, Tetreault, Joel
Format: Article
Language:English
Published: The MIT Press 2019-11-01
Series:Transactions of the Association for Computational Linguistics
Online Access:https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00282
id doaj-e7dd22a14a4b43da9beba155d8c79e7b
record_format Article
spelling doaj-e7dd22a14a4b43da9beba155d8c79e7b2020-11-25T03:25:18ZengThe MIT PressTransactions of the Association for Computational Linguistics2307-387X2019-11-01755156610.1162/tacl_a_00282Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and AnalysesNapoles, CourtneyNădejde, MariaTetreault, Joel Until now, grammatical error correction (GEC) has been primarily evaluated on text written by non-native English speakers, with a focus on student essays. This paper enables GEC development on text written by native speakers by providing a new data set and metric. We present a multiple-reference test corpus for GEC that includes 4,000 sentences in two new domains ( formal and informal writing by native English speakers) and 2,000 sentences from a diverse set of non-native student writing. We also collect human judgments of several GEC systems on this new test set and perform a meta-evaluation, assessing how reliable automatic metrics are across these domains. We find that commonly used GEC metrics have inconsistent performance across domains, and therefore we propose a new ensemble metric that is robust on all three domains of text. https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00282
collection DOAJ
language English
format Article
sources DOAJ
author Napoles, Courtney
Nădejde, Maria
Tetreault, Joel
spellingShingle Napoles, Courtney
Nădejde, Maria
Tetreault, Joel
Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses
Transactions of the Association for Computational Linguistics
author_facet Napoles, Courtney
Nădejde, Maria
Tetreault, Joel
author_sort Napoles, Courtney
title Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses
title_short Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses
title_full Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses
title_fullStr Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses
title_full_unstemmed Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses
title_sort enabling robust grammatical error correction in new domains: data sets, metrics, and analyses
publisher The MIT Press
series Transactions of the Association for Computational Linguistics
issn 2307-387X
publishDate 2019-11-01
description Until now, grammatical error correction (GEC) has been primarily evaluated on text written by non-native English speakers, with a focus on student essays. This paper enables GEC development on text written by native speakers by providing a new data set and metric. We present a multiple-reference test corpus for GEC that includes 4,000 sentences in two new domains ( formal and informal writing by native English speakers) and 2,000 sentences from a diverse set of non-native student writing. We also collect human judgments of several GEC systems on this new test set and perform a meta-evaluation, assessing how reliable automatic metrics are across these domains. We find that commonly used GEC metrics have inconsistent performance across domains, and therefore we propose a new ensemble metric that is robust on all three domains of text.
url https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00282
work_keys_str_mv AT napolescourtney enablingrobustgrammaticalerrorcorrectioninnewdomainsdatasetsmetricsandanalyses
AT nadejdemaria enablingrobustgrammaticalerrorcorrectioninnewdomainsdatasetsmetricsandanalyses
AT tetreaultjoel enablingrobustgrammaticalerrorcorrectioninnewdomainsdatasetsmetricsandanalyses
_version_ 1724597705712336896