Information extraction to facilitate translation of natural language legislation

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011. === Cataloged from PDF version of thesis. === Includes bibliographical references (p. 65-66). === There is a large body of existing legislation and policies that govern how government or...

Full description

Bibliographic Details
Main Author: Wang, Samuel (Samuel Siyue)
Other Authors: Hal Abelson.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2011
Subjects:
Online Access:http://hdl.handle.net/1721.1/64598
id ndltd-MIT-oai-dspace.mit.edu-1721.1-64598
record_format oai_dc
spelling ndltd-MIT-oai-dspace.mit.edu-1721.1-645982019-05-02T16:26:36Z Information extraction to facilitate translation of natural language legislation Wang, Samuel (Samuel Siyue) Hal Abelson. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011. Cataloged from PDF version of thesis. Includes bibliographical references (p. 65-66). There is a large body of existing legislation and policies that govern how government organizations and corporations can share information. Since these rules are generally expressed in natural language, it is difficult and labor intensive to verify whether or not data sharing events are compliant with the relevant policies. This work aims to develop a natural language processing framework that automates significant portions of this translation process, so legal policies are more accessible to existing automated reasoning systems. Even though these laws are expressed in natural language, for this very specific domain, only a handful of sentence structures are actually used to convey logic. This structure can be exploited so that the program can automatically detect who the actor, action, object, and conditions are for each rule. In addition, once the structure of a rule is identified, similar rules can be presented to the user. If integrated into an authoring environment, this will allow the user to reuse previously translated rules as templates to translate novel rules more easily, independent of the target language for translation. A body of 315 real-world rules from 12 legal sources was collected and annotated for this project. Cross-validation experiments were conducted on this annotated data set, and the developed system was successful in identifying the underlying rule structure 43% of the time, and annotating the underlying tokens with recall of .66 and precision of .66. In addition, for 70% of the rules in each test set, the underlying rule structure had been seen in the training set. This suggests that the hypothesis that rules can only be expressed in a limited number of ways is probable. by Samuel Wang. S.M. 2011-06-20T15:57:58Z 2011-06-20T15:57:58Z 2011 2011 Thesis http://hdl.handle.net/1721.1/64598 727067697 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 66 p. application/pdf Massachusetts Institute of Technology
collection NDLTD
language English
format Others
sources NDLTD
topic Electrical Engineering and Computer Science.
spellingShingle Electrical Engineering and Computer Science.
Wang, Samuel (Samuel Siyue)
Information extraction to facilitate translation of natural language legislation
description Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011. === Cataloged from PDF version of thesis. === Includes bibliographical references (p. 65-66). === There is a large body of existing legislation and policies that govern how government organizations and corporations can share information. Since these rules are generally expressed in natural language, it is difficult and labor intensive to verify whether or not data sharing events are compliant with the relevant policies. This work aims to develop a natural language processing framework that automates significant portions of this translation process, so legal policies are more accessible to existing automated reasoning systems. Even though these laws are expressed in natural language, for this very specific domain, only a handful of sentence structures are actually used to convey logic. This structure can be exploited so that the program can automatically detect who the actor, action, object, and conditions are for each rule. In addition, once the structure of a rule is identified, similar rules can be presented to the user. If integrated into an authoring environment, this will allow the user to reuse previously translated rules as templates to translate novel rules more easily, independent of the target language for translation. A body of 315 real-world rules from 12 legal sources was collected and annotated for this project. Cross-validation experiments were conducted on this annotated data set, and the developed system was successful in identifying the underlying rule structure 43% of the time, and annotating the underlying tokens with recall of .66 and precision of .66. In addition, for 70% of the rules in each test set, the underlying rule structure had been seen in the training set. This suggests that the hypothesis that rules can only be expressed in a limited number of ways is probable. === by Samuel Wang. === S.M.
author2 Hal Abelson.
author_facet Hal Abelson.
Wang, Samuel (Samuel Siyue)
author Wang, Samuel (Samuel Siyue)
author_sort Wang, Samuel (Samuel Siyue)
title Information extraction to facilitate translation of natural language legislation
title_short Information extraction to facilitate translation of natural language legislation
title_full Information extraction to facilitate translation of natural language legislation
title_fullStr Information extraction to facilitate translation of natural language legislation
title_full_unstemmed Information extraction to facilitate translation of natural language legislation
title_sort information extraction to facilitate translation of natural language legislation
publisher Massachusetts Institute of Technology
publishDate 2011
url http://hdl.handle.net/1721.1/64598
work_keys_str_mv AT wangsamuelsamuelsiyue informationextractiontofacilitatetranslationofnaturallanguagelegislation
_version_ 1719040964581916672