Starved neural learning : Morpheme segmentation using low amounts of data

Automatic morpheme segmentation as a field has been dominated by unsupervised methods since its inception. Partly due to theoretical motivations, but also due to resource constraints. Given the success neural network methods have shown on a wide variety of field in later years, it would seem compell...

Full description

Bibliographic Details
Main Author: Persson, Peter
Format: Others
Language:English
Published: Stockholms universitet, Avdelningen för datorlingvistik 2018
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-160953
Description
Summary:Automatic morpheme segmentation as a field has been dominated by unsupervised methods since its inception. Partly due to theoretical motivations, but also due to resource constraints. Given the success neural network methods have shown on a wide variety of field in later years, it would seem compelling to apply these methods to the morpheme segmentation field. This study explores the efficacy of modern neural networks, specifically convolutional neural networks and Bi-directional LSTM networks, on the morpheme segmentation task in a resource low setting to determine their viability as contenders with previous unsupervised, minimally supervised, and semi-supervised systems in the field. One architecture of each type is implemented and trained on a new gold standard data set and the results are compared to previously established methods. A qualitative error analysis of the architectures’ segmentations is also performed. The study demonstrates that a BLSTM system can be trained with minimal effort to produce a proof of concept solution at low levels of training data and suggests that BLSTM methods may be a fruitful direction for further research in this field.