Neřízená závistlostní analýza

Unsupervised dependency parsing is an alternative approach to identifying relations between words in a sentence. It does not require any annotated treebank, it is independent of language theory and universal across languages. However, its main disadvantage is its so far quite low parsing quality. Th...

Full description

Bibliographic Details
Main Author: Mareček, David
Other Authors: Žabokrtský, Zdeněk
Format: Doctoral Thesis
Language:English
Published: 2012
Online Access:http://www.nusl.cz/ntk/nusl-306310
id ndltd-nusl.cz-oai-invenio.nusl.cz-306310
record_format oai_dc
spelling ndltd-nusl.cz-oai-invenio.nusl.cz-3063102021-03-29T05:12:13Z Neřízená závistlostní analýza Unsupervised Dependency Parsing Mareček, David Žabokrtský, Zdeněk Jurčíček, Filip Sogaard, Anders Unsupervised dependency parsing is an alternative approach to identifying relations between words in a sentence. It does not require any annotated treebank, it is independent of language theory and universal across languages. However, its main disadvantage is its so far quite low parsing quality. This thesis discusses some previous works and introduces a novel approach to unsupervised parsing. Our dependency model consists of four submodels: (i) edge model, which controls the distribution of governor-dependent pairs, (ii) fertility model, which controls the number of node's dependents, (iii) distance model, which controls the length of the dependency edges, and (iv) reducibility model. The reducibility model is based on a hypothesis that words that can be removed from a sentence without violating its grammaticality are leaves in the dependency tree. Induction of the dependency structures is done using Gibbs sampling method. We introduce a sampling algorithm that keeps the dependency trees projective, which is a very valuable constraint. In our experiments across 30 languages, we discuss the results of various settings of our models. Our method outperforms the previously reported results on a majority of the test languages. 2012 info:eu-repo/semantics/doctoralThesis http://www.nusl.cz/ntk/nusl-306310 eng info:eu-repo/semantics/restrictedAccess
collection NDLTD
language English
format Doctoral Thesis
sources NDLTD
description Unsupervised dependency parsing is an alternative approach to identifying relations between words in a sentence. It does not require any annotated treebank, it is independent of language theory and universal across languages. However, its main disadvantage is its so far quite low parsing quality. This thesis discusses some previous works and introduces a novel approach to unsupervised parsing. Our dependency model consists of four submodels: (i) edge model, which controls the distribution of governor-dependent pairs, (ii) fertility model, which controls the number of node's dependents, (iii) distance model, which controls the length of the dependency edges, and (iv) reducibility model. The reducibility model is based on a hypothesis that words that can be removed from a sentence without violating its grammaticality are leaves in the dependency tree. Induction of the dependency structures is done using Gibbs sampling method. We introduce a sampling algorithm that keeps the dependency trees projective, which is a very valuable constraint. In our experiments across 30 languages, we discuss the results of various settings of our models. Our method outperforms the previously reported results on a majority of the test languages.
author2 Žabokrtský, Zdeněk
author_facet Žabokrtský, Zdeněk
Mareček, David
author Mareček, David
spellingShingle Mareček, David
Neřízená závistlostní analýza
author_sort Mareček, David
title Neřízená závistlostní analýza
title_short Neřízená závistlostní analýza
title_full Neřízená závistlostní analýza
title_fullStr Neřízená závistlostní analýza
title_full_unstemmed Neřízená závistlostní analýza
title_sort neřízená závistlostní analýza
publishDate 2012
url http://www.nusl.cz/ntk/nusl-306310
work_keys_str_mv AT marecekdavid nerizenazavistlostnianalyza
AT marecekdavid unsuperviseddependencyparsing
_version_ 1719389274037551104