A system for document analysis, translation, and automatic hypertext linking
A digital library database is a heterogeneous collection of documents. Documents may become available in different formats (e.g., ASCII, SGML, typesetter languages) and they may have to be translated to a standard document representation scheme used by the digital library. This work focuses on the...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | en |
Published: |
Virginia Tech
2014
|
Subjects: | |
Online Access: | http://hdl.handle.net/10919/43809 http://scholar.lib.vt.edu/theses/available/etd-07212009-040529/ |
id |
ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-43809 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-438092021-05-26T05:48:42Z A system for document analysis, translation, and automatic hypertext linking Averboch, Guillermo Andres Computer Science and Applications Heath, Lenwood S. Fox, Edward A. Arthur, James D. formats computer language Database LD5655.V855 1995.A992 A digital library database is a heterogeneous collection of documents. Documents may become available in different formats (e.g., ASCII, SGML, typesetter languages) and they may have to be translated to a standard document representation scheme used by the digital library. This work focuses on the design of a framework that can be used to convert text documents in any format to equivalent documents in different formats and, in particular, to SGML (Standard Generalized Markup Language). In addition, the framework must be able to extract information about the analyzed documents, store that information in a permanent database, and construct hypertext links between documents and the information contained in that database and between the document themselves. For example, information about the author of a document could be extracted and stored in the database. A link can then be established between the document and the information about its author and from there to other documents by the same author. These tasks must be performed without any human intervention, even at the risk of making a small number of mistakes. To accomplish these goals we developed a language called DELTO (Description Language for Textual Objects) that can be used to describe a document format. Given a description for a particular format, our system is able to extract information from documents in that format, to store part of that information in a permanent database, and to use that information in constructing an abstract representation of those documents that can be used to generate equivalent documents in different formats. The system originated from this work is used for constructing the database of Envision, a Virginia Tech digital library research project. Master of Science 2014-03-14T21:40:48Z 2014-03-14T21:40:48Z 1995-06-05 2009-07-21 2009-07-21 2009-07-21 Thesis Text etd-07212009-040529 http://hdl.handle.net/10919/43809 http://scholar.lib.vt.edu/theses/available/etd-07212009-040529/ en OCLC# 34376883 LD5655.V855_1995.A992.pdf In Copyright http://rightsstatements.org/vocab/InC/1.0/ xii, 226 leaves BTD application/pdf application/pdf Virginia Tech |
collection |
NDLTD |
language |
en |
format |
Others
|
sources |
NDLTD |
topic |
formats computer language Database LD5655.V855 1995.A992 |
spellingShingle |
formats computer language Database LD5655.V855 1995.A992 Averboch, Guillermo Andres A system for document analysis, translation, and automatic hypertext linking |
description |
A digital library database is a heterogeneous collection of documents. Documents may become available in different formats (e.g., ASCII, SGML, typesetter languages) and they may have to be translated to a standard document representation scheme used by the digital library.
This work focuses on the design of a framework that can be used to convert text documents in any format to equivalent documents in different formats and, in particular, to SGML (Standard Generalized Markup Language). In addition, the framework must be able to extract information about the analyzed documents, store that information in a permanent database, and construct hypertext links between documents and the information contained in that database and between the document themselves. For example, information about the author of a document could be extracted and stored in the database. A link can then be established between the document and the information about its author and from there to other documents by the same author. These tasks must be performed without any human intervention, even at the risk of making a small number of mistakes.
To accomplish these goals we developed a language called DELTO (Description Language for Textual Objects) that can be used to describe a document format. Given a description for a particular format, our system is able to extract information from documents in that format, to store part of that information in a permanent database, and to use that information in constructing an abstract representation of those documents that can be used to generate equivalent documents in different formats.
The system originated from this work is used for constructing the database of Envision, a Virginia Tech digital library research project. === Master of Science |
author2 |
Computer Science and Applications |
author_facet |
Computer Science and Applications Averboch, Guillermo Andres |
author |
Averboch, Guillermo Andres |
author_sort |
Averboch, Guillermo Andres |
title |
A system for document analysis, translation, and automatic hypertext linking |
title_short |
A system for document analysis, translation, and automatic hypertext linking |
title_full |
A system for document analysis, translation, and automatic hypertext linking |
title_fullStr |
A system for document analysis, translation, and automatic hypertext linking |
title_full_unstemmed |
A system for document analysis, translation, and automatic hypertext linking |
title_sort |
system for document analysis, translation, and automatic hypertext linking |
publisher |
Virginia Tech |
publishDate |
2014 |
url |
http://hdl.handle.net/10919/43809 http://scholar.lib.vt.edu/theses/available/etd-07212009-040529/ |
work_keys_str_mv |
AT averbochguillermoandres asystemfordocumentanalysistranslationandautomatichypertextlinking AT averbochguillermoandres systemfordocumentanalysistranslationandautomatichypertextlinking |
_version_ |
1719406971273084928 |