A system for document analysis, translation, and automatic hypertext linking

A digital library database is a heterogeneous collection of documents. Documents may become available in different formats (e.g., ASCII, SGML, typesetter languages) and they may have to be translated to a standard document representation scheme used by the digital library. This work focuses on the...

Full description

Bibliographic Details
Main Author: Averboch, Guillermo Andres
Other Authors: Computer Science and Applications
Format: Others
Language:en
Published: Virginia Tech 2014
Subjects:
Online Access:http://hdl.handle.net/10919/43809
http://scholar.lib.vt.edu/theses/available/etd-07212009-040529/
id ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-43809
record_format oai_dc
spelling ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-438092021-05-26T05:48:42Z A system for document analysis, translation, and automatic hypertext linking Averboch, Guillermo Andres Computer Science and Applications Heath, Lenwood S. Fox, Edward A. Arthur, James D. formats computer language Database LD5655.V855 1995.A992 A digital library database is a heterogeneous collection of documents. Documents may become available in different formats (e.g., ASCII, SGML, typesetter languages) and they may have to be translated to a standard document representation scheme used by the digital library. This work focuses on the design of a framework that can be used to convert text documents in any format to equivalent documents in different formats and, in particular, to SGML (Standard Generalized Markup Language). In addition, the framework must be able to extract information about the analyzed documents, store that information in a permanent database, and construct hypertext links between documents and the information contained in that database and between the document themselves. For example, information about the author of a document could be extracted and stored in the database. A link can then be established between the document and the information about its author and from there to other documents by the same author. These tasks must be performed without any human intervention, even at the risk of making a small number of mistakes. To accomplish these goals we developed a language called DELTO (Description Language for Textual Objects) that can be used to describe a document format. Given a description for a particular format, our system is able to extract information from documents in that format, to store part of that information in a permanent database, and to use that information in constructing an abstract representation of those documents that can be used to generate equivalent documents in different formats. The system originated from this work is used for constructing the database of Envision, a Virginia Tech digital library research project. Master of Science 2014-03-14T21:40:48Z 2014-03-14T21:40:48Z 1995-06-05 2009-07-21 2009-07-21 2009-07-21 Thesis Text etd-07212009-040529 http://hdl.handle.net/10919/43809 http://scholar.lib.vt.edu/theses/available/etd-07212009-040529/ en OCLC# 34376883 LD5655.V855_1995.A992.pdf In Copyright http://rightsstatements.org/vocab/InC/1.0/ xii, 226 leaves BTD application/pdf application/pdf Virginia Tech
collection NDLTD
language en
format Others
sources NDLTD
topic formats
computer language
Database
LD5655.V855 1995.A992
spellingShingle formats
computer language
Database
LD5655.V855 1995.A992
Averboch, Guillermo Andres
A system for document analysis, translation, and automatic hypertext linking
description A digital library database is a heterogeneous collection of documents. Documents may become available in different formats (e.g., ASCII, SGML, typesetter languages) and they may have to be translated to a standard document representation scheme used by the digital library. This work focuses on the design of a framework that can be used to convert text documents in any format to equivalent documents in different formats and, in particular, to SGML (Standard Generalized Markup Language). In addition, the framework must be able to extract information about the analyzed documents, store that information in a permanent database, and construct hypertext links between documents and the information contained in that database and between the document themselves. For example, information about the author of a document could be extracted and stored in the database. A link can then be established between the document and the information about its author and from there to other documents by the same author. These tasks must be performed without any human intervention, even at the risk of making a small number of mistakes. To accomplish these goals we developed a language called DELTO (Description Language for Textual Objects) that can be used to describe a document format. Given a description for a particular format, our system is able to extract information from documents in that format, to store part of that information in a permanent database, and to use that information in constructing an abstract representation of those documents that can be used to generate equivalent documents in different formats. The system originated from this work is used for constructing the database of Envision, a Virginia Tech digital library research project. === Master of Science
author2 Computer Science and Applications
author_facet Computer Science and Applications
Averboch, Guillermo Andres
author Averboch, Guillermo Andres
author_sort Averboch, Guillermo Andres
title A system for document analysis, translation, and automatic hypertext linking
title_short A system for document analysis, translation, and automatic hypertext linking
title_full A system for document analysis, translation, and automatic hypertext linking
title_fullStr A system for document analysis, translation, and automatic hypertext linking
title_full_unstemmed A system for document analysis, translation, and automatic hypertext linking
title_sort system for document analysis, translation, and automatic hypertext linking
publisher Virginia Tech
publishDate 2014
url http://hdl.handle.net/10919/43809
http://scholar.lib.vt.edu/theses/available/etd-07212009-040529/
work_keys_str_mv AT averbochguillermoandres asystemfordocumentanalysistranslationandautomatichypertextlinking
AT averbochguillermoandres systemfordocumentanalysistranslationandautomatichypertextlinking
_version_ 1719406971273084928