Finding hidden semantics of text tables

Combining data from different sources for further automatic processing is often hindered by differences in the underlying semantics and representation. Therefore when linking information presented in documents in tabular form with data held in databases, it is important to determine as much informat...

Full description

Bibliographic Details
Main Author: Alrashed, Saleh A.
Published: Cardiff University 2004
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.583489
id ndltd-bl.uk-oai-ethos.bl.uk-583489
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-5834892015-12-31T03:25:20ZFinding hidden semantics of text tablesAlrashed, Saleh A.2004Combining data from different sources for further automatic processing is often hindered by differences in the underlying semantics and representation. Therefore when linking information presented in documents in tabular form with data held in databases, it is important to determine as much information about the table and its content. Important information about the table data is often given in the text surrounding the table in that document. The table's creators cannot clarify all the semantics in the table itself therefore they use the table context or the text around it to give further information. These semantics are very useful when integrating and using this data, but are often difficult to detect automatically. We propose a solution to part of this problem based on a domain ontology. The input to our system is a document that contains tabular data and the system aims to find semantics in the document that are related to the tabular data. The output of our system is a set of detected semantics linked to the corresponding table. The system uses elements of semantic detection, semantic representation, and data integration. Semantic detection uses a domain ontology, in which we store concepts of that domain. This allows us to analyse the content of the document (text) and detect context information about the tables present in a document containing tabular data. Our approach consists of two components: (1) extract, from the domain ontology, concepts, synonyms, and relations that correspond to the table data. (2) Build a tree for the paragraphs and use this tree to detect the hidden semantics by searching for words matching the extracted concepts. Semantic representation techniques then allow representation of the detected semantics of the table data. Our system represents the detected semantics, as either 'semantic units' or 'enhanced metadata'. Semantic units are a flexible set of meta-attributes that describe the meaning of the data item along with the detected semantics. In addition, each semantic unit has a concept label associated with it that specifies the relationship between the unit and the real world aspects it describes. In the enhanced metadata, table metadata is enhanced with the semantics and representation context found in the text. Integrating data in our proposed system takes place in two steps. First, the semantic units are converted to a common context, reflecting the application. This is achieved by using appropriate conversion functions. Secondly, the semantically identical semantic units, will be identified and integrated into a common representation. This latter is the subject of future work. Thus the research has shown that semantics about a table are in the text and how it is possible to locate and use these semantics by transforming them into an appropriate form to enhance the basic table metadata.006.3Cardiff Universityhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.583489http://orca.cf.ac.uk/55953/Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 006.3
spellingShingle 006.3
Alrashed, Saleh A.
Finding hidden semantics of text tables
description Combining data from different sources for further automatic processing is often hindered by differences in the underlying semantics and representation. Therefore when linking information presented in documents in tabular form with data held in databases, it is important to determine as much information about the table and its content. Important information about the table data is often given in the text surrounding the table in that document. The table's creators cannot clarify all the semantics in the table itself therefore they use the table context or the text around it to give further information. These semantics are very useful when integrating and using this data, but are often difficult to detect automatically. We propose a solution to part of this problem based on a domain ontology. The input to our system is a document that contains tabular data and the system aims to find semantics in the document that are related to the tabular data. The output of our system is a set of detected semantics linked to the corresponding table. The system uses elements of semantic detection, semantic representation, and data integration. Semantic detection uses a domain ontology, in which we store concepts of that domain. This allows us to analyse the content of the document (text) and detect context information about the tables present in a document containing tabular data. Our approach consists of two components: (1) extract, from the domain ontology, concepts, synonyms, and relations that correspond to the table data. (2) Build a tree for the paragraphs and use this tree to detect the hidden semantics by searching for words matching the extracted concepts. Semantic representation techniques then allow representation of the detected semantics of the table data. Our system represents the detected semantics, as either 'semantic units' or 'enhanced metadata'. Semantic units are a flexible set of meta-attributes that describe the meaning of the data item along with the detected semantics. In addition, each semantic unit has a concept label associated with it that specifies the relationship between the unit and the real world aspects it describes. In the enhanced metadata, table metadata is enhanced with the semantics and representation context found in the text. Integrating data in our proposed system takes place in two steps. First, the semantic units are converted to a common context, reflecting the application. This is achieved by using appropriate conversion functions. Secondly, the semantically identical semantic units, will be identified and integrated into a common representation. This latter is the subject of future work. Thus the research has shown that semantics about a table are in the text and how it is possible to locate and use these semantics by transforming them into an appropriate form to enhance the basic table metadata.
author Alrashed, Saleh A.
author_facet Alrashed, Saleh A.
author_sort Alrashed, Saleh A.
title Finding hidden semantics of text tables
title_short Finding hidden semantics of text tables
title_full Finding hidden semantics of text tables
title_fullStr Finding hidden semantics of text tables
title_full_unstemmed Finding hidden semantics of text tables
title_sort finding hidden semantics of text tables
publisher Cardiff University
publishDate 2004
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.583489
work_keys_str_mv AT alrashedsaleha findinghiddensemanticsoftexttables
_version_ 1718157759634997248