Platform for Collection from Heterogeneous Web Sources and its Application to a Semantic Repository Organization at SeDiCI: Preliminaries

Presentation of a web collection platform designed to relate and unify information available ondifferent standard web sources with a view to creating a user-browseable thematic repository. The platform will be used at the Intellectual Creation Diffusion Service [1] combined with ontologies and thesa...

Full description

Bibliographic Details
Main Authors: Marisa Raquel De Giusti, Ariel Sobrado, Agustin Vosou, Gonzalo Luján Villarreal
Format: Article
Language:English
Published: Postgraduate Office, School of Computer Science, Universidad Nacional de La Plata 2009-10-01
Series:Journal of Computer Science and Technology
Subjects:
Online Access:https://journal.info.unlp.edu.ar/JCST/article/view/770
id doaj-65ab3b28715f4f77b8cd374b08efdd10
record_format Article
spelling doaj-65ab3b28715f4f77b8cd374b08efdd102021-05-05T13:56:15ZengPostgraduate Office, School of Computer Science, Universidad Nacional de La PlataJournal of Computer Science and Technology1666-60461666-60382009-10-019028992464Platform for Collection from Heterogeneous Web Sources and its Application to a Semantic Repository Organization at SeDiCI: PreliminariesMarisa Raquel De Giusti0Ariel Sobrado1Agustin Vosou2Gonzalo Luján Villarreal3Comisión de Investigaciones Científicas (CIC) de la Provincia de Buenos Aires and Proyecto de Enlace de Bibliotecas UNLPProyecto de Enlace de Bibliotecas UNLPProyecto de Enlace de Bibliotecas UNLPConsejo Nacional de Investigaciones Técnicas y Científicas (CONICET) and Proyecto de Enlace de Bibliotecas UNLPPresentation of a web collection platform designed to relate and unify information available ondifferent standard web sources with a view to creating a user-browseable thematic repository. The platform will be used at the Intellectual Creation Diffusion Service [1] combined with ontologies and thesaurus to provide improved data sorting. Data is currently spread on web resources and traditional search engines return ranked lists with no semantic relation among documents. Users have to spend a great deal of time relating documents and trying to figure out which ones fully address the issue domain. It is only after locating similarities and differences that information fragments are applied to the user's work, enabling knowledge creation. The proposed platform sorts out the different theme domain functioning modules to allow their use in various knowledge areas. Development includes two agents that searches data base stored URLs, one is capable of identifying bookmarked pages, interpreting labels and providing rules for extracting information and storing it in a RDF data file; on the other hand, the other agent is in charge of getting related URLs from the given one. After this stage, homogenization is applied and transformed information is sorted out according to domain ontologies. The platform allows for more efficient automatic extraction processes and information search among heterogeneous sources that represent the same concepts using different standards.https://journal.info.unlp.edu.ar/JCST/article/view/770sedicisemantic repositoryontologythesaurus
collection DOAJ
language English
format Article
sources DOAJ
author Marisa Raquel De Giusti
Ariel Sobrado
Agustin Vosou
Gonzalo Luján Villarreal
spellingShingle Marisa Raquel De Giusti
Ariel Sobrado
Agustin Vosou
Gonzalo Luján Villarreal
Platform for Collection from Heterogeneous Web Sources and its Application to a Semantic Repository Organization at SeDiCI: Preliminaries
Journal of Computer Science and Technology
sedici
semantic repository
ontology
thesaurus
author_facet Marisa Raquel De Giusti
Ariel Sobrado
Agustin Vosou
Gonzalo Luján Villarreal
author_sort Marisa Raquel De Giusti
title Platform for Collection from Heterogeneous Web Sources and its Application to a Semantic Repository Organization at SeDiCI: Preliminaries
title_short Platform for Collection from Heterogeneous Web Sources and its Application to a Semantic Repository Organization at SeDiCI: Preliminaries
title_full Platform for Collection from Heterogeneous Web Sources and its Application to a Semantic Repository Organization at SeDiCI: Preliminaries
title_fullStr Platform for Collection from Heterogeneous Web Sources and its Application to a Semantic Repository Organization at SeDiCI: Preliminaries
title_full_unstemmed Platform for Collection from Heterogeneous Web Sources and its Application to a Semantic Repository Organization at SeDiCI: Preliminaries
title_sort platform for collection from heterogeneous web sources and its application to a semantic repository organization at sedici: preliminaries
publisher Postgraduate Office, School of Computer Science, Universidad Nacional de La Plata
series Journal of Computer Science and Technology
issn 1666-6046
1666-6038
publishDate 2009-10-01
description Presentation of a web collection platform designed to relate and unify information available ondifferent standard web sources with a view to creating a user-browseable thematic repository. The platform will be used at the Intellectual Creation Diffusion Service [1] combined with ontologies and thesaurus to provide improved data sorting. Data is currently spread on web resources and traditional search engines return ranked lists with no semantic relation among documents. Users have to spend a great deal of time relating documents and trying to figure out which ones fully address the issue domain. It is only after locating similarities and differences that information fragments are applied to the user's work, enabling knowledge creation. The proposed platform sorts out the different theme domain functioning modules to allow their use in various knowledge areas. Development includes two agents that searches data base stored URLs, one is capable of identifying bookmarked pages, interpreting labels and providing rules for extracting information and storing it in a RDF data file; on the other hand, the other agent is in charge of getting related URLs from the given one. After this stage, homogenization is applied and transformed information is sorted out according to domain ontologies. The platform allows for more efficient automatic extraction processes and information search among heterogeneous sources that represent the same concepts using different standards.
topic sedici
semantic repository
ontology
thesaurus
url https://journal.info.unlp.edu.ar/JCST/article/view/770
work_keys_str_mv AT marisaraqueldegiusti platformforcollectionfromheterogeneouswebsourcesanditsapplicationtoasemanticrepositoryorganizationatsedicipreliminaries
AT arielsobrado platformforcollectionfromheterogeneouswebsourcesanditsapplicationtoasemanticrepositoryorganizationatsedicipreliminaries
AT agustinvosou platformforcollectionfromheterogeneouswebsourcesanditsapplicationtoasemanticrepositoryorganizationatsedicipreliminaries
AT gonzalolujanvillarreal platformforcollectionfromheterogeneouswebsourcesanditsapplicationtoasemanticrepositoryorganizationatsedicipreliminaries
_version_ 1721460492513312768