Romanian Linguistic Resources On Very Large Scale

This paper suggests a methodology for building a technological environment for linguistic processing, intended to conserve, update and exploit, for research, for public and for commercial purposes, strategic linguistic resources of the Romanian language, rooted in textual data contributed daily and...

Full description

Bibliographic Details
Main Author: Dan Cristea
Format: Article
Language:English
Published: Institute of Mathematics and Computer Science of the Academy of Sciences of Moldova 2011-10-01
Series:Computer Science Journal of Moldova
Online Access:http://www.math.md/files/csjm/v19-n2/v19-n2-(pp130-145).pdf
Description
Summary:This paper suggests a methodology for building a technological environment for linguistic processing, intended to conserve, update and exploit, for research, for public and for commercial purposes, strategic linguistic resources of the Romanian language, rooted in textual data contributed daily and in the long run by important editorial houses and mass-media institutions. In essence, it describes a technology able to receive, store and continuously process large amounts of textual data, received from voluntary contributors, on a daily basis. Apart from storing linguistic data \textit{\`{a} la longue} for the benefit of preserving the language, the results of the processing will be returned to three categories of users: the researchers working on Romanian language and computational linguistics, the contributors of the resources, and the public at large.
ISSN:1561-4042