A Unique Indexing Technique for Discourse Structures

Sutra is a form of text representation that has been used in both Tamil and Sanskrit literature to convey information in a short and crisp manner. Nanool, an ancient Tamil grammar masterpiece has used sutras for defining grammar rules. Similarly, in Sanskrit literature, many of the Shāstrās have use...

Full description

Bibliographic Details
Main Authors: Subalalitha Chinnaudayar Navaneethakrishnan, Ranjani Parthasarathi
Format: Article
Language:English
Published: De Gruyter 2014-09-01
Series:Journal of Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1515/jisys-2013-0034
Description
Summary:Sutra is a form of text representation that has been used in both Tamil and Sanskrit literature to convey information in a short and crisp manner. Nanool, an ancient Tamil grammar masterpiece has used sutras for defining grammar rules. Similarly, in Sanskrit literature, many of the Shāstrās have used sutras for a concise representation of their content. Sutras are defined as short aphorisms, formulae-like structures that convey the complete essence of the text. They act as indices to the elaborate content they refer to. Inspired by their characteristics, this article proposes an indexing mechanism based on sutras for discourse structures built using rhetorical structure theory (RST) and also using Sangati, a concept proposed in Sanskrit literature. The indices identified by the indexer are ideal for question answering (QA), summary generation, and information retrieval (IR) systems. The indexer has been tested on IR system using 1000 Tamil language text documents. A performance comparison has also been made with one of the existing RST-based indexing technique.
ISSN:0334-1860
2191-026X