A Unique Indexing Technique for Discourse Structures
Sutra is a form of text representation that has been used in both Tamil and Sanskrit literature to convey information in a short and crisp manner. Nanool, an ancient Tamil grammar masterpiece has used sutras for defining grammar rules. Similarly, in Sanskrit literature, many of the Shāstrās have use...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
De Gruyter
2014-09-01
|
Series: | Journal of Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1515/jisys-2013-0034 |
Summary: | Sutra is a form of text representation that has been used in both Tamil and Sanskrit literature to convey information in a short and crisp manner. Nanool, an ancient Tamil grammar masterpiece has used sutras for defining grammar rules. Similarly, in Sanskrit literature, many of the Shāstrās have used sutras for a concise representation of their content. Sutras are defined as short aphorisms, formulae-like structures that convey the complete essence of the text. They act as indices to the elaborate content they refer to. Inspired by their characteristics, this article proposes an indexing mechanism based on sutras for discourse structures built using rhetorical structure theory (RST) and also using Sangati, a concept proposed in Sanskrit literature. The indices identified by the indexer are ideal for question answering (QA), summary generation, and information retrieval (IR) systems. The indexer has been tested on IR system using 1000 Tamil language text documents. A performance comparison has also been made with one of the existing RST-based indexing technique. |
---|---|
ISSN: | 0334-1860 2191-026X |