Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms
Grammatical inference methods are expected to find grammatical structures hidden in biological sequences. One hopes that studies of grammar serve as an appropriate tool for theory formation. Thus, we have developed JSequitur for automatically generating the grammatical structure of biological sequen...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Korea Genome Organization
2012-12-01
|
Series: | Genomics & Informatics |
Subjects: | |
Online Access: | http://genominfo.org/upload/pdf/gni-10-266.pdf |
id |
doaj-aeddfdd398e04160b274dca627edc525 |
---|---|
record_format |
Article |
spelling |
doaj-aeddfdd398e04160b274dca627edc5252020-11-24T23:41:02ZengKorea Genome OrganizationGenomics & Informatics1598-866X2234-07422012-12-0110426627010.5808/GI.2012.10.4.26630Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression AlgorithmsBulgan Galbadrakh0Kyung-Eun Lee1Hyun-Seok Park2Department of Computer Science, Ewha Womans University, Seoul 120-750, Korea.Department of Computer Science, Ewha Womans University, Seoul 120-750, Korea.Department of Computer Science, Ewha Womans University, Seoul 120-750, Korea.Grammatical inference methods are expected to find grammatical structures hidden in biological sequences. One hopes that studies of grammar serve as an appropriate tool for theory formation. Thus, we have developed JSequitur for automatically generating the grammatical structure of biological sequences in an inference framework of string compression algorithms. Our original motivation was to find any grammatical traits of several cancer genes that can be detected by string compression algorithms. Through this research, we could not find any meaningful unique traits of the cancer genes yet, but we could observe some interesting traits in regards to the relationship among gene length, similarity of sequences, the patterns of the generated grammar, and compression rate.http://genominfo.org/upload/pdf/gni-10-266.pdfcontext-free grammarformal language theorynatural language processingstochastic modeling |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Bulgan Galbadrakh Kyung-Eun Lee Hyun-Seok Park |
spellingShingle |
Bulgan Galbadrakh Kyung-Eun Lee Hyun-Seok Park Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms Genomics & Informatics context-free grammar formal language theory natural language processing stochastic modeling |
author_facet |
Bulgan Galbadrakh Kyung-Eun Lee Hyun-Seok Park |
author_sort |
Bulgan Galbadrakh |
title |
Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms |
title_short |
Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms |
title_full |
Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms |
title_fullStr |
Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms |
title_full_unstemmed |
Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms |
title_sort |
developing jsequitur to study the hierarchical structure of biological sequences in a grammatical inference framework of string compression algorithms |
publisher |
Korea Genome Organization |
series |
Genomics & Informatics |
issn |
1598-866X 2234-0742 |
publishDate |
2012-12-01 |
description |
Grammatical inference methods are expected to find grammatical structures hidden in biological sequences. One hopes that studies of grammar serve as an appropriate tool for theory formation. Thus, we have developed JSequitur for automatically generating the grammatical structure of biological sequences in an inference framework of string compression algorithms. Our original motivation was to find any grammatical traits of several cancer genes that can be detected by string compression algorithms. Through this research, we could not find any meaningful unique traits of the cancer genes yet, but we could observe some interesting traits in regards to the relationship among gene length, similarity of sequences, the patterns of the generated grammar, and compression rate. |
topic |
context-free grammar formal language theory natural language processing stochastic modeling |
url |
http://genominfo.org/upload/pdf/gni-10-266.pdf |
work_keys_str_mv |
AT bulgangalbadrakh developingjsequiturtostudythehierarchicalstructureofbiologicalsequencesinagrammaticalinferenceframeworkofstringcompressionalgorithms AT kyungeunlee developingjsequiturtostudythehierarchicalstructureofbiologicalsequencesinagrammaticalinferenceframeworkofstringcompressionalgorithms AT hyunseokpark developingjsequiturtostudythehierarchicalstructureofbiologicalsequencesinagrammaticalinferenceframeworkofstringcompressionalgorithms |
_version_ |
1725508169673736192 |