Construction and analysis of the word network based on the Random Reading Frame (RRF) method
In present study, a method was developed to construct and analyze the word network. The core of the method is Random Reading Frame (RRF) method. First, download or collect word files (in various formats, e.g., pdf, txt, doc, docx, rtf, html, etc.) from internet or local machine in terms of the conce...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
International Academy of Ecology and Environmental Sciences
2021-09-01
|
Series: | Network Biology |
Subjects: | |
Online Access: | http://www.iaees.org/publications/journals/nb/articles/2021-11(3)/construction-and-analysis-of-word-network-from-Random-Reading-Frame.pdf |
id |
doaj-f32a44b353964605a8e57fb42ec56be9 |
---|---|
record_format |
Article |
spelling |
doaj-f32a44b353964605a8e57fb42ec56be92021-08-26T07:10:10ZengInternational Academy of Ecology and Environmental SciencesNetwork Biology2220-88792021-09-01113154193Construction and analysis of the word network based on the Random Reading Frame (RRF) methodWenJun Zhang0School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, ChinaIn present study, a method was developed to construct and analyze the word network. The core of the method is Random Reading Frame (RRF) method. First, download or collect word files (in various formats, e.g., pdf, txt, doc, docx, rtf, html, etc.) from internet or local machine in terms of the concerned topics. All files were then combined in a final text file. Excepting for splitting words and stop words, all words were arranged in a word vector following their orders in the combined text file. In the RRF method, for a given pair of unique words (x, y), x, y<-{u1,u2,...,um}, a reading frame with randomly changeable width is randomly placed on the vector to count the respective number of the two words in the frame. Randomly repeating the procedure p times, the paired numbers are thus achieved: (x1, y1), (x2, y2), ..., (xp, yp). In such a way, the paired numbers for all pairs of unique words are achieved. Thereafter, for a given pair of unique words (x, y), Pearson correlation and Pearson partial correlation, Spearman rank correlation, or point correlation is used to calculate their correlation value according to their paired numbers (x1, y1), (x2, y2), ..., (xp, yp), and the statistically significance can be determined by t-test (Pearson correlation, Pearson partial correlation, Spearman rank correlation) or chi2-test (point correlation). In such a way, all statistically significant word pairs are achieved in terms of the correlation measure chosen by user. Finally, the word network, in terms of the correlation measure chosen, can be constructed based on these word pairs, and no links between statistically insignificant word pairs. Network analysis is conducted for the word network constructed from significant between-word positive correlations among all unique words. Word centrality measures, word tree, word chains, word modules, etc., can be calculated in the method. The Matlab software, wordNetwork for the method was given also.http://www.iaees.org/publications/journals/nb/articles/2021-11(3)/construction-and-analysis-of-word-network-from-Random-Reading-Frame.pdfword associationassociation rulescorrelation measuresrandom reading framenetwork constructionnetwork analysisalgorithmtext mining |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
WenJun Zhang |
spellingShingle |
WenJun Zhang Construction and analysis of the word network based on the Random Reading Frame (RRF) method Network Biology word association association rules correlation measures random reading frame network construction network analysis algorithm text mining |
author_facet |
WenJun Zhang |
author_sort |
WenJun Zhang |
title |
Construction and analysis of the word network based on the Random Reading Frame (RRF) method |
title_short |
Construction and analysis of the word network based on the Random Reading Frame (RRF) method |
title_full |
Construction and analysis of the word network based on the Random Reading Frame (RRF) method |
title_fullStr |
Construction and analysis of the word network based on the Random Reading Frame (RRF) method |
title_full_unstemmed |
Construction and analysis of the word network based on the Random Reading Frame (RRF) method |
title_sort |
construction and analysis of the word network based on the random reading frame (rrf) method |
publisher |
International Academy of Ecology and Environmental Sciences |
series |
Network Biology |
issn |
2220-8879 |
publishDate |
2021-09-01 |
description |
In present study, a method was developed to construct and analyze the word network. The core of the method is Random Reading Frame (RRF) method. First, download or collect word files (in various formats, e.g., pdf, txt, doc, docx, rtf, html, etc.) from internet or local machine in terms of the concerned topics. All files were then combined in a final text file. Excepting for splitting words and stop words, all words were arranged in a word vector following their orders in the combined text file. In the RRF method, for a given pair of unique words (x, y), x, y<-{u1,u2,...,um}, a reading frame with randomly changeable width is randomly placed on the vector to count the respective number of the two words in the frame. Randomly repeating the procedure p times, the paired numbers are thus achieved: (x1, y1), (x2, y2), ..., (xp, yp). In such a way, the paired numbers for all pairs of unique words are achieved. Thereafter, for a given pair of unique words (x, y), Pearson correlation and Pearson partial correlation, Spearman rank correlation, or point correlation is used to calculate their correlation value according to their paired numbers (x1, y1), (x2, y2), ..., (xp, yp), and the statistically significance can be determined by t-test (Pearson correlation, Pearson partial correlation, Spearman rank correlation) or chi2-test (point correlation). In such a way, all statistically significant word pairs are achieved in terms of the correlation measure chosen by user. Finally, the word network, in terms of the correlation measure chosen, can be constructed based on these word pairs, and no links between statistically insignificant word pairs. Network analysis is conducted for the word network constructed from significant between-word positive correlations among all unique words. Word centrality measures, word tree, word chains, word modules, etc., can be calculated in the method. The Matlab software, wordNetwork for the method was given also. |
topic |
word association association rules correlation measures random reading frame network construction network analysis algorithm text mining |
url |
http://www.iaees.org/publications/journals/nb/articles/2021-11(3)/construction-and-analysis-of-word-network-from-Random-Reading-Frame.pdf |
work_keys_str_mv |
AT wenjunzhang constructionandanalysisofthewordnetworkbasedontherandomreadingframerrfmethod |
_version_ |
1721195954282954752 |