Construction and analysis of the word network based on the Random Reading Frame (RRF) method

In present study, a method was developed to construct and analyze the word network. The core of the method is Random Reading Frame (RRF) method. First, download or collect word files (in various formats, e.g., pdf, txt, doc, docx, rtf, html, etc.) from internet or local machine in terms of the conce...

Full description

Bibliographic Details
Main Author: WenJun Zhang
Format: Article
Language:English
Published: International Academy of Ecology and Environmental Sciences 2021-09-01
Series:Network Biology
Subjects:
Online Access:http://www.iaees.org/publications/journals/nb/articles/2021-11(3)/construction-and-analysis-of-word-network-from-Random-Reading-Frame.pdf
id doaj-f32a44b353964605a8e57fb42ec56be9
record_format Article
spelling doaj-f32a44b353964605a8e57fb42ec56be92021-08-26T07:10:10ZengInternational Academy of Ecology and Environmental SciencesNetwork Biology2220-88792021-09-01113154193Construction and analysis of the word network based on the Random Reading Frame (RRF) methodWenJun Zhang0School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, ChinaIn present study, a method was developed to construct and analyze the word network. The core of the method is Random Reading Frame (RRF) method. First, download or collect word files (in various formats, e.g., pdf, txt, doc, docx, rtf, html, etc.) from internet or local machine in terms of the concerned topics. All files were then combined in a final text file. Excepting for splitting words and stop words, all words were arranged in a word vector following their orders in the combined text file. In the RRF method, for a given pair of unique words (x, y), x, y<-{u1,u2,...,um}, a reading frame with randomly changeable width is randomly placed on the vector to count the respective number of the two words in the frame. Randomly repeating the procedure p times, the paired numbers are thus achieved: (x1, y1), (x2, y2), ..., (xp, yp). In such a way, the paired numbers for all pairs of unique words are achieved. Thereafter, for a given pair of unique words (x, y), Pearson correlation and Pearson partial correlation, Spearman rank correlation, or point correlation is used to calculate their correlation value according to their paired numbers (x1, y1), (x2, y2), ..., (xp, yp), and the statistically significance can be determined by t-test (Pearson correlation, Pearson partial correlation, Spearman rank correlation) or chi2-test (point correlation). In such a way, all statistically significant word pairs are achieved in terms of the correlation measure chosen by user. Finally, the word network, in terms of the correlation measure chosen, can be constructed based on these word pairs, and no links between statistically insignificant word pairs. Network analysis is conducted for the word network constructed from significant between-word positive correlations among all unique words. Word centrality measures, word tree, word chains, word modules, etc., can be calculated in the method. The Matlab software, wordNetwork for the method was given also.http://www.iaees.org/publications/journals/nb/articles/2021-11(3)/construction-and-analysis-of-word-network-from-Random-Reading-Frame.pdfword associationassociation rulescorrelation measuresrandom reading framenetwork constructionnetwork analysisalgorithmtext mining
collection DOAJ
language English
format Article
sources DOAJ
author WenJun Zhang
spellingShingle WenJun Zhang
Construction and analysis of the word network based on the Random Reading Frame (RRF) method
Network Biology
word association
association rules
correlation measures
random reading frame
network construction
network analysis
algorithm
text mining
author_facet WenJun Zhang
author_sort WenJun Zhang
title Construction and analysis of the word network based on the Random Reading Frame (RRF) method
title_short Construction and analysis of the word network based on the Random Reading Frame (RRF) method
title_full Construction and analysis of the word network based on the Random Reading Frame (RRF) method
title_fullStr Construction and analysis of the word network based on the Random Reading Frame (RRF) method
title_full_unstemmed Construction and analysis of the word network based on the Random Reading Frame (RRF) method
title_sort construction and analysis of the word network based on the random reading frame (rrf) method
publisher International Academy of Ecology and Environmental Sciences
series Network Biology
issn 2220-8879
publishDate 2021-09-01
description In present study, a method was developed to construct and analyze the word network. The core of the method is Random Reading Frame (RRF) method. First, download or collect word files (in various formats, e.g., pdf, txt, doc, docx, rtf, html, etc.) from internet or local machine in terms of the concerned topics. All files were then combined in a final text file. Excepting for splitting words and stop words, all words were arranged in a word vector following their orders in the combined text file. In the RRF method, for a given pair of unique words (x, y), x, y<-{u1,u2,...,um}, a reading frame with randomly changeable width is randomly placed on the vector to count the respective number of the two words in the frame. Randomly repeating the procedure p times, the paired numbers are thus achieved: (x1, y1), (x2, y2), ..., (xp, yp). In such a way, the paired numbers for all pairs of unique words are achieved. Thereafter, for a given pair of unique words (x, y), Pearson correlation and Pearson partial correlation, Spearman rank correlation, or point correlation is used to calculate their correlation value according to their paired numbers (x1, y1), (x2, y2), ..., (xp, yp), and the statistically significance can be determined by t-test (Pearson correlation, Pearson partial correlation, Spearman rank correlation) or chi2-test (point correlation). In such a way, all statistically significant word pairs are achieved in terms of the correlation measure chosen by user. Finally, the word network, in terms of the correlation measure chosen, can be constructed based on these word pairs, and no links between statistically insignificant word pairs. Network analysis is conducted for the word network constructed from significant between-word positive correlations among all unique words. Word centrality measures, word tree, word chains, word modules, etc., can be calculated in the method. The Matlab software, wordNetwork for the method was given also.
topic word association
association rules
correlation measures
random reading frame
network construction
network analysis
algorithm
text mining
url http://www.iaees.org/publications/journals/nb/articles/2021-11(3)/construction-and-analysis-of-word-network-from-Random-Reading-Frame.pdf
work_keys_str_mv AT wenjunzhang constructionandanalysisofthewordnetworkbasedontherandomreadingframerrfmethod
_version_ 1721195954282954752