A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports

With developments of modern and advanced information and communication technologies (ICTs), Industry 4.0 has launched big data analysis, natural language processing (NLP), and artificial intelligence (AI). Corpus analysis is also a part of big data analysis. For many cases of statistic-based corpus...

Full description

Bibliographic Details
Main Authors: Liang-Ching Chen, Kuei-Hu Chang, Hsiang-Yu Chung
Format: Article
Language:English
Published: MDPI AG 2020-08-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/16/5505
id doaj-eae2ca005fc041bfa0d4bd1f0ae3620f
record_format Article
spelling doaj-eae2ca005fc041bfa0d4bd1f0ae3620f2020-11-25T02:48:17ZengMDPI AGApplied Sciences2076-34172020-08-01105505550510.3390/app10165505A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News ReportsLiang-Ching Chen0Kuei-Hu Chang1Hsiang-Yu Chung2Department of Foreign Languages, R.O.C. Military Academy, Kaohsiung 830, TaiwanDepartment of Management Sciences, R.O.C. Military Academy, Kaohsiung 830, TaiwanDepartment of Management Sciences, R.O.C. Military Academy, Kaohsiung 830, TaiwanWith developments of modern and advanced information and communication technologies (ICTs), Industry 4.0 has launched big data analysis, natural language processing (NLP), and artificial intelligence (AI). Corpus analysis is also a part of big data analysis. For many cases of statistic-based corpus techniques adopted to analyze English for specific purposes (ESP), researchers extracted critical information by retrieving domain-oriented lexical units. However, even if corpus software embraces algorithms such as log-likelihood tests, log ratios, BIC scores, etc., the machine still cannot understand linguistic meanings. In many ESP cases, function words reduce the efficiency of corpus analysis. However, many studies still use manual approaches to eliminate function words. Manual annotation is inefficient and time-wasting, and can easily cause information distortion. To enhance the efficiency of big textual data analysis, this paper proposes a novel statistic-based corpus machine processing approach to refine big textual data. Furthermore, this paper uses COVID-19 news reports as a simulation example of big textual data and applies it to verify the efficacy of the machine optimizing process. The refined resulting data shows that the proposed approach is able to rapidly remove function and meaningless words by machine processing and provide decision-makers with domain-specific corpus data for further purposes.https://www.mdpi.com/2076-3417/10/16/5505information and communication technologies (ICT)big data analysiscorpus analysisEnglish for specific purposes (ESP)COVID-19machine optimizing process
collection DOAJ
language English
format Article
sources DOAJ
author Liang-Ching Chen
Kuei-Hu Chang
Hsiang-Yu Chung
spellingShingle Liang-Ching Chen
Kuei-Hu Chang
Hsiang-Yu Chung
A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports
Applied Sciences
information and communication technologies (ICT)
big data analysis
corpus analysis
English for specific purposes (ESP)
COVID-19
machine optimizing process
author_facet Liang-Ching Chen
Kuei-Hu Chang
Hsiang-Yu Chung
author_sort Liang-Ching Chen
title A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports
title_short A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports
title_full A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports
title_fullStr A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports
title_full_unstemmed A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports
title_sort novel statistic-based corpus machine processing approach to refine a big textual data: an esp case of covid-19 news reports
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2020-08-01
description With developments of modern and advanced information and communication technologies (ICTs), Industry 4.0 has launched big data analysis, natural language processing (NLP), and artificial intelligence (AI). Corpus analysis is also a part of big data analysis. For many cases of statistic-based corpus techniques adopted to analyze English for specific purposes (ESP), researchers extracted critical information by retrieving domain-oriented lexical units. However, even if corpus software embraces algorithms such as log-likelihood tests, log ratios, BIC scores, etc., the machine still cannot understand linguistic meanings. In many ESP cases, function words reduce the efficiency of corpus analysis. However, many studies still use manual approaches to eliminate function words. Manual annotation is inefficient and time-wasting, and can easily cause information distortion. To enhance the efficiency of big textual data analysis, this paper proposes a novel statistic-based corpus machine processing approach to refine big textual data. Furthermore, this paper uses COVID-19 news reports as a simulation example of big textual data and applies it to verify the efficacy of the machine optimizing process. The refined resulting data shows that the proposed approach is able to rapidly remove function and meaningless words by machine processing and provide decision-makers with domain-specific corpus data for further purposes.
topic information and communication technologies (ICT)
big data analysis
corpus analysis
English for specific purposes (ESP)
COVID-19
machine optimizing process
url https://www.mdpi.com/2076-3417/10/16/5505
work_keys_str_mv AT liangchingchen anovelstatisticbasedcorpusmachineprocessingapproachtorefineabigtextualdataanespcaseofcovid19newsreports
AT kueihuchang anovelstatisticbasedcorpusmachineprocessingapproachtorefineabigtextualdataanespcaseofcovid19newsreports
AT hsiangyuchung anovelstatisticbasedcorpusmachineprocessingapproachtorefineabigtextualdataanespcaseofcovid19newsreports
AT liangchingchen novelstatisticbasedcorpusmachineprocessingapproachtorefineabigtextualdataanespcaseofcovid19newsreports
AT kueihuchang novelstatisticbasedcorpusmachineprocessingapproachtorefineabigtextualdataanespcaseofcovid19newsreports
AT hsiangyuchung novelstatisticbasedcorpusmachineprocessingapproachtorefineabigtextualdataanespcaseofcovid19newsreports
_version_ 1724748760870813696