A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports
With developments of modern and advanced information and communication technologies (ICTs), Industry 4.0 has launched big data analysis, natural language processing (NLP), and artificial intelligence (AI). Corpus analysis is also a part of big data analysis. For many cases of statistic-based corpus...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-08-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/10/16/5505 |
id |
doaj-eae2ca005fc041bfa0d4bd1f0ae3620f |
---|---|
record_format |
Article |
spelling |
doaj-eae2ca005fc041bfa0d4bd1f0ae3620f2020-11-25T02:48:17ZengMDPI AGApplied Sciences2076-34172020-08-01105505550510.3390/app10165505A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News ReportsLiang-Ching Chen0Kuei-Hu Chang1Hsiang-Yu Chung2Department of Foreign Languages, R.O.C. Military Academy, Kaohsiung 830, TaiwanDepartment of Management Sciences, R.O.C. Military Academy, Kaohsiung 830, TaiwanDepartment of Management Sciences, R.O.C. Military Academy, Kaohsiung 830, TaiwanWith developments of modern and advanced information and communication technologies (ICTs), Industry 4.0 has launched big data analysis, natural language processing (NLP), and artificial intelligence (AI). Corpus analysis is also a part of big data analysis. For many cases of statistic-based corpus techniques adopted to analyze English for specific purposes (ESP), researchers extracted critical information by retrieving domain-oriented lexical units. However, even if corpus software embraces algorithms such as log-likelihood tests, log ratios, BIC scores, etc., the machine still cannot understand linguistic meanings. In many ESP cases, function words reduce the efficiency of corpus analysis. However, many studies still use manual approaches to eliminate function words. Manual annotation is inefficient and time-wasting, and can easily cause information distortion. To enhance the efficiency of big textual data analysis, this paper proposes a novel statistic-based corpus machine processing approach to refine big textual data. Furthermore, this paper uses COVID-19 news reports as a simulation example of big textual data and applies it to verify the efficacy of the machine optimizing process. The refined resulting data shows that the proposed approach is able to rapidly remove function and meaningless words by machine processing and provide decision-makers with domain-specific corpus data for further purposes.https://www.mdpi.com/2076-3417/10/16/5505information and communication technologies (ICT)big data analysiscorpus analysisEnglish for specific purposes (ESP)COVID-19machine optimizing process |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Liang-Ching Chen Kuei-Hu Chang Hsiang-Yu Chung |
spellingShingle |
Liang-Ching Chen Kuei-Hu Chang Hsiang-Yu Chung A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports Applied Sciences information and communication technologies (ICT) big data analysis corpus analysis English for specific purposes (ESP) COVID-19 machine optimizing process |
author_facet |
Liang-Ching Chen Kuei-Hu Chang Hsiang-Yu Chung |
author_sort |
Liang-Ching Chen |
title |
A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports |
title_short |
A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports |
title_full |
A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports |
title_fullStr |
A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports |
title_full_unstemmed |
A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports |
title_sort |
novel statistic-based corpus machine processing approach to refine a big textual data: an esp case of covid-19 news reports |
publisher |
MDPI AG |
series |
Applied Sciences |
issn |
2076-3417 |
publishDate |
2020-08-01 |
description |
With developments of modern and advanced information and communication technologies (ICTs), Industry 4.0 has launched big data analysis, natural language processing (NLP), and artificial intelligence (AI). Corpus analysis is also a part of big data analysis. For many cases of statistic-based corpus techniques adopted to analyze English for specific purposes (ESP), researchers extracted critical information by retrieving domain-oriented lexical units. However, even if corpus software embraces algorithms such as log-likelihood tests, log ratios, BIC scores, etc., the machine still cannot understand linguistic meanings. In many ESP cases, function words reduce the efficiency of corpus analysis. However, many studies still use manual approaches to eliminate function words. Manual annotation is inefficient and time-wasting, and can easily cause information distortion. To enhance the efficiency of big textual data analysis, this paper proposes a novel statistic-based corpus machine processing approach to refine big textual data. Furthermore, this paper uses COVID-19 news reports as a simulation example of big textual data and applies it to verify the efficacy of the machine optimizing process. The refined resulting data shows that the proposed approach is able to rapidly remove function and meaningless words by machine processing and provide decision-makers with domain-specific corpus data for further purposes. |
topic |
information and communication technologies (ICT) big data analysis corpus analysis English for specific purposes (ESP) COVID-19 machine optimizing process |
url |
https://www.mdpi.com/2076-3417/10/16/5505 |
work_keys_str_mv |
AT liangchingchen anovelstatisticbasedcorpusmachineprocessingapproachtorefineabigtextualdataanespcaseofcovid19newsreports AT kueihuchang anovelstatisticbasedcorpusmachineprocessingapproachtorefineabigtextualdataanespcaseofcovid19newsreports AT hsiangyuchung anovelstatisticbasedcorpusmachineprocessingapproachtorefineabigtextualdataanespcaseofcovid19newsreports AT liangchingchen novelstatisticbasedcorpusmachineprocessingapproachtorefineabigtextualdataanespcaseofcovid19newsreports AT kueihuchang novelstatisticbasedcorpusmachineprocessingapproachtorefineabigtextualdataanespcaseofcovid19newsreports AT hsiangyuchung novelstatisticbasedcorpusmachineprocessingapproachtorefineabigtextualdataanespcaseofcovid19newsreports |
_version_ |
1724748760870813696 |