Evidence-Based Statistical Evaluation of Japanese L2-Learners’ Proficiency using Principal Component Analysis
This paper aims at an automatic evaluation of second language (L2) learners’ proficiencies and tries to analyze English conversation data having 94 statistics and Global Scale scores of the Common European Framework of Reference (CEFR) given to each participant. The CEFR defines Range, Accuracy, Flu...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
EDP Sciences
2021-01-01
|
Series: | SHS Web of Conferences |
Subjects: | |
Online Access: | https://www.shs-conferences.org/articles/shsconf/pdf/2021/13/shsconf_etltc2021_01005.pdf |
id |
doaj-125e4ce2f03c425192bd4b4fbcce759e |
---|---|
record_format |
Article |
spelling |
doaj-125e4ce2f03c425192bd4b4fbcce759e2021-05-04T12:25:00ZengEDP SciencesSHS Web of Conferences2261-24242021-01-011020100510.1051/shsconf/202110201005shsconf_etltc2021_01005Evidence-Based Statistical Evaluation of Japanese L2-Learners’ Proficiency using Principal Component AnalysisArai Masafumi0Tsubaki Hajime1Sagisaka Yoshinori2Department of Pure and Applied Mathematics, Waseda UniversityGlobal Information and Telecommunication Institute, Waseda UniversityDepartment of Pure and Applied Mathematics, Waseda UniversityThis paper aims at an automatic evaluation of second language (L2) learners’ proficiencies and tries to analyze English conversation data having 94 statistics and Global Scale scores of the Common European Framework of Reference (CEFR) given to each participant. The CEFR defines Range, Accuracy, Fluency, Interaction and Coherence as 5 subcategories, which constitute the CEFR Global Scale score. The statistics were classified into the CEFR’s 5 subcategories. We used the Principal Component Analysis (PCA), an unsupervised machine learning method, on each subcategory and obtained the participants’ principal component scores (PC scores) of the 5 subcategories for estimation parameters. We predicted the participants’ CEFR Global scores using the Multiple Regression Analysis (MRA). The proposed prediction method using the PC scores was compared with conventional methods with the 94 statistics. Based on the coefficients of determination (R2), the value of the proposed method (0.82) was nearly equivalent to one of values obtained by the conventional methods. Meanwhile, as for standard deviation, the proposed method showed the smallest value in the comparison. The results indicated usability of the PCA and PC scores calculated from the CEFR subcategory data for objective evaluation of L2 learners’ English proficiencies.https://www.shs-conferences.org/articles/shsconf/pdf/2021/13/shsconf_etltc2021_01005.pdfprincipal component analysismultiple regression analysiscefrl2evaluation |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Arai Masafumi Tsubaki Hajime Sagisaka Yoshinori |
spellingShingle |
Arai Masafumi Tsubaki Hajime Sagisaka Yoshinori Evidence-Based Statistical Evaluation of Japanese L2-Learners’ Proficiency using Principal Component Analysis SHS Web of Conferences principal component analysis multiple regression analysis cefr l2 evaluation |
author_facet |
Arai Masafumi Tsubaki Hajime Sagisaka Yoshinori |
author_sort |
Arai Masafumi |
title |
Evidence-Based Statistical Evaluation of Japanese L2-Learners’ Proficiency using Principal Component Analysis |
title_short |
Evidence-Based Statistical Evaluation of Japanese L2-Learners’ Proficiency using Principal Component Analysis |
title_full |
Evidence-Based Statistical Evaluation of Japanese L2-Learners’ Proficiency using Principal Component Analysis |
title_fullStr |
Evidence-Based Statistical Evaluation of Japanese L2-Learners’ Proficiency using Principal Component Analysis |
title_full_unstemmed |
Evidence-Based Statistical Evaluation of Japanese L2-Learners’ Proficiency using Principal Component Analysis |
title_sort |
evidence-based statistical evaluation of japanese l2-learners’ proficiency using principal component analysis |
publisher |
EDP Sciences |
series |
SHS Web of Conferences |
issn |
2261-2424 |
publishDate |
2021-01-01 |
description |
This paper aims at an automatic evaluation of second language (L2) learners’ proficiencies and tries to analyze English conversation data having 94 statistics and Global Scale scores of the Common European Framework of Reference (CEFR) given to each participant. The CEFR defines Range, Accuracy, Fluency, Interaction and Coherence as 5 subcategories, which constitute the CEFR Global Scale score. The statistics were classified into the CEFR’s 5 subcategories. We used the Principal Component Analysis (PCA), an unsupervised machine learning method, on each subcategory and obtained the participants’ principal component scores (PC scores) of the 5 subcategories for estimation parameters. We predicted the participants’ CEFR Global scores using the Multiple Regression Analysis (MRA). The proposed prediction method using the PC scores was compared with conventional methods with the 94 statistics. Based on the coefficients of determination (R2), the value of the proposed method (0.82) was nearly equivalent to one of values obtained by the conventional methods. Meanwhile, as for standard deviation, the proposed method showed the smallest value in the comparison. The results indicated usability of the PCA and PC scores calculated from the CEFR subcategory data for objective evaluation of L2 learners’ English proficiencies. |
topic |
principal component analysis multiple regression analysis cefr l2 evaluation |
url |
https://www.shs-conferences.org/articles/shsconf/pdf/2021/13/shsconf_etltc2021_01005.pdf |
work_keys_str_mv |
AT araimasafumi evidencebasedstatisticalevaluationofjapanesel2learnersproficiencyusingprincipalcomponentanalysis AT tsubakihajime evidencebasedstatisticalevaluationofjapanesel2learnersproficiencyusingprincipalcomponentanalysis AT sagisakayoshinori evidencebasedstatisticalevaluationofjapanesel2learnersproficiencyusingprincipalcomponentanalysis |
_version_ |
1721478902942007296 |