Evidence-Based Statistical Evaluation of Japanese L2-Learners’ Proficiency using Principal Component Analysis

This paper aims at an automatic evaluation of second language (L2) learners’ proficiencies and tries to analyze English conversation data having 94 statistics and Global Scale scores of the Common European Framework of Reference (CEFR) given to each participant. The CEFR defines Range, Accuracy, Flu...

Full description

Bibliographic Details
Main Authors: Arai Masafumi, Tsubaki Hajime, Sagisaka Yoshinori
Format: Article
Language:English
Published: EDP Sciences 2021-01-01
Series:SHS Web of Conferences
Subjects:
l2
Online Access:https://www.shs-conferences.org/articles/shsconf/pdf/2021/13/shsconf_etltc2021_01005.pdf
id doaj-125e4ce2f03c425192bd4b4fbcce759e
record_format Article
spelling doaj-125e4ce2f03c425192bd4b4fbcce759e2021-05-04T12:25:00ZengEDP SciencesSHS Web of Conferences2261-24242021-01-011020100510.1051/shsconf/202110201005shsconf_etltc2021_01005Evidence-Based Statistical Evaluation of Japanese L2-Learners’ Proficiency using Principal Component AnalysisArai Masafumi0Tsubaki Hajime1Sagisaka Yoshinori2Department of Pure and Applied Mathematics, Waseda UniversityGlobal Information and Telecommunication Institute, Waseda UniversityDepartment of Pure and Applied Mathematics, Waseda UniversityThis paper aims at an automatic evaluation of second language (L2) learners’ proficiencies and tries to analyze English conversation data having 94 statistics and Global Scale scores of the Common European Framework of Reference (CEFR) given to each participant. The CEFR defines Range, Accuracy, Fluency, Interaction and Coherence as 5 subcategories, which constitute the CEFR Global Scale score. The statistics were classified into the CEFR’s 5 subcategories. We used the Principal Component Analysis (PCA), an unsupervised machine learning method, on each subcategory and obtained the participants’ principal component scores (PC scores) of the 5 subcategories for estimation parameters. We predicted the participants’ CEFR Global scores using the Multiple Regression Analysis (MRA). The proposed prediction method using the PC scores was compared with conventional methods with the 94 statistics. Based on the coefficients of determination (R2), the value of the proposed method (0.82) was nearly equivalent to one of values obtained by the conventional methods. Meanwhile, as for standard deviation, the proposed method showed the smallest value in the comparison. The results indicated usability of the PCA and PC scores calculated from the CEFR subcategory data for objective evaluation of L2 learners’ English proficiencies.https://www.shs-conferences.org/articles/shsconf/pdf/2021/13/shsconf_etltc2021_01005.pdfprincipal component analysismultiple regression analysiscefrl2evaluation
collection DOAJ
language English
format Article
sources DOAJ
author Arai Masafumi
Tsubaki Hajime
Sagisaka Yoshinori
spellingShingle Arai Masafumi
Tsubaki Hajime
Sagisaka Yoshinori
Evidence-Based Statistical Evaluation of Japanese L2-Learners’ Proficiency using Principal Component Analysis
SHS Web of Conferences
principal component analysis
multiple regression analysis
cefr
l2
evaluation
author_facet Arai Masafumi
Tsubaki Hajime
Sagisaka Yoshinori
author_sort Arai Masafumi
title Evidence-Based Statistical Evaluation of Japanese L2-Learners’ Proficiency using Principal Component Analysis
title_short Evidence-Based Statistical Evaluation of Japanese L2-Learners’ Proficiency using Principal Component Analysis
title_full Evidence-Based Statistical Evaluation of Japanese L2-Learners’ Proficiency using Principal Component Analysis
title_fullStr Evidence-Based Statistical Evaluation of Japanese L2-Learners’ Proficiency using Principal Component Analysis
title_full_unstemmed Evidence-Based Statistical Evaluation of Japanese L2-Learners’ Proficiency using Principal Component Analysis
title_sort evidence-based statistical evaluation of japanese l2-learners’ proficiency using principal component analysis
publisher EDP Sciences
series SHS Web of Conferences
issn 2261-2424
publishDate 2021-01-01
description This paper aims at an automatic evaluation of second language (L2) learners’ proficiencies and tries to analyze English conversation data having 94 statistics and Global Scale scores of the Common European Framework of Reference (CEFR) given to each participant. The CEFR defines Range, Accuracy, Fluency, Interaction and Coherence as 5 subcategories, which constitute the CEFR Global Scale score. The statistics were classified into the CEFR’s 5 subcategories. We used the Principal Component Analysis (PCA), an unsupervised machine learning method, on each subcategory and obtained the participants’ principal component scores (PC scores) of the 5 subcategories for estimation parameters. We predicted the participants’ CEFR Global scores using the Multiple Regression Analysis (MRA). The proposed prediction method using the PC scores was compared with conventional methods with the 94 statistics. Based on the coefficients of determination (R2), the value of the proposed method (0.82) was nearly equivalent to one of values obtained by the conventional methods. Meanwhile, as for standard deviation, the proposed method showed the smallest value in the comparison. The results indicated usability of the PCA and PC scores calculated from the CEFR subcategory data for objective evaluation of L2 learners’ English proficiencies.
topic principal component analysis
multiple regression analysis
cefr
l2
evaluation
url https://www.shs-conferences.org/articles/shsconf/pdf/2021/13/shsconf_etltc2021_01005.pdf
work_keys_str_mv AT araimasafumi evidencebasedstatisticalevaluationofjapanesel2learnersproficiencyusingprincipalcomponentanalysis
AT tsubakihajime evidencebasedstatisticalevaluationofjapanesel2learnersproficiencyusingprincipalcomponentanalysis
AT sagisakayoshinori evidencebasedstatisticalevaluationofjapanesel2learnersproficiencyusingprincipalcomponentanalysis
_version_ 1721478902942007296