Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach
Medical text classification assigns medical related text into different categories such as topics or disease types. Machine learning based techniques have been widely used to perform such tasks despite the obvious drawback in such “black box” approach, leaving no easy way to fi...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8864974/ |
id |
doaj-9536bd8b483248648213088508e44bd1 |
---|---|
record_format |
Article |
spelling |
doaj-9536bd8b483248648213088508e44bd12021-03-29T23:41:33ZengIEEEIEEE Access2169-35362019-01-01714789214790410.1109/ACCESS.2019.29466228864974Regular Expression Based Medical Text Classification Using Constructive Heuristic ApproachMenglin Cui0https://orcid.org/0000-0002-7296-7718Ruibin Bai1Zheng Lu2Xiang Li3Uwe Aickelin4Peiming Ge5School of Computer Science, University of Nottingham, Ningbo, ChinaSchool of Computer Science, University of Nottingham, Ningbo, ChinaSchool of Computer Science, University of Nottingham, Ningbo, ChinaSchool of Computer Science, University of Nottingham, Ningbo, ChinaSchool of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, AustraliaTechnology Department, Ping An Health Cloud, Shanghai, ChinaMedical text classification assigns medical related text into different categories such as topics or disease types. Machine learning based techniques have been widely used to perform such tasks despite the obvious drawback in such “black box” approach, leaving no easy way to fine-tune the resultant model for better performance. We propose a novel constructive heuristic approach to generate a set of regular expressions that can be used as effective text classifiers. The main innovation of our approach is that we develop a novel regular expression based text classifier with both satisfactory classification performance and excellent interpretability. We evaluate our framework on real-world medical data provided by our collaborator, one of the largest online healthcare providers in the market, and observe the high performance and consistency of this approach. Experimental results show that the machine-generated regular expressions can be effectively used in conjunction with machine learning techniques to perform medical text classification tasks. The proposed methodology improves the performance of baseline methods (Naive Bayes and Support Vector Machines) by 9% in precision and 4.5% in recall. We also evaluate the performance of modified regular expressions by human experts and demonstrate the potential of practical applications using the proposed method.https://ieeexplore.ieee.org/document/8864974/Regular expressionstext classificationconstructive heuristic method |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Menglin Cui Ruibin Bai Zheng Lu Xiang Li Uwe Aickelin Peiming Ge |
spellingShingle |
Menglin Cui Ruibin Bai Zheng Lu Xiang Li Uwe Aickelin Peiming Ge Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach IEEE Access Regular expressions text classification constructive heuristic method |
author_facet |
Menglin Cui Ruibin Bai Zheng Lu Xiang Li Uwe Aickelin Peiming Ge |
author_sort |
Menglin Cui |
title |
Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach |
title_short |
Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach |
title_full |
Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach |
title_fullStr |
Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach |
title_full_unstemmed |
Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach |
title_sort |
regular expression based medical text classification using constructive heuristic approach |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2019-01-01 |
description |
Medical text classification assigns medical related text into different categories such as topics or disease types. Machine learning based techniques have been widely used to perform such tasks despite the obvious drawback in such “black box” approach, leaving no easy way to fine-tune the resultant model for better performance. We propose a novel constructive heuristic approach to generate a set of regular expressions that can be used as effective text classifiers. The main innovation of our approach is that we develop a novel regular expression based text classifier with both satisfactory classification performance and excellent interpretability. We evaluate our framework on real-world medical data provided by our collaborator, one of the largest online healthcare providers in the market, and observe the high performance and consistency of this approach. Experimental results show that the machine-generated regular expressions can be effectively used in conjunction with machine learning techniques to perform medical text classification tasks. The proposed methodology improves the performance of baseline methods (Naive Bayes and Support Vector Machines) by 9% in precision and 4.5% in recall. We also evaluate the performance of modified regular expressions by human experts and demonstrate the potential of practical applications using the proposed method. |
topic |
Regular expressions text classification constructive heuristic method |
url |
https://ieeexplore.ieee.org/document/8864974/ |
work_keys_str_mv |
AT menglincui regularexpressionbasedmedicaltextclassificationusingconstructiveheuristicapproach AT ruibinbai regularexpressionbasedmedicaltextclassificationusingconstructiveheuristicapproach AT zhenglu regularexpressionbasedmedicaltextclassificationusingconstructiveheuristicapproach AT xiangli regularexpressionbasedmedicaltextclassificationusingconstructiveheuristicapproach AT uweaickelin regularexpressionbasedmedicaltextclassificationusingconstructiveheuristicapproach AT peimingge regularexpressionbasedmedicaltextclassificationusingconstructiveheuristicapproach |
_version_ |
1724189109668282368 |