Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach

Medical text classification assigns medical related text into different categories such as topics or disease types. Machine learning based techniques have been widely used to perform such tasks despite the obvious drawback in such “black box” approach, leaving no easy way to fi...

Full description

Bibliographic Details
Main Authors: Menglin Cui, Ruibin Bai, Zheng Lu, Xiang Li, Uwe Aickelin, Peiming Ge
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8864974/
id doaj-9536bd8b483248648213088508e44bd1
record_format Article
spelling doaj-9536bd8b483248648213088508e44bd12021-03-29T23:41:33ZengIEEEIEEE Access2169-35362019-01-01714789214790410.1109/ACCESS.2019.29466228864974Regular Expression Based Medical Text Classification Using Constructive Heuristic ApproachMenglin Cui0https://orcid.org/0000-0002-7296-7718Ruibin Bai1Zheng Lu2Xiang Li3Uwe Aickelin4Peiming Ge5School of Computer Science, University of Nottingham, Ningbo, ChinaSchool of Computer Science, University of Nottingham, Ningbo, ChinaSchool of Computer Science, University of Nottingham, Ningbo, ChinaSchool of Computer Science, University of Nottingham, Ningbo, ChinaSchool of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, AustraliaTechnology Department, Ping An Health Cloud, Shanghai, ChinaMedical text classification assigns medical related text into different categories such as topics or disease types. Machine learning based techniques have been widely used to perform such tasks despite the obvious drawback in such “black box” approach, leaving no easy way to fine-tune the resultant model for better performance. We propose a novel constructive heuristic approach to generate a set of regular expressions that can be used as effective text classifiers. The main innovation of our approach is that we develop a novel regular expression based text classifier with both satisfactory classification performance and excellent interpretability. We evaluate our framework on real-world medical data provided by our collaborator, one of the largest online healthcare providers in the market, and observe the high performance and consistency of this approach. Experimental results show that the machine-generated regular expressions can be effectively used in conjunction with machine learning techniques to perform medical text classification tasks. The proposed methodology improves the performance of baseline methods (Naive Bayes and Support Vector Machines) by 9% in precision and 4.5% in recall. We also evaluate the performance of modified regular expressions by human experts and demonstrate the potential of practical applications using the proposed method.https://ieeexplore.ieee.org/document/8864974/Regular expressionstext classificationconstructive heuristic method
collection DOAJ
language English
format Article
sources DOAJ
author Menglin Cui
Ruibin Bai
Zheng Lu
Xiang Li
Uwe Aickelin
Peiming Ge
spellingShingle Menglin Cui
Ruibin Bai
Zheng Lu
Xiang Li
Uwe Aickelin
Peiming Ge
Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach
IEEE Access
Regular expressions
text classification
constructive heuristic method
author_facet Menglin Cui
Ruibin Bai
Zheng Lu
Xiang Li
Uwe Aickelin
Peiming Ge
author_sort Menglin Cui
title Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach
title_short Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach
title_full Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach
title_fullStr Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach
title_full_unstemmed Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach
title_sort regular expression based medical text classification using constructive heuristic approach
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description Medical text classification assigns medical related text into different categories such as topics or disease types. Machine learning based techniques have been widely used to perform such tasks despite the obvious drawback in such “black box” approach, leaving no easy way to fine-tune the resultant model for better performance. We propose a novel constructive heuristic approach to generate a set of regular expressions that can be used as effective text classifiers. The main innovation of our approach is that we develop a novel regular expression based text classifier with both satisfactory classification performance and excellent interpretability. We evaluate our framework on real-world medical data provided by our collaborator, one of the largest online healthcare providers in the market, and observe the high performance and consistency of this approach. Experimental results show that the machine-generated regular expressions can be effectively used in conjunction with machine learning techniques to perform medical text classification tasks. The proposed methodology improves the performance of baseline methods (Naive Bayes and Support Vector Machines) by 9% in precision and 4.5% in recall. We also evaluate the performance of modified regular expressions by human experts and demonstrate the potential of practical applications using the proposed method.
topic Regular expressions
text classification
constructive heuristic method
url https://ieeexplore.ieee.org/document/8864974/
work_keys_str_mv AT menglincui regularexpressionbasedmedicaltextclassificationusingconstructiveheuristicapproach
AT ruibinbai regularexpressionbasedmedicaltextclassificationusingconstructiveheuristicapproach
AT zhenglu regularexpressionbasedmedicaltextclassificationusingconstructiveheuristicapproach
AT xiangli regularexpressionbasedmedicaltextclassificationusingconstructiveheuristicapproach
AT uweaickelin regularexpressionbasedmedicaltextclassificationusingconstructiveheuristicapproach
AT peimingge regularexpressionbasedmedicaltextclassificationusingconstructiveheuristicapproach
_version_ 1724189109668282368