Summary: | In recent years, advances in technologies, such as machine learning, natural language processing, and automated data processing, have offered potential biomedical and public health applications that use massive data sources, e.g., social media. However, current methods are underutilized for features including consumer health terminology in social media texts. In this paper, we proposed a medical social media text classification (MSMTC) algorithm that integrates consumer health terminology. Classification of text from social media on medical subjects is divided into two sub-tasks: consumer health terminology extraction and text classification. First, text characteristics based on the double channel structure are used for training, and consumer health terminology is subsequently extracted-based using an adversarial network. Then, text classification is implemented based on the extracted consumer health terminology and double channel subtraction method. This paper takes datasets containing patient descriptions from social media as an example. The experimental results show that the algorithm outperforms single channel methods or baseline models, including Convolutional Neural Networks, Long Short-Term Memory Networks, Bi-directional Long Short-Term Memory Networks, Naive Bayesian Model, and Extreme Gradient Boosting.
|