What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams
Open domain question answering (OpenQA) tasks have been recently attracting more and more attention from the natural language processing (NLP) community. In this work, we present the first free-form multiple-choice OpenQA dataset for solving medical problems, <span style="font-variant: small...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-07-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/11/14/6421 |
id |
doaj-11ebc944223d456caec3066e6152ebd1 |
---|---|
record_format |
Article |
spelling |
doaj-11ebc944223d456caec3066e6152ebd12021-07-23T13:29:34ZengMDPI AGApplied Sciences2076-34172021-07-01116421642110.3390/app11146421What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical ExamsDi Jin0Eileen Pan1Nassim Oufattole2Wei-Hung Weng3Hanyi Fang4Peter Szolovits5Computer Science and Artificial Intelligence, Massachusetts Institute of Technology, Cambridge, MA 02139, USAComputer Science and Artificial Intelligence, Massachusetts Institute of Technology, Cambridge, MA 02139, USAComputer Science and Artificial Intelligence, Massachusetts Institute of Technology, Cambridge, MA 02139, USAComputer Science and Artificial Intelligence, Massachusetts Institute of Technology, Cambridge, MA 02139, USATongji Medical College, Huazhong University of Science and Technology, Wuhan 430074, ChinaComputer Science and Artificial Intelligence, Massachusetts Institute of Technology, Cambridge, MA 02139, USAOpen domain question answering (OpenQA) tasks have been recently attracting more and more attention from the natural language processing (NLP) community. In this work, we present the first free-form multiple-choice OpenQA dataset for solving medical problems, <span style="font-variant: small-caps;">MedQA</span>, collected from the professional medical board exams. It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively. We implement both rule-based and popular neural methods by sequentially combining a document retriever and a machine comprehension model. Through experiments, we find that even the current best method can only achieve 36.7%, 42.0%, and 70.1% of test accuracy on the English, traditional Chinese, and simplified Chinese questions, respectively. We expect <span style="font-variant: small-caps;">MedQA</span> to present great challenges to existing OpenQA systems and hope that it can serve as a platform to promote much stronger OpenQA models from the NLP community in the future.https://www.mdpi.com/2076-3417/11/14/6421natural language processingopen-domain question answeringmulti-choice question answeringclinical question answering |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Di Jin Eileen Pan Nassim Oufattole Wei-Hung Weng Hanyi Fang Peter Szolovits |
spellingShingle |
Di Jin Eileen Pan Nassim Oufattole Wei-Hung Weng Hanyi Fang Peter Szolovits What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams Applied Sciences natural language processing open-domain question answering multi-choice question answering clinical question answering |
author_facet |
Di Jin Eileen Pan Nassim Oufattole Wei-Hung Weng Hanyi Fang Peter Szolovits |
author_sort |
Di Jin |
title |
What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams |
title_short |
What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams |
title_full |
What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams |
title_fullStr |
What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams |
title_full_unstemmed |
What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams |
title_sort |
what disease does this patient have? a large-scale open domain question answering dataset from medical exams |
publisher |
MDPI AG |
series |
Applied Sciences |
issn |
2076-3417 |
publishDate |
2021-07-01 |
description |
Open domain question answering (OpenQA) tasks have been recently attracting more and more attention from the natural language processing (NLP) community. In this work, we present the first free-form multiple-choice OpenQA dataset for solving medical problems, <span style="font-variant: small-caps;">MedQA</span>, collected from the professional medical board exams. It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively. We implement both rule-based and popular neural methods by sequentially combining a document retriever and a machine comprehension model. Through experiments, we find that even the current best method can only achieve 36.7%, 42.0%, and 70.1% of test accuracy on the English, traditional Chinese, and simplified Chinese questions, respectively. We expect <span style="font-variant: small-caps;">MedQA</span> to present great challenges to existing OpenQA systems and hope that it can serve as a platform to promote much stronger OpenQA models from the NLP community in the future. |
topic |
natural language processing open-domain question answering multi-choice question answering clinical question answering |
url |
https://www.mdpi.com/2076-3417/11/14/6421 |
work_keys_str_mv |
AT dijin whatdiseasedoesthispatienthavealargescaleopendomainquestionansweringdatasetfrommedicalexams AT eileenpan whatdiseasedoesthispatienthavealargescaleopendomainquestionansweringdatasetfrommedicalexams AT nassimoufattole whatdiseasedoesthispatienthavealargescaleopendomainquestionansweringdatasetfrommedicalexams AT weihungweng whatdiseasedoesthispatienthavealargescaleopendomainquestionansweringdatasetfrommedicalexams AT hanyifang whatdiseasedoesthispatienthavealargescaleopendomainquestionansweringdatasetfrommedicalexams AT peterszolovits whatdiseasedoesthispatienthavealargescaleopendomainquestionansweringdatasetfrommedicalexams |
_version_ |
1721289495070900224 |