Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis

BackgroundSemantic textual similarity (STS) is a natural language processing (NLP) task that involves assigning a similarity score to 2 snippets of text based on their meaning. This task is particularly difficult in the domain of clinical text, which often features specialize...

Full description

Bibliographic Details
Main Authors:	Ormerod, Mark, Martínez del Rincón, Jesús, Devereux, Barry
Format:	Article
Language:	English
Published:	JMIR Publications 2021-05-01
Series:	JMIR Medical Informatics
Online Access:	https://medinform.jmir.org/2021/5/e23099

id	doaj-536bf594777f46109b0b20e5cad4fad4
record_format	Article
spelling	doaj-536bf594777f46109b0b20e5cad4fad42021-05-26T13:01:48ZengJMIR PublicationsJMIR Medical Informatics2291-96942021-05-0195e2309910.2196/23099Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational AnalysisOrmerod, MarkMartínez del Rincón, JesúsDevereux, Barry BackgroundSemantic textual similarity (STS) is a natural language processing (NLP) task that involves assigning a similarity score to 2 snippets of text based on their meaning. This task is particularly difficult in the domain of clinical text, which often features specialized language and the frequent use of abbreviations. ObjectiveWe created an NLP system to predict similarity scores for sentence pairs as part of the Clinical Semantic Textual Similarity track in the 2019 n2c2/OHNLP Shared Task on Challenges in Natural Language Processing for Clinical Data. We subsequently sought to analyze the intermediary token vectors extracted from our models while processing a pair of clinical sentences to identify where and how representations of semantic similarity are built in transformer models. MethodsGiven a clinical sentence pair, we take the average predicted similarity score across several independently fine-tuned transformers. In our model analysis we investigated the relationship between the final model’s loss and surface features of the sentence pairs and assessed the decodability and representational similarity of the token vectors generated by each model. ResultsOur model achieved a correlation of 0.87 with the ground-truth similarity score, reaching 6th place out of 33 teams (with a first-place score of 0.90). In detailed qualitative and quantitative analyses of the model’s loss, we identified the system’s failure to correctly model semantic similarity when both sentence pairs contain details of medical prescriptions, as well as its general tendency to overpredict semantic similarity given significant token overlap. The token vector analysis revealed divergent representational strategies for predicting textual similarity between bidirectional encoder representations from transformers (BERT)–style models and XLNet. We also found that a large amount information relevant to predicting STS can be captured using a combination of a classification token and the cosine distance between sentence-pair representations in the first layer of a transformer model that did not produce the best predictions on the test set. ConclusionsWe designed and trained a system that uses state-of-the-art NLP models to achieve very competitive results on a new clinical STS data set. As our approach uses no hand-crafted rules, it serves as a strong deep learning baseline for this task. Our key contribution is a detailed analysis of the model’s outputs and an investigation of the heuristic biases learned by transformer models. We suggest future improvements based on these findings. In our representational analysis we explore how different transformer models converge or diverge in their representation of semantic signals as the tokens of the sentences are augmented by successive layers. This analysis sheds light on how these “black box” models integrate semantic similarity information in intermediate layers, and points to new research directions in model distillation and sentence embedding extraction for applications in clinical NLP.https://medinform.jmir.org/2021/5/e23099
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Ormerod, Mark Martínez del Rincón, Jesús Devereux, Barry
spellingShingle	Ormerod, Mark Martínez del Rincón, Jesús Devereux, Barry Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis JMIR Medical Informatics
author_facet	Ormerod, Mark Martínez del Rincón, Jesús Devereux, Barry
author_sort	Ormerod, Mark
title	Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis
title_short	Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis
title_full	Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis
title_fullStr	Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis
title_full_unstemmed	Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis
title_sort	predicting semantic similarity between clinical sentence pairs using transformer models: evaluation and representational analysis
publisher	JMIR Publications
series	JMIR Medical Informatics
issn	2291-9694
publishDate	2021-05-01
description	BackgroundSemantic textual similarity (STS) is a natural language processing (NLP) task that involves assigning a similarity score to 2 snippets of text based on their meaning. This task is particularly difficult in the domain of clinical text, which often features specialized language and the frequent use of abbreviations. ObjectiveWe created an NLP system to predict similarity scores for sentence pairs as part of the Clinical Semantic Textual Similarity track in the 2019 n2c2/OHNLP Shared Task on Challenges in Natural Language Processing for Clinical Data. We subsequently sought to analyze the intermediary token vectors extracted from our models while processing a pair of clinical sentences to identify where and how representations of semantic similarity are built in transformer models. MethodsGiven a clinical sentence pair, we take the average predicted similarity score across several independently fine-tuned transformers. In our model analysis we investigated the relationship between the final model’s loss and surface features of the sentence pairs and assessed the decodability and representational similarity of the token vectors generated by each model. ResultsOur model achieved a correlation of 0.87 with the ground-truth similarity score, reaching 6th place out of 33 teams (with a first-place score of 0.90). In detailed qualitative and quantitative analyses of the model’s loss, we identified the system’s failure to correctly model semantic similarity when both sentence pairs contain details of medical prescriptions, as well as its general tendency to overpredict semantic similarity given significant token overlap. The token vector analysis revealed divergent representational strategies for predicting textual similarity between bidirectional encoder representations from transformers (BERT)–style models and XLNet. We also found that a large amount information relevant to predicting STS can be captured using a combination of a classification token and the cosine distance between sentence-pair representations in the first layer of a transformer model that did not produce the best predictions on the test set. ConclusionsWe designed and trained a system that uses state-of-the-art NLP models to achieve very competitive results on a new clinical STS data set. As our approach uses no hand-crafted rules, it serves as a strong deep learning baseline for this task. Our key contribution is a detailed analysis of the model’s outputs and an investigation of the heuristic biases learned by transformer models. We suggest future improvements based on these findings. In our representational analysis we explore how different transformer models converge or diverge in their representation of semantic signals as the tokens of the sentences are augmented by successive layers. This analysis sheds light on how these “black box” models integrate semantic similarity information in intermediate layers, and points to new research directions in model distillation and sentence embedding extraction for applications in clinical NLP.
url	https://medinform.jmir.org/2021/5/e23099
work_keys_str_mv	AT ormerodmark predictingsemanticsimilaritybetweenclinicalsentencepairsusingtransformermodelsevaluationandrepresentationalanalysis AT martinezdelrinconjesus predictingsemanticsimilaritybetweenclinicalsentencepairsusingtransformermodelsevaluationandrepresentationalanalysis AT devereuxbarry predictingsemanticsimilaritybetweenclinicalsentencepairsusingtransformermodelsevaluationandrepresentationalanalysis
_version_	1721426300181151744

Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis

Similar Items