Creation of a Next-Generation Standardized Drug Groupingfor QT Prolonging Reactions using Machine Learning Techniques

This project aims to support pharmacovigilance, the science and activities relating to drug-safety and prevention of adverse drug reactions (ADRs). We focus on a specific ADR called QT prolongation, a serious reaction affecting the heartbeat. Our main goal is to group medicinal ingredients that migh...

Full description

Bibliographic Details
Main Authors: Tiensuu, Jacob, Rådahl, Elsa
Format: Others
Language:English
Published: Uppsala universitet, Avdelningen för systemteknik 2021
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447980
id ndltd-UPSALLA1-oai-DiVA.org-uu-447980
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-uu-4479802021-07-02T05:24:03ZCreation of a Next-Generation Standardized Drug Groupingfor QT Prolonging Reactions using Machine Learning TechniquesengTiensuu, JacobRådahl, ElsaUppsala universitet, Avdelningen för systemteknik2021PharmacovigilanceAdverse Drug ReactionsMedDRAVigiBaseWHODrug GlobalQT prolongationTorsades de PointesIndividual Case Safety ReportsText RecognitionStandardised Drug GroupingMultinomial Logistic RegressionBERTOther Engineering and TechnologiesAnnan teknikThis project aims to support pharmacovigilance, the science and activities relating to drug-safety and prevention of adverse drug reactions (ADRs). We focus on a specific ADR called QT prolongation, a serious reaction affecting the heartbeat. Our main goal is to group medicinal ingredients that might cause QT prolongation. This grouping can be used in safety analysis and for exclusion lists in clinical studies. It should preferably be ranked according to level of suspected correlation. We wished to create an automated and standardised process. Drug safety-related reports describing patients' experienced ADRs and what medicinal products they have taken are collected in a database called VigiBase, that we have used as source for ingredient extraction. The ADRs are described in free-texts and coded using an international standardised terminology. This helps us to process the data and filter ingredients included in a report that describes QT prolongation. To broaden our project scope to include uncoded data, we extended the process to use free-text verbatims describing the ADR as input. By processing and filtering the free-text data and training a classification model for natural language processing released by Google on VigiBase data, we were able to predict if a free-text verbatim is describing QT prolongation. The classification resulted in an F1-score of 98%. For the ingredients extracted from VigiBase, we wanted to validate if there is a known connection to QT prolongation. The VigiBase occurrences is a parameter to consider, but it might be misleading since a report can include several drugs, and a drug can include several ingredients, making it hard to validate the cause. For validation, we used product labels connected to each ingredient of interest. We used a tool to download, scan and code product labels in order to see which ones mention QT prolongation. To rank our final list of ingredients according to level of suspected QT prolongation correlation, we used a multinomial logistic regression model. As training data, we used a data subset manually labeled by pharmacists. Used on unlabeled validation data, the model accuracy was 68%. Analyzing the training data showed that it was not easily separated linearly explaining the limited classification performance. The final ranked list of ingredients suspected to cause QT prolongation consists of 1086 ingredients. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447980UPTEC F, 1401-5757 ; 21028application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Pharmacovigilance
Adverse Drug Reactions
MedDRA
VigiBase
WHODrug Global
QT prolongation
Torsades de Pointes
Individual Case Safety Reports
Text Recognition
Standardised Drug Grouping
Multinomial Logistic Regression
BERT
Other Engineering and Technologies
Annan teknik
spellingShingle Pharmacovigilance
Adverse Drug Reactions
MedDRA
VigiBase
WHODrug Global
QT prolongation
Torsades de Pointes
Individual Case Safety Reports
Text Recognition
Standardised Drug Grouping
Multinomial Logistic Regression
BERT
Other Engineering and Technologies
Annan teknik
Tiensuu, Jacob
Rådahl, Elsa
Creation of a Next-Generation Standardized Drug Groupingfor QT Prolonging Reactions using Machine Learning Techniques
description This project aims to support pharmacovigilance, the science and activities relating to drug-safety and prevention of adverse drug reactions (ADRs). We focus on a specific ADR called QT prolongation, a serious reaction affecting the heartbeat. Our main goal is to group medicinal ingredients that might cause QT prolongation. This grouping can be used in safety analysis and for exclusion lists in clinical studies. It should preferably be ranked according to level of suspected correlation. We wished to create an automated and standardised process. Drug safety-related reports describing patients' experienced ADRs and what medicinal products they have taken are collected in a database called VigiBase, that we have used as source for ingredient extraction. The ADRs are described in free-texts and coded using an international standardised terminology. This helps us to process the data and filter ingredients included in a report that describes QT prolongation. To broaden our project scope to include uncoded data, we extended the process to use free-text verbatims describing the ADR as input. By processing and filtering the free-text data and training a classification model for natural language processing released by Google on VigiBase data, we were able to predict if a free-text verbatim is describing QT prolongation. The classification resulted in an F1-score of 98%. For the ingredients extracted from VigiBase, we wanted to validate if there is a known connection to QT prolongation. The VigiBase occurrences is a parameter to consider, but it might be misleading since a report can include several drugs, and a drug can include several ingredients, making it hard to validate the cause. For validation, we used product labels connected to each ingredient of interest. We used a tool to download, scan and code product labels in order to see which ones mention QT prolongation. To rank our final list of ingredients according to level of suspected QT prolongation correlation, we used a multinomial logistic regression model. As training data, we used a data subset manually labeled by pharmacists. Used on unlabeled validation data, the model accuracy was 68%. Analyzing the training data showed that it was not easily separated linearly explaining the limited classification performance. The final ranked list of ingredients suspected to cause QT prolongation consists of 1086 ingredients.
author Tiensuu, Jacob
Rådahl, Elsa
author_facet Tiensuu, Jacob
Rådahl, Elsa
author_sort Tiensuu, Jacob
title Creation of a Next-Generation Standardized Drug Groupingfor QT Prolonging Reactions using Machine Learning Techniques
title_short Creation of a Next-Generation Standardized Drug Groupingfor QT Prolonging Reactions using Machine Learning Techniques
title_full Creation of a Next-Generation Standardized Drug Groupingfor QT Prolonging Reactions using Machine Learning Techniques
title_fullStr Creation of a Next-Generation Standardized Drug Groupingfor QT Prolonging Reactions using Machine Learning Techniques
title_full_unstemmed Creation of a Next-Generation Standardized Drug Groupingfor QT Prolonging Reactions using Machine Learning Techniques
title_sort creation of a next-generation standardized drug groupingfor qt prolonging reactions using machine learning techniques
publisher Uppsala universitet, Avdelningen för systemteknik
publishDate 2021
url http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447980
work_keys_str_mv AT tiensuujacob creationofanextgenerationstandardizeddruggroupingforqtprolongingreactionsusingmachinelearningtechniques
AT radahlelsa creationofanextgenerationstandardizeddruggroupingforqtprolongingreactionsusingmachinelearningtechniques
_version_ 1719415536263102464