Temporal Event Tracing on Big Medical Data Analytics

博士 === 國立臺灣大學 === 資訊管理學研究所 === 103 === Backgroud – Global aging trend combined with societal changes are creating population health problems and increasing health care spending. As a precaution, local policy makers have been promoting electronic medical data to help achieve five major goals of healt...

Full description

Bibliographic Details
Main Authors: Chin-Ho Lin, 林慶和
Other Authors: 曹承礎
Format: Others
Language:zh-TW
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/16404955909914962330
id ndltd-TW-103NTU05396014
record_format oai_dc
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 博士 === 國立臺灣大學 === 資訊管理學研究所 === 103 === Backgroud – Global aging trend combined with societal changes are creating population health problems and increasing health care spending. As a precaution, local policy makers have been promoting electronic medical data to help achieve five major goals of health care system: 1) improving health care quality, safety, and performance, 2) committing to patient health needs, 3) improving health care coordination, 4) improving the health of the population, and 5) ensuring privacy and security. However, in order to make these medical data to be "Meaningful Use", to expand data usage, and to create more profits, many research difficulties have to be overcome and it will not an easy task. Currently medical data is scattered in different industries, data collection is difficult, and mutual analysis is rare. Furthermore, medical records have been accumulating to big data after many years. This not only significantly impacts original plan and research, but also creates bonus innovative applications and opportunities. Objectives – Given that the current biomedical field in big data analysis infrastructure is still seriously lagging behind current trend, researchers have to spend considerable time on constructing and organizing their data and on interpreting meaning and identifying issues with these data. To revolutionize biomedical big data analysis, this study proposes a set of methods ranging from data storage to data analysis. Based on this set of methods, two novel applications for big data were verified, 1) prompt testing of medical reported incidents, such as adverse drug reactions reported incidents, 2) timely monitoring and tracking of temporal medical events, such as monitoring of newly marketed drugs. To achieve the objectives, this set of methods must have: 1) timeliness, to quickly respond process results, 2) effectiveness, shall reach low cost reach, 3) scalability, shall allow horizontal expansion of computing power and storage capacity, 4) easy calculation, convenient for testing and calculating tracking indicators, and 5) applicability. Methods – Unlike epidemiological research methods, problems to be studied for tracking and analysis of temporal medical events cannot be delivered in advance. This study proposes a new model, providing an operation mechanism which allows for timely tracking and monitoring of medical events and uncovering relevant information. This model contains four parts, which are: 1) source of data, namely current electronic medical data, 2) data management, including big data storage model PDMdoc, temporal medical events model TMEdoc, and tactics and management of sharded cluster, 3) processing and computing, including sharded cluster operating procedures, cloud computing MapReduce big data processing methods, and an integrated temporal event tracking analysis, 4) tracking indicator, content mainly comprising of a number of indicators, and recording patient index value for every occurrence. Among them, indicators belong to practical application level; therefore impacting whether this model can achieve timely monitoring and tracking function, the essential part lies in data management and efficiency of processing and calculation method. Results – Complexity of the research methods in this study: 1) sharded cluster horizontal scaling and degree of parallelism is 1 unit, specifically, every time a shard is added to the cluster system, the computing power and storage capacity will both be increased by 1 unit, not affected by the number of cluster nodes, 2) network I/O, only relevant to the amount of data for search results, irrelevant to the number of cluster nodes, 3) search and disk I/O, average seek time for PDMdoc and TMEdoc are O(1) and O(logd(STMEdoc/B)), respectively, average disk I/O for seek time, rotational delay, transmission time are "O(1), O(1), O(EPDMdoc)" and "O(logd(STMEdoc/B)), O(1), O(ETMEdoc × LTMEdoc)", respectively. Statistics in experiments performed, 1) data, gathered from Taiwan NHIRD LHID2010 Dataset, containing health care data of a total of one million people for the period 1996 to 2010, 2) test system, sharded cluster containing 3 shard nodes built on MongoDB and five PCs, 3) experiments results: a) benchmarks, the times required to search diseased patients from 8 disease groups for single server system and sharded cluster range from 0.607 to 63.248 seconds and from 0.336 to 29.484 seconds, respectively, the two systems have performance ratio of 1:2.024, b) adverse drug reactions reported incidents, take Januvia drug safety information published by FDA in September, 2009 for example, the test result for odds ratio is 1.626, showing that this type of incidents had significant occurrences in Taiwan as well, c) monitoring for newly marketed drugs, system processing capacity for number of TME can exceed 140,000 per second, the daily number of drugs that can be monitored is estimated to be above tens of thousands.
author2 曹承礎
author_facet 曹承礎
Chin-Ho Lin
林慶和
author Chin-Ho Lin
林慶和
spellingShingle Chin-Ho Lin
林慶和
Temporal Event Tracing on Big Medical Data Analytics
author_sort Chin-Ho Lin
title Temporal Event Tracing on Big Medical Data Analytics
title_short Temporal Event Tracing on Big Medical Data Analytics
title_full Temporal Event Tracing on Big Medical Data Analytics
title_fullStr Temporal Event Tracing on Big Medical Data Analytics
title_full_unstemmed Temporal Event Tracing on Big Medical Data Analytics
title_sort temporal event tracing on big medical data analytics
publishDate 2015
url http://ndltd.ncl.edu.tw/handle/16404955909914962330
work_keys_str_mv AT chinholin temporaleventtracingonbigmedicaldataanalytics
AT línqìnghé temporaleventtracingonbigmedicaldataanalytics
AT chinholin jùliàngyīliáozīliàozhīshíxùshìjiànzhuīzōngyǔfēnxī
AT línqìnghé jùliàngyīliáozīliàozhīshíxùshìjiànzhuīzōngyǔfēnxī
_version_ 1718394355246432256
spelling ndltd-TW-103NTU053960142016-11-19T04:09:46Z http://ndltd.ncl.edu.tw/handle/16404955909914962330 Temporal Event Tracing on Big Medical Data Analytics 巨量醫療資料之時序事件追蹤與分析 Chin-Ho Lin 林慶和 博士 國立臺灣大學 資訊管理學研究所 103 Backgroud – Global aging trend combined with societal changes are creating population health problems and increasing health care spending. As a precaution, local policy makers have been promoting electronic medical data to help achieve five major goals of health care system: 1) improving health care quality, safety, and performance, 2) committing to patient health needs, 3) improving health care coordination, 4) improving the health of the population, and 5) ensuring privacy and security. However, in order to make these medical data to be "Meaningful Use", to expand data usage, and to create more profits, many research difficulties have to be overcome and it will not an easy task. Currently medical data is scattered in different industries, data collection is difficult, and mutual analysis is rare. Furthermore, medical records have been accumulating to big data after many years. This not only significantly impacts original plan and research, but also creates bonus innovative applications and opportunities. Objectives – Given that the current biomedical field in big data analysis infrastructure is still seriously lagging behind current trend, researchers have to spend considerable time on constructing and organizing their data and on interpreting meaning and identifying issues with these data. To revolutionize biomedical big data analysis, this study proposes a set of methods ranging from data storage to data analysis. Based on this set of methods, two novel applications for big data were verified, 1) prompt testing of medical reported incidents, such as adverse drug reactions reported incidents, 2) timely monitoring and tracking of temporal medical events, such as monitoring of newly marketed drugs. To achieve the objectives, this set of methods must have: 1) timeliness, to quickly respond process results, 2) effectiveness, shall reach low cost reach, 3) scalability, shall allow horizontal expansion of computing power and storage capacity, 4) easy calculation, convenient for testing and calculating tracking indicators, and 5) applicability. Methods – Unlike epidemiological research methods, problems to be studied for tracking and analysis of temporal medical events cannot be delivered in advance. This study proposes a new model, providing an operation mechanism which allows for timely tracking and monitoring of medical events and uncovering relevant information. This model contains four parts, which are: 1) source of data, namely current electronic medical data, 2) data management, including big data storage model PDMdoc, temporal medical events model TMEdoc, and tactics and management of sharded cluster, 3) processing and computing, including sharded cluster operating procedures, cloud computing MapReduce big data processing methods, and an integrated temporal event tracking analysis, 4) tracking indicator, content mainly comprising of a number of indicators, and recording patient index value for every occurrence. Among them, indicators belong to practical application level; therefore impacting whether this model can achieve timely monitoring and tracking function, the essential part lies in data management and efficiency of processing and calculation method. Results – Complexity of the research methods in this study: 1) sharded cluster horizontal scaling and degree of parallelism is 1 unit, specifically, every time a shard is added to the cluster system, the computing power and storage capacity will both be increased by 1 unit, not affected by the number of cluster nodes, 2) network I/O, only relevant to the amount of data for search results, irrelevant to the number of cluster nodes, 3) search and disk I/O, average seek time for PDMdoc and TMEdoc are O(1) and O(logd(STMEdoc/B)), respectively, average disk I/O for seek time, rotational delay, transmission time are "O(1), O(1), O(EPDMdoc)" and "O(logd(STMEdoc/B)), O(1), O(ETMEdoc × LTMEdoc)", respectively. Statistics in experiments performed, 1) data, gathered from Taiwan NHIRD LHID2010 Dataset, containing health care data of a total of one million people for the period 1996 to 2010, 2) test system, sharded cluster containing 3 shard nodes built on MongoDB and five PCs, 3) experiments results: a) benchmarks, the times required to search diseased patients from 8 disease groups for single server system and sharded cluster range from 0.607 to 63.248 seconds and from 0.336 to 29.484 seconds, respectively, the two systems have performance ratio of 1:2.024, b) adverse drug reactions reported incidents, take Januvia drug safety information published by FDA in September, 2009 for example, the test result for odds ratio is 1.626, showing that this type of incidents had significant occurrences in Taiwan as well, c) monitoring for newly marketed drugs, system processing capacity for number of TME can exceed 140,000 per second, the daily number of drugs that can be monitored is estimated to be above tens of thousands. 曹承礎 2015 學位論文 ; thesis 75 zh-TW