SynSys: A Synthetic Data Generation System for Healthcare Applications

Creation of realistic synthetic behavior-based sensor data is an important aspect of testing machine learning techniques for healthcare applications. Many of the existing approaches for generating synthetic data are often limited in terms of complexity and realism. We introduce SynSys, a machine lea...

Full description

Bibliographic Details
Main Authors: Jessamyn Dahmen, Diane Cook
Format: Article
Language:English
Published: MDPI AG 2019-03-01
Series:Sensors
Subjects:
Online Access:http://www.mdpi.com/1424-8220/19/5/1181
id doaj-da636e3a7d5f4334bf8e9a2773e2daf0
record_format Article
spelling doaj-da636e3a7d5f4334bf8e9a2773e2daf02020-11-25T02:17:23ZengMDPI AGSensors1424-82202019-03-01195118110.3390/s19051181s19051181SynSys: A Synthetic Data Generation System for Healthcare ApplicationsJessamyn Dahmen0Diane Cook1School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164, USASchool of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164, USACreation of realistic synthetic behavior-based sensor data is an important aspect of testing machine learning techniques for healthcare applications. Many of the existing approaches for generating synthetic data are often limited in terms of complexity and realism. We introduce SynSys, a machine learning-based synthetic data generation method, to improve upon these limitations. We use this method to generate synthetic time series data that is composed of nested sequences using hidden Markov models and regression models which are initially trained on real datasets. We test our synthetic data generation technique on a real annotated smart home dataset. We use time series distance measures as a baseline to determine how realistic the generated data is compared to real data and demonstrate that SynSys produces more realistic data in terms of distance compared to random data generation, data from another home, and data from another time period. Finally, we apply our synthetic data generation technique to the problem of generating data when only a small amount of ground truth data is available. Using semi-supervised learning we demonstrate that SynSys is able to improve activity recognition accuracy compared to using the small amount of real data alone.http://www.mdpi.com/1424-8220/19/5/1181Synthetic datahidden Markov modelsregressionsmart homeshealthcare dataactivity recognition
collection DOAJ
language English
format Article
sources DOAJ
author Jessamyn Dahmen
Diane Cook
spellingShingle Jessamyn Dahmen
Diane Cook
SynSys: A Synthetic Data Generation System for Healthcare Applications
Sensors
Synthetic data
hidden Markov models
regression
smart homes
healthcare data
activity recognition
author_facet Jessamyn Dahmen
Diane Cook
author_sort Jessamyn Dahmen
title SynSys: A Synthetic Data Generation System for Healthcare Applications
title_short SynSys: A Synthetic Data Generation System for Healthcare Applications
title_full SynSys: A Synthetic Data Generation System for Healthcare Applications
title_fullStr SynSys: A Synthetic Data Generation System for Healthcare Applications
title_full_unstemmed SynSys: A Synthetic Data Generation System for Healthcare Applications
title_sort synsys: a synthetic data generation system for healthcare applications
publisher MDPI AG
series Sensors
issn 1424-8220
publishDate 2019-03-01
description Creation of realistic synthetic behavior-based sensor data is an important aspect of testing machine learning techniques for healthcare applications. Many of the existing approaches for generating synthetic data are often limited in terms of complexity and realism. We introduce SynSys, a machine learning-based synthetic data generation method, to improve upon these limitations. We use this method to generate synthetic time series data that is composed of nested sequences using hidden Markov models and regression models which are initially trained on real datasets. We test our synthetic data generation technique on a real annotated smart home dataset. We use time series distance measures as a baseline to determine how realistic the generated data is compared to real data and demonstrate that SynSys produces more realistic data in terms of distance compared to random data generation, data from another home, and data from another time period. Finally, we apply our synthetic data generation technique to the problem of generating data when only a small amount of ground truth data is available. Using semi-supervised learning we demonstrate that SynSys is able to improve activity recognition accuracy compared to using the small amount of real data alone.
topic Synthetic data
hidden Markov models
regression
smart homes
healthcare data
activity recognition
url http://www.mdpi.com/1424-8220/19/5/1181
work_keys_str_mv AT jessamyndahmen synsysasyntheticdatagenerationsystemforhealthcareapplications
AT dianecook synsysasyntheticdatagenerationsystemforhealthcareapplications
_version_ 1724886664696823808