J48SS: A Novel Decision Tree Approach for the Handling of Sequential and Time Series Data

Temporal information plays a very important role in many analysis tasks, and can be encoded in at least two different ways. It can be modeled by discrete sequences of events as, for example, in the business intelligence domain, with the aim of tracking the evolution of customer behaviors over time....

Full description

Bibliographic Details
Main Authors: Andrea Brunello, Enrico Marzano, Angelo Montanari, Guido Sciavicco
Format: Article
Language:English
Published: MDPI AG 2019-03-01
Series:Computers
Subjects:
Online Access:http://www.mdpi.com/2073-431X/8/1/21
id doaj-4d375ad03601424392a23492975b4cc8
record_format Article
spelling doaj-4d375ad03601424392a23492975b4cc82020-11-25T00:25:24ZengMDPI AGComputers2073-431X2019-03-01812110.3390/computers8010021computers8010021J48SS: A Novel Decision Tree Approach for the Handling of Sequential and Time Series DataAndrea Brunello0Enrico Marzano1Angelo Montanari2Guido Sciavicco3Department of Mathematics, Computer Science and Physics, University of Udine, Via delle Scienze, 206, 33100 Udine, ItalyR&D Deparment, Gap S.r.l.u., Via Tricesimo, 246, 33100 Udine, ItalyDepartment of Mathematics, Computer Science and Physics, University of Udine, Via delle Scienze, 206, 33100 Udine, ItalyDepartment of Mathematics and Computer Science, University of Ferrara, Via Giuseppe Saragat, 1, 44122 Ferrara, ItalyTemporal information plays a very important role in many analysis tasks, and can be encoded in at least two different ways. It can be modeled by discrete sequences of events as, for example, in the business intelligence domain, with the aim of tracking the evolution of customer behaviors over time. Alternatively, it can be represented by time series, as in the stock market to characterize price histories. In some analysis tasks, temporal information is complemented by other kinds of data, which may be represented by static attributes, e.g., categorical or numerical ones. This paper presents J48SS, a novel decision tree inducer capable of natively mixing static (i.e., numerical and categorical), sequential, and time series data for classification purposes. The novel algorithm is based on the popular C4.5 decision tree learner, and it relies on the concepts of frequent pattern extraction and time series shapelet generation. The algorithm is evaluated on a text classification task in a real business setting, as well as on a selection of public UCR time series datasets. Results show that it is capable of providing competitive classification performances, while generating highly interpretable models and effectively reducing the data preparation effort.http://www.mdpi.com/2073-431X/8/1/21machine learningdecision treessequential datapattern miningtime series classificationevolutionary algorithms
collection DOAJ
language English
format Article
sources DOAJ
author Andrea Brunello
Enrico Marzano
Angelo Montanari
Guido Sciavicco
spellingShingle Andrea Brunello
Enrico Marzano
Angelo Montanari
Guido Sciavicco
J48SS: A Novel Decision Tree Approach for the Handling of Sequential and Time Series Data
Computers
machine learning
decision trees
sequential data
pattern mining
time series classification
evolutionary algorithms
author_facet Andrea Brunello
Enrico Marzano
Angelo Montanari
Guido Sciavicco
author_sort Andrea Brunello
title J48SS: A Novel Decision Tree Approach for the Handling of Sequential and Time Series Data
title_short J48SS: A Novel Decision Tree Approach for the Handling of Sequential and Time Series Data
title_full J48SS: A Novel Decision Tree Approach for the Handling of Sequential and Time Series Data
title_fullStr J48SS: A Novel Decision Tree Approach for the Handling of Sequential and Time Series Data
title_full_unstemmed J48SS: A Novel Decision Tree Approach for the Handling of Sequential and Time Series Data
title_sort j48ss: a novel decision tree approach for the handling of sequential and time series data
publisher MDPI AG
series Computers
issn 2073-431X
publishDate 2019-03-01
description Temporal information plays a very important role in many analysis tasks, and can be encoded in at least two different ways. It can be modeled by discrete sequences of events as, for example, in the business intelligence domain, with the aim of tracking the evolution of customer behaviors over time. Alternatively, it can be represented by time series, as in the stock market to characterize price histories. In some analysis tasks, temporal information is complemented by other kinds of data, which may be represented by static attributes, e.g., categorical or numerical ones. This paper presents J48SS, a novel decision tree inducer capable of natively mixing static (i.e., numerical and categorical), sequential, and time series data for classification purposes. The novel algorithm is based on the popular C4.5 decision tree learner, and it relies on the concepts of frequent pattern extraction and time series shapelet generation. The algorithm is evaluated on a text classification task in a real business setting, as well as on a selection of public UCR time series datasets. Results show that it is capable of providing competitive classification performances, while generating highly interpretable models and effectively reducing the data preparation effort.
topic machine learning
decision trees
sequential data
pattern mining
time series classification
evolutionary algorithms
url http://www.mdpi.com/2073-431X/8/1/21
work_keys_str_mv AT andreabrunello j48ssanoveldecisiontreeapproachforthehandlingofsequentialandtimeseriesdata
AT enricomarzano j48ssanoveldecisiontreeapproachforthehandlingofsequentialandtimeseriesdata
AT angelomontanari j48ssanoveldecisiontreeapproachforthehandlingofsequentialandtimeseriesdata
AT guidosciavicco j48ssanoveldecisiontreeapproachforthehandlingofsequentialandtimeseriesdata
_version_ 1725349147842707456