Summary: | Working with machine learning models has become an everyday task not only for software engineers, but for a much wider spectrum of researchers and professionals. Training such models involves finding the best learning methods and their best hyper-parameters for a specific task, keeping track of the achieved performance measures, comparing the results visually, etc. If we add feature extraction methods – that precede the learning phase and depend on many hyper-parameters themselves – into the mixture, like source code embedding that is quite common in the field of software analysis, the task cries out for supporting tools. We propose a framework called Deep-Water that works similarly to a configuration management tool in the area of software engineering. It supports defining arbitrary feature extraction and learning methods for an input dataset and helps in executing all the training tasks with different hyper-parameters in a distributed manner. The framework stores all circumstances, parameters and results of training, which can be filtered and visualized later. We successfully used the tool in several software analysis based prediction tasks, like vulnerability or bug prediction, but it is general enough to be applicable in other areas as well, e.g. NLP, image processing, or even other non-IT fields.
|