Domain similarity metrics for predicting transfer learning performance

The lack of training data is a common problem in machine learning. One solution to thisproblem is to use transfer learning to remove or reduce the requirement of training data.Selecting datasets for transfer learning can be difficult however. As a possible solution, thisstudy proposes the domain sim...

Full description

Bibliographic Details
Main Author: Bäck, Jesper
Format: Others
Language:English
Published: Linköpings universitet, Interaktiva och kognitiva system 2019
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-153747
Description
Summary:The lack of training data is a common problem in machine learning. One solution to thisproblem is to use transfer learning to remove or reduce the requirement of training data.Selecting datasets for transfer learning can be difficult however. As a possible solution, thisstudy proposes the domain similarity metrics document vector distance (DVD) and termfrequency-inverse document frequency (TF-IDF) distance. DVD and TF-IDF could aid inselecting datasets for good transfer learning when there is no data from the target domain.The simple metric, shared vocabulary, is used as a baseline to check whether DVD or TF-IDF can indicate a better choice for a fine-tuning dataset. SQuAD is a popular questionanswering dataset which has been proven useful for pre-training models for transfer learn-ing. The results were therefore measured by pre-training a model on the SQuAD datasetand fine-tuning on a selection of different datasets. The proposed metrics were used tomeasure the similarity between the datasets to see whether there was a correlation betweentransfer learning effect and similarity. The results found a clear relation between a smalldistance according to the DVD metric and good transfer learning. This could prove usefulfor a target domain without training data, a model could be trained on a big dataset andfine-tuned on a small dataset that is very similar to the target domain. It was also foundthat even small amount of training data from the target domain can be used to fine-tune amodel pre-trained on another domain of data, achieving better performance compared toonly training on data from the target domain.