An algorithm for detecting leaks of insider information of financial markets in investment consulting

The paper focuses on revealing insider information leaks of financial markets during investment consulting. An original dataset was created, containing the records of the conversations between consultants and clients, presented in the form of dialogs in text format. The applicability of machine lear...

Full description

Bibliographic Details
Main Authors: Alisa A. Vorobeva, Vladislav V. Gerasimov, Yulia V. Li
Format: Article
Language:English
Published: Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University) 2021-06-01
Series:Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
Subjects:
Online Access:https://ntv.ifmo.ru/file/article/20507.pdf
Description
Summary:The paper focuses on revealing insider information leaks of financial markets during investment consulting. An original dataset was created, containing the records of the conversations between consultants and clients, presented in the form of dialogs in text format. The applicability of machine learning methods for automating the detection of leaks arising in a conversation between a consultant and a client has been studied. The authors examined the applicability of the following supervised machine learning methods for constructing and training a classifier: probabilistic (Naïve Bayes classifier), metric (k-nearest neighbors algorithm), logical (random forest), linear (support vector machine), and methods based on artificial neural networks. The paper considers various approaches to the construction of a natural language text model, such as tokenization (bag of words, word n-grams: bigrams and trigrams) and vectorization (one-hot encoding). The proposed algorithm for detecting financial markets insider information leaks is based on the use of support vector machine (SVM) and tokenization by bigrams. The obtained results demonstrate that SVM and bigram tokenization provide the highest leakage detection accuracy. The research results can be used in cybersecurity tools development, as well as for the further elaboration of natural language processing methods dealing with information security problems.
ISSN:2226-1494
2500-0373