Predicting financial markets using text on the Web

It remains a question whether consistent inefficiencies can be discovered in financial markets in order to realise risk free excess profits above the market rate of return over long periods of time. Previous research in finance claims that markets are fundamentally unpredictable. Therefore, theories...

Full description

Bibliographic Details
Main Author: Bitvai, Zsolt
Other Authors: Cohn, Trevor
Published: University of Sheffield 2016
Subjects:
004
Online Access:https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.736520
id ndltd-bl.uk-oai-ethos.bl.uk-736520
record_format oai_dc
collection NDLTD
sources NDLTD
topic 004
spellingShingle 004
Bitvai, Zsolt
Predicting financial markets using text on the Web
description It remains a question whether consistent inefficiencies can be discovered in financial markets in order to realise risk free excess profits above the market rate of return over long periods of time. Previous research in finance claims that markets are fundamentally unpredictable. Therefore, theories such as the Efficient Market Hypothesis, Capital Asset Pricing Model and Modern Portfolio Theory aim to match the market return for optimal asset allocation. However, Behavioural Economics provides competing psychological explanations for seemingly irrational behaviour of market participants. This suggests that risk free excess profits are possible to achieve by exploiting these phenomena. Another limiting factor in this field is that the public availability of previous studies is restricted because once new inefficiencies are discovered, they are often arbitraged away, and may not be published to preserve competitive advantage. Therefore, there is a clear need for a comprehensive study to highlight alleged inefficiencies in markets and determine their sources. In this thesis, we aim to empirically document market inefficiencies, by proposing several novel Machine Learning and Natural Language Processing algorithms to intelligently forecast the evolution of markets. This data driven approach is in contrast to more theoretical frameworks found in traditional finance. Each of our models captures different characteristics of markets that are uncovered by constructing new modelling techniques suitable for the task. These tasks range from measuring the influence of related securities across time periods, to quantifying uncertainty in prices and underlying market dynamics, and to even simulating a real life human trader that aims to maximise their balance by assuming positions based on what economic and news information they have read. Our models aim to preserve generalizability across a multitude of markets by carefully trading off modelling complexity with simplicity and computational capacity. They treat market dynamics as arbitrarily complex systems and are able to explain key influential factors affecting market evolution. Our multi modal data sources include time, structured market and economic data as well as unstructured text related to news articles and social media posts. Our models are able to execute trades while adapting to the particular aspects of each market domain they operate on. We also explicitly model temporal dynamics via non-stationary processes, which is challenging due to the high levels of noise observed in markets. We further propose multiple ways to capture trading profit as the model objective directly, instead of employing simplified surrogate objectives. In the first part of the thesis, we detail a novel linear model where we directly optimize for trading profit in a realistic stock market trading scenario, and model different companies and trading periods with multi-task learning regularizers. In addition, we empirically validate technical analysis, which relies on human constructed heuristics for trading. Next, we outline several non-linear probabilistic models within the state-of-the-art Gaussian Process framework, where first we incorporate the above profit objective with a Gaussian approximation. Second, we show how we can model multiple types of input data with a novel kernel combination technique. Finally, we propose a way to exploit the posterior uncertainty of the model output in order to capture significant arbitrage opportunities. In the final part of the thesis, we describe several new deep learning model architectures inspired by recent advancements in representation learning. We demonstrate that obtaining progressively higher levels of representations for concepts in a text document considerably boosts predictive power, as compared to shallow architectures. Then, we show a method to identify influential phrases within a text document, which considerably aids the interpretability of the black box nature of Neural Networks. In the final experiment, we propose an end-to-end fully differentiable learning framework to simulate a real life human trader that reads raw market, economic, and news data every day and places limit order trades directly on the market, while optimising for the net worth of their own portfolio. The results of the experiments are demonstrated on a variety of developed markets in the United Kingdom and the United States over several decades of test data. These include three major stock markets as well as an emerging peer-to-peer lending market. Last, we demonstrate a way to forecast box office revenues from critic reviews for prediction markets. The contributions of this thesis is to show that several financial markets are predictable and risk free excess profits are possible to achieve via automated trading over long stretches of time. We also show that temporal, structured and textual characteristics of the data have high influence on predictability. Thereby, this work provides evidence against the Efficient Market Hypothesis.
author2 Cohn, Trevor
author_facet Cohn, Trevor
Bitvai, Zsolt
author Bitvai, Zsolt
author_sort Bitvai, Zsolt
title Predicting financial markets using text on the Web
title_short Predicting financial markets using text on the Web
title_full Predicting financial markets using text on the Web
title_fullStr Predicting financial markets using text on the Web
title_full_unstemmed Predicting financial markets using text on the Web
title_sort predicting financial markets using text on the web
publisher University of Sheffield
publishDate 2016
url https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.736520
work_keys_str_mv AT bitvaizsolt predictingfinancialmarketsusingtextontheweb
_version_ 1718995789573783552
spelling ndltd-bl.uk-oai-ethos.bl.uk-7365202019-03-05T15:40:39ZPredicting financial markets using text on the WebBitvai, ZsoltCohn, Trevor2016It remains a question whether consistent inefficiencies can be discovered in financial markets in order to realise risk free excess profits above the market rate of return over long periods of time. Previous research in finance claims that markets are fundamentally unpredictable. Therefore, theories such as the Efficient Market Hypothesis, Capital Asset Pricing Model and Modern Portfolio Theory aim to match the market return for optimal asset allocation. However, Behavioural Economics provides competing psychological explanations for seemingly irrational behaviour of market participants. This suggests that risk free excess profits are possible to achieve by exploiting these phenomena. Another limiting factor in this field is that the public availability of previous studies is restricted because once new inefficiencies are discovered, they are often arbitraged away, and may not be published to preserve competitive advantage. Therefore, there is a clear need for a comprehensive study to highlight alleged inefficiencies in markets and determine their sources. In this thesis, we aim to empirically document market inefficiencies, by proposing several novel Machine Learning and Natural Language Processing algorithms to intelligently forecast the evolution of markets. This data driven approach is in contrast to more theoretical frameworks found in traditional finance. Each of our models captures different characteristics of markets that are uncovered by constructing new modelling techniques suitable for the task. These tasks range from measuring the influence of related securities across time periods, to quantifying uncertainty in prices and underlying market dynamics, and to even simulating a real life human trader that aims to maximise their balance by assuming positions based on what economic and news information they have read. Our models aim to preserve generalizability across a multitude of markets by carefully trading off modelling complexity with simplicity and computational capacity. They treat market dynamics as arbitrarily complex systems and are able to explain key influential factors affecting market evolution. Our multi modal data sources include time, structured market and economic data as well as unstructured text related to news articles and social media posts. Our models are able to execute trades while adapting to the particular aspects of each market domain they operate on. We also explicitly model temporal dynamics via non-stationary processes, which is challenging due to the high levels of noise observed in markets. We further propose multiple ways to capture trading profit as the model objective directly, instead of employing simplified surrogate objectives. In the first part of the thesis, we detail a novel linear model where we directly optimize for trading profit in a realistic stock market trading scenario, and model different companies and trading periods with multi-task learning regularizers. In addition, we empirically validate technical analysis, which relies on human constructed heuristics for trading. Next, we outline several non-linear probabilistic models within the state-of-the-art Gaussian Process framework, where first we incorporate the above profit objective with a Gaussian approximation. Second, we show how we can model multiple types of input data with a novel kernel combination technique. Finally, we propose a way to exploit the posterior uncertainty of the model output in order to capture significant arbitrage opportunities. In the final part of the thesis, we describe several new deep learning model architectures inspired by recent advancements in representation learning. We demonstrate that obtaining progressively higher levels of representations for concepts in a text document considerably boosts predictive power, as compared to shallow architectures. Then, we show a method to identify influential phrases within a text document, which considerably aids the interpretability of the black box nature of Neural Networks. In the final experiment, we propose an end-to-end fully differentiable learning framework to simulate a real life human trader that reads raw market, economic, and news data every day and places limit order trades directly on the market, while optimising for the net worth of their own portfolio. The results of the experiments are demonstrated on a variety of developed markets in the United Kingdom and the United States over several decades of test data. These include three major stock markets as well as an emerging peer-to-peer lending market. Last, we demonstrate a way to forecast box office revenues from critic reviews for prediction markets. The contributions of this thesis is to show that several financial markets are predictable and risk free excess profits are possible to achieve via automated trading over long stretches of time. We also show that temporal, structured and textual characteristics of the data have high influence on predictability. Thereby, this work provides evidence against the Efficient Market Hypothesis.004University of Sheffieldhttps://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.736520http://etheses.whiterose.ac.uk/19505/Electronic Thesis or Dissertation