Summary: | Recent advances in science and technology promote the generation of a huge amount of data from various sources including scientific experiments, social surveys and practical observations. The availability of powerful computer hardware and software offers easier ways to store datasets. However, more efficient and accurate methodologies are required to analyse datasets and extract useful information from them. This work aims at applying mathematical programming and optimisation methodologies to analyse different forms of datasets. The research focuses on three areas including data classification, community structure identification of complex networks and DNA motif discovery. Firstly, a general data classification problem is investigated. A mixed integer optimisation-based approach is proposed to reveal the patterns hidden behind training data samples using a hyper-box representation. An efficient solution methodology is then developed to extend the applicability of hyper-box classifiers to datasets with many training samples and complex structures. Secondly, the network community structure identification problem is addressed. The proposed mathematical model finds optimal modular structures of complex networks through the maximisation of network modularity metric. Communities of medium/large networks are identified through a two-stage solution algorithm developed in this thesis. Finally, the third part presents an optimisation-based framework to extract DNA motifs and consensus sequences. The problem is formulated as a mixed integer linear programming model and an iterative solution procedure is developed to identify multiple motifs in each DNA sequence. The flexibility of the proposed motif finding approach is then demonstrated to incorporate other biological features.
|