Data-Driven Combinatorial Optimization and Efficient Machine Learning Frameworks

Contemporary research in building optimization models and designing algorithms has become more data-centric and application-specific. While addressing three problems in the fields of combinatorial optimization and machine learning (ML), this work highlights the value of making data an important driv...

Full description

Bibliographic Details
Main Author:	Sakr, Nourhan
Language:	English
Published:	2019
Subjects:	Operations research Computer science Machine learning Combinatorial optimization Algorithms > Design
Online Access:	https://doi.org/10.7916/d8-z27f-xv84

id	ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-d8-z27f-xv84
record_format	oai_dc
spelling	ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-d8-z27f-xv842019-10-29T03:28:49ZData-Driven Combinatorial Optimization and Efficient Machine Learning FrameworksSakr, Nourhan2019ThesesOperations researchComputer scienceMachine learningCombinatorial optimizationAlgorithms--DesignContemporary research in building optimization models and designing algorithms has become more data-centric and application-specific. While addressing three problems in the fields of combinatorial optimization and machine learning (ML), this work highlights the value of making data an important driver for building frameworks and designing solutions. In Chapter 2, we look into building frameworks for data-intensive applications, such as ML algorithms. We introduce a family of matrices, Structured Spinners, and use these to perform input data projections. This operation is commonly needed for a plethora of ML algorithms such as dimension reduction, cross-polytope LSH techniques or kernel approximation, yet comprises a major bottleneck in terms of time and space complexity. We design a generic framework that speeds up ML algorithms which perform such projections, with no or minor loss in accuracy. Our method applies for projections in both randomized and learned settings. We confirm our results via theoretical guarantees and numerical experiments. Moreover, we are the first to provide theoretical results for that type of work under adaptive settings. In Chapter 3, we rely on empirical analysis to design algorithms for an online packet scheduling problem with weighted packets and agreeable deadlines. We adopt the model presented in Jez et al. and use their Modified Greedy MG algorithm, which was the best-performing one in the literature, as our benchmark. The technique used for proving the competitive ratio for this benchmark is worst case analysis. Via extensive simulations, we shed light on practical bottlenecks and observe that these are not captured by the competitive analysis. We design three new algorithms, two of which make deterministic online decisions, while the third learns an adaptive one. When contrasted against the MG benchmark, our algorithms have significantly worse competitive ratios, yet noticeably better empirical performance. Our methodology is particularly useful for online algorithms and underlines the significance of leveraging data or simulations to guide algorithm design and testing. In Chapter 4, we study two different applications in cybersecurity: an adaptive ML problem and a game-theoretic model called PLADD. The common objective between both problems is to protect cybersystems against attacks by intelligent, adaptable and well-resourced adversaries while maintaining a cost budget. We introduce a novel combinatorial scheduling formulation to design useful defense strategies that meet this goal. Our work separates the formulation from the data-driven analysis and solution. As such, the scheduling formulation, which does not resemble any previously studied formulations from the scheduling literature, may be used as a new model by other researchers for different motivations. We keep the model generic enough for others to use, but design the algorithms that work best for our context. The formulation is inspired by stochastic programming and cast as a mixed integer program (MIP). We provide theoretical analysis, e.g. explore integrality gaps, exploit the combinatorial structure, prove NP-hardness, develop dynamic programming solutions for two-machine cases, then work towards data-driven heuristics and approximation algorithms using distribution assumptions and real data.Englishhttps://doi.org/10.7916/d8-z27f-xv84
collection	NDLTD
language	English
sources	NDLTD
topic	Operations research Computer science Machine learning Combinatorial optimization Algorithms--Design
spellingShingle	Operations research Computer science Machine learning Combinatorial optimization Algorithms--Design Sakr, Nourhan Data-Driven Combinatorial Optimization and Efficient Machine Learning Frameworks
description	Contemporary research in building optimization models and designing algorithms has become more data-centric and application-specific. While addressing three problems in the fields of combinatorial optimization and machine learning (ML), this work highlights the value of making data an important driver for building frameworks and designing solutions. In Chapter 2, we look into building frameworks for data-intensive applications, such as ML algorithms. We introduce a family of matrices, Structured Spinners, and use these to perform input data projections. This operation is commonly needed for a plethora of ML algorithms such as dimension reduction, cross-polytope LSH techniques or kernel approximation, yet comprises a major bottleneck in terms of time and space complexity. We design a generic framework that speeds up ML algorithms which perform such projections, with no or minor loss in accuracy. Our method applies for projections in both randomized and learned settings. We confirm our results via theoretical guarantees and numerical experiments. Moreover, we are the first to provide theoretical results for that type of work under adaptive settings. In Chapter 3, we rely on empirical analysis to design algorithms for an online packet scheduling problem with weighted packets and agreeable deadlines. We adopt the model presented in Jez et al. and use their Modified Greedy MG algorithm, which was the best-performing one in the literature, as our benchmark. The technique used for proving the competitive ratio for this benchmark is worst case analysis. Via extensive simulations, we shed light on practical bottlenecks and observe that these are not captured by the competitive analysis. We design three new algorithms, two of which make deterministic online decisions, while the third learns an adaptive one. When contrasted against the MG benchmark, our algorithms have significantly worse competitive ratios, yet noticeably better empirical performance. Our methodology is particularly useful for online algorithms and underlines the significance of leveraging data or simulations to guide algorithm design and testing. In Chapter 4, we study two different applications in cybersecurity: an adaptive ML problem and a game-theoretic model called PLADD. The common objective between both problems is to protect cybersystems against attacks by intelligent, adaptable and well-resourced adversaries while maintaining a cost budget. We introduce a novel combinatorial scheduling formulation to design useful defense strategies that meet this goal. Our work separates the formulation from the data-driven analysis and solution. As such, the scheduling formulation, which does not resemble any previously studied formulations from the scheduling literature, may be used as a new model by other researchers for different motivations. We keep the model generic enough for others to use, but design the algorithms that work best for our context. The formulation is inspired by stochastic programming and cast as a mixed integer program (MIP). We provide theoretical analysis, e.g. explore integrality gaps, exploit the combinatorial structure, prove NP-hardness, develop dynamic programming solutions for two-machine cases, then work towards data-driven heuristics and approximation algorithms using distribution assumptions and real data.
author	Sakr, Nourhan
author_facet	Sakr, Nourhan
author_sort	Sakr, Nourhan
title	Data-Driven Combinatorial Optimization and Efficient Machine Learning Frameworks
title_short	Data-Driven Combinatorial Optimization and Efficient Machine Learning Frameworks
title_full	Data-Driven Combinatorial Optimization and Efficient Machine Learning Frameworks
title_fullStr	Data-Driven Combinatorial Optimization and Efficient Machine Learning Frameworks
title_full_unstemmed	Data-Driven Combinatorial Optimization and Efficient Machine Learning Frameworks
title_sort	data-driven combinatorial optimization and efficient machine learning frameworks
publishDate	2019
url	https://doi.org/10.7916/d8-z27f-xv84
work_keys_str_mv	AT sakrnourhan datadrivencombinatorialoptimizationandefficientmachinelearningframeworks
_version_	1719280218238091264

Data-Driven Combinatorial Optimization and Efficient Machine Learning Frameworks

Similar Items