Data-Driven Combinatorial Optimization and Efficient Machine Learning Frameworks

Contemporary research in building optimization models and designing algorithms has become more data-centric and application-specific. While addressing three problems in the fields of combinatorial optimization and machine learning (ML), this work highlights the value of making data an important driv...

Full description

Bibliographic Details
Main Author: Sakr, Nourhan
Language:English
Published: 2019
Subjects:
Online Access:https://doi.org/10.7916/d8-z27f-xv84
id ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-d8-z27f-xv84
record_format oai_dc
spelling ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-d8-z27f-xv842019-10-29T03:28:49ZData-Driven Combinatorial Optimization and Efficient Machine Learning FrameworksSakr, Nourhan2019ThesesOperations researchComputer scienceMachine learningCombinatorial optimizationAlgorithms--DesignContemporary research in building optimization models and designing algorithms has become more data-centric and application-specific. While addressing three problems in the fields of combinatorial optimization and machine learning (ML), this work highlights the value of making data an important driver for building frameworks and designing solutions. In Chapter 2, we look into building frameworks for data-intensive applications, such as ML algorithms. We introduce a family of matrices, Structured Spinners, and use these to perform input data projections. This operation is commonly needed for a plethora of ML algorithms such as dimension reduction, cross-polytope LSH techniques or kernel approximation, yet comprises a major bottleneck in terms of time and space complexity. We design a generic framework that speeds up ML algorithms which perform such projections, with no or minor loss in accuracy. Our method applies for projections in both randomized and learned settings. We confirm our results via theoretical guarantees and numerical experiments. Moreover, we are the first to provide theoretical results for that type of work under adaptive settings. In Chapter 3, we rely on empirical analysis to design algorithms for an online packet scheduling problem with weighted packets and agreeable deadlines. We adopt the model presented in Jez et al. and use their Modified Greedy MG algorithm, which was the best-performing one in the literature, as our benchmark. The technique used for proving the competitive ratio for this benchmark is worst case analysis. Via extensive simulations, we shed light on practical bottlenecks and observe that these are not captured by the competitive analysis. We design three new algorithms, two of which make deterministic online decisions, while the third learns an adaptive one. When contrasted against the MG benchmark, our algorithms have significantly worse competitive ratios, yet noticeably better empirical performance. Our methodology is particularly useful for online algorithms and underlines the significance of leveraging data or simulations to guide algorithm design and testing. In Chapter 4, we study two different applications in cybersecurity: an adaptive ML problem and a game-theoretic model called PLADD. The common objective between both problems is to protect cybersystems against attacks by intelligent, adaptable and well-resourced adversaries while maintaining a cost budget. We introduce a novel combinatorial scheduling formulation to design useful defense strategies that meet this goal. Our work separates the formulation from the data-driven analysis and solution. As such, the scheduling formulation, which does not resemble any previously studied formulations from the scheduling literature, may be used as a new model by other researchers for different motivations. We keep the model generic enough for others to use, but design the algorithms that work best for our context. The formulation is inspired by stochastic programming and cast as a mixed integer program (MIP). We provide theoretical analysis, e.g. explore integrality gaps, exploit the combinatorial structure, prove NP-hardness, develop dynamic programming solutions for two-machine cases, then work towards data-driven heuristics and approximation algorithms using distribution assumptions and real data.Englishhttps://doi.org/10.7916/d8-z27f-xv84
collection NDLTD
language English
sources NDLTD
topic Operations research
Computer science
Machine learning
Combinatorial optimization
Algorithms--Design
spellingShingle Operations research
Computer science
Machine learning
Combinatorial optimization
Algorithms--Design
Sakr, Nourhan
Data-Driven Combinatorial Optimization and Efficient Machine Learning Frameworks
description Contemporary research in building optimization models and designing algorithms has become more data-centric and application-specific. While addressing three problems in the fields of combinatorial optimization and machine learning (ML), this work highlights the value of making data an important driver for building frameworks and designing solutions. In Chapter 2, we look into building frameworks for data-intensive applications, such as ML algorithms. We introduce a family of matrices, Structured Spinners, and use these to perform input data projections. This operation is commonly needed for a plethora of ML algorithms such as dimension reduction, cross-polytope LSH techniques or kernel approximation, yet comprises a major bottleneck in terms of time and space complexity. We design a generic framework that speeds up ML algorithms which perform such projections, with no or minor loss in accuracy. Our method applies for projections in both randomized and learned settings. We confirm our results via theoretical guarantees and numerical experiments. Moreover, we are the first to provide theoretical results for that type of work under adaptive settings. In Chapter 3, we rely on empirical analysis to design algorithms for an online packet scheduling problem with weighted packets and agreeable deadlines. We adopt the model presented in Jez et al. and use their Modified Greedy MG algorithm, which was the best-performing one in the literature, as our benchmark. The technique used for proving the competitive ratio for this benchmark is worst case analysis. Via extensive simulations, we shed light on practical bottlenecks and observe that these are not captured by the competitive analysis. We design three new algorithms, two of which make deterministic online decisions, while the third learns an adaptive one. When contrasted against the MG benchmark, our algorithms have significantly worse competitive ratios, yet noticeably better empirical performance. Our methodology is particularly useful for online algorithms and underlines the significance of leveraging data or simulations to guide algorithm design and testing. In Chapter 4, we study two different applications in cybersecurity: an adaptive ML problem and a game-theoretic model called PLADD. The common objective between both problems is to protect cybersystems against attacks by intelligent, adaptable and well-resourced adversaries while maintaining a cost budget. We introduce a novel combinatorial scheduling formulation to design useful defense strategies that meet this goal. Our work separates the formulation from the data-driven analysis and solution. As such, the scheduling formulation, which does not resemble any previously studied formulations from the scheduling literature, may be used as a new model by other researchers for different motivations. We keep the model generic enough for others to use, but design the algorithms that work best for our context. The formulation is inspired by stochastic programming and cast as a mixed integer program (MIP). We provide theoretical analysis, e.g. explore integrality gaps, exploit the combinatorial structure, prove NP-hardness, develop dynamic programming solutions for two-machine cases, then work towards data-driven heuristics and approximation algorithms using distribution assumptions and real data.
author Sakr, Nourhan
author_facet Sakr, Nourhan
author_sort Sakr, Nourhan
title Data-Driven Combinatorial Optimization and Efficient Machine Learning Frameworks
title_short Data-Driven Combinatorial Optimization and Efficient Machine Learning Frameworks
title_full Data-Driven Combinatorial Optimization and Efficient Machine Learning Frameworks
title_fullStr Data-Driven Combinatorial Optimization and Efficient Machine Learning Frameworks
title_full_unstemmed Data-Driven Combinatorial Optimization and Efficient Machine Learning Frameworks
title_sort data-driven combinatorial optimization and efficient machine learning frameworks
publishDate 2019
url https://doi.org/10.7916/d8-z27f-xv84
work_keys_str_mv AT sakrnourhan datadrivencombinatorialoptimizationandefficientmachinelearningframeworks
_version_ 1719280218238091264