Learning Hierarchical Interactions at Scale: A Convex Optimization Approach

In many learning settings, it is beneficial toaugment the main features with pairwise in-teractions. Such interaction models can beoften enhanced by performing variable selec-tion under the so-calledstrong hierarchycon-straint: an interaction is non-zero only if itsassociated main features are non-z...

Full description

Bibliographic Details
Main Authors: Hazimeh, Hussein (Author), Mazumder, Rahul (Author)
Other Authors: Sloan School of Management (Contributor), Massachusetts Institute of Technology. Operations Research Center (Contributor)
Format: Article
Language:English
Published: International Machine Learning Society, 2021-04-06T13:49:15Z.
Subjects:
Online Access:Get fulltext
LEADER 02016 am a22002053u 4500
001 130384
042 |a dc 
100 1 0 |a Hazimeh, Hussein  |e author 
100 1 0 |a Sloan School of Management  |e contributor 
100 1 0 |a Massachusetts Institute of Technology. Operations Research Center  |e contributor 
700 1 0 |a Mazumder, Rahul  |e author 
245 0 0 |a Learning Hierarchical Interactions at Scale: A Convex Optimization Approach 
260 |b International Machine Learning Society,   |c 2021-04-06T13:49:15Z. 
856 |z Get fulltext  |u https://hdl.handle.net/1721.1/130384 
520 |a In many learning settings, it is beneficial toaugment the main features with pairwise in-teractions. Such interaction models can beoften enhanced by performing variable selec-tion under the so-calledstrong hierarchycon-straint: an interaction is non-zero only if itsassociated main features are non-zero. Ex-isting convex optimization-based algorithmsface difficulties in handling problems wherethe number of main featuresp∼103(withtotal number of features∼p2). In this pa-per, we study a convex relaxation which en-forces strong hierarchy and develop a highlyscalable algorithm based on proximal gradi-ent descent. We introduce novel screeningrules that allow for solving the complicatedproximal problem in parallel. In addition,we introduce a specialized active-set strategywith gradient screening for avoiding costlygradient computations. The framework can handle problems having dense design matri-ces, withp= 50,000 (∼109interactions)-instances that are much larger than state ofthe art. Experiments on real and syntheticdata suggest that our toolkithierScaleout-performs the state of the art in terms of pre-diction and variable selection and can achieveover a 4900x speed-up. 
520 |a United States. Office of Naval Research (Grants ONR-N000141512342, ONR-N000141812298) 
520 |a National Science Foundation (U.S.) (Grant NSF-IIS1718258) 
546 |a en 
655 7 |a Article 
773 |t Proceedings of Machine Learning Research