Rank One Convexification

import%20marimo%0A%0A__generated_with%20%3D%20%220.15.2%22%0Aapp%20%3D%20marimo.App(width%3D%22medium%22)%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_()%3A%0A%20%20%20%20import%20marimo%20as%20mo%0A%20%20%20%20import%20numpy%20as%20np%0A%20%20%20%20from%20mosek.fusion%20import%20Model%2C%20Domain%2C%20Expr%2C%20ObjectiveSense%2C%20Matrix%2C%20Var%0A%20%20%20%20import%20mosek.fusion.pythonic%0A%20%20%20%20import%20sys%0A%20%20%20%20import%20requests%2C%20zipfile%2C%20io%2C%20os%2C%20glob%0A%20%20%20%20from%20sklearn.preprocessing%20import%20StandardScaler%0A%20%20%20%20import%20pandas%20as%20pd%0A%20%20%20%20import%20time%0A%20%20%20%20return%20(%0A%20%20%20%20%20%20%20%20Domain%2C%0A%20%20%20%20%20%20%20%20Expr%2C%0A%20%20%20%20%20%20%20%20Model%2C%0A%20%20%20%20%20%20%20%20ObjectiveSense%2C%0A%20%20%20%20%20%20%20%20StandardScaler%2C%0A%20%20%20%20%20%20%20%20glob%2C%0A%20%20%20%20%20%20%20%20io%2C%0A%20%20%20%20%20%20%20%20mo%2C%0A%20%20%20%20%20%20%20%20np%2C%0A%20%20%20%20%20%20%20%20os%2C%0A%20%20%20%20%20%20%20%20pd%2C%0A%20%20%20%20%20%20%20%20requests%2C%0A%20%20%20%20%20%20%20%20time%2C%0A%20%20%20%20%20%20%20%20zipfile%2C%0A%20%20%20%20)%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%0A%20%20%20%20%20%20%20%20r%22%22%22%0A%20%20%20%20%23%20Rank-One%20Sparse%20Regression%20Model%0A%0A%20%20%20%20This%20notebook%20is%20based%20on%20%5BRank-one%20Convexification%20for%20Sparse%20Regression%5D(https%3A%2F%2Farxiv.org%2Fabs%2F1901.10334)%20by%20Atamt%C3%BCrk%20and%20G%C3%B3mez.%20%0A%0A%20%20%20%20In%20the%20paper%20a%20model%20is%20designed%20to%20solve%20regression%20problems%20when%20the%20number%20of%20predictors%20is%20large%20and%20we%20expect%20only%20a%20small%20subset%20of%20them%20to%20be%20truly%20relevant.%20This%20model%20combines%20**quadratic%20loss%20minimization**%20with%20a%20**sparsity-inducing%20penalty**%20and%20a%20**conic%20reformulation**%20that%20makes%20the%20optimization%20problem%20tractable.%20Some%20properties%20this%20model%20offers%20to%20users%3A%0A%0A%20%20%20%20-%20**Feature%20selection**%3A%20It%20automatically%20selects%20only%20the%20most%20relevant%20predictors.%20%20%0A%20%20%20%20-%20**Interpretability**%3A%20Produces%20simpler%2C%20more%20interpretable%20models.%20%20%0A%20%20%20%20-%20**Regularization**%3A%20Prevents%20overfitting%20by%20discouraging%20unnecessarily%20large%20coefficients.%20%20%0A%20%20%20%20-%20**High-dimensional%20settings**%3A%20Particularly%20useful%20when%20the%20number%20of%20predictors%20%5C(p%5C)%20is%20large%20compared%20to%20the%20number%20of%20samples%20%5C(n%5C).%20%20%0A%0A%20%20%20%20In%20short%2C%20this%20model%20balances%20**accuracy**%20and%20**sparsity**%2C%20giving%20us%20predictive%20models%20that%20are%20both%20powerful%20and%20easy%20to%20interpret.%0A%20%20%20%20%22%22%22%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%0A%20%20%20%20%20%20%20%20r%22%22%22%0A%20%20%20%20We%20consider%20the%20following%20optimization%20problem%3A%0A%0A%20%20%20%20%5C%5B%0A%20%20%20%20%5Cmin_%7B%5Cbeta%20%5Cin%20%5Cmathbb%7BR%7D%5Ep%7D%20%5C%3B%5C%3B%20%0A%20%20%20%20%5C%7Cy%20-%20X%5Cbeta%5C%7C_2%5E2%20%0A%20%20%20%20%5C%3B%2B%5C%3B%20%5Clambda%20%5C%7C%5Cbeta%5C%7C_2%5E2%20%0A%20%20%20%20%5C%3B%2B%5C%3B%20%5Cmu%20%5C%7C%5Cbeta%5C%7C_1%20%0A%20%20%20%20%5C%5D%0A%0A%20%20%20%20%5C%5B%0A%20%20%20%20%5Cquad%20%5Ctext%7Bs.t.%7D%20%5Cquad%20%5C%7C%5Cbeta%5C%7C_0%20%5Cleq%20k%0A%20%20%20%20%5C%5D%0A%0A%20%20%20%20where%3A%20%0A%0A%20%20%20%20-%20%5C(y%20%5Cin%20%5Cmathbb%7BR%7D%5En%5C)%20%3A%20response%20vector%20(observed%20outcomes).%20%20%0A%20%20%20%20-%20%5C(X%20%5Cin%20%5Cmathbb%7BR%7D%5E%7Bn%20%5Ctimes%20p%7D%5C)%20%3A%20design%20(feature)%20matrix%20with%20%5C(n%5C)%20samples%20and%20%5C(p%5C)%20predictors.%20%20%20%0A%20%20%20%20-%20%5C(%5Clambda%20%5Cgeq%200%5C)%20%3A%20regularization%20parameter%20controlling%20the%20ridge%20penalty.%20%20%0A%20%20%20%20-%20%5C(%5Cmu%20%5Cgeq%200%5C)%20%3A%20regularization%20parameter%20controlling%20the%20lasso%20penalty.%20%20%0A%20%20%20%20-%20%5C(k%5C)%20%3A%20sparsity%20level%2C%20i.e.%2C%20maximum%20number%20of%20predictors%20allowed%20to%20be%20nonzero.%0A%20%20%20%20%22%22%22%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%0A%20%20%20%20%20%20%20%20r%22%22%22%0A%20%20%20%20This%20optimization%20problem%20extends%20least%20squares%20regression%20by%20adding%20two%20regularization%20terms%20and%20a%20sparsity%20constraint%3A%0A%0A%20%20%20%20-%20Squared%20error%20term%20%24%5C%7Cy%20-%20X%5Cbeta%5C%7C_2%5E2%24%3A%20Penalizes%20discrepancies%20between%20predictions%20%24X%5Cbeta%24%20and%20the%20observed%20responses%20%24y%24.%0A%0A%20%20%20%20-%20Ridge%20penalty%20%24%5Clambda%20%5C%7C%5Cbeta%5C%7C_2%5E2%24%3A%20Shrinks%20coefficient%20values%20to%20improve%20numerical%20stability%20and%20reduce%20overfitting%20(especially%20when%20predictors%20are%20correlated%20or%20%24p%24%20is%20large).%0A%0A%20%20%20%20-%20Lasso%20penalty%20%24%5Cmu%20%5C%7C%5Cbeta%5C%7C_1%24%3A%20Encourages%20sparsity%20by%20shrinking%20some%20coefficients%20to%20zero.%0A%0A%20%20%20%20-%20Cardinality%20constraint%20%24%5C%7C%5Cbeta%5C%7C_0%20%5Cleq%20k%24%3A%20Limits%20the%20number%20of%20nonzero%20entries%20in%20%24%5Cbeta%24%20to%20at%20most%20%24k%24%2C%20directly%20enforcing%20sparsity%20and%20interpretability.%0A%0A%0A%20%20%20%20In%20the%20model%20the%20cardinality%20constraint%20is%20usually%20problematic%20in%20application.%20This%20is%20a%20hard%20sparsity%20constraint%20and%20problems%20involving%20this%20count%20constraint%20are%20usually%20*NP-Hard*.%20Therefore%20instead%20of%20solving%20this%20model%2C%20the%20following%20transformation%20is%20applied.%20%0A%0A%20%20%20%20In%20the%20model%20the%20cardinality%20constraint%20is%20usually%20problematic%20in%20practice.%20This%20is%20a%20hard%20sparsity%20constraint%2C%20which%20forces%20the%20solution%20to%20include%20at%20most%20%F0%9D%91%98%20nonzero%20coefficients.%20However%2C%20optimization%20problems%20with%20such%20a%20count%20constraint%20are%20generally%20NP-hard%2C%20since%20solving%20them%20requires%20searching%20through%20all%20possible%20subsets%20of%20predictors.%20Therefore%2C%20instead%20of%20solving%20this%20formulation%20directly%2C%20the%20model%20is%20often%20transformed%20or%20relaxed%20into%20an%20alternative%20form%20that%20can%20be%20solved%20efficiently.%0A%20%20%20%20%22%22%22%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(r%22%22%22%23%23%20Alternative%20Reformulation%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%0A%20%20%20%20%20%20%20%20r%22%22%22%0A%20%20%20%20%24%24%0A%20%20%20%20%5Cmin_%7B%5Cbeta%20%5Cin%20%5Cmathbb%7BR%7D%5Ep%7D%5C%3B%0A%20%20%20%20%5C%7Cy%20-%20X%5Cbeta%5C%7C_2%5E2%20%5C%3B%2B%5C%3B%20%5Clambda%20%5C%7C%5Cbeta%5C%7C_2%5E2%20%5C%3B%2B%5C%3B%20%5Cmu%20%5C%7C%5Cbeta%5C%7C_1%0A%20%20%20%20%5Cquad%20%5Ctext%7Bs.t.%7D%5Cquad%20%5C%7C%5Cbeta%5C%7C_0%20%5Cle%20k%20.%0A%20%20%20%20%24%24%0A%20%20%20%20%22%22%22%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%0A%20%20%20%20%20%20%20%20r%22%22%22%0A%20%20%20%20As%20the%20first%20step%2C%20expand%20the%20first%20two%20terms%20in%20the%20objective%20function.%20Through%20expansion%20one%20can%20see%20that%20we%20easily%20reach%20the%20general%20quadratic%20form.%20%0A%0A%20%20%20%20%24%24%0A%20%20%20%20%5C%7Cy%20-%20X%5Cbeta%5C%7C_2%5E2%0A%20%20%20%20%3D%20y%5E%5Ctop%20y%20-%202%5C%2Cy%5E%5Ctop%20X%5Cbeta%20%2B%20%5Cbeta%5E%5Ctop%20X%5E%5Ctop%20X%20%5Cbeta%20.%0A%20%20%20%20%24%24%0A%0A%20%20%20%20Then%20the%20partial%20objective%20becomes%3A%20%0A%0A%20%20%20%20%24%24%0A%20%20%20%20%5C%7Cy%20-%20X%5Cbeta%5C%7C_2%5E2%20%2B%20%5Clambda%5C%7C%5Cbeta%5C%7C_2%5E2%0A%20%20%20%20%3D%20y%5E%5Ctop%20y%20-%202%5C%2Cy%5E%5Ctop%20X%5Cbeta%20%2B%20%5Cbeta%5E%5Ctop%20(X%5E%5Ctop%20X%20%2B%20%5Clambda%20I)%5Cbeta%20.%0A%20%20%20%20%24%24%0A%20%20%20%20%22%22%22%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%0A%20%20%20%20%20%20%20%20r%22%22%22%0A%20%20%20%20The%20%24%5Cell_1%24-norm%20%24%5C%7C%5Cbeta%5C%7C_1%20%3D%20%5Csum_%7Bi%3D1%7D%5Ep%20%7C%5Cbeta_i%7C%24%20is%20replaced%20by%20auxiliary%20variables%20%24u_i%20%5Cge%200%24%20with%20constraints%3A%20%0A%0A%20%20%20%20%5C%5B%0A%20%20%20%20-u_i%20%5Cle%20%5Cbeta_i%20%5Cle%20u_i%20.%0A%20%20%20%20%5C%5D%0A%0A%20%20%20%20These%20linear%20inequalities%20ensure%20%24u_i%20%5Cge%20%7C%5Cbeta_i%7C%24.%20Since%20the%20objective%20%0A%20%20%20%20minimizes%20%24%5Csum_%7Bi%3D1%7D%5Ep%20u_i%24%2C%20at%20optimality%20we%20have%20%24u_i%20%3D%20%7C%5Cbeta_i%7C%24.%20%0A%20%20%20%20Thus%2C%20%0A%0A%20%20%20%20%5C%5B%0A%20%20%20%20%5C%7C%5Cbeta%5C%7C_1%20%3D%20%5Csum_%7Bi%3D1%7D%5Ep%20u_i%0A%20%20%20%20%5C%5D%0A%0A%20%20%20%20is%20exactly%20represented%20in%20linear%20form.%0A%20%20%20%20%22%22%22%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%0A%20%20%20%20%20%20%20%20r%22%22%22%0A%20%20%20%20To%20model%20the%20cardinality%20constraint%20%24%5C%7C%5Cbeta%5C%7C_0%20%5Cle%20k%24%2C%20we%20introduce%20binary%20%0A%20%20%20%20indicator%20variables%20%24z_i%20%5Cin%20%5C%7B0%2C1%5C%7D%24%20for%20each%20coefficient.%20Formally%2C%20%0A%20%20%20%20%24z_i%20%3D%201%24%20if%20%24%5Cbeta_i%24%20is%20allowed%20to%20be%20nonzero%2C%20and%20%24z_i%20%3D%200%24%20if%20%0A%20%20%20%20%24%5Cbeta_i%24%20must%20be%20zero.%20Thus%2C%20every%20nonzero%20%24%5Cbeta_i%24%20is%20*counted*%20%0A%20%20%20%20through%20its%20corresponding%20indicator%20%24z_i%24.%0A%0A%20%20%20%20The%20constraint%20below%20ensures%20that%20at%20most%20%24k%24%20coefficients%20can%20be%20nonzero.%20%0A%0A%20%20%20%20%5C%5B%0A%20%20%20%20%5Csum_%7Bi%3D1%7D%5Ep%20z_i%20%5Cle%20k%0A%20%20%20%20%5C%5D%0A%0A%0A%20%20%20%20To%20enforce%20the%20connection%20between%20%24%5Cbeta_i%24%20and%20its%20indicator%20%24z_i%24%2C%20one%20can%20impose%3A%20%0A%0A%20%20%20%20%5C%5B%5Cbeta_i%20(1%20-%20z_i)%20%3D%200%20%5Cquad%20%5Ctext%7Bfor%20%7D%20i%3D1%2C%5Cdots%2Cp%5C%5D%0A%0A%20%20%20%20This%20guarantees%20the%20*either%E2%80%93or*%20relationship%3A%20if%20%24z_i%3D0%24%2C%20then%20%24%5Cbeta_i%20%3D%200%24%2C%20while%20if%20%24z_i%3D1%24%2C%20then%20%24%5Cbeta_i%24%20is%20free%20to%20take%20a%20%0A%20%20%20%20nonzero%20value.%20In%20this%20way%2C%20the%20challenge%20of%20the%20count%20operation%20can%20be%20worked%20around.%0A%20%20%20%20%22%22%22%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(r%22%22%22%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%0A%20%20%20%20%20%20%20%20r%22%22%22%0A%20%20%20%20At%20this%20point%2C%20to%20model%20the%20constraint%20%20%0A%0A%20%20%20%20%24%24%0A%20%20%20%20%5Cbeta_i%20(1%20-%20z_i)%20%3D%200%2C%20%5Cquad%20i%3D1%2C%5Cdots%2Cp%2C%0A%20%20%20%20%24%24%20%20%0A%0A%20%20%20%20the%20paper%20suggests%20introducing%20a%20*big-%24M%24*%20formulation.%20In%20that%20approach%2C%20large%20constants%20are%20used%20to%20relax%20the%20constraint%20and%20enforce%20the%20link%20between%20%24%5Cbeta_i%24%20and%20%24z_i%24.%20%20%0A%0A%20%20%20%20After%20the%20big-%24M%24%20value%20is%20fixed%2C%20the%20calculation%20becomes%20relatively%20easy%20to%20implement.%20However%2C%20**determining%20a%20suitable%20big-%24M%24%20value%20is%20itself%20a%20challenge**.%20If%20%24M%24%20is%20chosen%20too%20small%2C%20the%20model%20may%20cut%20off%20feasible%20solutions.%20If%20%24M%24%20is%20chosen%20too%20large%2C%20the%20relaxation%20becomes%20weak%20and%20the%20solver%20may%20perform%20poorly.%20Therefore%2C%20based%20on%20the%20specific%20problem%20and%20the%20data%20at%20hand%2C%20an%20**appropriate%20big-%24M%24%20value%20must%20be%20determined%20each%20time**.%20%20%0A%0A%20%20%20%20There%20is%20no%20universal%20choice%3A%20%20%0A%20%20%20%20-%20The%20bound%20should%20be%20large%20enough%20so%20that%20all%20feasible%20%24%5Cbeta_i%24%20can%20be%20represented.%20%20%0A%20%20%20%20-%20But%20it%20should%20not%20be%20excessively%20large%2C%20since%20that%20would%20weaken%20the%20relaxation%20and%20slow%20down%20the%20solver.%20%20%0A%0A%0A%20%20%20%20By%20contrast%2C%20**MOSEK**%20has%20native%20support%20for%20*disjunctive%20constraints*.%20This%20means%20we%20do%20not%20need%20to%20approximate%20with%20big-%24M%24.%20Instead%2C%20we%20can%20directly%20model%20the%20logical%20condition%3A%20%20%0A%0A%20%20%20%20-%20Either%20%24z_i%20%3D%201%24%20(the%20variable%20%24%5Cbeta_i%24%20is%20allowed%20to%20be%20nonzero)%2C%20%20%0A%20%20%20%20-%20Or%20%24%5Cbeta_i%20%3D%200%24%20(the%20variable%20is%20forced%20to%20zero%20when%20%24z_i%20%3D%200%24).%20%20%0A%0A%20%20%20%20This%20way%2C%20the%20constraint%20is%20expressed%20exactly%20as%20an%20**either%E2%80%93or**%20condition%2C%20and%20MOSEK%20will%20internally%20handle%20the%20disjunction%3A%0A%0A%20%20%20%20%5C%5B%0A%20%20%20%20%5Cbeta_i%20%3D%200%20%5C%3B%5C%3B%5C%3B%5C%3B%20%5Clor%20%5C%3B%5C%3B%5C%3B%5C%3B%20z_i%20%3D%201%2C%20%0A%20%20%20%20%5Cquad%20i%20%3D%201%2C%5Cdots%2Cp%20.%0A%20%20%20%20%5C%5D%0A%20%20%20%20%22%22%22%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%0A%20%20%20%20%20%20%20%20r%22%22%22%0A%20%20%20%20Then%20the%20final%20model%20becomes%3A%0A%0A%20%20%20%20%24%24%0A%20%20%20%20%5Cbegin%7Baligned%7D%0A%20%20%20%20%5Cmin_%7B%5Cbeta%2Cu%2Cz%7D%5Cquad%0A%20%20%20%20%26%20-%5C%2C2%5C%2Cy%5E%5Ctop%20X%5Cbeta%20%5C%3B%2B%5C%3B%20%5Cbeta%5E%5Ctop%20(X%5E%5Ctop%20X%20%2B%20%5Clambda%20I)%5Cbeta%20%5C%3B%2B%5C%3B%20%5Cmu%20%5Csum_%7Bi%3D1%7D%5Ep%20u_i%20%5C%5C%0A%20%20%20%20%5Ctext%7Bs.t.%7D%5Cquad%0A%20%20%20%20%26%20-u_i%20%5Cle%20%5Cbeta_i%20%5Cle%20u_i%2C%5C%3B%5C%3B%20u_i%20%5Cge%200%2C%20%5Cquad%20i%3D1%2C%5Cdots%2Cp%2C%20%5C%5C%0A%20%20%20%20%26%20%5Cbeta_i%20%3D%200%20%5C%3B%5C%3B%5C%3B%5C%3B%20%5Clor%20%5C%3B%5C%3B%5C%3B%5C%3B%20z_i%20%3D%201%2C%20%0A%20%20%20%20%5Cquad%20i%20%3D%201%2C%5Cdots%2Cp%20%5C%5C%0A%20%20%20%20%26%20%5Csum_%7Bi%3D1%7D%5Ep%20z_i%20%5Cle%20k%2C%20%5Cquad%20z_i%20%5Cin%20%5C%7B0%2C1%5C%7D%2C%20%5C%3B%5C%3B%20i%3D1%2C%5Cdots%2Cp%2C%20%5C%5C%0A%20%20%20%20%26%20%5Cbeta%20%5Cin%20%5Cmathbb%7BR%7D%5Ep%2C%5C%3B%5C%3B%20z%20%5Cin%20%5C%7B0%2C1%5C%7D%5Ep%2C%5C%3B%5C%3B%20u%20%5Cin%20%5Cmathbb%7BR%7D_%2B%5Ep.%0A%20%20%20%20%5Cend%7Baligned%7D%0A%20%20%20%20%24%24%0A%20%20%20%20%22%22%22%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(r%22%22%22%23%23%20Model%20Implementation%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%0A%20%20%20%20%20%20%20%20r%22%22%22%0A%20%20%20%20The%20objective%20we%20are%20minimizing%20has%20three%20types%20of%20terms%3A%0A%0A%20%20%20%20%5C%5B%0A%20%20%20%20-2y%5E%5Ctop%20X%20%5Cbeta%20%5C%3B%2B%5C%3B%20%5Cbeta%5E%5Ctop%20(X%5E%5Ctop%20X%20%2B%20%5Clambda%20I)%5Cbeta%20%5C%3B%2B%5C%3B%20%5Cmu%20%5Csum_i%20u_i%20%5C%3B%2B%5C%3B%20%5Ctext%7Bconst.%7D%0A%20%20%20%20%5C%5D%0A%0A%20%20%20%20---%0A%0A%20%20%20%20%23%23%201.%20Linear%20terms%3A%0A%0A%20%20%20%20-%20The%20scalar%20term%20%5C(-2y%5E%5Ctop%20X%5Cbeta%5C)%20is%20a%20simple%20inner%20product%2C%20i.e.%20linear%20in%20the%20decision%20variables%20%5C(%5Cbeta%5C).%0A%20%20%20%20-%20The%20penalty%20%5C(%5Cmu%20%5Csum_i%20u_i%5C)%20is%20also%20linear%20in%20the%20auxiliary%20variables%20%5C(u_i%5C).%20%20%0A%20%20%20%20These%20can%20be%20added%20directly%20to%20the%20optimization%20model%20without%20special%20reformulation.%0A%0A%20%20%20%20---%0A%0A%20%20%20%20%23%23%202.%20Quadratic%20Term%0A%0A%20%20%20%20The%20expression%0A%0A%20%20%20%20%5C%5B%0A%20%20%20%20%5Cbeta%5E%5Ctop%20(X%5E%5Ctop%20X%20%2B%20%5Clambda%20I)%5Cbeta%0A%20%20%20%20%5C%5D%0A%0A%20%20%20%20is%20**quadratic**.%20%0A%20%20%20%20There%20are%20**two%20approaches**%20we%20can%20apply%20to%20handle%20it%20as%20a%20rotated%20second-order%20cone%20(RSOC)%20constraint.%0A%0A%0A%20%20%20%20---%0A%0A%20%20%20%20%23%23%23%20Option%201%3A%20Cholesky-based%20approach%0A%0A%20%20%20%20To%20encode%20it%20in%20a%20conic%20optimization%20model%2C%20we%20introduce%20an%20auxiliary%20variable%20%5C(t%5C)%20representing%20the%20quadratic%20form%3A%0A%0A%20%20%20%20%5C%5B%0A%20%20%20%20t%20%5C%3B%5Cge%5C%3B%20%5Cbeta%5E%5Ctop%20Q%20%5Cbeta%2C%20%5Cquad%20Q%20%3D%20X%5E%5Ctop%20X%20%2B%20%5Clambda%20I%0A%20%20%20%20%5C%5D%0A%0A%0A%0A%20%20%20%201.%20Since%20%5C(Q%5C)%20is%20positive%20semidefinite%20we%20can%20perform%20a%20**Cholesky%20decomposition**%3A%0A%0A%20%20%20%20%5C%5B%0A%20%20%20%20Q%20%3D%20L%20L%5E%5Ctop%0A%20%20%20%20%5C%5D%0A%0A%20%20%20%202.%20Express%20the%20quadratic%20form%20as%20a%20norm%3A%20%20%0A%0A%20%20%20%20%5C%5B%0A%20%20%20%20%5Cbeta%5E%5Ctop%20Q%20%5Cbeta%20%3D%20%5C%7C%20L%5E%5Ctop%20%5Cbeta%20%5C%7C_2%5E2%0A%20%20%20%20%5C%5D%0A%0A%20%20%20%203.%20Encode%20it%20as%20a%20**rotated%20SOC%20constraint**%3A%0A%0A%20%20%20%20%5C%5B%0A%20%20%20%20(1%2C%5C%3B%200.5%5C%2Ct%2C%5C%3B%20L%5E%5Ctop%20%5Cbeta)%20%5C%3B%5Cin%5C%3B%20%5Cmathcal%7BQ%7D_r%0A%20%20%20%20%5C%5D%0A%0A%20%20%20%20This%20enforces%20%5C(t%20%5Cge%20%5C%7C%20L%5E%5Ctop%20%5Cbeta%20%5C%7C_2%5E2%5C).%0A%0A%20%20%20%20---%0A%0A%20%20%20%20%23%23%23%20Option%202%3A%20Stacked-matrix%20approach%20(no%20Cholesky)%0A%0A%20%20%20%20Firstly%2C%20let's%20introduce%20an%20auxiliary%20variable%20%5C(t%5C)%20representing%20the%20quadratic%20form%20again%3A%0A%0A%20%20%20%20%5C%5B%0A%20%20%20%20t%20%5C%3B%5Cge%5C%3B%20%5Cbeta%5E%5Ctop%20(X%5E%5Ctop%20X%20%2B%20%5Clambda%20I)%5Cbeta%0A%20%20%20%20%5C%5D%0A%0A%20%20%20%201.%20Stack%20%5C(X%5C)%20and%20%5C(%5Csqrt%7B%5Clambda%7D%20I%5C)%20vertically%3A%0A%0A%20%20%20%20%5C%5B%0A%20%20%20%20%5Cbegin%7Bbmatrix%7D%20X%20%5C%5C%20%5Csqrt%7B%5Clambda%7D%20I%20%5Cend%7Bbmatrix%7D%0A%20%20%20%20%5C%5D%0A%0A%20%20%20%202.%20Express%20the%20quadratic%20form%20as%20a%20norm%3A%0A%0A%20%20%20%20%5C%5B%0A%20%20%20%20%5Cbeta%5E%5Ctop%20(X%5E%5Ctop%20X%20%2B%20%5Clambda%20I)%20%5Cbeta%20%3D%20%5Cleft%5C%7C%20%5Cbegin%7Bbmatrix%7D%20X%20%5C%5C%20%5Csqrt%7B%5Clambda%7D%20I%20%5Cend%7Bbmatrix%7D%20%5Cbeta%20%5Cright%5C%7C_2%5E2%0A%20%20%20%20%5C%5D%0A%0A%20%20%20%203.%20Introduce%20a%20scalar%20variable%20%5C(t%20%5Cge%20%5C%7C%20%5BX%3B%20%5Csqrt%7B%5Clambda%7D%20I%5D%20%5Cbeta%20%5C%7C_2%5C)%20and%20write%20as%20a%20**rotated%20SOC%20constraint**%3A%0A%0A%20%20%20%20%5C%5B%0A%20%20%20%20%5Cleft(1%2C%20%5Cfrac%7Bt%7D%7B2%7D%2C%20%5Cbegin%7Bbmatrix%7D%20X%20%5C%5C%20%5Csqrt%7B%5Clambda%7D%20I%20%5Cend%7Bbmatrix%7D%20%5Cbeta%20%5Cright)%20%5C%3B%5Cin%5C%3B%20%5Cmathcal%7BQ%7D_r%0A%20%20%20%20%5C%5D%0A%0A%20%20%20%20This%20approach%20avoids%20Cholesky%20factorization%20and%20works%20well%20when%20%5C(X%5C)%20is%20sparse.%0A%0A%20%20%20%20---%0A%0A%20%20%20%20**Note%20on%20selection%3A**%20%20%0A%0A%20%20%20%20-%20If%20%5C(X%5C)%20is%20**dense**%2C%20the%20Cholesky-based%20approach%20is%20usually%20faster%20and%20more%20efficient.%20%20%0A%20%20%20%20-%20If%20%5C(X%5C)%20is%20**sparse**%2C%20the%20stacked-matrix%20approach%20is%20preferred.%20%20%0A%0A%20%20%20%20The%20choice%20depends%20on%20the%20**structure%20of%20%5C(X%5C)**%20and%20the%20**solver%2Fdata%20requirements**.%0A%0A%0A%20%20%20%20%23%23%203.%20Putting%20it%20together%3A%0A%0A%20%20%20%20-%20The%20final%20objective%20is%20linear%20in%20%5C(%5Cbeta%2C%20u%2C%20t%5C).%0A%20%20%20%20-%20The%20only%20nonlinear%20part%20%5C(%5Cbeta%5E%5Ctop%20Q%20%5Cbeta%5C)%20has%20been%20reformulated%20into%20a%20**cone%20constraint**%2C%20so%20the%20solver%20can%20handle%20it%20efficiently.%0A%20%20%20%20%22%22%22%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(r%22%22%22%23%20Implementing%20the%20model%20in%20Python%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%0A%20%20%20%20%20%20%20%20r%22%22%22%0A%20%20%20%20The%20following%20function%20implements%20the%20model%20described%20earlier.%20%20%0A%0A%20%20%20%20The%20quadratic%20term%20can%20be%20handled%20using%20**either%20of%20the%20two%20approaches**%20depending%20on%20the%20user%E2%80%99s%20preference%20and%20the%20input%20data%3A%0A%0A%20%20%20%20-%20If%20the%20user%20sets%20%60cholesky%20%3D%20True%60%20(the%20**default**)%2C%20the%20function%20uses%20the%20**Cholesky-based%20transformation**%20(Option%201).%20%20%0A%20%20%20%20-%20If%20the%20user%20sets%20%60cholesky%20%3D%20False%60%2C%20the%20function%20uses%20the%20**stacked-matrix%20transformation**%20(Option%202).%20%20%0A%0A%20%20%20%20This%20design%20allows%20the%20model%20to%20adapt%20to%20the%20structure%20of%20the%20input%20data%20and%20the%20user%E2%80%99s%20requirements%20while%20keeping%20the%20implementation%20flexible.%0A%0A%20%20%20%20%22%22%22%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(Domain%2C%20Expr%2C%20Model%2C%20ObjectiveSense%2C%20np%2C%20time)%3A%0A%20%20%20%20def%20RankOne(n%2C%20p%2C%20lmb%2C%20mu%2C%20y%20%2C%20X%2C%20k%2Ccholesky%20%3D%20True)%3A%0A%20%20%20%20%20%20%20%20M%20%3D%20Model(%22RankOne%22)%0A%0A%20%20%20%20%20%20%20%20%23%20Decision%20variables%0A%20%20%20%20%20%20%20%20B%20%3D%20M.variable(%22B%22%2C%20p%2C%20Domain.unbounded())%20%20%20%20%20%20%20%20%23%20regression%20coefficients%20%CE%B2%0A%20%20%20%20%20%20%20%20u%20%3D%20M.variable(%22u%22%2C%20p%2C%20Domain.greaterThan(0.0))%20%20%20%23%20auxiliary%20vars%20for%20%7C%CE%B2%7C%0A%20%20%20%20%20%20%20%20t%20%3D%20M.variable(%22t%22%2C%201)%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%23%20scalar%20var%20for%20quadratic%20form%0A%20%20%20%20%20%20%20%20z%20%3D%20M.variable(%22z%22%2C%20p%2C%20Domain.binary())%20%20%20%20%20%20%20%20%20%20%20%23%20binary%20indicators%20for%20support%0A%0A%20%20%20%20%20%20%20%20%23%20Objective%20part%3A%20y%E1%B5%80y%20(constant%2C%20included%20for%20completeness)%0A%20%20%20%20%20%20%20%20term1%20%3D%20y.T%20%40%20y%0A%0A%20%20%20%20%20%20%20%20%23%20Objective%20part%3A%20-2%20y%E1%B5%80X%CE%B2%0A%20%20%20%20%20%20%20%20term2%20%3D%20-2%20*%20Expr.mul(y.reshape(1%2C%20-1)%2C%20Expr.mul(X%2C%20B))%0A%0A%20%20%20%20%20%20%20%20if%20cholesky%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20%23%20Quadratic%20term%20%CE%B2%E1%B5%80(X%E1%B5%80X%20%2B%20%CE%BBI)%CE%B2%20handled%20with%20rotated%20quadratic%20cone%0A%20%20%20%20%20%20%20%20%20%20%20%20Q%20%3D%20(X.T%20%40%20X%20%2B%20lmb%20*%20np.identity(p))%0A%20%20%20%20%20%20%20%20%20%20%20%20w%2C%20V%20%3D%20np.linalg.eigh(Q)%20%20%20%20%20%20%20%20%23%20eigen-decomposition%0A%20%20%20%20%20%20%20%20%20%20%20%20L%20%3D%20V%20%40%20np.diag(np.sqrt(w))%20%20%20%20%20%23%20factor%20such%20that%20Q%20%3D%20L%20L%E1%B5%80%0A%0A%20%20%20%20%20%20%20%20%20%20%20%20%23%20Cone%20constraint%3A%20(1%2C%200.5t%2C%20L%E1%B5%80B)%20%E2%88%88%20Rotated%20Quadratic%20Cone%0A%20%20%20%20%20%20%20%20%20%20%20%20M.constraint(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20Expr.vstack(1.0%2C%200.5%20*%20t%2C%20Expr.mul(L.T%2C%20B))%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20Domain.inRotatedQCone()%0A%20%20%20%20%20%20%20%20%20%20%20%20)%0A%20%20%20%20%20%20%20%20else%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20%23%20This%20corresponds%20to%20the%20stacked%20matrix%20%5BX%3B%20sqrt(lambda)%20*%20I_p%5D%20used%20for%20the%20quadratic%20form%0A%20%20%20%20%20%20%20%20%20%20%20%20Xp%20%3D%20np.vstack(%5BX%2C%20np.sqrt(lmb)%20*%20np.identity(p)%5D)%0A%0A%20%20%20%20%20%20%20%20%20%20%20%20%23%20Rotated%20Quadratic%20Cone%20constraint%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20%23%20(1%2C%200.5*t%2C%20%5BX%3B%20sqrt(lambda)%20I_p%5D%20*%20%CE%B2)%20%E2%88%88%20Rotated%20Quadratic%20Cone%0A%20%20%20%20%20%20%20%20%20%20%20%20%23%20This%20encodes%20the%20quadratic%20term%20%CE%B2%E1%B5%80(X%E1%B5%80X%20%2B%20%CE%BBI)%CE%B2%20as%20a%20rotated%20cone%20constraint%0A%20%20%20%20%20%20%20%20%20%20%20%20M.constraint(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20Expr.vstack(1.0%2C%200.5%20*%20t%2C%20Expr.mul(Xp%2C%20B))%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20Domain.inRotatedQCone()%0A%20%20%20%20%20%20%20%20%20%20%20%20)%0A%0A%0A%20%20%20%20%20%20%20%20%23%20L1%20penalty%20term%20%CE%BC%E2%88%91%20u_i%0A%20%20%20%20%20%20%20%20term4%20%3D%20mu%20*%20Expr.sum(u)%0A%0A%20%20%20%20%20%20%20%20%23%20Cardinality%20constraint%3A%20at%20most%20k%20nonzeros%0A%20%20%20%20%20%20%20%20M.constraint(Expr.sum(z)%20%3C%3D%20k)%0A%0A%20%20%20%20%20%20%20%20%23%20Relating%20u%20and%20B%3A%20%7C%CE%B2_i%7C%20%E2%89%A4%20u_i%0A%20%20%20%20%20%20%20%20M.constraint(B%20%3C%3D%20u)%0A%20%20%20%20%20%20%20%20M.constraint(-B%20%3C%3D%20u)%0A%0A%20%20%20%20%20%20%20%20%23%20Either-or%20condition%3A%20z_i%20%3D%201%20OR%20%CE%B2_i%20%3D%200%0A%20%20%20%20%20%20%20%20M.disjunction((z%20%3D%3D%201)%20%7C%20(B%20%3D%3D%200))%0A%0A%20%20%20%20%20%20%20%20%23%20Final%20objective%3A%20min%20(y%E1%B5%80y%20-%202%20y%E1%B5%80X%CE%B2%20%2B%20%CE%B2%E1%B5%80Q%CE%B2%20%2B%20%CE%BC%E2%80%96%CE%B2%E2%80%96%E2%82%81)%0A%20%20%20%20%20%20%20%20M.objective(ObjectiveSense.Minimize%2C%20term2%20%2B%20t%20%2B%20term4%20%2B%20term1)%0A%0A%20%20%20%20%20%20%20%20%23%20Export%20model%20and%20solve%0A%20%20%20%20%20%20%20%20M.writeTask(%22RankOne.ptf%22)%0A%20%20%20%20%20%20%20%20s%20%3D%20time.time()%0A%20%20%20%20%20%20%20%20M.solve()%0A%20%20%20%20%20%20%20%20elapsed%20%3D%20time.time()%20-%20s%0A%20%20%20%20%20%20%20%20print(f%22Elapsed%20Time%3A%20%7Belapsed%3A.4f%7D%20s%22)%0A%20%20%20%20%20%20%20%20print(f%22Objective%20value%3A%20%7BM.primalObjValue()%3A.4f%7D%22)%0A%20%20%20%20%20%20%20%20%23%20Print%20solution%20status%0A%20%20%20%20%20%20%20%20ModelStatus%20%3D%20M.getPrimalSolutionStatus()%0A%20%20%20%20%20%20%20%20print(%22Model%20Status%3A%20%22%2C%20ModelStatus)%0A%0A%20%20%20%20%20%20%20%20return%20M%0A%20%20%20%20return%20(RankOne%2C)%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%0A%20%20%20%20%20%20%20%20r%22%22%22%0A%20%20%20%20The%20described%20model%20is%20implemented%20using%20Mosek%20above.%0A%0A%20%20%20%20%60%60%60python%0A%20%20%20%20%20%20%20%20M.disjunction((z%20%3D%3D%201)%20%7C%20(B%20%3D%3D%200))%0A%20%20%20%20%60%60%60%0A%0A%20%20%20%20The%20line%20above%20illustrates%20how%20**Mosek%20can%20implement%20disjunction%20constraints**%20without%20requiring%20any%20additional%20transformations%2C%20such%20as%20using%20Big-M%20values.%20%0A%0A%20%20%20%20To%20run%20the%20model%2C%20we%20use%20the%20*diabetes%20dataset*%20shared%20by%20the%20referenced%20paper.%20We%20download%2C%20and%20load%20the%20dataset%20online%20in%20the%20code%20block%20below.%0A%20%20%20%20%22%22%22%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(r%22%22%22%23%23%20Loading%20the%20Data%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(StandardScaler%2C%20glob%2C%20io%2C%20os%2C%20pd%2C%20requests%2C%20zipfile)%3A%0A%20%20%20%20%23%20URL%20of%20the%20data%0A%20%20%20%20url%20%3D%20%22https%3A%2F%2Fatamturk.ieor.berkeley.edu%2Fdata%2Fsparse.regression%2Fdata.sparse.regression.zip%22%0A%0A%20%20%20%20%23%20Download%20and%20unzip%0A%20%20%20%20r%20%3D%20requests.get(url)%0A%20%20%20%20z%20%3D%20zipfile.ZipFile(io.BytesIO(r.content))%0A%20%20%20%20extract_dir%20%3D%20%22data_sparse_regression%22%0A%20%20%20%20z.extractall(extract_dir)%20%20%23%20extracts%20to%20a%20local%20folder%0A%0A%20%20%20%20%23%20Find%20a%20file%20that%20looks%20like%20the%20diabetes%20dataset%20(e.g.%2C%20diabetes*.csv%20%2F%20.txt)%0A%20%20%20%20candidates%20%3D%20glob.glob(os.path.join(extract_dir%2C%20%22**%22%2C%20%22diabetes*.*%22)%2C%20recursive%3DTrue)%0A%20%20%20%20if%20not%20candidates%3A%0A%20%20%20%20%20%20%20%20%23%20If%20you%20know%20the%20exact%20name%2C%20set%20it%20here%3A%0A%20%20%20%20%20%20%20%20%23%20file_path%20%3D%20os.path.join(extract_dir%2C%20%22data.sparse.regression%22%2C%20%22diabetesQ.csv%22)%0A%20%20%20%20%20%20%20%20raise%20FileNotFoundError(%22No%20file%20matching%20'diabetes*'%20was%20found%20in%20the%20extracted%20contents.%22)%0A%0A%20%20%20%20file_path%20%3D%20candidates%5B0%5D%0A%20%20%20%20print(%22Using%20file%3A%22%2C%20file_path)%0A%0A%20%20%20%20%23%20Read%20the%20dataset%20(let%20pandas%20sniff%20the%20delimiter%20if%20not%20comma)%0A%20%20%20%20df%20%3D%20pd.read_csv(file_path%2C%20sep%3DNone%2C%20engine%3D%22python%22)%0A%20%20%20%20print(%22Columns%20in%20dataset%3A%22%2C%20df.columns.tolist())%0A%0A%20%20%20%20%23%20Assuming%20the%20last%20column%20is%20the%20target%20variable%0A%20%20%20%20X_nonSTD%20%3D%20df.iloc%5B%3A%2C%20%3A-1%5D.values%0A%20%20%20%20y%20%3D%20df.iloc%5B%3A%2C%20-1%5D.values%0A%0A%20%20%20%20X_nonSTD%20%3D%20X_nonSTD.astype(float)%0A%0A%20%20%20%20%23%20Standardize%20features%3A%20zero%20mean%2C%20unit%20variance%0A%20%20%20%20x_scaler%20%3D%20StandardScaler()%0A%20%20%20%20X%20%3D%20x_scaler.fit_transform(X_nonSTD)%0A%0A%20%20%20%20print(%22Shape%20of%20X%20(features)%3A%22%2C%20X.shape)%0A%20%20%20%20print(%22Shape%20of%20y%20(target)%3A%22%2C%20y.shape)%0A%20%20%20%20return%20X%2C%20y%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(r%22%22%22Then%20using%20the%20loaded%20data%2C%20we%20can%20run%20the%20problem.%20As%20it%20can%20be%20seen%20from%20the%20printed%20out%20message%20below%2C%20the%20model%20is%20solved%20to%20feasibility.%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(RankOne%2C%20X%2C%20y)%3A%0A%20%20%20%20n%2Cp%20%3D%20X.shape%5B0%5D%2CX.shape%5B1%5D%0A%20%20%20%20lmb%20%3D%205%0A%20%20%20%20mu%20%3D%201%0A%20%20%20%20k%20%3D%20p*0.8%0A%0A%20%20%20%20RankOne(n%2C%20p%2C%20lmb%2C%20mu%2C%20y%20%2C%20X%2C%20k%2CTrue)%0A%20%20%20%20return%0A%0A%0Aif%20__name__%20%3D%3D%20%22__main__%22%3A%0A%20%20%20%20app.run()%0A