Learn how Moment-Sum-of-Squares relaxation improves optimization for machine learning models when standard SDP methods fail to find global optima.Learn how Moment-Sum-of-Squares relaxation improves optimization for machine learning models when standard SDP methods fail to find global optima.

Improving Global Optimization in HSVM and SDP Problems

2026/01/14 23:30
3분 읽기

Abstract and 1. Introduction

  1. Related Works

  2. Convex Relaxation Techniques for Hyperbolic SVMs

    3.1 Preliminaries

    3.2 Original Formulation of the HSVM

    3.3 Semidefinite Formulation

    3.4 Moment-Sum-of-Squares Relaxation

  3. Experiments

    4.1 Synthetic Dataset

    4.2 Real Dataset

  4. Discussions, Acknowledgements, and References

    \

A. Proofs

B. Solution Extraction in Relaxed Formulation

C. On Moment Sum-of-Squares Relaxation Hierarchy

D. Platt Scaling [31]

E. Detailed Experimental Results

F. Robust Hyperbolic Support Vector Machine

3.4 Moment-Sum-of-Squares Relaxation

The SDP relaxation in Equation (8) may not be tight, particularly when the resulting W has a rank much larger than 1. Indeed, we often find W to be full-rank empirically. In such cases, moment-sum-of-squares relaxation may be beneficial. Specifically, it can certifiably find the global optima, provided that the solution exhibits a special structure, known as the flat-extension property [30, 32].

\

\ With all these definitions established, we can present the moment-sum-of-squares relaxation [9] to the HSVM problem, outlined in Equation (7), as

\

\ Note that 𝑔(q) ⩾ 0, as previously defined, serves as constraints in the original formulation. Additionally, when forming the moment matrix, the degree of generated monomials is 𝑠 = 𝜅 − 1, since all constraints in Equation (7) has maximum degree 1. Consequently, Equation (13) is a convex programming and can be implemented as a standard SDP problem using mainstream solvers. We further emphasize that by progressively increasing the relaxation order 𝜅, we can find increasingly better solutions theoretically, as suggested by Lasserre [33]

\

\ where 𝐵 is an index set of the moment matrix to entries generated by w along, ensuring that each moment matrix with overlapping regions share the same values as required. We refer the last constraint as the sparse-binding constraint.

\ Unfortunately, our solution empirically does not satisfy the flat-extension property and we cannot not certify global optimality. Nonetheless, in practice, it achieves significant performance improvements in selected datasets over both projected gradient descent and the SDP-relaxed formulation. Similarly, this formulation does not directly yield decision boundaries and we defer discussions on the extraction methods to Appendix B.2.

\ Figure 2: Star-shaped Sparsity pattern in Equation (13) visualized with 𝑛 = 4

\

:::info Authors:

(1) Sheng Yang, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA (shengyang@g.harvard.edu);

(2) Peihan Liu, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA (peihanliu@fas.harvard.edu);

(3) Cengiz Pehlevan, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, Center for Brain Science, Harvard University, Cambridge, MA, and Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA (cpehlevan@seas.harvard.edu).

:::


:::info This paper is available on arxiv under CC by-SA 4.0 Deed (Attribution-Sharealike 4.0 International) license.

:::

\

면책 조항: 본 사이트에 재게시된 글들은 공개 플랫폼에서 가져온 것으로 정보 제공 목적으로만 제공됩니다. 이는 반드시 MEXC의 견해를 반영하는 것은 아닙니다. 모든 권리는 원저자에게 있습니다. 제3자의 권리를 침해하는 콘텐츠가 있다고 판단될 경우, service@support.mexc.com으로 연락하여 삭제 요청을 해주시기 바랍니다. MEXC는 콘텐츠의 정확성, 완전성 또는 시의적절성에 대해 어떠한 보증도 하지 않으며, 제공된 정보에 기반하여 취해진 어떠한 조치에 대해서도 책임을 지지 않습니다. 본 콘텐츠는 금융, 법률 또는 기타 전문적인 조언을 구성하지 않으며, MEXC의 추천이나 보증으로 간주되어서는 안 됩니다.