Massive Data Discrimination via Linear Suppot Vector Machines

Mangasarian, O.L.; Bradley, P.S.

Massive Data Discrimination via Linear Suppot Vector Machines

Files

98-05.pdf (180.55 KB)

Date

1999-03-31

Authors

Mangasarian, O.L.

Bradley, P.S.

Type

Technical Report

Abstract

A linear support vector machine formulation is used to generate a fast, finitely-terminating linear-programming algorithm for discriminating between two massive sets in n-dimensional space, where the number of points can be orders of magnitude larger than n. The algorithm creates a succession of sufficiently small linear programs that separate chunks of the data at a time. The key idea is that a small number of support vectors, corresponding to linear programming constrains with positive dual variables, are carried over between the successive small linear programs, each of which containing a chunk of the data. We prove that this procedure is monotonic and terminates in a finite number of steps at an exact solution leads to an optimal separating plane for the entire data set. Numerical results on full dense publicly available datasets, number 20,000 to 1 million points in 32-dimensional space, confirm the theoretical results and demonstrate the ability to handle very large problems.

Keywords

linear programming chunking, support vector machines

Citation

98-05

URI

http://digital.library.wisc.edu/1793/66093

Collections

Math Prog Technical Reports

Full item page

Massive Data Discrimination via Linear Suppot Vector Machines

Files

Date

Authors

Advisors

License

DOI

Type

Journal Title

Journal ISSN

Volume Title

Publisher

Grantor

Abstract

Description

Keywords

Related Material and Data

Citation

Sponsorship

URI

Collections

Endorsement

Review

Supplemented By

Referenced By