Massive Data Discrimination via Linear Suppot Vector Machines
Loading...
Files
Date
Authors
Mangasarian, O.L.
Bradley, P.S.
Advisors
License
DOI
Type
Technical Report
Journal Title
Journal ISSN
Volume Title
Publisher
Grantor
Abstract
A linear support vector machine formulation is used to generate a fast, finitely-terminating linear-programming algorithm for discriminating between two massive sets in n-dimensional space, where the number of points can be orders of magnitude larger than n. The algorithm creates a succession of sufficiently small linear programs that separate chunks of the data at a time. The key idea is that a small number of support vectors, corresponding to linear programming constrains with positive dual variables, are carried over between the successive small linear programs, each of which containing a chunk of the data. We prove that this procedure is monotonic and terminates in a finite number of steps at an exact solution leads to an optimal separating plane for the entire data set. Numerical results on full dense publicly available datasets, number 20,000 to 1 million points in 32-dimensional space, confirm the theoretical results and demonstrate the ability to handle very large problems.
Description
Related Material and Data
Citation
98-05