Fast Support Vector Classifier for large-scale classification problems
The Support Vector Machine (SVM) is a state-of-the-art classifier that for large datasets is very slow and requires much memory. To solve this defficiency, we propose the Fast Support Vector Classifier (FSVC) that includes: 1) an efficient closed-form training without numerical procedures; 2) a small collection of class prototypes instead of support vectors; and 3) a fast method that selects the spread of the radial basis function kernel directly from data. Its storage requirements are very low and can be adjusted to the available memory, being able to classify any dataset of arbitrarily large sizes (31 millions of patterns, 30,000 inputs and 131 classes in less than 1.5 hours). The FSVC spends 12 times less memory than Liblinear, that fails on the 4 largest datasets by lack of memory, being one and two orders of magnitude faster than Liblinear and Libsvm, respectively. Comparing performance, FSVC is 4.1 points above Liblinear and only 6.7 points below Libsvm. The time spent by FSVC only depends on the dataset size (610^-7 sec. per pattern, input and class) and can be accurately estimated for new datasets, while for Libsvm and Liblinear depends on the dataset difficulty. Code is provided.
keywords: Classification, large-scale datasets, SVM, closed-form training, model selection
Publication: Article
1625553108224
July 6, 2021
/research/publications/fast-support-vector-classifier-for-large-scale-classification-problems
The Support Vector Machine (SVM) is a state-of-the-art classifier that for large datasets is very slow and requires much memory. To solve this defficiency, we propose the Fast Support Vector Classifier (FSVC) that includes: 1) an efficient closed-form training without numerical procedures; 2) a small collection of class prototypes instead of support vectors; and 3) a fast method that selects the spread of the radial basis function kernel directly from data. Its storage requirements are very low and can be adjusted to the available memory, being able to classify any dataset of arbitrarily large sizes (31 millions of patterns, 30,000 inputs and 131 classes in less than 1.5 hours). The FSVC spends 12 times less memory than Liblinear, that fails on the 4 largest datasets by lack of memory, being one and two orders of magnitude faster than Liblinear and Libsvm, respectively. Comparing performance, FSVC is 4.1 points above Liblinear and only 6.7 points below Libsvm. The time spent by FSVC only depends on the dataset size (610^-7 sec. per pattern, input and class) and can be accurately estimated for new datasets, while for Libsvm and Liblinear depends on the dataset difficulty. Code is provided. - Ziad Akram, Ali Hammouri, Manuel Fernández Delgado, Eva Cernadas, Senen Barro - 10.1109/TPAMI.2021.3085969
publications_en