SVM Performance Benchmarks: Yottamine Versus the Leading Traditional SVM

1. Large Data Set – Binary Covtype (http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/)

For both programs we used 581,021 examples with 70% used for training and 30% for testing. The data set has 2 classes and each data point has 54 features.

LIBSVM

Data Type: To get the best result from LIBSVM, we used the sparse data set version because the program is more efficient with that type.

Parameters: We selected the Gaussian kernel and used the default C and Sigma parameters for the program.

Compute Resources: Intel Xeon Server with 4 Cores and 16GB of RAM running Ubuntu 10.04.

Result:
Evaluate 1 parameter: 6.3 Hours
Accuracy: 77.1082%

Yottamine KNN SVM

Data Type: We used the dense data set version because Yottamine is faster with dense data.

Parameters: Yottamine’s parametric automation selected optimum parameters.

Compute Resources: 10 nodes Hadoop Cluster each with 8 cores, 7GB of memory and 1 Gigabit interconnection. Each node running Ubuntu 10.04 server.

Result:
Evaluate 210 parameters: 1.5 Hours
Accuracy: 95.75%

2. Ultra Large Data Set – MNIST8M (http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/)

For both programs we used the 8.1 million data points for training, and 10,000 for testing. Each data point has 784 features and has 10 classes. The data set is known to be nonlinear.

LIBSVM

Data Type: Once again, sparse data was used. The sparse version is about 18GB.

Parameters: We selected the Gaussian kernel and used default C and Sigma. We set the size of the kernel caching to be 8GB.

Compute Resources: Intel Xeon Server with 4 cores and 16GB of RAM with Ubuntu 10.04.

Result:

Evaluate 1 parameter: Approximately 40 days
We terminated the program after 3 days. During the three-day period, LIBSVM completed only 3 models. We estimated it took about 20 hours for each model to finish, and it will take approximately 40 days to evaluate one pair of Sigma and C parameters.
Accuracy: Unknown
Because LIBSVM could not finish computing the model, the accuracy of the default parameters is indeterminable.

Yottamine Approximated Gaussian:

Data Type: For Yottamine, we used the dense data set version, 43GB in size. The actual data set is not very sparse, and Yottamine is actually faster operating on dense data, despite its larger size.

Parameters: The Yottamine software automatically selected optimum Sigma and C parameter values after testing 12 different combinations.

Compute Resources: 20 nodes Hadoop Cluster in AWS, each with 16 physical cores and 60.5GB of memory, connected by a 10 Gigabit network.

Result:
Evaluate 12 parameters: 6 hours
Accuracy: 97.82%.
Amount of Speed Up: Approximately 1800 times