Last month, wise.io co-founder Joey wrote a blog post on the principles of doing proper benchmarks of machine learning frameworks. Here, I start to put those principles into practice, presenting the first in a series of blog posts on ML benchmarking. For those short of time, you can jump to conclusions.
For this benchmark, I focus on comparing accuracy and speed of four random forest®1 classifier implementations, including the high-performance WiseRF™. In follow-up posts we will cover random forest regression and benchmark against other machine learning algorithms. We will also benchmark memory usage across different implementations — another very important, but often overlooked aspect of benchmarking, especially when it comes to “big data.”
We created a standardized benchmarking platform to compare the accuracy and speed of the following random forest implementations:
All of the tools, with the exception of the randomForest R package, are multi-threaded and were parallelized across available cores. H2O has the option of running a distributed (multi-machine) implementation, but I considered herein the more common single-node workstation for this benchmarking exercise.