Use Hadoop and BigR to solve a Kaggle Competition

I previously explained how to use DashDB on Bluemix as infrastructure for a Kaggle competition. In this post I’m going to explain how to use Hadoop instead of DashDB.  Why should I use it? When the volume of data is HUGE, using Hadoop is always a good option. Using the service ‘IBM Analytics for Hadoop‘ (powered by BigInsights) is in my opinion the best option…why?

– First because IBM offers 20 GB of data storage for free!!! And this is VERY COOL!

– Second, you can use BigR. IBM InfoSphere BigInsights BigR is a library of functions that provide end-to-end integration with the R language and InfoSphere BigInsights. This is new in the Bluemix service, available since the last update.

All the environment is READY to run R, with R installed in all the nodes of the Hadoop Cluster.

In this post I’m going to show how to set up the environment to solve the challenge ‘Click-Through Rate Prediction‘ with an award of $15,000. And I repeat…all this cloud powerful infrastructure for FREE!


Continue reading