SPSS is now on Bluemix!

Great news for all the developers. You can now deploy the predictive models created using IBM SPSS Modeler to your applications.

PredictiveModeling

It is very simple, simply log in Bluemix and you can upload your SPSS stream in the Administration Dashboard. Once deployed, your applications can access your models via requests and score input data. You can also refresh your predictive models without stopping your applications.

You can access IBM Predictive Modelling service via REST API using any programming language.  I will do a tutorial and video showing you how easy is to get started using it.

Analysis of #CharlieHebdo sentiment with SPSS

With the horrible attack in Paris  in the Charlie Hebdo office, we are experiencing once more a new way to be informed about last news, this time powered by Twitter. It is amazing how fast people are sharing thoughts, photos, links, and absolutely everything. It thus becomes the data set of the world population’s mind in real time.

In this post I am going to show how to query tweets and do some simple analysis using IBM SPSS Modeler and the new SPSS Predictive Extensions based on R. All this analysis…without any coding at all!  We are going to do 3 things:

– Create a Word Cloud with a new WordCloud node based on the R wordcloud package.

– Integration of RCharts with IBM SPSS Modeler. RCharts (developed by Ramnath Vaidyanathan) was born as the initiative to bring powerful JavaScript visualization for R users. So they can now create these interactive charts without having JavaScript skills, only with R. With this integration within SPSS workbench, you don’t even need to know R in order to use them. Simply drag and drop the node and start getting powerful results that are easy to share. These are the libraries available now in IBM SPSS Modeler:

-Integration with the new R package HTMLWidgets. This package enable you to add new types of HTML output to R Markdown documents. There are different types of widgets like maps, charts, 3D scatterplots and more.

NewNodesSPSS

Continue reading

R Shiny Application to Predict Survival on the Titanic

One of the most popular exercises to get started with Data Mining is predicting the survival on the Titanic. In the Kaggle website this is one of the main challenges, and you can find accurate documentation and tutorials on how to solve it using Excel, Python, R…

In the IBM Extreme Blue team that I leaded last summer the 4 students got started on Data Mining doing this challenge, and we end up creating a Shiny R application.  Shiny is a web application framework for R to convert your analysis into interactive web applications without having to write in HTML, CSS or JavaScript. We not only worked in solving the problem with R and with SPSS getting very good result, we created this application so everybody can access online and see the results. The model is created on the cloud and the results are calculated in as well in the cloud.

 http://titanicanalysis.mybluemix.net/

The application is hosted on Bluemix. To do so, we are using the Cloud Foundry R Custom Buildpack I modified to install R 3.1: https://github.com/aruizga7/cf-buildpack-r

You have the instructions about how to run R Shiny on Bluemix in this article: https://bluemixanalytics.wordpress.com/2014/09/11/new-developerworks-article-about-running-r-shiny-on-bluemix/

You can find the source code of this Shiny Application here: https://github.com/aruizga7/TitanicShinyApp.git

Continue reading

Use Hadoop and BigR to solve a Kaggle Competition

I previously explained how to use DashDB on Bluemix as infrastructure for a Kaggle competition. In this post I’m going to explain how to use Hadoop instead of DashDB.  Why should I use it? When the volume of data is HUGE, using Hadoop is always a good option. Using the service ‘IBM Analytics for Hadoop‘ (powered by BigInsights) is in my opinion the best option…why?

– First because IBM offers 20 GB of data storage for free!!! And this is VERY COOL!

– Second, you can use BigR. IBM InfoSphere BigInsights BigR is a library of functions that provide end-to-end integration with the R language and InfoSphere BigInsights. This is new in the Bluemix service, available since the last update.

All the environment is READY to run R, with R installed in all the nodes of the Hadoop Cluster.

In this post I’m going to show how to set up the environment to solve the challenge ‘Click-Through Rate Prediction‘ with an award of $15,000. And I repeat…all this cloud powerful infrastructure for FREE!

Hadoop

Continue reading

Use #Bluemix DashDB and R to solve a Kaggle Competition

Kaggle is a platform for predictive modelling and analytics competitions on which companies and researches post their data and statisticians and data miners from all over the world compete to produce the best model.

There are very tempting awards for the winners:

Kaggle1Anyone can nowadays start solving these challenges, and there are powerful tools like SPSS Modeler that within a few click rank you in the top 5% of the Ranking.

One of the first problems that you face doing this kind of competitions is that the datasets are TOO big to handle with a regular computer. Not enough memory…not enough CPU power and the process is not performant…So then…what? I cannot participate? Or I have to pay expensive money to have a Cloud Environment? No! Use IBM Bluemix band DashDB!

With IBM Bluemix service DashDB is one of the best solutions to do so. Why?

– DashDB is powered by IBM BLU Acceleration and Netezza in-Database Analytics. It uses dynamic in-memory columnar technology and innovations such as actionable compression to rapidly scan and return relevant data. In-database analytic algorithms integrated from Netezza bring simplicity and performance to advanced analytics.

-You can get started for Free. You get a free account now on Bluemix.net. You get for free at No charge 1GB of data stored. After that, you pay as you grow. 1 GB to 10 GB of available compressed database storage that can hold, respectively, from 5 GB to 50 GB of uncompressed data, based on typical compression ratios. The compression ratio for your data varies based on the characteristics and values in your data set.

-Full integration with R: You can not only run R scripts but also open an instance of the best R Development Environment RStudio completely embedded in the web-browser. You don’t need to install any software on your computer, you can do it all in the Cloud!

alt

Continue reading

Rave Viz compatible with Bluemix

For those willing to create shiny visualizations, at IBM there is an engine that is powering the Watson products called Rave. Rave means Rapidly Adaptive Visualization Engine and it is fully integrated with IBM Cognos and it is compatible with the newest web-browsers and mobile devices. You can learn more about it here:

http://www-01.ibm.com/software/analytics/many-eyes/

I am very excited to show for first time a sample IBM Bluemix App using Rave Engine. Now your apps can integrate it easily…. it is not only about doing powerful analytics…it is also about showing the results in a clear manner!

Here are some sample:

rave

You will find as well learning material:

Analytics Using R and R Studio in Bluemix with dashDB

I’ve published some articles explaining different ways to integrate R in Bluemix. I found a nice hands-on lab that explains step by step how to use R and RStudio (embedded in the webrower, no installation required…pretty cool) directly on Bluemix.

Work with analytics R scripts and R Studio for a scenario to analyze telecom industry customer churn based on existing customer data. Once you debug and test the script in R Studio you can deploy the scenario to a production Bluemix analytics warehouse environment in dashDB, including customer data in data warehouse tables and your deployed R analytics script modified to use the data from the warehouse.

Enjoy!

Download PDF

dashDBR

Twitter and IBM Partnership – IBM Insight 2014 Big Announcement

The conference is over and it is time to make conclusions. It is been an exciting week and I am proud to have seen the moment when IBM and Twitter announced their partnership… in my opinion the most important announce during the conference Insight 2014.
How was it? It was Wednesday 29th of October, I was feeling a bit stressed because at 10h I had my first intervention ever as speaker in such a conference like IBM Insight. Suddenly on the stage, the Twitter’s Vice President of Data Strategy and Ex-CEO of GNIP (cool company adquired by Twitter last year). He was explaining the importance of analysing data, and talking about concepts like Internet of Things…and then he said…imagine instrumenting the brain of all the people, and analyzing all this data in an efficient way. And…BOOOM!!! They announced the partnership with and all the crowd was really excited. The best and most valuable data source with the most powerful analytics engine and software partner together to bring this new source of business insight.
twitterIBM

Continue reading

Geospatial Analytics with SPSS Modeler #IBMInsight

I was glad to attend to the session of Geospatial Analytics with SPSS Modeler and see a lot of familiar stuff…basically I saw a demo of Space-Time-Boxes. Exactly the same demo with the same dataset is in the article I wrote in DeveloperWorks some months ago and you can enjoy here:

-Mine spatial data with space-time-boxes in IBM SPSS Modeler: http://www.ibm.com/developerworks/library/ba-mine-spatial-data-spss-r

Then I’ve seen as well some interested new Geospatial nodes based on R…and this nodes were developed by my team last summer :-)  It is a big pleasure come out with an idea, work on it, and see it presented in front of so many people and industry leaders in the main conference of Big Data and Analytics.

IMG_1354IMG_1352

IMG_1351

First day at IBM Insight 2014: DATA, DATA and DATA!

The thing I wanted to do since I joined IBM was to attend the conference of Insight (previously called Information On Demand).  And here I am, dream is not anymore a dream but a reality, and I am participating actively, meeting product managers and speaking in one of the keynotes next Wednesday.

The first day it’s been very interesting…many new stuff.  The conference started with 3 words: DATA, DATA and…DATA! Since the blog is called Bluemix Analytics…I think the first to highlight are the new Bluemix Services for Cloud Analytics.

dataWorksdashDB

– IBM DataWorks:

-IBM dashDB: The dashDB service is a data warehousing and analytics solution. You can quickly move your data into a next-generation columnar in-memory database, start running complex analytical queries with in-database algorithms, and integrate with R language and other analytic and business intelligence tools. Powered by IBM BLU Acceleration.

The first two are already available, and Watson Curator is anothe one that will be available next month. Then we’ve seen a lot of Watson Analytics and you can subscribe to the beta here: http://www.ibm.com/analytics/watson-analytics/

Regarding SPSS, it is exciting to talk about the new offering of IBM SPSS Gold on Softlayer, available in the cloud, and new interesting features for IBM Cognos BI. I have also seen the SPSS service for Bluemix in action, using SPSS as a service is the coolest way to consume your predictive models!

IMG_1355