Udacity Full Stack Web Development

Today is my birthday and I gift myself with the course Full Stack Web Development on Udacity.

What is Udacity? Is it a for-profit educational organization that provides massive open online courses (MOOCs). There are many interesting on courses out there by companies like Google, Cloudera, salesforce, at&t, autodesk….


For those followers of my blog, you might be asking why I do web development now when I am mostly focused on Data Mining and Analytics….well it turns that I have many ideas and I would like to start building some prototypes by myself and the lack of skills for web development is stopping me…and I decided to fix it!

I consider myself a self-learner, and I love the flexibility that MOOCs provide. I’m improving my R skills doing the Data Science specialization on Coursera by the Johns Hopkins University. I am already in the course 7 (this month about Regression Models).

Never stop learning!!!!!

FCBarcelona vs Real Madrid Players Twitter Popularity

I created a simple visualization showing the number of followers of each of the players of FCBarcelona and Real Madrid.

In this case the result is a simple bar chart. Barcelona fans, don’t get mad, Messi doesn’t have Twitter account…he is more active on Facebook and I will generate the same kind of report based on Facebook data.

Below you will find the source code to do all this and in the coming days I will illustrate how to get exactly the same results but using only SPSS, some new R nodes and without coding at all. So…stay tuned!

Link to Full Screen Chart

Continue reading

SPSS Node to plot interactive maps

IBM SPSS Modeler already includes map capabilities but far away of being perfect. Now we can create beautiful maps in a matter of seconds and all in the same SPSS Modeler workbench thanks to the integration of SPSS Modeler with R Programming Language. The extensions are available in the SPSS Modeler Marketplace that we launched last week and they are free.

In the maps you can use the same color for all points or use a legend column to specify a color code. This legend may be categorical or continuous. Several color palettes are available (sequential, divergent, qualitative or monochrome) covering all possible use of the node.

More precisely, this node generates an HTML file which can be saved to a specific directory and/or opened in the default browser on execution. This html page is an interactive map, that is to say you can move, zoom in and out, etc. The R package used is called PlotGoogleMaps.

Download extension: Plot Spatial Data

Continue reading

Watson Analytics giving insights on the Titanic Survival

Yesterday I posted a post about how to solve the Titanic Survival challenge that you can find in the Kaggle website. Today I want to show for first time IBM Watson Analytics and how rapidly get some insight on this same use case.

IBM Watson Analytics is available as a service and you can use it with Freemium license.  It combines search, content analytics, and cognitive computing. Just upload your dataset, in this case the train.csv, select the target (survival) and Watson Explorer is going to start analyzing this dataset and give you 360-degree information.

Get started for free here: http://watsonanalytics.com

Here you have some nice screenshots:

WatsonExplorerMain Continue reading

Connect Google BigQuery to IBM SPSS Modeler using JDBC with R

If you want to mine your data using IBM SPSS Modeler and your data is stored in the Google Cloud, you can do it and I will show you how in this post.

I am not going to explain how valuable is to use the cloud, and how cool is to set up an Hadoop Cluster using IBM, Amazon, Google or any other’s cloud. In seconds you can have your infrastructure ready to use. So if you are dealing with big amounts of data you might need to mine it…and for this…the best to use is IBM SPSS Modeler!

There are (in my opinion), four different ways to connect to Google BigQuery and IBM SPSS Modeler:

1. R –>There is a package called bigrquery https://github.com/hadley/bigrquery

You cannot use it yet in Modeler 16 because it uses R 2.15 and you need 3.1 to install this package. But for the next release we will be able to install this package and connecting to BigQuery and create an extension node will be very easy.

2. JDBC –> There is a JDBC Open Source driver to do that, and you can find it here: https://code.google.com/p/starschema-bigquery-jdbc/

I created an  extension  for SPSS Modeler using R to connect toBigQuery through JDBC. It is less direct than using the bigrquery package of the previous point, but still quite easy to do. Here you can see how it looks like, using the Custom Dialog Builder I created the user interface and it is as easy as selecting your projectID, UserID, KeyID and then writing the Query.


3. If you are willing to pay, there are some companies that developed ODBC Drivers to connect to BigQuery: http://www.simba.com/connectors/google-bigquery-odbc

4. The best way might be using IBM SPSS Analytic Server, but BigQuery is not yet supported (but should be possible to implement).

Continue reading