Post Page Advertisement [Top]

Howdy!

I've been working lately in developing an application that consumes lots of data from several Open Data portals and before doing something with it, stores this data in Google Fusion Tables. That said, this post depicts the details you might need when trying to consume Google services from a server application.


Land Ho!

Google Services

As you may know already, there are APIs that allow consumption of Google services programmatically, for instance, from a Web application that wants to get the Google+ posts from a user. Every user that has a Google Account can access a web site to have a grasp on the available services. This console allows the user to activate every of these services, but keep in mind that some of them are not free.

Google APIs  dashboard



There are a ton of services, you can check them in the Services tab, and you'll need to activate them in order to be able to access them by using the API. Besides, it is recommended to create a developer account instead of using your normal Google Account.


Authentication

Mostly all Google Services require some kind of authentication, but it all comes down to this:

  • To access public resources, such as public Google+ posts, public Drive documents or, in my case, public Fusion Tables, an API key is enough.
  • Any other type of access requires Oauth authentication, meaning that you'll have to ask for permission to the owner of that data.
That said, let's have a look at both approaches.

API Key

An API Key is a hash that identifies you application. To generate it, you only need to go to the API dashboard and select API access menu:

API access menu

Then, select the Generate New Key option, insert required data and you're good to go. I'll show an example on how to use this key in just a minute.

API Key generation

Oauth authentication

For those of you that don't know what Oauth is, I recommend this primer, but to put it simple, its a type of authentication that does not involve asking permanently for user credentials, which is, by nature, dangerous. Instead, Oauth promotes asking for a temporary permission to the user to access whatever data that will expire shortly and that only allows access to some data, not full "administrative" privileges. This temporary permission comes in the form of a Token.

So if your app wants to access users data from apps like Google+, Facebook, Twitter, etc, and you are using Oauth (which is, by the way, mandatory nowadays in most cases) you'll have to ask the user if he/she allows your app to access (temporarily) his/her data for whatever your app does. If the user approves, your app will be given a Token that you must keep for as long as your app wants to access users data. Normally this token has an expiration date, thus granting the access to users data is not permanent.

This is also known as the "Oauth dance", and can come in the form of several flavours (2-legged, 3-legged) but I won't go into details, is out of the scope of this post. While Oauth works great if there is a real user it can be problematic if what you want to do is app to app authentication, which is my case here, I want my app to access Google Fusion Tables services. To solve this, Google provides Service Account, a type of Oauth account that allows you to authenticate your app using Oauth without physical user intervention.

That said, to create a Service Account you need to access the API dashboard again and click on create "Create an Oauth 2.0 Client ID" button.

Oauth ID creation
Then, introduce data regarding your app: name, logo, etc.

Application data

Next, select Service Account as the client ID and click on Create:

Service Account type

Next window will ask you to download the private key. This is important, without this file you won't be able to authenticate. Keep it safe.

Download Key
Once generated you can check the Client ID data, the most important field being the account email, since it is required for accessing Google Services.

Service Accout email

All set to start working and gather data from Google services, Fusion Tables in this case.

Using Fusion Tables


Fusion Tables is a Google Service that basically, allows you to create data tables in "the cloud". Pretty much as you would do with a typical database engine (Oracle, MySQL, etc), but without having to deal with the typical server side annoyances. Besides, Fusion Tables allows you quickly integrate cool widgets like Google Maps, Google Charts, Google Earth, etc., with the data stored in tables. Also, you can easily merge tables using shared columns.

That said, Fusion Tables has a very easy to use UI that gives access to all these functionalities, but also provides a nice API that you can use to access its functionalities by coding.

Depending on the nature of the data your application tries to access, you might require simple or Oauth authentication. More precisely, access to tables that are marked as "exportable" or public, do not require Oauth authentication, even if you did not create those Tables on the first place. Obviously, you're not able to write those Tables, only read permissions are granted using API key authentication.

On the other hand, access to Tables that are private, require Oauth authentication yes or yes. If the Tables to be read/written are from a Google physical user, your application will have to wait for user intervention to access this data. Otherwise, (like my case) you will use a Service Account to authenticate. Let's see how to authenticate to use Fusion Tables services with both approaches.

API Key authentication

API Key authentication is the simplest way to authenticate. It only requires you to append your API key at the end of URL request in the following form:

https://www.googleapis.com/fusiontables/v1/query?sql=[SQL]&key=[your API key here]

The following snippet shows how to use this approach, using Jersey for making the Http Request. NOTE: Keep in mind that I'm using Java in all this post.

Client client = Client.create();
String g = null;
try {
 g = URIUtil.encodeQuery("https://www.googleapis.com/fusiontables/v1/query?sql=SELECT * FROM [TableID here ]&key=[API Key here]");
} 
catch (URIException e) { 
 e.printStackTrace();
}
  
WebResource webResource = client.resource(g);
String s = webResource.get(String.class);
System.out.println(s);  

That's it, all you need to read public Fusion Tables is that bit of code...simple, huh?

Oauth 2.0 authentication

Oauth is a bit more complex, but all the complexity comes in the form of misleading documentation and specially, not easy to find working samples. Thankfully enough, I found some samples that will make this process quite easier. The problem with Fusion Tables (I think) is that it is a quite new Google Service, thus, whatever you ask, you probably won't get many responses, as you can see here, here and...here.

Know the feeling when you want to nuke half the planet? Yeah, exactly

If you are curious, the working examples available from Google can be found here. I think there are equivalents for other languages (PHP, Python) but I cannot guarantee that. This example is quite useful, altough I had to modify it because I could not authenticate with it directly. You'll need Mercurial to get these samples, or you can just download my example, in any case, they both require you to have Maven installed in Eclipse (which is a good idea by the way :D).

That said, let's have a look at the sample code from Chris Junk. First, we need some mandatory fields:

  /**
   * Be sure to specify the name of your application. If the application name is {@code null} or
   * blank, the application will log a warning. Suggested format is "MyCompany-ProductName/1.0".
   */
  private static final String APPLICATION_NAME = "your.app.name";
  
  /** E-mail address of the service account. Something in the form xxxxx@developer.gserviceaccount.com*/
  private static final String SERVICE_ACCOUNT_EMAIL = "Enter your email here...";
  
  /** Global instance of the HTTP transport. */
  private static HttpTransport HTTP_TRANSPORT;

  /** Global instance of the JSON factory. */
  private static final JsonFactory JSON_FACTORY = new JacksonFactory();

  /**
   * Used to interact with Fusiontables API
   */
  private static Fusiontables fusiontables;
  
  /**
   * Used to store tables id for deletion once completed
   */
  private static ArrayList tablesId;

The most important fields are the app Name, that is suggested to be the same one you specified when creating the Service Account, altough I used different one and still works. Also the associated email is mandatory. Lastly, the Fusiontables object is the key element that will help us interact with Fusion Tables service (both for authentication and accessing data). Now let's see how to authenticate:

  HTTP_TRANSPORT = GoogleNetHttpTransport.newTrustedTransport();      
  // check for valid setup
  if (SERVICE_ACCOUNT_EMAIL.startsWith("Enter ")) {
      System.err.println(SERVICE_ACCOUNT_EMAIL);
      System.exit(1);
  }
  String p12Content = Files.readFirstLine(new File("key.p12"), Charset.defaultCharset());
  if (p12Content.startsWith("Please")) {
      System.err.println(p12Content);
      System.exit(1);
  }
  // service account credential (uncomment setServiceAccountUser for domain-wide delegation)
  GoogleCredential credential = new GoogleCredential.Builder().setTransport(HTTP_TRANSPORT)
                                                              .setJsonFactory(JSON_FACTORY)
                                                              .setServiceAccountId(SERVICE_ACCOUNT_EMAIL)
                                                              .setServiceAccountScopes(FusiontablesScopes.FUSIONTABLES)
                                                              .setServiceAccountPrivateKeyFromP12File(new File("key.p12"))
                                                           // .setServiceAccountUser("user@example.com")
                                                              .build();

First we check that a valid email is available. Second, remember that private key file you downloaded when creating the Service Account? Is that file, drop it in the root folder of this project, name it key.p12 and you should be good to go. Lastly, we create the GoogleCredential object, specifying (among other things), the Service Account email, the Google Service we want to use and the private key file. The GoogleCredential object stores the auth Token and has some cool functionalities like token expiration auto-detection.

Next step is creation of the FusionTables object, specifying the GoogleCredential we created earlier.


  // set up global FusionTables instance
  fusiontables = new Fusiontables.Builder(HTTP_TRANSPORT, JSON_FACTORY, credential).setApplicationName(APPLICATION_NAME).build();
  // run commands
  listTables();
  String tableId = createTable();
  insertData(tableId);
  showRows(tableId);
  deleteTable();
  View.header("All operations completed successfully!!!!");


Simple enough, lets take a closer look at the other methods:


  
  /** List tables for the authenticated user. */
  private static void listTables() throws IOException {
    View.header("Listing My Tables");

    // Fetch the table list
    Fusiontables.Table.List listTables = fusiontables.table().list();
    TableList tablelist = listTables.execute();

    if (tablelist.getItems() == null || tablelist.getItems().isEmpty()) {
      System.out.println("No tables found!");
      return;
    }

    tablesId = new ArrayList();
    for (Table table : tablelist.getItems()) {
      tablesId.add(table.getTableId());
      View.show(table);
      View.separator();
    }
  }


As you can see, the Table object encapsulates the required objects to interact with FusionTables. In listTables, we are obtaining a List object, which in the end represents an HttpRequest, that we must launch by calling execute(). First run of this method should return an empty list.
  
  /** Create a table for the authenticated user. */
  private static String createTable() throws IOException {
      View.header("Create Sample Table");
      // Create a new table
      Table table = new Table();
      table.setName(UUID.randomUUID().toString());
      table.setIsExportable(false);
      table.setDescription("Sample Table");

      // Set columns for new table
      table.setColumns(Arrays.asList(new Column().setName("Text").setType("STRING"),
                                     new Column().setName("Number").setType("NUMBER"),
                                     new Column().setName("Location").setType("LOCATION"),
                                     new Column().setName("Date").setType("DATETIME")));

      // Adds a new column to the table.
      Fusiontables.Table.Insert t = fusiontables.table().insert(table);
      Table r = t.execute();
      View.show(r);
      tablesId.add(table.getTableId());
      return r.getTableId();
  }


Create a table with four columns, each one having the four available data types: STRING, NUMBER, LOCATION and DATETIME.

  
  /** Inserts a row in the newly created table for the authenticated user. */
  private static void insertData(String tableId) throws IOException {
    Sql sql = fusiontables.query().sql("INSERT INTO " + tableId + " (Text,Number,Location,Date) "
        + "VALUES (" + "'Google Inc', " + "1, " + "'1600 Amphitheatre Parkway Mountain View, "
        + "CA 94043, USA','" + new DateTime(new Date()) + "')");

    try { 
      Sqlresponse response = sql.execute();
      View.header("Executed query to insert data");
      System.out.println("Executed query is " + sql + "\n");
      System.out.println("Results from query are:" + "\n");
      System.out.println(response.toPrettyString());
      
    } 
    catch (IllegalArgumentException e) {
      System.out.println(e);
    }
  }


Insert a row using a SQL statement. Response comes in the form of a SqlResponse object. This insertion method can cause some problems that I will explain later, but works fine for small data sets.

  
  /**
   * Shows the rows from specified table
   * @param tableId The table ID
   * @throws IOException
   */
  private static void showRows(String tableId) throws IOException {
      View.header("Showing Rows From Table");
      Sql sql = fusiontables.query().sql("SELECT * FROM " + tableId);
      try {
          Sqlresponse response = sql.execute();
          View.header("Showing rows from last query");
          List<list bject="">&gt; rows = response.getRows();
          Iterator<list bject="">&gt; rowsIterator = rows.iterator();
      
          while (rowsIterator.hasNext()){
              System.out.println("Next Row elements");
              System.out.println("*****************");
              List<object> currentRow = rowsIterator.next();
              Iterator<object> rowIterator = currentRow.iterator();
              StringBuilder buffer = new StringBuilder();
              while (rowIterator.hasNext()){
                  Object currentRowElement = rowIterator.next();
                  buffer.append(currentRowElement.toString());
                  buffer.append("|");
              }
              System.out.println(buffer.toString());
          }
      }  
      catch (IllegalArgumentException e) {
          System.out.println(e);
      }
  }


This method executes a query to the previously created table and iterates the resultSet to display the data rows, nothing really complex.

  
  private static void deleteTables() throws IOException {
      View.header("Delete All Tables");
      // Deletes all tables
      for (String id : tablesId) {
          Delete delete = fusiontables.table().delete(id);
          System.out.println("Deleting table with ID " + id);
          delete.execute();
          System.out.println("Latest status code from deletion is " + delete.getLastStatusCode() );
          System.out.println("Latest status message from deletion is " + delete.getLastStatusMessage() );
      }
  }


Simply iterate the collection of stored tables ID and delete each table (if any). If successful, you should get Status Code 204. You can test this project downloading it here.

Performance and usability considerations


While the previous example is good for small datasets, you must take into account several limitations from Google Fusion Tables before you consider using it in a serious, large-scale project. Those limitations, as far as I know, are:
  • 25,000 requests/day
  • 250MB space per table
  • A cell of data in Fusion Tables supports a maximum of 1 million characters
  • Maximum data size for a single HTTP request is 1MB
  • Total number of table cells being added cannot exceed 10,000 cells (not rows!)
  • A single request may contain up to a maximum of 500 INSERT statements

This may suffice for most cases, but in my case, I encountered a problem with the 25.000 requests/day limitation. Using the insertion method from the previous example creates a request for every single data row you want to insert. Thus, if you plan in creating tables with huge number of files, you'll end up reaching this limit real soon.

The are two ways to solve this basically, first would be in "packing" hundreds of INSERTS in every request you make. In my case, I had to insert more or less 65K rows in a Table, so 65000/500 = 130, I can make 130 "packs" of INSERT statments resulting in only 130 HTTP request, way below the threshold.

To illustrate this, I created a helper class to provide a higher level of abstraction when using Google's Oauth and Tables API, the following snippet shows the insertRows method:


 /**
  * Inserts a set of rows into the specified table, taking into account the Fusion Table Limits:
  * @param  tableId   The table ID in which data will be inserted
  * @param sqlStatements The SQL statemenst to insert data, this has to be an INSERT
  * @return TRUE if insertion was correct 
  * @throws IOException If an error occurs trying to insert data
  */
  public boolean insertRows(String tableId, ArrayList<String> sqlStatements) throws IOException {   
      if (authenticated){
          //check fields are okay
   if (tableId != null){
       if (!tableId.isEmpty()){
           if (sqlStatements != null){
        if (!sqlStatements.isEmpty()){
            //check that the provided tableId exists in hashMap
            if (tables.containsKey(tableId)){
         Logger.debug(DEBUG_TAG, "Number of rows to insert is: " + sqlStatements.size());
         //check if the INSERT statemens are more than 500
         if (sqlStatements.size() > 500){
             //use guava to partition the list in smaller lists of 500 elements max (last one probably will be smaller)
      List<List<String>> bulkList = Lists.partition(sqlStatements, 500);
      //Now iterate the meta-list and for each list
      Iterator<List<String>> iterator = bulkList.iterator();
      String sqlStatement = new String();
      while (iterator.hasNext()){
          StringBuffer insertBuffer = new StringBuffer();
          //Pack all the strings into ; separeted INSERTS
          List<String> insertList = iterator.next();
          Iterator<String> insertIterator = insertList.iterator();
          while (insertIterator.hasNext()){
              String currentInsert = insertIterator.next();
       insertBuffer.append(currentInsert);
       insertBuffer.append(";");
          }
          //Store the pack of INSERTs
          sqlStatement = insertBuffer.toString();
          //Execute the query
          HttpContent content = ByteArrayContent.fromString(null, "sql=" + sqlStatement);
          GenericUrl url = new GenericUrl(BASE_URL);
          HttpRequest httpRequest = fusionTables.getRequestFactory().buildPostRequest(url, content);
          try { 
              HttpResponse response = httpRequest.execute();         
       Logger.debug(DEBUG_TAG, response.toString());
                
          } 
          catch (IllegalArgumentException e) {
              Logger.error(DEBUG_TAG, "There was an error executing SQL statement to insert row. Aborting");
       return false;
          }
    }
    //All rows inserted correctly
    Logger.info(DEBUG_TAG, "All rows have been inserted correctly!");
    return true;
       }       
          }
          else{
              Logger.error(DEBUG_TAG, "There is no table with ID " + tableId + ".Create the table first.");
       return false;
          }
      } 
      else{
          Logger.error(DEBUG_TAG, "Provided SQL statements are empty. Cannot insert data.");
          return false;
      }
         }
         else{
             Logger.error(DEBUG_TAG, "Provided SQL statements are invalid. Cannot insert data.");
      return false;
         }
     }
     else{
         Logger.error(DEBUG_TAG, "Provided table ID is empty. Cannot insert data.");
         return false;
     }
         }   
         else{
            Logger.error(DEBUG_TAG, "Provided table ID is invalid. Cannot insert data.");
     return false;
         }
      }
  }


The main difference with this approach is that we are constructing the HTTP request manually, instead of using the Sql Class from the previous example. This allows us to insert the SQL statements in the BODY of the request, not in the URL, which won't work and that's what the Sql class does.

  //Execute the query
  HttpContent content = ByteArrayContent.fromString(null, "sql=" + sqlStatement);
  GenericUrl url = new GenericUrl(BASE_URL);
  HttpRequest httpRequest = fusionTables.getRequestFactory().buildPostRequest(url, content);

You can see the full topic about this here. The second and more elegant and probably fastest way to do it is to use the ImportRows class. This class is used to import big data sets, directly providing the InputStreams, normally in the form of CSV files. Having a look at the Javadocs, provides a constructor:


  protected Fusiontables.Table.ImportRows(String tableId,
                                        AbstractInputStreamContent mediaContent)


That allows passing an InputStream (the CSV files normally). I haven't still used this approach, but I will definitely try it. If you can reduce the number of HTTP request to just one, this probably means an increase in performance. And that's it, hope you find it useful and reduces your time to get something working with FusionTables and Oauth Google APIs (provided you use Java).

No comments:

Post a Comment

Bottom Ad [Post Page]