Channel Daily News

Google introduces a service to boost big data pipelines

At the Google IO conference in San Francisco, the search engine powerhouse told attendees, mostly cloud developers, about Google Cloud Dataflow, a service for developers to help them build data pipelines for mobile apps.

Also announced were new cloud platform tools for debugging, tracing and monitoring cloud applications.

In a blog post, Greg DeMichillie, a member of Google’s cloud platform team, said that the company invented MapReduce about a decade ago to process massive datasets using distributed computing.

Jump 10 years to today and Google has replaced MapReduce with Cloud Dataflow. DeMichillie says more devices and information require more capable analytics pipelines. The problem comes where they become difficult to create and maintain.

Cloud Dataflow from Google comes as a managed service and it consumes, transforms and analyzes data in both batch and streaming modes.

According to DeMichillie, Cloud Dataflow enables users to attain actionable insights from data while lowering operational costs without the complexity of deploying, maintaining or scaling infrastructure.

“You can use Cloud Dataflow for use cases like ETL, batch data processing and streaming analytics, and it will automatically optimize, deploy and manage the code and resources required,” DeMichillie said in his blog post.

DeMichillie has spent his entire career working on developer platforms for Web, mobile and the cloud.

The other major announcement from the Google IO show are the new cloud platform tools for debugging, tracing and monitoring cloud apps. These new tools are for the developer community in the channel for diagnosing systems in production.

For example, the Google Cloud Monitoring tool finds and fixes unusual behaviour across the apps stack. The cloud platform tools were created using technology from Stackdriver, a recent Google acquisition.

Cloud Monitoring provides rich metrics, dashboards and alerting for Cloud Platform, as well as more than a dozen popular open source apps, including Apache, Nginx, MongoDB, MySQL, Tomcat, IIS, Redis, and Elasticsearch, DeMichillie said.

The Cloud Monitoring tool also identifies and troubleshoots cases where users are experiencing increased error rates connecting from an App Engine module or slow query times from a Cassandra database with minimal configuration.

The Google Cloud Trace tool visualizes and logs the time spent by app, while comparing performance between various releases using latency distributions.

The Cloud Debugger tool besides debugging apps in production gives a full stack trace and snapshots of all local variables for any watchpoint.

DeMichillie claims that this “brings modern debugging to cloud-based applications.”

The Google Cloud Save tool gives users an API for saving, retrieving, and synchronizing user data to the cloud and across devices without needing to code up the backend. Data is stored in Google Cloud Datastore, making the data accessible from Google App Engine or Google Compute Engine using the existing Datastore API. Google Cloud Save is currently in private beta and will be available for general use soon.