Christian Schalk – Google
Google’s New Cloud Technologies
- google storage for developers
- api compatible with amazon s3
- prediction api (machine learning)
- bigquery
Google Storage
- store your data in google’s cloud
- any format, any amount, any time
- you control access to your data
- private, shared, public
- access via google apis or third party tools/libraries
- sample use cases
- static content hosting, e.g. static html, images, music, video
- backup and recovery
- sharing
- data storage for applications
- e.g. used as storage backend for android, appengine, cloud based apps
- storage for computation
- bigquery, prediction api
Google Storage Benefits
- high performance and scalability
- backed by google infrastructure
- strong security and privacy
- control access to your data
- easy to use
- get started fast with google and third party tools
Google Storage Technical Details
- restful api
- get, put, post, head, delete
- resources identified by uri
- compatible with s3
- buckets — flat containers
- objects
- any type
- size: 100 gb / object
- access control for google accounts
- for individuals and groups
- two ways to authenticate requests
- sign request using access keys
- ???
Performance and Scalability
- objects of any type and 100GB/object
- unlimited numbers of objects, 1000s of buckets
- all data replicated to multiple US data centers
- leveraging google’s worldwide network for data delivery
- only you can use bucket names with your domain names
- read-your-writes data consistency
- range get
Security and Privacy Features
- key-based authentication
- authenticated downloads from a browser
Getting Started with Google Storage
- go to http://code.google.com for basic info
- http://code.google.com/apis/storage (currently in preview mode)
- getting started guide, docs, etc.
- can sign up for an account
- command line tool available — gsutil — low-level access from the command line, scripting
- google storage manager — web-based tool for managing google storage
Google Storage Usage Within Google & Early Adopters
- google bigquery
- google prediction api
- google.org — imagery
- google patents
- panoramio
- picnik
- vmware
- US Navy
- theguardian
- socialwok
- xylabs
- etc.
Pricing
- storage: 0.17/gb/month
- also costs for up/downloads
- similar pricing to amazon s3
- preview in US
- 100GB free storage and network from google per account
- sign up for waitlist at http://code.google.com/apis/storage
- non-US preview available on case-by-case basis
Google Prediction API
- google’s sophisticated machine learning technology
- available as an on-demand restful http web service
- provide a bit of text and “train” the algorithm in the service to predict outcomes based on patterns
- simple example: language detection
- provide series of examples of english, spanish, french, etc. and train the prediction api to recognize the language
- endless number of applications
- customer sentiment
- transaction risk
- etc
Prediction API Examples
- predict and respond to emails in an automated way
Using the Prediction API
- three step process
- upload training data to google storage
- build a model from your data
- make new predictions
Training
- POST prediciton/v1.1/training?data=mybucket…
- can respond when the prediction engine is ready and gives an estimate of accuracy
Predict
- apply the trained model to make predictions on new data
- returns json data
- includes scores indicating confidence of prediction
Prediction API Capabilities
- data
- input features: numeric or unstructured text
- output: up to hundreds of discrete categories
- Training
- many machine learning techniques
Prediction Demo
- cuisine predictor
- spreadsheet of type of food (e.g. mexican, italian, french) and food description as training data
- upload spreadsheet to google data storage
- kick off training process, then can check to see if it’s done
- pretty accurate predictions even on a limited training dataset
Google BigQuery
- also resides on top of google storage
- can have large amounts of data that you can quickly analyze using sql-like language
- fast, simple to use
Use Cases
- interative tools
- spam
- trends detection
- web dashboards
- network optimization
Key Capabilities
- scalable to billions of rows
- fast–response in seconds
- simple–queries in sql
- webservice based–rest, json
Using BigQuery
- upload to google storage
- call bigquery service to import raw data into bigquery table
- perform sql queries on table
Security and Privacy
- google accounts
- oauth
- https
Tools
- bigquery shell utility available — just type sql commands and get responses back
- can tie in a google spreadsheet and point it to a bigquery table