Grails + CouchDB #s2gx

Scott Davis – thirstyhead.com

NoSQL Databases in General

  • given the number of big companies using them, clearly they're ready to use today
  • time to re-examine our unnatural obsession for relational databases
  • rdbms has been around for 50 years now–well understood, great tooling, lots of information
  • rdbmses are silos
    • still good at what they do, but aren't necessarily well-suited to all data
  • as developers we're being forced to use sql to express something that's crucial to the success of your application
    • not our native language, kind of foreign when it comes down to it
  • we use orm to insulate ourselves from sql
    • express yourself in the native language of your choice instead of in sql

Is ORM State of the Art?

  • really just a bridge
  • why aren't there pure java or groovy datastores?
  • persistence is pretty uninteresting to developers
  • orm is a reasonable bridge, but a rather leaky abstraction as well
  • ted neward: orm is the vietnam of computer science
    • "[ORM] represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear win conditions, and no clear exit strategy."

What Drew Me to CouchDB

  • what if i didn't have to bridge technologies anymore?
  • what if i could save my objects in their native format?
    • couchdb is actually a json datastore, but grails makes it trivial to transfer pogo <-> json
  • just need a thin translation layer

NoSQL Solutions

  • Google BigTable
  • mongoDB
  • CouchDB
  • Cassandra
    • "this is the future, but no one believes us"
  • each one of these are a bit different and each has their strengths and weaknesses
  • NoSQL = "not only SQL"
  • don't think of nosql solutions as just another database; truly different way to think about persistence
  • if you think of it as just another database, it'll be the worst database you've ever used
  • need to get out of the mindset of "spreadsheet" type format for data
  • start thinking more about the right tool for the job

CouchDB History

  • starting point was Lotus Notes
    • largely ahead of its time
    • document database
    • not brand-new stuff–ideas and foundation has been around for a very long time
  • Apache project

RDBMS vs. CouchDB

  • rdbms
    • row/column oriented
    • language: sql
    • insert, select, update, delete
  • CouchDB
    • if your data has a more vertical orientation as opposed to horizontal, starts to look more like attachments
    • email is a good example: to, from, body, attachment
    • language: javascript (map/reduce functions)
    • put, get, post, delete (REST)
    • "Django may be built for the Web, but CouchDB is built of the Web." — Jacob Kaplan-Moss, Django Developer
    • can build entire apps in CouchDB
  • Couch = acronym for "cluster of unreliable commodity hardware"
  • clustering is much more difficult to do clustering–couch was built from the ground up to be massively distributed, clusters out of the box
  • O'Reilly book available — free online

Using CouchDB With Grails

  • grails has native json support out of the box

import grails.converters.* class AlbumController { def scaffold = true def listAsJson = { render Album.list() as JSON } def listAsXml = { render Album.list() as XML } } CouchDB 101

  • json up and down
  • restful interface
  • no drivers since it's just http
  • written in erlang
    • incredibly fast
    • designed for scalability and parallel processing

Installing CouchDB

  • sudo apt-get install couchdb
  • windows installer available

Kicking the Tires

  • ping
    • curl http://localhost:5984
      {"couchdb":"Welcome","version":"1.0.1"}
    • can also hit this in a browser, but of course can't do a POST from a URL in a browser
  • get databases
  • create a database
  • delete a database
  • uses standard HTTP response codes, e.g. a 201 response code for a database create
  • web UI available – "Futon"
  • create a document
  • create a document from a file
  • URIs for documents are essentially your primary key–unique way of representing the document
  • don't have to create schemas — just start throwing documents at the database
  • documents get etags so they're very cache friendly
  • documents also get revisions–keeps tracks of multiple versions of the document
    • have to provide version number when updating
    • versioning numbers are revision number (integer), then -, then md5 hash of the document itself
    • can explicitly compress the database to get rid of old versions to reduce size of database
  • couch prefers uuids for the ids, but you can use anything you want
  • get UUID(s) from couch
  • to update a document, you'll get the latest version of the document, then do the update, then pass your changes back to couchdb which includes the revision number
  • one of the major things couchdb gives you since it's document based is that the data is accurate at that point in time
    • if the data changes in the future, in an rdbms the old document would get the new data

CouchDB With Grails

  • domain class–id and _rev as properties
  • can add couchdb stuff to Config.groovy to do stuff like create-drop for couchdb databases
  • add stuff to BootStrap.groovy
  • showing CouchDBService that has convenience methods around a lot of the URL calls to couch

Map/Reduce

  • in sql you say select firstname, lastname from foo (this is map) where state = 'NE' (this is reduce)
  • map and reduce are stored in 2 separate javascript functions

Polyglot Web Development With the Grails Framework #s2gx

Jeff Brown – SpringSource

Polyglot?

  • "many languages"
  • writing software in multiple languages
  • some people would say if you do any web development, you're doing polyglot
    • javascript, css, html, java, etc.
  • in the context of this talk, we'll be talking about implementing the actual business logic with multiple languages

Languages on the JVM

  • 200+ languages available on the JVM
  • many of these aren't exactly practical, but many are
  • at least 10-12 programming languages available on the JVM that could be used for serious development
  • big players are java, groovy, clojure, scala, jruby, jython
  • which of these is the best? no answer of course
    • personal preference, best tool for the job, etc.
  • many of these languages solve specific problems really well
  • all these languages are turing complete, so anything you can do in one you can do in another
  • but depending on the problem you're trying to solve, you may find one language or another is ideally suited to the task at hand
  • reached a point with CPU development where the speed of light is a factor in terms of increasing the speed
    • can't really make processors any faster with the current method of developing processors
    • instead of making faster processors, we're using more processors and multiple cores
  • concurrency is becoming more and more important
  • OO languages don't lend themselves to managing concurrency very well
    • allocate objects on the heap, objects in a shared mutable state
    • best we can do in OO languages is use locks so multiple threads can't access things at the same time
    • problem with locking is it's opt-in
  • functional languages make the concurrency problem almost disappear
    • no such thing as destructive assignment in a pure functional language
    • clojure and scala *do* allow destructive assignment
    • but in clojure, for example, you have to do this in a transaction
      • get a snapshot of the heap, and nothing can change on the heap while you're making changes
  • because of the advantages in terms of concurrency, it will be more common to use polyglot programming moving forward
    • e.g. write much of the application in groovy, but build parts of the application using a functional language
  • ultimately all this will run on the jvm, but we can take advantage of the best things each language has to offer
  • pretty different from the past–would have been rather unusual to write a C++ program that had other languages mixed in

Grails?

  • full stack MVC platform for the JVM
    • build system down to ORM, etc.
  • leverages proven staples
    • spring, hibernate, groovy, quartz, java, sitemesh
  • extensible plugin system
    • e.g. can pull out hibernate and use a different persistence mechanism, or can write your own plugins
  • since grails is built on the jvm you can take advantage of any language that will run on the jvm

Demo

  • showing how to write a grails app that uses a groovy controller, but a "math helper" class can be written in groovy, or java, or clojure
  • as long as there's a bean in the spring context, regardless of language, it can be injected into grails controllers
  • the grails controller doesn't care what language the classes it uses are written in

Clojure

  • core grails doesn't have clojure support, but there's a clojure plugin
  • plugin creates an src/clj file for clojure source
  • code in the grails controller doesn't have to change to take advantage of the math helper written in the different languages


(ns grails)
(defn addNumbers [x y]
  (+ x y ))

Using the Clojure Plugin

  • call via clj.mathHelper.addNumbers(x, y)
  • clj is the same as calling getClj()
  • the cloure plugin adds the getCjl method to all of the grails classes
    • classes*.metaClass*."getClj" = { return proxy } — this is done in the withDynamicProperties method in the plugin config
    • the proxy is an instance of the clojure proxy class (grails.clojure.ClojureProxy)
  • no addNumbers method in the proxy class — uses methodMissing
  • plugin looks for clojure methods in the grails namespace by default
    • if you need another namespace, clj['mynamespace'].methodName() will handle this
  • clojure plugin declares that it wants to watch all the files in src/clj
    • gets notified when files change and compiles the files if they change
  • if your plugin is adding things to the metaclasses on application startup, then you need to make sure your plugin also modifies controllers, services, etc. as they're changed while the application is running
  • can also observe only specific plugins for changes, e.g. only notify me when something involved with hibernate changes
  • can swap out other view technologies in grails
  • for taglibs, they have to be written in groovy, but inside the taglib you could be making calls to code in other languages

Who gets the credit?

  • grails? groovy? clojure? java? the jvm?
  • really it's the combination of all of them
  • don't have to walk away from grails to take advantage of any of the languages that run on the jvm

How to Analyze Your Data and Take Advantage of Machine Learning in YourApplication #s2gx

Christian Schalk – Google
Google’s New Cloud Technologies

  • google storage for developers 
    • api compatible with amazon s3
  • prediction api (machine learning)
  • bigquery

Google Storage

  • store your data in google’s cloud 
    • any format, any amount, any time
  • you control access to your data 
    • private, shared, public
  • access via google apis or third party tools/libraries
  • sample use cases 
    • static content hosting, e.g. static html, images, music, video
    • backup and recovery
    • sharing
    • data storage for applications 
      • e.g. used as storage backend for android, appengine, cloud based apps
    • storage for computation 
      • bigquery, prediction api

Google Storage Benefits

  • high performance and scalability 
    • backed by google infrastructure
  • strong security and privacy 
    • control access to your data
  • easy to use 
    • get started fast with google and third party tools

Google Storage Technical Details

  • restful api 
    • get, put, post, head, delete
    • resources identified by uri
    • compatible with s3
  • buckets — flat containers
  • objects 
    • any type
    • size: 100 gb / object
  • access control for google accounts 
    • for individuals and groups
  • two ways to authenticate requests 
    • sign request using access keys
    • ???

Performance and Scalability

  • objects of any type and 100GB/object
  • unlimited numbers of objects, 1000s of buckets
  • all data replicated to multiple US data centers
  • leveraging google’s worldwide network for data delivery
  • only you can use bucket names with your domain names
  • read-your-writes data consistency
  • range get

Security and Privacy Features

  • key-based authentication
  • authenticated downloads from a browser

Getting Started with Google Storage

  • go to http://code.google.com for basic info
  • http://code.google.com/apis/storage (currently in preview mode) 
    • getting started guide, docs, etc.
    • can sign up for an account
  • command line tool available — gsutil — low-level access from the command line, scripting
  • google storage manager — web-based tool for managing google storage

Google Storage Usage Within Google & Early Adopters

  • google bigquery
  • google prediction api
  • google.org — imagery
  • google patents
  • panoramio
  • picnik
  • vmware
  • US Navy
  • theguardian
  • socialwok
  • xylabs
  • etc.

Pricing

  • storage: 0.17/gb/month
  • also costs for up/downloads
  • similar pricing to amazon s3
  • preview in US 
  • non-US preview available on case-by-case basis

Google Prediction API

  • google’s sophisticated machine learning technology
  • available as an on-demand restful http web service
  • provide a bit of text and “train” the algorithm in the service to predict outcomes based on patterns 
  • simple example: language detection 
    • provide series of examples of english, spanish, french, etc. and train the prediction api to recognize the language
  • endless number of applications 
    • customer sentiment
    • transaction risk
    • etc

Prediction API Examples

  • predict and respond to emails in an automated way

Using the Prediction API

  • three step process 
    • upload training data to google storage
    • build a model from your data
    • make new predictions

Training

  • POST prediciton/v1.1/training?data=mybucket…
  • can respond when the prediction engine is ready and gives an estimate of accuracy

Predict

  • apply the trained model to make predictions on new data
  • returns json data
  • includes scores indicating confidence of prediction

Prediction API Capabilities

  • data 
    • input features: numeric or unstructured text
    • output: up to hundreds of discrete categories
  • Training 
    • many machine learning techniques

Prediction Demo

  • cuisine predictor
  • spreadsheet of type of food (e.g. mexican, italian, french) and food description as training data
  • upload spreadsheet to google data storage
  • kick off training process, then can check to see if it’s done
  • pretty accurate predictions even on a limited training dataset

Google BigQuery

  • also resides on top of google storage
  • can have large amounts of data that you can quickly analyze using sql-like language
  • fast, simple to use

Use Cases

  • interative tools
  • spam
  • trends detection
  • web dashboards
  • network optimization

Key Capabilities

  • scalable to billions of rows
  • fast–response in seconds
  • simple–queries in sql
  • webservice based–rest, json

Using BigQuery

  • upload to google storage
  • call bigquery service to import raw data into bigquery table
  • perform sql queries on table

Security and Privacy

  • google accounts
  • oauth
  • https

Tools

  • bigquery shell utility available — just type sql commands and get responses back
  • can tie in a google spreadsheet and point it to a bigquery table

Google App Engine for Business 101 #s2gx

How to Build, Manage & Run Your Business Applications on Google’s Infrastructure
Christian Schalk – Developer Advocate, Google

  • not really an advocacy position
  • still in engineering, but work a lot more with users directly
  • go out to companies to help them be successful

What is cloud computing?

  • lots of different definitions
  • pyramid of (bottom up): 
    • infrastructure as a service 
      • joyent, rackspace, vmware, amazon web services
      • provides cooling, power, networking
    • application platform as a service 
      • GAE falls in this category
      • tools to build apps
    • software as a service 
      • google docs, etc.

GAE

  • easy to build
  • easy to maintain
  • easy to scale 
    • appengine resides in google’s overall infrastructure so will scale up as needed
  • started with only python
  • with java support, opened the doors for java enterprise developers

By the Numbers

  • launched in 2008
  • 250,000 developers
  • 100,000+ apps
  • 500M+ daily pageviews 
    • 19,000 queries per second — has almost doubled since January

Some Partners

  • best buy
  • socialwok
  • xylabs
  • ebay
  • android developer challenge
  • forbes
  • buddypoke 
    • 62 million users
  • gigya 
    • do social integration for large media events (movie launches, sports events) — huge spikes in traffic so GAE just handles it
  • ubisoft
  • google lab
  • ilike
  • walk score
  • gigapan
  • others
  • point here is it’s very easy to drop specific apps on GAE without running litearlly everything on GAE
  • very popular among social networking apps because of easy scalability

Why App Engine?

  • managing everything is hard
  • diy hosting means hidden costs 
    • idle capacity
    • software patches & upgrades
    • license fees
  • “cloud development in a box”

App Engine Details

  • collection of services 
    • memcache, datastore, url fetch, mail, xmpp, task queue, images, blobstore, user service
  • ensuring portability — follows java standards 
    • servlets -> webapp container
    • jdo/jpa -> datasource api
    • java.net.URL -> URL fetch
    • javax.mail -> Mail API
    • javax.cache -> memcache
  • extended language support through jvm 
    • java, scala, jruby, groovy, quercus (php), javascript (rhino)
  • always free to get started
  • liberal quotas for free applications 
    • 5M pageviews/month
    • 6.5 CPU hours/day

Application Platform Management

  • download and install SDK 
    • Eclipse plugin also available
  • build app and then deploy to the public GAE servers
  • app engine dashboard
  • app engine health history 
    • shows status of each service individually across GAE as a whole

Tools

  • google app engine launcher for python
  • sdk console 
    • local version of the app engine dashboard
  • google plugin for eclipse 
    • wizard for building new app engine apps
    • can run the entire gae environment locally within eclipse
    • easy deployment to app engine servers
    • in process of building a new version of this with more features

Continuously Evolving

  • aggressive schedule for providing new features
  • may 2010 — app engine for business announced

What’s New?

  • multi-tenant apps with namespace API
  • high performance image serving
  • openid/oauth integration
  • custom error pages
  • increased quotas
  • app.yaml now usable in java apps
  • can pause task queues
  • dashboard graphs now show 30 days
  • more — see http://googleappengine.blogpost.com

Getting Started

Creating and Deploying an App

  • demoing eclipse plugin
  • can create a new Google Web Application, optionally with GWT
  • projects follow the typical java webapp structure
  • before deployment, can test/debug locally just like any Java project in eclipse
  • even the datastore is available locally for development/testing
  • new features tend to be introduced in python first, then java gets them later
  • to deploy, right click the project, choose “google,” then deploy 
    • this brings up a window where you put in your application ID and version, then uploads to the GAE servers
  • can log into GAE dashboard and configure billing with maximum charges if your app will exceed the free quotas
  • can use your own custom domains, this ties into google apps
  • can assign additional developers to GAE applications by email address
  • can deploy new versions of applications and keep the old ones as well, can toggle between versions and choose one as default

What about business applications?

  • GAE for Business
  • same scalable cloud hosting platform, but designed for the enterprise
  • not production quite yet
  • enterprise application management 
    • centralized domain console (preview available today)
  • enterprise reliability and support 
    • 99.9% SLA
    • direct support 
      • tickets tracked, phone support, etc.
  • hosted SQL (preview available today) 
    • managed relational sql database in the cloud
    • doesn’t replace the datastore–available in addition to the datastore
  • ssl on your domain 
    • current core product doesn’t offer this
  • secure by default 
    • integrated single signon
  • pricing that makes sense 
    • apps cost $8/user, up to a max of $1000 per month

Enterprise App Development With Google

  • GAE for Business
  • Google Apps for Business
  • Google Apps Marketplace
  • Firewall tunneling technology available (Secure Data Connector)

App Engine for Business Roadmap

  • enterprise admin console (preview)
  • direct support (preview)
  • hosted sql (limited release q4 2010)
  • sla (q4 2010)
  • enterprise billing (q4 2010)
  • custom domain ssl (2010 – 2011)

SQL Support

  • can run this all locally in eclipse
  • demo of spring mvc travel app running on GAE with the SQL database 
    • have to explicitly enable sessions
    • had to disable flow-managed persistence

Become an App Engine for Business Trusted Tester!

Developing Social-Ready Web Applications #s2gx

Craig Walls – SpringSource

  • working on Spring Social, which is the brains behind Greenhouse (web/mobile conference app for SpringOne)

Socializing Your Applications

  • why would you want to do this?
  • this is where your customers are–lots of people spend a LOT of time on Facebook
    • if they're there, you want to be there with them
  • Facebook–over 500 million active users
    • third largest country in the world
    • 50% log on to Facebook on any given day
    • there's even a movie about it–that says something
  • Twitter — over 100 million users
    • more than 190 million unique visitors monthly
    • more than 65 million tweets per day
  • Others: LinkedIn (80 million members), TripIt (230,000 trips planned per month)
  • More: FourSquare, YouTube (2 billion videos viewed per day), MySpace, Gowalla, Google, Flickr
  • how do you use this to better your application?
    • really depends on the customers and applications
    • don't want to make people come to you, better to interact with people where they already are
    • you can have your customers tell you things about themselves and this data would be hard to get otherwise

Types of Social Integration

  • widgets
    • facebook xfbml/js; the "like" button
      • xfbml — tag library that's interpreted on the client by javascript
    • twitter @anywhere
    • linkedin widgets / linkedin jsapi
      • jaspi resembles xfbml
  • embedded
    • facebook applications
    • igoogle gadgets
    • myspace applications
  • rest api
    • provided by virtually all social networks
    • consumed by external and embedded applications

Widgets

  • facebook connect
    • xfbml tag on page adds the login button to any page (<fb:login-button …>Connect to Facebook</fb:login>
    • demoing "find my facebook friends" functionality (<fb:multi-friend-selector …> — fbml tags that run on the server)
  • twitter @anywhere offers some javascript-based widgets, e.g. follow, connect with twitter
    • can also linkify and hovercard text–does this with a class to add the links and javascript handles adding links (hovercard is the thing that shows the little twitter profile boxes for users)
    • twitter anywhere has great examples in their documentation

Facebook Embedded Applications

  • hosted on your own servers, but look seamless when you're on facebook (look like they're part of facebook)
  • can leverage widgets, REST APIs, javascript apis, etc.
  • most often used for games, quizzes, surveys, etc.

Accessing Social Data with REST Social APIs

  • common operations
    • get user profile
    • get/update status
    • get list of friends
  • specialized operations
    • facebook: create photo album, create a note, etc.
    • twitter: create/follow a list, view trends
    • tripit: retrieve upcoming trips, view friends nearby
  • all done with restful apis
    • most support both json and xml representations

Searching Twitter RestTemplate rest = new RestTemplate(); String query = "#s2gx"; String results = rest.getForObject("http://search.twitter.com/search.json?q={query}", String.class);

  • if you want to get friends on twitter, you get the user IDs back, so you have to make another call back to get info about the user based on the user id

Facebook Graph API

  • interesting form of REST API
  • two basic url patterns
  • if you don't have an authorization key you only get very basic info back (name, gender, country)

Securing Social Data: OAuth is the key to social data

  • most social data is secured behind oauth
  • authentication takes place on social provider
  • consumer application given an access token to access user's profile
    • this gets around having to give another application your login credentials
    • also lets you revoke access for specific applications
  • consumer never knows the user's social network credentials
  • demo of trying to post a tweet without being authorized–throws a 401 error
  • when you sign in via oauth you're signing into the originating application (e.g. facebook) and then facebook tells the application "yes, the provided the correct authentication and have given you permission to do what you told them you were going to do"
    • click "connect with facebook" button from an application
    • box pops up from facebook where the user logs in and grants permissions
    • facebook then makes the connection and gives the application an access key

Comparing OAuth and OpenID

  • openid
    • primary concern is single sign-on
    • shared credentials for multiple sites
    • authentication takes place on your chosen openid server
  • oauth
    • concern is shared data
    • sign into the host application
    • host application then gives some other application access
  • if you sign on via oauth the underlying mechanism could be openid

Versions of OAuth in Play

  • OAuth 1.0: tripit
  • OAuth 1.0a: twitter, linkedin, foursquare, most others
  • OAuth 2: still in draft; early adoption by facebook (not quite full oauth 2) salesforce, gowalla, github, 37signals
    • on target to go final by the end of the year

Signing a request: OAuth 1.0a

  • construct a base string that includes …
    • the http method
    • the request url
    • any parameters (including post/put body parameters if the content type is "application/x-www-form-urlencoded")
  • encrypt the base string to create signature
    • commonly hmac-sha1, signed with api secret
    • could be plaintext or rsa-sha1 (if supported)
  • add authorization header to request

The OAuth 2 Dance — much simpler than oauth 1

  • request authorization from user
  • return to consumer with the authorization code in the request
  • exchange auth code and client secret for access token
  • return access token to consumer for use in REST API calls

Easy Facebook OAuth

  • <fb:login-button perms="email.publish_stream,offline_access">Connect to Facebook</fb:login-button>
  • offline access = the application can access your facebook account at any time
  • oauth 2 gives you the option to create an access token that will expire after a period of time
  • oauth 2 also has a renewal token so you can renew expired tokens, but facebook doesn't support renewal tokens yet
  • if you give the application the "give this app access at any time" it's really just a way to not have the access token expire
    • currently access tokens expire after about an hour
  • once you authorize with FB, you get a cookie back called fbs_appKey (where appKey is your application's key)
    • cookie also includes the access token and user id
  • if you store access tokens in your application's local database, you should store them encrypted
  • once you have the access token, you make the same call to facebook but pass the access token, and then you get a lot more of the profile info from facebook

Social REST API Challenges

  • signing a request for oauth 1.0(a) is difficult when using Spring's RestTemplate
  • each social provider's api varies wildly
  • getting a facebook access token requires parsing the cookie string
  • how should various http response codes be handled?

Spring Social

  • supports social integration in Spring
  • born out of Greenhouse development

TwitterTemplate

  • simplifies signing of OAuth 1 requests through RestTemplate
  • Offers consistent API template-based API across social providers
  • extends spring MVC to offer Facebook access token and user ID as controller parameters
  • maps social responses to a hierarchy of social exceptions
  • Spring Social can get at the actual response to a 4XX error code which you can't get if you're using RestTemplate directly
  • similar to using JdbcTemplate which gives you more detail than the raw sql exceptions
  • Spring Social includes TwitterTemplate to make interacting with twitter much easier

FacebookTemplate

  • a bit simpler since all that's needed is the access token
  • FacebookTemplate facebook = new FacebookTemplate(ACCESS_TOKEN);
  • String profileId = facebook.getProfileId();
  • also linkedin template and tripittemplate

Spring Social Next Steps

  • expanding available operations in social templates
  • more social templates for other providers

Introduction to Tomcat 7 #s2gx

Mark Thomas, SpringSource

  • Tomcat 7 Supports …
    • Servlet 3.0
    • JSP 2.2
    • EL 2.2
    • Java 1.6
  • New major release of Tomcat every time the spec has a major change
  • Servlet 3.0
    • asynchronous processing
    • pluggability
    • annotations
    • session management
    • miscellaneous
  • Asynchronous processing
    • request processing is synchronous, but the response processing can now be asynchronous
    • outline
      • start asynch processing
      • request/response passed to a new thread
      • container thread returns to the pool
      • new thread does its work
    • allows container threads to be used more efficiently
      • when waiting for external resources
      • when rationing to a resource
      • or any other time when the container thread would be blocking
    • allows separation of request and response
      • chat applications
      • stock tickers
    • all filters, servlets, and valves in the processing chain must support asynchronous processing
    • not as asynchronous as COMET
  • pluggability
    • purpose was to improve developer productivity–worry less about application configuration
    • annotations
    • web fragments
    • static resources in JARs
    • programmatic configuration options
    • pros
      • development can be faster
      • apps can be more modular
    • cons
      • fault diagnostics are significantly hampered
      • might end up enabling things you don't want or need
    • overall, I don't recommend using it for production
    • instead:
      • get tomcat to generate the equivalent web.xml
      • use the equivalent web.xml instead
    • can be frustrating to figure out what's going on when the application is doing things that aren't in web.xml
    • JARs can contain their own web.xml
    • allows JARs to be self-contained
    • JARs can also contain static resources
      • always used, cannot be excluded by fragment ordering
      • non-deterministic if there are duplicate reosurces in multiple JARs
  • annotations
    • servlets, filters, listeners
      • can be placed on any class
      • tomcat has to scan every class on application start
    • JARs scanned if included in fragment ordering
      • can exclude JARs from the scanning process; controlled in catalina.properties
    • security, file upload
      • placed on servlets
      • processed when class is loaded
    • file upload has almost–but not quite–the same API as Commons File Upload
      • don't have to ship commons file upload with your apps anymore
    • with annotations the configuration can become a lot more opaque
    • can turn all of this off in your main web.xml–turn off metadata complete
      • this is all or nothing–can't pick and choose what bits you want on or off
  • programmatic configuration
    • allows a subset of things you can do in we.xml
      • add servlets, filters, and listeners
      • change session tracking
      • configure session cookies
      • configure security
      • set initialization parameters
    • allows greater control / optional configuration
    • some environment-specific settings
    • can make troubleshooting difficult–no xml to refer to in order to see what's going on
    • main advantage is doing things like if/thens in your configuration which you can't do in web.xml
  • servlet 3.0 – session tracking
    • adds tracking via ssl session id
      • must be used on its own
    • allows selecting of supported tracking methods
      • url, cookie, ssl
    • url based tracking is viewed as a security risk
      • can't turn this off in servlet 2.2, but can turn it off in servlet 3.0
      • another release of tomcat 6 will likely allow this to be turned off
    • session id is cryptographically secure — can't be spoofed
  • servlet 3.0 – session cookies
    • can control default parameters for session cookies
      • name – may be overridden by tomcat
      • domain – may be overridden by tomcat
      • path – may be overridden by tomcat
      • maxage
      • comment
      • secure – may be overridden by tomcat
      • httponly – may be overridden by tomcat
  • servlet 3.0 – misc
    • httpOnly
      • not in any of the specs
      • however, widely supported
      • prevents scripts accessing the cookie content
      • provide a degree of xss protection
    • programmatic Login
      • useful when creating a new user account
      • can log the user in without redirecting them to the login page
      • allows the application to trigger a login
  • jsp 2.2
    • propery group changes
    • can specify default content type in jsp-config
    • can specify the buffer size for a page
    • new feature – error-on-undeclared-namespace
      • e.g. if you have a typo when using a tag library it fails silently
      • with error-on-undeclared-namespace turned on, error is thrown at compile time
    • jsp:attribute adds support for the omit attribute
  • ESL 2.2
    • now possible to invoke methods on a bean
    • correctly identifying the intended method is tricky
    • likely to be some differences between containers–spec if unclear on behavior
    • tomcat tries to do what the java compiler does
  • other tomcat 7 changes: management
    • add the ability to fix the remote jmx ports
      • previously jmx picked a port at random
    • single line log formatter
    • manager app can distinguish between primary, backup, and proxy sessions (for clusters)
    • aligned mbeans with reality (GSoC 2010)
    • general improvements to JMX support
      • can now have a server.xml with just a <Server …/> element and create a fully working Tomcat instance (Hosts, Contexts, etc. all via JMX)
        • can't save this config out but that's being worked on
  • performance
    • unlikely to see a big change
    • can limit the number of JSPs loaded at any one time
      • useful for development
    • not many areas where tomcat needs a big performance boost
  • security
    • generic CSRF protection
      • if you go to a site with malicious code, might trigger your browser to make a call to the tomcat manager to deploy an app that gives access to your machine
      • now the manager looks for a token that was passed from the previous response to the manager app and if the token doesn't exist, the request will fail
    • separate roles for manager and host manager apps
    • session fixation protection
      • changes session ID on authentication
    • enable the LockOutRealm by default (e.g. lock out user for 10 minutes after 5 failed login attempts)
    • enable an access log by default
    • added ability to disable exec command for SSI
  • code cleanup
    • use of generics throughout
    • removed deprecated and unused code
    • reduced duplication, particularly in the connectors
    • better definition of the lifecycle interface
    • added checkstyle to the build process
    • if you've written your own custom tomcat components, you might need to change them for tomcat 7
  • extensibility
    • added hooks for rfc66 – used by virgo
    • refectored to simplify geronimo integration
    • significantly simpler embedding
  • stability
    • builds on tomcat 6
    • tomcat 6 is already very stable
    • significant reductions in the open bug count
      • 6 open bugs without patches when i wrote this slide
      • for tomcat 5.5.x, 6.0.x, and 7.0.x combined
    • added unit tests
      • CI using BIO, NIO, and APR/native on every commit
    • memory leak detection and prevention
      • back-ported to tomcat 6
  • flexibility
    • copying of /META-INF/context.xml is now configurable — can control whether or not the expansion/copying of this file happens
    • alias support for contexts
      • map external content into a web application
      • keeps tomcat from deleting things in a symlink when the app is undeployed
    • shutdown address is now configurable
      • deliberately limited to localhost by default
    • tomcat equivalent of some httpd modules
      • mod_expires
      • mod_remoteIP
  • tomcat 7 status
    • passes servlet 3.0 TCK with every combination of connectors
    • passes jsp 2.2 TCK
    • passes EL 2.2 TCK
    • all with the security manager enabled
    • note that just because it passes the TCK doesn't necessarily mean it's fully compliant
    • 7.0.4 just released today
  • when will tomcat 7 be stable?
    • when three +1 votes come from committers
    • in practice the committers each have their own criteria
    • i'm looking for 2-3 releases with …
      • no major code changes that might cause regressions
      • tcks all pass (already have this)
      • no major bugs reported
      • good levels of adoption (already have this)
  • tomcat 7 plans
    • one release every month
      • bug 49884 put a spanner in the works
    • stable by the end of the year?
    • keep on top of the open bugs
    • work on bringing the open enhancement requests down
    • if all goes well, 7.0.6 will be the stable release
    • jsr 196 implementation?
      • authentication SPI for containers
      • geronimo has most (all?) of this already
    • windows authentication
      • looking unlikely — too much baggage
        • needs some native libraries for it to work well
      • waffle project already does this
    • simpler jndi configuration for shared resources
      • no more <ResourceLink … />
    • more jmx improvements
    • further improvements to memory leak protection
    • continue migration from valves to filters
    • java ee 6 web profile
      • no interest so far from user community
      • had more questions from journalists than users
      • no plans at present
      • adds a lot of baggage that isn't that useful
      • if you want a web profile implementation, there's geronimo
  • useful resources
  • new feature — rolling update/side-by-side deployment
    • can deploy a new version while the app is running and when a user's session expires, they hit the new version of the app
    • came out of a tc server requirement but made more sense to implement it in Tomcat
    • springsource providing patch to ASF and will be part of a future tomcat release
    • deploy a new WAR with the same name as an existing app, but add ##N at the end of the war file name where N is the version (e.g. myapp##1.war will be a new version of myapp.war)
      • context path is retained, meaning context path is the same for both versions of the app
    • feature that will be added is when no more sessions are active on the old version it will be automatically undeployed

Advanced GORM: Performance, Customization, and Monitoring #s2gx

Speaker: Burt Beckwith, SpringSource

Overview

  • demo of potential performance issues with mapped collections in GORM
  • using the hibernate 2nd-level cache
  • monitoring and managing 2nd-level caches
  • app info plugin

Standard Grails One-to-Many

  • library has many visits
  • visit class has person name and date, with backreference to library

What's the problem?

  • hasMany = [visits:Visit] creates a set
  • sets guarantee uniqueness
  • adding to the set required loading all instances from the database to guarantee uniqueness, even if you know the item is unique
  • likewise for a mapped list–lists don't guarantee uniqueness, but they do guarantee order, so they still have to pull all records from the db to get the order right
  • you get a false sense of security since it's lazy-loaded; only partially helpful
  • works fine in development when you only have a few visits, but imagine when you deploy to production and you have 1,000,000 visits and want to add one more
  • risk of artificial optimistic locking exceptions; altering a mapped collection bumps the version, so simultaneous visit creations can break but shouldn't

What's the Solution?

  • don't use collections
  • instead of visit belonging to a library, visit HAS a library
  • different syntax for persisting a visit
  • no cascading; to delete a library you need to delete its visits first in a transactional service method
  • you also lose your collection so you can't do library.visits.size(), etc. but you can still use dynamic finders,which is better anyway since you're only pulling what you need

Standard Grails Many-to-Many

  • user has many roles, role has many users
  • problem is that if all new users are granted a particular role, you get into scaling issues quickly
  • with many to many you have an intermediate table with pointers to both tables and can map the join table
  • the belongsTo in a many to many can go in either class since it's bidirectional
    • but this is the problem since the same amount of data will get loaded either way
  • more efficient to treat kind of like one-to-many and create the user, then grant the role
    • this way you're adding/deleting single records in a single table due to existence of a domain class describing the relationship
  • important to have well-defined equals() and hashCode() methods in your domain classes, as well as implement serializable so you can use second level caching
  • wind up with user.addToRole() or role.addToUsers()
  • no cascading like before–have to manage this yourself

So never used mapped collections?

  • no, you need to examine each case
  • standard approach is fine if the collections are reasonably small–for both sides in the case of many to many
  • the collections will contain proxies, so they're smaller than real isntances until initialized, but still a memory concern
  • great example of something that's convenient and easy out of the box, but when it becomes a problem, you just do it a different way

Using Hibernate 2nd-level Caching

  • great, but have to  be careful because it can burn you in the same way that a query cache in a database can bite you
  • great candidates for caching–anything that's read only and doesn't change often
  • can overuse cache to the point where you're spending more cycles flushing and aren't saving yourself any db traffic–can actually make things worse than just hitting the db

Caching Usage Notes

  • 1st level cache is the hibernate session itself
  • get is always cached
  • can significantly reduce db load by keeping instances in memory
  • can be distributed between multiple servers to let one instance load from the db and share updated instances, avoiding extra db trips
  • "cache true" creates a read/write cache, best for read-mostly object since frequently updated objects will result in excessive cache invalidation (and network traffic when distributed)
  • "cache usage: 'read only'" creates a read-only cache, best for lookup data (countries, zip codes, etc.) that never change

Query Cache

  • can set cacheable true on all the query options; this caches the query and you can grab class instances from this

Hibernate query cache considered harmful?

  • most queries are not good candidates for caching; must be same query with same parameters
  • updates to domain classes will pessimistically flush all potentially affected cache results
  • DomainClass.list() is a decent candidate if there aren't any (or many) updates and the total number isn't huge
  • great blog post by alex miller at http://tech.puredanger.com/2009/07/10/hibernate-query-cache

2nd Level Cache API

  • evict one instance: sessionFactory.evict(DomainClass, id)
  • can get stats (hits/misses, etc.)
  • can look at the stats and get a good sense of whether or not you're caching effectively
  • e.g. if miss count is high then your cache strategy isn't effective
  • appinfo plugin gives you tons of information about what's going on in your app, what's going on in hibernate, etc.

Groovy Web Services, Part I: REST #s2gx

Speaker: Ken Kousen – Kousen IT, Inc.

  • currently working on book for Manning: "Java and Groovy: The Sweet Spots"
  • "Java is really good for tools and infrastructure. Groovy is good for pretty much everything else."

Two Flavors of Web Services

  • SOAP based
    • SOAP wrapper for payload
    • much like an XML API on a system
    • makes header elements available
    • lots of automatic code generation–tools are very mature
      • stubs, skeletons, proxies, etc. are all written for you
      • sprinkle annotations into your codebase to get a web service out of it
  • REST based
    • everything is an addressable resource (uri)
    • leverage http transport
    • uniform interface (get, post, put, delete)
      • a lot of rest web services available today aren't truly restful–e.g. amazon web services where you have to specify the operation name
      • just invoking methods over the web using URLs, but the request type doesn't matter
  • REST -> Representational State Transfer
    • term coined by Roy Fielding in PhD thesis (2000), "Architectural Styles and Design of Network-Based Software Architectures" (available in HTML online; very readable and thought-provoking)
    • one of the founders of the Apache Software Foundation
    • on the team that came up with the spec for HTTP
  • Idempotent
    • repeated requests give the same result
    • get, put, and delete requests are meant to be idempotent
    • POST is not meant to be idempotent
    • e.g. on put–if you're supposed to get the same result every time, the assumption is that you'd know the uri you're applying this to every time
      • the ID will be in the request somehow
      • assumption here is that you know what the ID is going to be before you do the insert
      • having the ID ahead of time probably isn't going to happen in the real world
    • most people wind up doing POST for inserts and PUT for updates
    • "safe" is another term that comes into play–requests do not change the state of the server
      • GET requests should never change the state of the server
  • Content negotiation
    • each resource can have multiple representaitons
      • e.g. xml or json?
    • request will state which representation it wants
  • RESTful client
    1. asemble url with query string
    2. select the http request verb
    3. select the content type
    4. transmit request
    5. process results
  • Steps for rest above are more cumbersome than SOAP since SOAP does a lot of this for you. With REST you have to deal with all of this yourself.
    • where groovy comes into play, is that it's augmented a lot of these classes to make this easier
    • groovy also shines when you get the results back

EXAMPLE: Google Geocoder

  • The Google Geocoding API — converts address info for latitude, longitude
  • Base URL is http://maps.googleapis.com/maps/api/geocode/output?parameters
    • output is XML or JSON
    • parameters include
      • url encoded physical address
      • sensor = true | false (indicates whether or not the request is coming from a GPS-enabled device)
  • Groovy makes it easy to assemble a query string
    • map of parameters, then collect and join, then deal with the xml


def url = base + [sensor:false, address:[loc.street, loc.city, loc.state].collect { v -> URLEncoder.encode(v, 'UTF-8') }.join(',+')].collect {k,v -> "$k=$v"}.join('&')

def response = new XmlSlurper().parse(url)

  • simple read-only web services are a natural fit for REST, even if a lot of other web services in your organization are SOAP
  • get, post, put, and delete also map very naturally to sql verbs, so simple data management apps are natural for rest
  • content negotiation based on URL, e.g. http://…/xml for xml, http://…/json for json

Twitter API

  • closer to true REST
  • in the middle of changing their API again
  • have been supporting basic authentication with base64 encoding for years, which isn't encrypted
  • twitter API is http://dev.twitter.com/doc
  • twitter is oauth only now

HTTP Clients

  • simple http client in groovy — java.net.URL class
  • get is easy, just access the url
  • def xml = new URL(base).text — returns the text of the entire page
  • post, put, and delete in groovy are similar to java

Building a RESTful Service

  • design your URL strategy
    • what are URLs going to mean, what are http verbs going to mean
  • sample url strategy
    • http://…/songs
      • get returns all songs
      • post adds a new song
      • put, delete not used on this url
    • http://…/songs/id
      • get — return a specific song
      • post — not used
      • put — update song with given id
      • delete — remove song with given id

Groovlet

  • groovy servlet — groovy class that responds to an http request
  • easy to configure
  • provides access to the http methods
  • markup builder called html
  • set up the groovy servlet in web.xml
  • showing code for the various operations in the groovlet
  • main tools we have in groovy for restful web services are the xml slurper and markup builder–makes all this very simple

What does Java bring to the table?

  • one of groovy's principles is to not reinvent stuff that's available in java
  • inevitable that there's a performance penalty for using groovy
  • nice thing is you can mix and match groovy with java and use what's appropriate for individual portions of the application
  • JSR311 – JAX-RS
  • currently JAX-WS is part of Java EE 5 but also part of Java SE 1.6
  • specification for JAX-RS — part of Java EE 6

Libraries for HTTP

  • apache http client
  • groovy http builder
  • rest client available in the groovy http builder
  • typical for groovy: take a java library and make it easier/build on it

Grails

  • grails apps make REST easy
  • can set up mappings to map urls to controllers/methods — this is built in
  • xml marhsalling is dead simple in grails — just use "render foo as XML"
    • works great for xml that doesn't go too deep
    • can always use the markup builder

Downside to REST

  • No WSDL, therefore no proxy generation tools
    • all hand-written code for the plumbing
  • WADL: Web Application Description Language
    • attempt to create WSDL for REST
    • slowly gaining acceptance but not common yet

JAX-RS

  • attempt to do for REST what JAX-WS does for SOAP
  • Jersey — reference implementation for JAX-RS
  • annotation based

Conclusions

  • groovy makes it very easy to build URLs with query strings, invoke urls and get responses, and parse xml response
  • groovy works with JAX-RS reference implementation

Clustering and Load Balancing With tc Server and ERS httpd #s2gx

Mark Thomas – SpringSource

  • Tomcat committer
  • tc Server developer
  • responsible for keeping tc Server and Tomcat in sync
    • memory leak detection in tomcat manager app
    • recent logging improvements
    • simplifying jmx access
    • all of the above started in tc Server, but have been contributed back and implemented these features in tomcat
    • don't want to get into having a significant fork of tomcat

Typical Architectures

  • load balancer (round robin) -> httpd (sticky sessions) -> tc Server (clustered)
    • don't go anywhere near tc Server clustering unless you absolutely have to–adds complexity and overhead
    • only thing tc Server clustering gives you is the ability for users not to lose sessions if an instance of tomcat goes down
    • ask yourself how big of a deal it is if your users lose their sessions when an outage occurs–if it's a big deal then you may need clustering

Starting Point

  • ubuntu 8.04.4 64-bit VM
  • vmware tools installed
  • 64-bit sun jdk 1.6.0_21
  • will be installing tc Server, Hyperic, etc. on this clean image

tc Server Installation

  • don't run tc Server as root
  • create a tcserver user
    • owns the tc Server files
    • runs the tc Server processes
  • install to /usr/local/tcserver

Instance Naming and Port Numbering

  • think about this in advance–may wind up with 100s of instances
  • tc01, tc02, etc. as the instance name, then follow this for ports
  • example scheme for ports
    • 1NN80 – http
    • 1NN43 – https
    • 1NN09 – ajp
    • 1NN05 – shutdown (if used)
    • 1NN69 – jmx
  • server and jvmRoute naming–consider linking server name to IP address, e.g. srvXXX-tcYY where XXX is the end of the IP address, YY is the tomcat instance number
    • 1NN20 – cluster communication

DEMO: Installing tc Server

  • tc Server version names are e.g. apache-tomcat-6.0.29.A.RELEASE where the first part is the version of Tomcat, the "A" means it's the first release of tc Server based on that tomcat release
  • if shutdown port is disabled, doing a kill -15 does a graceful shutdown. kill -9 works too and tomcat won't care, though your application might, so only do -9 if you have to
  • created two instances of tc Server using the tc Server create instance script
  • tc Server comes with templates for startup scripts–copy these over to /etc/init.d and edit as needed
  • paramterize cluster addresses and ports in a catalina properties file
  • can use ${…} notation in server.xml to hit the properties in catalina.properties

Creating a Cluster

  • switching to static node membership
    • cumbersome for large clusters
    • remove the <Membership …/> element
    • need to add a bunch of config stuff after the <Interceptor …/> elements
  • easier to use dynamic node discovery
  • backup strategies — tomcat gives you DeltaManager and BackupManager
    • delta manager is simplest–replicates every session to every node in the cluster
    • if your sessions use a lot of memory, delta manager doesn't give you much scalability
    • if your limitation is CPU, delta manager gives you some scalability
    • amount of network traffic on delta manager increases with the square of the number of nodes–not terribly scalable
  • backup manager
    • replicates session data to one other node in the cluster
    • send options: synchronous vs. asynchronous
      • in synchronous, writes session changes to other nodes, waits for acknowledgement, and then sends response to the user. can mean a lag for the user.
      • asynchronous — changes to sessions are put on a queue and the user gets the response immediately. means there's a chance that the cluster will be in an inconsistent state. use of sticky sessions means the consistency of the cluster doesn't really matter.
      • because java thread running isn't deterministic, in asynchronous mode the session updates may not be processed in the same order in which they were placed on the queue, so if your application depends on these being processed in the same order this is a risk
    • no need for the WAR farm deployer — hyperic does this better
      • WAR farm deployer has been removed from tc Server
    • backup manager DOES know where the primary and backup nodes ARE for every session
      • i.e. it doesn't actually store all the sessions from all nodes, but it knows where to get the session it lost
    • backup manager scales much better than delta manager in both memory and network traffic
      • network traffic scales linearly with number of nodes
  • for availability on a small cluster, use the delta manager
  • if you're worried about scalability, go with the backup manager

Hyperic HQ Installation

  • create an hqs user
  • hqs user owns the hyperic hq agent files
  • the agent itself runs as the tcserver user
  • os security considerations
    • agent doesn't need root privileges to access OS mechanics, start/stop processes, etc.
    • tc Server needs to be able to read WAR files uploaded via the agent
    • don't want tc Server runtime running as root
  • hyperic security considerations
    • don't want agent connecting as hqadmin super user
    • create a dedicated agent user
    • requires create, modify, and delete privileges for platform and platform services only

ERS httpd

  • ERS = Enterprise Ready Server
  • SpringSource's distribution of Apache httpd
  • install ERS as root
    • httpd processes run as nobody:nobody so this is fine
  • remove the test instance
  • create a new instance
  • module configuration
    • enable mod_proxy_balancer
    • enable mod_proxy_ajp
    • mod_proxy_ajp isn't quite as stable vs. mod_jk and mod_proxy_http
    • mentioned something about mod_http now having remote IP addresses available–need to ask about this
  • configure balancer in ers


<Proxy balancer://tc>
  BalancerMember http://ip.address.here:port route=tc01-uniqueID
  BalancerMember http://ip.address.here:port route=tc02-uniqueID
</Proxy>

ProxyPass /cluster-test balancer://tc/cluster-test stickysession=JSESSIONID:jessionid
ProxyPassReverse /cluster-test balancer://tc/cluster-test

Debugging Clusters

  • need something in your apps that tells you which cluster node you're on
  • also need something to spit out the session ID so you can test that the sticky sessions are working
  • if your context path differs from your host name in tc Server, this may cause your cookies not to work since the hosts are different
    • can use cookiepath in proxypassreverse directive
    • easier: just have your context path match your host name
  • anything you want replicated in sessions has to be serializable
    • if your application can't support having everything in the session be serializable, terracotta will support non-serializable data in session replication

SpringOne 2GX Kicks Off with Spring 3.0 release | Javalobby

The release of the Spring 3.0 and a free developer edition of the SpringSource Tomcat Server will be announced today at the SpringOne 2GX conference
The new version of Spring’s Java framework will be fully Java 5 based
and have early support for Java EE 6.  The SpringSource tc Server
developer edition will feature a dashboard with real-time Spring
application metrics.

Heading out to the conference tomorrow–can’t wait!