Once upon a time, I was talking to SmugMug about working with Boston.com on a photo project. It didn’t end up happening, but I was very impressed at how much they had their act together and how they were actually making money. In 2004, this seemed a foreign concept to the other companies I contacted.
I’ve been following them ever since, especially as they’ve become a poster child for Amazon Web Services (AWS), the cloud-computing system I’m using for Serendeputy. Don MacAskill, the CEO, just wrote an outstanding piece explaining how SmugMug is using AWS to handle all their document processing.
SkyNet [SmugMug’s main controller] is completely autonomous - it operates with with zero human interaction, either watching or providing interactive guidance. No-one at SmugMug even pays attention to it anymore (and we haven’t for many months) since it operates so efficiently. (Yes, I realize that means it’s probably well on its way to world domination. Sorry in advance to everyone killed in the forthcoming man-machine war.)
Roughly once per minute, SkyNet makes an EC2 decision: launch instance(s), terminate instance(s), or sleep. It has a lot of inputs - it checks anywhere from 30-50 pieces of data to make an informed decision. One of the reasons for that is we have a variety of different jobs coming in, some of which (uploads) are semi-predictable. We know that lots of uploads come in every Sunday evening, for example, so we can begin our prediction model there. Other jobs, though, such as watermarking an entire gallery of 10,000 photos with a single click, aren’t predictable in a useful way, and we can only respond once the load hits the queue.
I’m architecting my systems in a similar way, trying to build everything out so that it’s as decoupled and asynchronous as possible. If I can fire up only the machines I want and only when I need them, then I can bootstrap the organization far longer than I could if I had to buy the equivalent physical machines. The experiments and prototypes I’m working on would be prohibitively expensive without AWS.
Although SmugMug isn’t using it, I’m using SQS for managing all the communications between these instances. Keeping it all in one system reduces my headaches. Now, I just need Amazon to incorporate CouchDB and I’ll be really able to roll. SimpleDB is a start, but it’s not really meeting what I need.
I look forward to hearing more from Mr. MacAskill and SmugMug as they continue to innovate with AWS — especially if I can pick up more architecture hints…