Archive for the 'Serendeputy' Category

Whatcha doin’ this weekend? Sharding the cabinet.

Friday, January 29th, 2010

Sharding the cabinet sounds like best euphemism ever.

Sadly, my life is not that exciting. I’m taking down Serendeputy’s main set of data stores (a series of Tokyo Cabinets) and re-sharding them to make them more memory-efficient and a bit faster. Customer growth is getting a little ahead of the architecture, so I’m doing some tweaking. (A nice problem to have, but still a bit of a problem.)

Things might get a little wonky this weekend as I’m making this transition, but I’ll try to keep the outage short.

Make everything as simple as possible but no simpler

Tuesday, January 26th, 2010

After living with Serendeputy for the past year and a half, I’ve been able to reduce all the massive complexity of the application into just three core concepts: the gesture, the profile and the list. By making all the interactions on the site and through the API go through these three primitives, I’ve been able to solve the scalability issues and most of the performance issues inherent in mass personalization. Now, it’s like Legos. Shiny, geeky Legos.

Thank goodness Ruby and Tokyo Cabinet are so flexible. If I’d been doing this using the tools of five years ago, I would not have been able to pivot this smoothly.

Look for a couple of fun announcements in the next [real soon now] weeks.

(And yes, Tokyo Cabinet is probably one of the geekiest topics I have up on the site, right up there with Machine Learning.)

Serendeputy state of the union

Thursday, November 19th, 2009

The state of the union is strong.

I’m pretty excited. I’ve been heads-down the past few weeks working towards the official 1.0 release of the Serendeputy application. Probably a few more weeks to go.

Serendeputy is really four applications. Each of them are in pretty good shape.

The librarian understands the world. Its role is to read all the news sites, parse them and figure out what they’re talking about and what’s relevant right now. This has been running stably for eighteen months now, but I’m most excited about something I put in a couple of months ago — the sidecar.

The sidecar allows me to attach snippets (or pages) of code to any element in the librarian system, be it a source, an author, a feed, a tree or a topic. This lets me do little things like fixing the author tags on the New York Times articles, and bigger things like programmatically disambiguating the different meanings of “windows.” This sidecar functionality is really letting me fine-tune the librarian, which makes the overall results appear much more on-target for all the users. The machine learning pieces of the librarian are tremendously helpful, but nothing’s better than being able to hand-tune when I need to perfect one particular thing.

Hooray for Ruby Metaprogramming.

The deputy understands you. The deputy is the real personalization engine that takes all the gestures you give the system and builds them into a profile. It then reads all the librarian’s indexes to find just the right articles for you. The core deputy technology has been in place for more than a year, so I’ve mostly been fine-tuning it.

I’ve been mostly focusing on timeliness boosts and sinks. If you read something on Bill Belichick, the deputy will interpret that as an interest for you and build it into your master profile. The key knob I’m twiddling now is how much should I boost it based on the fact that you’re reading it now? How long should this boost last? Should this end up swamping some of your longer-term but not recent interests? I’m having a fun time modeling all the different ways of doing this, but I’m still working on finding a reasonably optimal configuration.

The web application (at Serendeputy.com) is getting tweaked substantially over the next few weeks. I have a few interface things I’ve wanted to get at, and I’m bundling them into a big release. (Firefox + jQuery + way too many DOM nodes = sluggishness). I’m also building out the site-wide meta lists (most popular for everyone, etc.) and building out the interface for the tree navigation of the sources and topics. Ideally, these improvements will make it easier for people to find what they’re looking for.

The API application is almost done. This is the functionality that helps publishers take advantage of Serendeputy’s personalization engine on their own site. I’m working with several alpha/beta customers right now to get the right balance of functionality. It’s a lot of fun talking to folks back in the industry, and I’m glad that I have a decent amount of cred from my time at the Boston Globe, Abuzz and Amazon.

This will be one of the core revenue drivers of the site, so I’m excited to see it coming together. Revenue is helpful.

So, lots to do, but I think I’m making good (and fast) progress. Never underestimate the ability to understand the entire system. And make decisions quickly. Not a lot of bureaucracy here at Serendeputy world headquarters.

Mostly, I’m very excited that this thing I’ve built is started to really resemble what I had in my head.

If you have any comments or suggestions, please drop me a line!

Still Alive!

Monday, November 9th, 2009

I’m polishing up the 0.9 release, so I’ve been a little remiss on the blog posting. Alas.

I’ll have a state of the union post up this week, and I have a couple of longer think pieces in the queue. We’ll see if I can get them up by the end of the month.

I’m better at building than writing, but I’ll try to be better at both.

Serendeputy public beta launch

Wednesday, July 15th, 2009

I pulled all the private-beta hacks out of the system and launched the public beta of Serendeputy today. Hooray. It has shipped. No longer vaporware. A bit of relief.

Six months from idea to prototype. Six months from prototype to product. Now, hopefully six months from product to business. Or else, six months from product to get a damn job, already. I can’t wait to see how it all works out.

Now, it’s on to making it better every single day. I have a whiteboard full of things to improve and build. I can’t wait to get to them.

Quote of the day:

“If you review your first site version and don’t feel embarrassment, you spent too much time on it.” – Reid Hoffman

Serendeputy should be pretty solid, but please make sure to give me your feedback. I need to make it much, much better than it is now.

Built out the vocabulary engine

Wednesday, June 24th, 2009

Today has been one of my more exciting days building. I finally finished up my vocabulary engine.

The vocabulary engine lets me fine-tune my classification engine topic-by-topic and source-by-source. This will allow me to do some pretty sophisticated disambiguation, and I hope that it will make document classification all the more effective.

This is still a hand-defined engine, but I also wrote in the hooks for the machine-learning piece. That’s still a ways away, though.

I’ve been an IA geek for a little over fifteen years at this point. Having the system of my dreams is pretty cool. It’s pretty rewarding to have the system you’ve had in your head for a long time actually exist in the real world. I’m not quite a sculptor, but I have to imagine the feeling is the same.

Getting closer

Tuesday, June 16th, 2009

We’re getting ever closer. The placeholder page is up. Drop me a note if you want to play with the private beta. The public launch is imminent-ish.

Part of me wants to hold off on inviting people. Serendeputy’s getting better every day (literally, as I’m spending most of the day in Emacs tweaking things), so the longer people hold off on trying it, the better it will be. But, I need to put it out there at some point. Might as well make it soon.

My favorite quote of the day is from Reid Hoffman, the founder of LinkedIn, among other companies.

If you review your first site version and don’t feel embarrassment, you spent too much time on it.

Reid Hoffman, as quoted in Mark Goldenson’s 10 lessons from a failed startup, a post-mortem of what PlayCafe’s founders did right and wrong.

My current favorite topics list

Tuesday, May 26th, 2009

Why not. I just did a list of my popular sources. Why not the analogous list of the top topics I’m reading about…

  • Boston Red Sox
  • Personalization
  • Parenting
  • Gardening
  • New England Patriots
  • Advertising
  • SEO
  • Boston Globe
  • Newspaper Industry
  • Joss Whedon
  • Supreme Court
  • Terminator
  • Revenue
  • Lisp
  • Lost
  • Ruby
  • Real Estate
  • New York Times
  • Weight Loss
  • Journalism
  • Awesome
  • Fashion
  • Artificial Intelligence
  • Scams
  • Bankruptcy

Not sure if there are any patterns there, but it’s reasonably reflective of what interests me at the moment. (Bankruptcy is strictly about the automobile industry; Serendeputy the company is doing just fine.)

My current favorite sources list

Tuesday, May 26th, 2009

At the bottom of every page of Serendeputy is a list of your most popular sources and topics. It only puts out ones with articles you haven’t read yet, so it’s a little thinner than it would be if I’d been away from the site for a couple of days, but here’s my current most popular sources list

I’m into the double-digits with betatesters right now. I’m looking forward to opening it up further in the next few days. See you then!

Where am I?

Thursday, April 9th, 2009

Ahh, spring is here. The Red Sox have opened up, the tulips are coming out of the garden, and I’m making good progress on Serendeputy. We (believe it or not) are getting close.

I’ve built the servers out, and they’re up and running now, burning in and breaking in fun new ways. I knew nothing about systems administration, so it’s been a bit of a journey from bare Linux installs to fully-functioning (and even reasonably snappy) servers. My librarian application has been running for a couple of months, the deputy and memcached servers for a couple of weeks. I’m working on the rails application now, and futzing around with jquery. I’m on version 0.5 of the application, up to check-in 913 in Subversion, and up to Bug 171 in FogBugz.

I’m going to launch a private-invite beta starting with version 0.7. 0.8 will be the public beta, and I hope to be at 1.0 within a few months. After that, the world.

Thanks for keeping in touch. I’ll write more soon. (Especially about the imploding newspaper industry. I’m not an insider anymore, but I still know how things work. It’s as if the NAA has turned into a giant suicide pact.)

Getting through the Dip

Monday, February 23rd, 2009

So, right now I’m in the middle of a serious re-write of my librarian application (the piece that talks to rest of the world). It’s moving in the right direction, and it will ensure that the whole building won’t fall over on the first day, but it’s been a horrible slog.

I’ve decided to look at it this way, though: I’m building distance between me and potential competitors. I’ve been deeply involved in this enough to know that it’s something that a YCombinator kid can’t clone in a weekend. (Or, so I hope).

1000.times do
  puts “It’s important to keep going through the dip.”
end

I also need to re-read my review of The Dip.

Moving closer to alpha

Tuesday, January 27th, 2009

We’re getting closer!

I’ve learned a ton from the private alpha I’ve been running for the past month or so. Now, it’s time to take those learnings and incorporate them in the product. I also need to go through and make it a bit more robust. There’s a lot of baling wire and duct tape keeping it together right now.

I should be able to get to the public alpha in the next few weeks. Woo Hoo!

Just need to make sure the baby doesn’t get sick any more…

Serendeputy on Twitter

Monday, January 5th, 2009

I am, as you might predict, @serendeputy on twitter. If you’re interested in this little project, please give it a follow.

As I’m going into alpha, I want to make sure everyone can get ahold of me. If it works best to hold conversations in public, I’m more than happy to.

Not a good sign

Wednesday, November 19th, 2008

For Serendeputy, I’ve built out a series of automated checks that tell me if something seems amiss with the data. One of these monitors alerts me to sites that haven’t been updated in a while so that I can check them out, and either update them or remove them from the catalog. I caught my own site (this particular blog) this morning. That’s not a good sign. I should probably be writing more here.

I’ve been head’s down on Serendeputy for the past few weeks. I have a couple of major demos in the first week of December, so I’m probably going to be pretty quiet for the next couple of weeks, too. Lots to talk about after that, though…

Digg coming to my turf

Monday, October 20th, 2008

Greg Linden talks about how Digg may be working on personalized news. This is entirely expected. Every infomediary, aggregator and publisher needs to be working on technology like this if they want to survive. By 2011, every web user will demand that all websites be intensely personalized. Like Serendeputy.

Hubris alert: I realize I haven’t launched anything yet. Working on it…

Progress update – 9/15/08

Monday, September 15th, 2008

It’s been a productive couple of weeks here at Serendeputy world headquarters. I have been heads-down trying to build the private alpha version of the site — i.e., get it running end to end on my own machine.

I hope to be able to have the private alpha done by the end of September, with an incredibly limited wider alpha by October 15th or so. I hope that it doesn’t take me two weeks to get the deployment and hosting issues squared away, but this is the weakest part of my game, so I’m trying to give myself plenty of time to get that part right.

The rails application is largely built, and the catalog application is coming along quickly. The hard part is maintaining the discipline to keep writing the tests and automating the admin pieces of the application. I keep wanting to jump ahead and build out some of the sexier features. It’s sometimes not a lot of fun being a grown up.

If you’re interested in taking an early look at the application, drop me a line, and I’ll make sure I send you the information when it’s somewhere publicly accessible.

The blog will probably be dark for a couple more weeks as I finish this sprint. The world grieves.