Sinatra ftw (Updated)

March 31st, 2011  |  Published in ruby  |  3 Comments

So, I have this task to accomplish:

I’ve got 1,180 articles that were all placed in a website directory I no longer want to keep. The friendly dbas gave me an Excel spreadsheet showing me each article’s title, url, pub date, author and current directory. I’m to go down the list and reassign each article.

I made it through 15 entries before realizing it’s hard to tell what an article’s about from just the headline. Some things look like tutorials, but they’re reviews, some things look like reviews, but they’re analysis. I can either guess, or I can copy the URL, flip to my browser and load the article then decide how to classify it. This process sucks:

  • Copying and pasting URLs is a drag
  • Waiting for articles to load if an element on the page hangs is a drag
  • Entering stuff in an Excel spreadsheet is a drag

So I gathered together a few tools:

  • Sinatra, which is a Ruby micro-framework for Web apps
  • Blueprint CSS, which is a CSS framework for quick/easy pretty (or at least not awful) HTML
  • Summarize, a gem that can take a bunch of text and return the relevant parts for a given percentage of the original

Then I set up a sqlite database & fed the Excel spreadsheet into it and sat down with Sinatra.

Here’s the finished app (minus the one template page I use):

It’s missing some affordances, but the long and short is demonstrated here:

Reclasser

You start with a given article. It shows you the headline, pub date and author. Using the summarizer gem, it pulls in a summary representing 25 percent of the total text. You pick a new category for it from the radio buttons, click “Reclassify,” and it takes you to the next record (unless you’re done, in which case it just reloads the current record.

The time gain? I dunno. If I worked like a machine, I’d probably be done in four hours. Now it takes less time to update a record and I don’t have to do as much to update it: Just click a radio button, click a submit button, move on. Even if it turns out the time I save is offset by the coding time (two hours, with a break for coffee, chatting, answering mail), I guess I figured out a way to spend some time that would be wasted screwing with Excel getting to play with Sinatra. And since I’m going to have to do the same task in a few weeks for another site, I’ll have time-saving code ready to go.

Update: Even better with a bit of jQuery: Rather than loading the article summary every time (even though I don’t always need it to tell what kind of article I’m dealing with), I wrote an additional route that handles providing just the summary and used jQuery’s load function to pull in the summary for review when I click on a “load summary” button. That makes things a lot faster when I can tell what an article is just from the headline.

Update2 : Sinatra uses Mongrel by default, which is fine, except I noticed that every now and then a page load would go from taking an average of a fifth of a third or so to five seconds. Meh. So I sudo gem install thin‘d the thin gem, and got much more consistent, snappy performance.

Responses

  1. David says:

    March 31st, 2011 at 7:41 pm (#)

    Happy to read some of your thoughts on Sinatra. My feelings about Ruby on Rails have always been mixed, I wonder if a less overwhelming framework would suit me better. From looking at your example, I’m guessing that it would. Thanks!

  2. mph says:

    April 2nd, 2011 at 4:24 pm (#)

    Glad it was helpful, David.

    Sinatra seems like it could be pretty useful in a lot of contexts, alright. I kind of look at it like a way to build one-off database apps the way I used to do with Microsoft Access.

    I used to think of Rails that way, but Sinatra feels like it’s easier to go into without a more firm plan, and with less hopping from file to file.

  3. Pow! etc. :: dot unplanned says:

    April 15th, 2011 at 3:53 pm (#)

    […] seems as if a few colleagues might need to use that little Sinatra-based reclassifier ditty, which is always a nice thing: The first iteration of these things usually seems to come close to […]

Leave a Response

© Michael Hall, licensed under a Creative Commons Attribution-ShareAlike 3.0 United States license.