Two Fixes

August 25th, 2010  |  Published in ruby

Boy did I get sick of having to set this every time I ran Panopticon:

    look_back = 3.days

Some days it was “2.days,” some days it was “1.day,” some days it was 3.days. So:

    tsf  = "/Users/mike/.panopticon_stamp"

    timestamp = File.stat(tsf).mtime

    look_back = Time.now - timestamp

then when all is done:

    \`touch #{tsf}\`

It wouldn’t have even been a problem if I’d just checked some items as “done,” but when I accidentally create an item, or decide an item doesn’t need to be read or processed or whatever, I just want it gone and I don’t want it sitting in a log claiming to be a thing I did or read. Because I didn’t. But when you delete an item, it’s gone and Panopticon can’t find it anywhere and will recreate it, which meant I was being haunted by zombie tasks that I most pointedly did not want to deal with anymore. Now I won’t be.

But all that put the next thing in mind: It’d be pretty nice to have things I mark as “done” in Panopticon get unmarked/unflagged/unstarred or whatever in their native app. So a starred item in Google Reader stops being starred, a flagged message gets unflagged, a bookmark in delicious gets tagged as “read,” etc.

Fixing Sampled Reporting

Probably no time for that this week. I spent a lot of time fiddling around with some problems introduced by sampling in Google Analytics.

The brief version: If you make queries against the Analytics API that involve more than 500k events, you start getting sampled data. The article specifically mentions pulling reports for long periods of time.I’ve been pulling reports for a number of sites that do well north of 500k visits per month, so when I started pulling queries for periods of 60 or 90 days long I was most certainly getting back sampled data.

When I started trying to do really simple reporting about how page views changed month-over-month, I started seeing articles that somehow had fewer total page views when they were 60 days old than when they were 30 days old. Changing my approach to gathering page views from “single long period” to “consecutive shorter periods” cleared the problem up. Rather than pulling queries for a period like “from the date of publication to 60 days after the date of publication,” you’re a lot better off pulling a pair of queries: “from the date of publication to 30 days after the date of publication,” and “from 60 days after the date of publication to 29 days before that date” then adding them up. Unless the site is doing more than 500k visits a month, sampling is less likely to get you.

Leave a Response

© Michael Hall, licensed under a Creative Commons Attribution-ShareAlike 3.0 United States license.