Docs Decomposer #4

March 11th, 2015  |  Published in ruby

Tags ended up going in pretty easily with acts-as-taggable-on. It allows for mutltiple kinds of tags, which means with judicious use of forms you can pretty much create any kind of taxonomy you want. I’m just using keywords for now, in a classic free-tagging setup. With judicious use of forms to control input, I’ll be able to add risk and priority tags.

Docs Decomposer

Which means, I guess, that the thing is pretty much “done” in terms of the basic features I’d like it to have:

  1. Individual user accounts.
  2. The ability to quickly reduce a given page to just the steps and CLI instructions.
  3. The ability to flag a page with a click.
  4. The ability to comment on a page.
  5. The ability to tag a page.

Over lunch, I broke my “don’t do this in the office” rule briefly to add some markup to our Jekyll templates to make it easier to grab the rendered content for import. The importer became a little simpler as a result, and the pages lose a little bit of extra nav cruft from our templates.

So, as it stands, I could pretty much run an inventory session for the team from this thing running on my laptop. The one thing that’s still vexing me about it is the notion of unique and trackable page elements (ordered lists and code blocks, mostly).

In my Padrino prototype, each page showed the rendered content, plus a tab that showed only ordered lists and pre-blocks. Each <ol> and <pre> was checksummed, and the “elements” tab showed which other pages in the docs corpus had the exact same content.

What I really, really want is something like this:

  • You look at the page and see a <pre> block that concerns you, either because it looks very perishable or could be harmful if the information has aged out.
  • You hover over the element to expose a flag or comment button, plus a little stats box that tells you where else that element appears.
  • Once you flag that element, it’s flagged everywhere it appears in the corpus, receiving a special visual treatment.

How’s that work?

Everything is parsed by Nokogiri, so I can just write a little checksumming service in my Elements controller that will be seeing the same HTML whether it’s getting it during the content import phase, or the presentation/review phase. So:

  1. User loads a page.
  2. JavaScript finds each element of interest on the page (each <pre> and <ol>) and sends it over to the checksumming service.
  3. The checksumming service returns a unique i.d. for the element.
  4. The element gets wrapped in a div with that i.d. (in case it already has an i.d.)
  5. The comment/flag widget for that element is just AJAX calls to the controller against the i.d. of the wrapper, which gets a class to reflect its flagged status.
  6. Each flagged element gets either a modal/lightbox or a page with the comment history.

Now that I write it all out, though, it seems pretty doable. I should probably write more stuff out.

Flaws in the plan? Mostly that it’s going to add some load time as each page is pulled in, dissected, and checksummed. Fortunately, there are advanced GIF technologies to make that seem almost pleasant:


It’s something I could do at import time, too, I guess. There’s nothing sacred about the underlying markup.

Oh, the other problem is “what about when that element changes, even if it’s for the better?” At that point, the checksum changes and the flag/comment history disappears because it becomes a “new” element. There goes the inventory. Which means the inventory might become something less about the literal content and more about what/where it is, e.g. “ordered list on page #{foo} under the heading #{bar}.” So when a flag gets thrown on an element, the existence and “coordinates” of the thing are logged somewhere, along with a snapshot of what it looked like at the time. That’s probably going to be enough to find it again.


So we could go either way here. I think writing the checksumming service and coding up the process sounds interesting and fun. I think deciding that flagging elements and leaving comments on them as a one-time process with no expectation of permanence might be okay. I think automating a log based on “there’s a fishy set of instructions on the page called #{foo} under the heading #{bar}” sounds like a useful middle ground. I’ve also already figured out the xpath needed to do just that: “Show me a thing plus the most recent heading that occurred before it,” which has proven great for explaining what some of the lists and pre blocks are trying to tell you at a glance.

Well … something to think about.

Leave a Response

© Michael Hall, licensed under a Creative Commons Attribution-ShareAlike 3.0 United States license.