Sanitize HTML With Sanitize

March 17th, 2009  |  Published in ruby

Handy gem:

    #!/usr/bin/env ruby

require "rubygems"

require "sanitize"

markup =

Sanitize.clean!(markup, Sanitize::Config::BASIC)

puts markup

Sanitize is built on Hpricot and provides quick-n-easy HTML cleanup. I plonked the above into a TextMate command and it automatically makes Word HTML 90 percent more tolerable by discarding style and class attributes and leaving just the most basic markup.

You can also configure it to leave some tags alone, permit some tags to have certain attributes but not others, etc. There are also ‘Restricted’ and ‘Relaxed’ levels of sanitization.

Insert some gushing about Ruby making my life better here. I don’t care if its syntax is too forgiving, loose or whatever: When I can eliminate a source of irritation in six lines of code, I’ll capitalize every vowel if required.

Leave a Response

© Michael Hall, licensed under a Creative Commons Attribution-ShareAlike 3.0 United States license.