Updated: TextSoap with appscript

July 1st, 2009  |  Published in ruby  |  3 Comments

(There’s an update at the bottom of this entry)

I used to be a faithful user of TextSoap, a Mac app that’s wildly, wildly useful when you’re dealing with content produced by a number of unpleasant tools (like Microsoft Word).

With TextSoap, you get a suite of text filters that do all sorts of stuff to text: Straighten smartquotes, convert things like smartquotes to text or numeric HTML entities, increase or decrease the quote level in a mail message, smarten quotes, perform conversions to typographical conventions (like “–” to em dash), and on and on. If TextSoap doesn’t have a pre-built filter to perform an operation, you can write your own. It even supports regular expressions.

You can use TextSoap like a steroidal text editing program, similar to Text Edit or Notepad or whatever, but it also integrates into the Mac as a service and as contextual items, so it’s always easy to use without having to launch a different app. It also provides an AppleScript scripting addition.

I got away from using TextSoap while I was carrying around an eeePC and replaced it instead with some very simple Ruby that ran through a hash of typical conversion tasks on STDIN. It was o.k., but it’s one of those things where you always turn up a new edge case every few months and that makes it unsatisfactory as a “just works” solution.

Now I don’t have the eeePC, and I’ve been through the unpleasant ordeal of seeing what it would be like to be a Windows dude, so I don’t mind augmenting plain old Ruby with some appscript now and then.

Since you have to use TextSoap as a scripting addition to automate it, the syntax is a little different when using it with rb-appscript: (See the update below)

#!/usr/bin/env ruby

require "rubygems"

require "appscript"

require "osax"

include OSAX

ts = OSAX::ScriptingAddition.new("TextSoapSA")

text = STDIN.read

puts ts.tsCleanText(text, :with => "InternetFriendly")

You can chain TextSoap filters by adding extra “:with => ‘foo'” parameters. It processes them first to last when it’s filtering the text.

I dropped that into TextMate as a command snippet and tied it to “cmd-shift-s” (for “scrub”) and it flattens text out. HTML entities and smartquotes become their flattest ASCII equivalents, so you can think of the “InternetFriendly” filter in TextSoap like a ginormous endumbening ray.

Which brings me to why I’m using the endumbening ray at all when I’d prefer to just convert smartquotes and the like to legal HTML entities: The CMS at work has taken to escaping entities when it renders a story, so all my carefully deployed right and left double quotes, em dashes and others render as literal text: rdquo, ldquo, emdash, etc.

It all has something to do with the chaos induced by a WYSIWYG editor in use. I wouldn’t know anything about that because it won’t run on the Mac version of any browser I’ve got: I experience the CMS as a collection of plain text areas and don’t consider that a loss.

I’d be more crabby about the whole thing, but I’ve been living with stuff like this for the nine years I’ve been in Web publishing and it seems I might as well take a cue from the medium I work in: route around the damage. (Quick aside on that link: at least look at the screenshot I provided at the time: How did we not claw our eyes out?

One other note: Putting a script like that in TextMate instead of deploying it from the Services menu, as Apple intended for things like TextSoap, is a little awkward. The Services menu, however, is a minor usability nightmare. Since I’ve got the “Edit in Textmate” plugin installed for all my Cocoa apps, and “It’s All Text” installed in Firefox, it’s easier to just worth from within TextMate. As a side benefit, all my command snippets are easily exported and used in another text editor, should I ever switch.

Update: Thanks to Patrick and Mark in the comments, who point out that TextSoap now ships with a scripting agent (textsoapAgent). It’s similar to the scripting addition but the syntax is a hair cleaner and you don’t have to include appscript’s OSAX support. Here’s the example above using the scripting agent instead of the scripting addition:

#!/usr/bin/env ruby

require "rubygems"

require "appscript"

ts = app("textsoapAgent")

text = STDIN.read

puts ts.cleanText(text, :with => "InternetFriendly")

If you’re not sure which cleaners are available or what to call them, the agent includes a few methods to make figuring that out easier:

app(“textsoapAgent”).groupNames will return the cleaner groups. In my case:

[“Library”, “:Standard”, “:Email”, “:Typographical”, “:Case Conversions”, “:Text Quoting”, “:Markdown”, “:HTML”, “:Plist”, “:Custom”, “MyList”, “Mail”, “BBEdit”, “Jupitermedia”]


app(“textsoapAgent).groupItems(:from => “group name“) will return the cleaners with a particular group. In this instance, “MyList,” which is a user-defined selection of filters:

[“Markdown Text”, “InternetFriendly”, “Remove Forwarding (>) Characters”, “Convert pseudo-heads”, “Capitalize Title”, “double to single quotes”, “Remove All Tabs”, “Remove Extra Spaces”, “Remove Extra Returns”, “Blog Preparation”, “Markdown”, “Remove Extra Spaces”]


  1. Patrick says:

    July 2nd, 2009 at 12:00 am (#)


    I just wanted to let you know that since version 6.1 of TextSoap you don’t have to use the scripting addition, and as far as I know it’s also discouraged to use it in the newer versions, as the textsoapAgent is directly scriptable.


  2. Mark Munz says:

    July 2nd, 2009 at 1:04 am (#)

    The scripting addition will continue to work with TextSoap 6 in a 32-bit environment, but as Mac OS X progresses, the preferred approach is to script the textsoapAgent directly.

    The textsoapAgent scripting has all of the same functionality as the scripting addition and then some. The commands were based on the scripting addition, so it should be a straightforward switch.

    There is a section in the online help that talks about scripting the textsoapAgent.

  3. Mark Munz says:

    July 2nd, 2009 at 1:09 am (#)

    Btw.. Great article. I always enjoy seeing people find ways to make TextSoap work almost invisibly in their specific workflow. Taking advantage of scripting TextSoap is one way to do just that.

Leave a Response

© Michael Hall, licensed under a Creative Commons Attribution-ShareAlike 3.0 United States license.