drupal

folksonomy

Clearing out some old notes from the last job and figured this would be a good one to preserve. It shows how many times tags were used on a collection of posts generated and tagged almost exclusively by the user community for a site I used to work on.

For instance, row 1 shows that tags were used only once in 12,602 cases; row 2 shows that tags were used 2-5 times in 2,735 cases, etc. So out of a total of 16,169 tags, we can see that tags were used only once about 78 percent of the time, and tags were used no more than five times about 95 percent of the time.

Times Used Number of Tags
1 12,602
2-5 2,735
6-10 416
11-20 200
21-50 118
51-100 54
101-500 27
501-1,000 7
1,000-2,000 5
2,000-5,000 5

I wasn’t prepared to worry too much about the whole thing until the site (run by Drupal) started to crawl and the slow query log showed us that taxonomy-related queries were killing us. I even took to Ask Metafilter to see what everyone else had to say, and got an answer from the guy who coined the term “folksonomy.”

The thing that was maybe a shame was that a lot of those 12,602 tags were variations on each other:

  • social networking

  • social_networking

  • socialnetworking

  • SocialNetworking

  • Social Networking

  • Social_Networking

  • Social_networking

  • Social networking

  • Socialnetworking

In the absence of any discipline at all and no overarching style guide for tagging, no real patterns emerged to make the tags useful. Search engine indexation sucked because we had 12,000+ tag index pages with only a single post, those thousands of tag pages netted well under 0.5 percent of site traffic and crawl times were ridiculous. You really should not have almost as many tag indexes as you have actual posts.

It wasn’t deemed a wise use of time to try to automate normalization. In the end, I wrote a VBO that allowed us to delete the 12,602 tags that were used only once (provided they weren’t newer than a month old, so we didn’t arbitrarily blow up a trend before it blossomed). We also locked the users at large out of being able to tag at all, leaving it to the curators on staff. Yes, it helped performance.

Dark side of tag normalization: At the job I held before this one, they just gave an editor a spreadsheet with the thousands of non-normalized tags and invited her to correct them by hand. I do believe I would have gone mad.

Epochal Automator

When Drupal thinks of time, it thinks in Unix timestamps. That is good for computers, not so good for me when I’m staring at a quick database query and just need to know when something happened, was saved, or will be updated next.

Several months back my friend Sam asked for a way to quickly convert a UNIX timestamp to a human readable date. He had a snippet of Ruby he was using, so we worked out an Automator workflow to take the mouse selection and copy the converted date to the clipboard. I usually just want to see the date, not do anything with it, so here’s a version that uses Growl to quickly display the converted date.  You need to have Growl installed. I should also note that Growl provides Automator actions, but I couldn’t get them to take input from the preceding Automator action. 

1. Install the Ruby growl gem. This will give you the Growl command line app, too:

sudo gem install ruby-growl

2. Test that the command line app is working:

echo "Hello world" | growl -H localhost

3. Fire up Automator and tell it you’re making a service:

Automator service
4. Tell Automator your new service is receiving the selected text from any application:
Automator input
5. Drag “run shell script” from the Utilities list into your workflow.
6. Paste this into the shell script action you just dragged in:

for f in "$@"
do
     date -r $f | growl -H localhost -t "Date from UNIX timestamp:"
done

7. Save it and remember its name. 

8. Assign your new service a sensible keyboard shortcut:

8a. Open System Preferences -> Keyboard and click the “Keyboard Shortcuts” tab

8b. Look for your service under “Text”

8c. Click the grayed-out word “none” next to your service, then click “add shortcut”

8d. Enter your shortcut

That’s pretty much it. 

There are a few more things you can do to customize it. In the Growl System Preferences, you can pick the style of alert for “ruby-growl,” including whether or not it makes any sound and whether or not it’s sticky. 

Here’s a sample:

Growl timestamp

Migrating content from Drupal 6 to Drupal 6 (part 1 of ?)

drupal_migrate_progress

I’ve got to move some user profiles, feeds and posts from one Drupal 6 site to another. That came with some frustrations. This should not be read as a lot of crowing, because the work isn’t done. A few things worth noting, though, that I’m putting here to help anyone who’s found some excellent tutorials from ca. 2010 and wonders why on Earth they are not working.

Migrate

The Migrate module provides a very complete API for handling this sort of thing. Migrate v1 provided a number of GUI tools for managing field mappings, rolling back migrations, etc. Migrate v2 is much more complex, offers little in the way of GUI setup, and has proved over complex for what I’m trying to do, which is move nodes between two sites on the same MySQL server.

Migrate v1 uses Table Wizard as a data source for incoming nodes. There’s a great tutorial by Angie “webchick” Byron on Lullabot’s site, but it may prove frustrating until you realize it’s for people using Migrate v1, which you can download here. To support CCK fields and a few other things, you’ll also want migrate_extras for the v1 module.

Once you have those things, the tutorial will make a lot more sense and I have just a few more things to add:

MySQL Views and TableWizard

For Drupal-to-Drupal migrations, you’ll also need to set up your databases with a little more care. In my case, each site has a database sitting on the same MySQL server. To bring content from the incoming database into the receiving site, you’ll have to let Drupal on the receiving site know about it. Adding a line like this in settings.php for the receiving site will do it:

$db_url['db_for_source_site'] = 'mysqli://user:pass@database_host/database_name'; 

Since you’re using a pair of Drupal databases that probably have the same table names, if you haven’t prefixed your table name this will probably bother TableWizard. The quick way around that is to use MySQL views to effectively rename the tables on the incoming site. Here, for instance, is a query to make a MySQL view to provide all your incoming site’s node data to the receiving site. It will appear to Table Wizard as if you have a table called “foo_node.”

create view foo_node AS SELECT * FROM node 

Table Wizard will then allow you to create relationships between your incoming tables using these views. If you don’t take this step, the views generated by Table Wizard will be disagreeable. You might be tempted to bypass the creation of views and simply make sure your Table Wizard default views are named with unique names that won’t collide with existing tables. That didn’t work for me. Take the five seconds each to make views for the tables you need.

© Michael Hall, licensed under a Creative Commons Attribution-ShareAlike 3.0 United States license.