Tag Metadata

January 15th, 2006, By Duncan Gough

Similar, well, almost similar, to my thoughts about tagging ‘how did i get here?’, Tom has posted an interesting idea about auto-tagging articles posted on Metafilter:

…get rid of tags from the subdomains and instead put in tags that represent languages. So you create a form of tags which operates as a key:value pair with a code something like lang:english or lang:francais and then present a default English homepage to Metafilter with links to english.metafilter.com and francais.metafilter.com on it. You then encourage people to post links in French on the latter one, and automatically tag each of their posts with lang:francais as you do so.

Which is something that the alpha version of Millions of Games had in place about this time last year. Since we were opening up what was originally going to be a basic links site to create the first and (still) only folksonomy of casual games, I was thinking a lot about using the almost perfectly-formed html scraper Beautiful Soup to extract as much metadata as I could from each game submission, turning that data into tags.

The most obvious example of this is the difference between a Flash game and a Shockwave one. Flash games end in .swf, Shockwave games in .dcr. It would be ‘trivial’ for a screen-scraper to chew through the html, find the relevant ‘object’ tag and pull out the file extension for the game. Once I had that, I could tag the game up as ’shockwave’ or ‘flash’ accordingly. From there, I could even do the same for Java games (.jar) too.

In practice, though, it became too confusing for users to see these extra tags that came out of nowhere. I’ve noticed that the founder of del.icio.us, Joshua Schachter, has often responded to questions along the line of ‘why don’t you do x?’ by saying that it’s extremely hard to display that extra information without it looking messy. In theory, I could screen-scrape submitted games for metadata and auto-generate a list of new attributes for each item in our database, but displaying that in user-friendly way becomes very hard.

Ultimately, I ruled out auto-generating tags because they reduce the value of the neighbour tags and, more importantly, they negate the act of personal expression that makes tagging such a valuable task for the end user.