DH2012: CATMA/CLÉA workshop (pre-conference)

CLÉA is version 4 of CATMA (which stands for Computer Aided Textual Markup & Analysis); they are transitioning from a desktop application to a web-based application (CLÉA), which I believe is still in beta.  It wasn't quite what I was expecting based on the workshop description, or even the acronym - because to me it does not seem to do actual text markup; it's a system for tagging and highlighting text that lets you search the text and the tags.  I'm wondering if I may be less impressed with it because I'm approaching it more as a programmer than a scholar; I don't have any particular project it is suited to (now that I know what it is and what it does).  Several of the participants in the room had interesting ideas for how it might be applicable to their research, but many of them also had suggestions for additional features that would be helpful (or maybe even necessary) for their work.

The best part of the workshop was actually hearing the kinds of work other people are doing, and why they are interested in a tool like this.  In particular, two graduate students associated with the CATMA team presented on their dissertation projects and explained how they are using CLÉA to tag and then query various types of narratological structures or tropes against other features (in one case, whether narratives contain unresolved conflict, resolved conflict, or no conflict; in the other case, musical features and structures).  But after seeing examples of their work in CLÉA, I can't help feeling like they are still doing an awful lot of manual work (searching and highlighting and clicking buttons to tag things; reading through the narratives and identifying various narrative structures; even coming up with a 500-term hierarchical tag set for narratological terms).  They also claim the system is designed for usability, but I did not find it particularly intuitive - and then the developer explains that the interface is complicated because of all the features it supports.

Here are a few of my quibbles and concerns: there is no support (optional or otherwise) for stop words, stemming or other text features that I (apparently) have started to think of as standard building blocks for working with or searching text; much less, support for more advanced features like collocation, parts-of-speech or named entity tagging.  I'm also very wary of the fact that it seems to bi a closed system-- there's currently no way to existing tags or the tagged text (although it sounds like they are looking into adding these features); even the query format seems to be custom and non-standard.   It seems to me that a lot of what CLÉA is trying to do could be done better with other tools (for XML content in particular).  I think I could set up something simple to tag or mark-up xml text with macros or regular expressions in any competent editor (dealing with nested or overlapping tags could be problematic, but CLÉA will has yet to tackle that export issue).  Searching, querying, and reporting could then be done with in any of a number of existing systems or tools.

In the end, I think that the usefulness of CLÉA probably depends on your goals, what questions you are asking or what you are trying to accomplish. However, this is certainly not a tool for improving or marking up existing text content, which is what I went in thinking it might be.