howto

How to prepare for an EM/CA data session

Participants in the first EMCA DN Meting

At the inaugural EMCA Doctoral Network meeting (my write-up here), where there were a mix of researchers with different levels of familiarity with EM/CA, I realised that the process of preparing for a data session – one of the most productive and essential tools of interaction analysis – is really poorly documented. There are some guidelines provided in textbooks and on websites that usually include issues of how to do the analysis/transcription itself, but nowhere is there a simple guide for how to actually get ready to contribute your data to a data session. This short primer is intended to fulfil that function, and invite others to contribute their own tips and best practices. 

I was more or less in this situation (having crammed my head full of CA literature without having had a great deal of hands-on data session practice) last year when I went to the Centre for Language Interaction and Culture at UCLA. There I had the chance to witness and participate in four separate weekly data sessions run by Steven Clayman, Chuck Goodwin, John Heritage, Gene Lerner, Tanya Stivers and Sandy Thompson and their students. It was a bit of a baptism of fire, but I learned a lot from it.

Each of the pros had interestingly different approaches to preparing data for the sessions, all useful in slightly different ways so I decided to write up my synthesis of best practice for preparing your data for a data session. Feedback and comments are very welcome!

What/who this guide is for:

This guide is intended for researchers interested in participating in data sessions in the tradition of Ethnomethodology and Conversation Analysis (EM/CA), who already have data and want to figure out how to present it.

This is not about gathering data, specific analytic approaches or about actually doing detailed analysis or any of the meat and potatoes of EM/CA work which is amply covered in many books and articles including:

This guide is intended to help researchers who may not have had much experience of data sessions to prepare their data in such a way that the session will be fun and analytically useful for them and everyone else who attends.

This is also not intended to be a primer in the use of specific bits of audio/video/text editing or transcription software – there are so many out there, I will recommend some that are freely available but pretty much any will do. I do plan to do this kind of guide, but that’s not what this article is for.

Selecting from your data for the session

Doing a data session obviously requires that some kind of data selection has to be made, so it helps to have a focal phenomenon of some sort. Since the data session is exploratory rather than about findings, it doesn’t really matter what the phenomenon is.

That’s the great thing about naturally occurring data – you might not find what you’re looking for, but you will find something analytically interesting. Negative findings about your focal phenomenon are also useful – i.e. you might find out that you’ve selected your clips with some assumptions about how they are related – you might find that this is not borne out by the interaction analysis. That is still a useful finding and will make for a fun and interesting data session.

Example phenomena for a rough data-session-like collection of extracts might focus on any one or on a combination of lexical, gestural, sequential, pragmatic, contextual, topical etc. features. E.g.:

  • Different sequential or pragmatic uses of specific objects such as ‘Oh’, ‘wow’ or ‘maybe’.
  • Body orientation shifts or specific patterns of these shifts during face-to-face interaction.
  • Word repeats by speaker and/or recipient at different sequential locations in talk.
  • Extracts from interactions in a particular physical location or within a specific institutional context.
  • Extracts of talk-in-interaction where speakers topicalize something specific (e.g.: doctors/teapots/religion/traffic).

At this stage your data doesn’t have to be organised into a principled ‘collection’ as such. Having cases that are ostensibly the same or similar, and then finding out how they are different is a tried and tested way of finding out what phenomenon you are actually dealing with in EM/CA terms.

There are wonderful accounts of this data-selection / phenomenon discovery process with notes and caveats about some of the practical and theoretical consequences of data selection in these two papers:

Pre-data session selection: how to focus your selection on a specific phenomenon.

You can bring any natural data at all to a session and it will be useful as long as it’s prepared reasonably well. However if you want the session to focus on something relevant to your overall project, it is helpful to think about what kind of analysis will be taking place in the session in relation to your candidate phenomenon and select clips accordingly.

There are proper descriptions of how to actually do this detailed interaction analysis in the references linked above. However, here is a paraphrase of some of the simple tips on data analysis that Gene Lerner and Sandy Thompson give when introducing an interdisciplinary data session where many people are doing it for the first time in their wonderful Language and the Body course:

  1. Describe the occasion and current situation being observed (where/when/sequence/location etc.).
  2. Limit your observations to those things you can actually point to on the screen/transcript.
  3. Then, pick out for data analysis features/occasions that are demonstrably oriented to by the participant themselves.
  4. That is your ‘target’, then zoom in to line-by-line, action-by-action sequences and describe each.
  5. Select a few targets where you can specify what is being done as the sequence of action unfolds.

Then in the data session itself, you and other researchers can look at how all interactional resources (bodily movements / prosody / speech / environmental factors) etc. are involved in these processes and make observations about how these things are being done.

Providing a transcript

I find it very hard to focus on analysis without having a printed transcript but there are a few different approaches each with different advantages and disadvantages. Chuck Goodwin, for example, recommends putting Jeffersonian transcription subtitles directly onto the video/audio clips so you don’t have to split focus between screen and page. However, most researchers produce a transcript using Jeffersonian transcription and play their clips separately.

Advantages of printed transcriptions

  • You and other participants have something convenient to write notes on.
  • You can capture errors or issues in the transcription easily.
  • Participants can refer to line numbers that are off-screen when they make their observations.

Advantages of subtitles on-screen

  • You don’t miss the action looking up and down between page and screen.
  • Generally easier to understand immediately than multi-line transcript when presenting data in a language your session participants might not understand.
  • You can present this data in environments where you don’t have the opportunity to print out and distribute paper transcripts.

In either case you will need to take the time to do a Jeffersonian transcription so why not do both?

Jeffersonian transcription

There are lots of resources for learning Jeffersonian transcription, here are some especially useful ones:

Visual/graphical transcripts

Chuck and Candy Goodwin often also present carefully designed illustrations alongside their final analyses. Some people also present their data, usually at a later stage of research with detailed multi-modal transcripts incorporating drawings, animations, film-strip-like representations etc. (see Eric Laurier’s paper for a great recent overview):

  • Laurier, E. (2014). The Graphic Transcript: Poaching Comic Book Grammar for Inscribing the Visual, Spatial and Temporal Aspects of Action. Geography Compass, 8(4), 235–248.

How much work you want to do on your transcript before a data session is up to you but it is probably premature to work on illustrations etc. until you have some analytic findings to illustrate.

Transcription issues vs. errors

It’s inevitable that other people will hear things differently, so the data session is a legitimate environment for improving a transcript-in-progress. In fact, often analytic findings may hinge on how something is heard, and then how it is transcribed – this is a useful thing to discuss in a data session and will be instructive for everyone. However, it is important to capture as much as possible of the obvious stuff as accurately as possible to provide people with a basic resource for doing analysis together without getting hung up on simple transcription errors rather than the interesting transcription questions.

Introducing your data

It is useful to give people a background to your data before you present it. This does not have to be a full background to all your research and the study you are undertaking. In fact, it’s useful to omit most of this kind of information because the resource you have access to in the data session are fresh eyes and ears that aren’t yet contaminated by assumptions about what is going on.

In terms of introducing your study as a whole, it’s useful to have a mini presentation (5 mins max for a 1.5h data session) prepared about your study with two or three key points that can give people an insight into where/what you are studying. Once you’ve made one of these for each study you can re-use it in multiple data sessions.

In terms of introducing each clip, have a look at Schegloff’s (2007) descriptions of his data extracts. They have a brilliantly pithy clarity that provides just enough information to understand what is going on without giving away any spoilers or showing any bias.

Preparing your audio/video data.

Assuming you already have naturalistic audio/video data of some kind, make some short clips using your favourite piece of audio/video editing software. The shorter the better (under 30s ideally) – longer clips, especially complex/busy ones may need to be broken down for analysis into smaller chunks.

It can be time-consuming searching through longer clips for specific sections, so I recommend making clips that correspond precisely to your transcript, but noting down where in the larger video/audio file this clip is located, in case someone wants to see what happens next or previously.

Copy these clips into a separate file or folder on your computer that is specifically for this data session – finding them if they’re buried in your file system can waste time.

If possible, test the audio and video projection/display equipment in the room you’re running the data session to make sure that your clips are audible and visible without headphones and on other screens. If in doubt, use audio editing software (such as Audacity) to make sure the audio in your files is as loud as possible without clipping. You can always turn a loud sound system down – but if your data and the playback system you’re using is too quiet – you’re stuck in the data session without being able to hear anything.

There are many more useful tips about sound and lighting etc. in data collection in Heath, Hindmarsh & Luff (2010).

The mechanics of showing your data

I find it useful to think of this as a kind of presentation – just like at a conference or workshop, so I recommend using presentation software to cue up and organise your clips for display rather than struggling looking through files and folders with different names etc…

Make sure each clip/slide is clearly named and/or numbered to correspond with sections of a transcript, so that people can follow it easily, and make sure you can get the clip to play – and pause/rewind/control it – with a minimum of fuss.

The data session is probably the most useful analytic resource you have after the data itself, so make sure you use every second of it.

Feedback / comments / comparisons very welcome

I hope this blog-post-grade guide is useful for those just getting into EMCA, and while I know that data session conventions vary widely, I hope this represents a sufficiently widely applicable set of recommendations to make sense in most contexts.

In general I am very interested in different data session conventions and would very much welcome tips, advice, recommendations and descriptions of specialized data session practices from other researchers/groups.

More very useful tips (thanks!):

Dr Jo Meredith adds: “because data sessions can be a bit terrifying the temptation’s to take some data you can talk about in an intelligent way, best data sessions i’ve been to have been with new pieces of data, and I’ve got inspired by other people’s observations”

S.P.A.M.D.: a turn-analytical mnemonic

While trawling through the many piles of conversational data being thrown at me while I’m happily ensconced as a CLIC visiting graduate student at UCLA, I’ve made a little mnemonic device for myself to help me tackle a transcript turn by turn:

Turn # (lines)
– seq:
– pos:
– act:
– mrk:
– des:

For each turn of talk, I’m asking myself:

  1. Which turn number is it?
  2. Which lines does it occupy?
  3. What is it in its local sequence? (an FPP, an SPP etc.)
  4. Which position in the sequence does it occupy?
  5. What action does it implement? (If any.)
  6. What is it marked by? (If at all.)
  7. How is it designed/shaped?

Although there’s always lots more to ask of any turn, especially in terms of its use of conventional formulations, detailed lexical/syntactic/prosodic features and how the turns in or across sequences interrelate, this seems like a reasonable set of initial questions to ask when looking at a turn for the first time.

I’m OK with the acronym S.P.A.M.D. but would welcome any suggestions for other turn-analytic question labels beginning with M or even better – with E so I can complete the set.

Use archive.org to park old websites without link rot

I’ve built dozens of websites over the years – both professionally and as part of artistic, research, teaching, or freelance projects. I’m still very proud of some of them and would like to show them to people and link to them. Other people have also linked to them extensively over the years, and those inbound links are useful.

The problem is that keeping all this stuff online takes maintenance and often causes headaches. I think I found a 10 second technical solution that I hope it doesn’t annoy the good people at archive.org.

Quick how-to:

  1. Find the best possible instance of your website on archive.org’s wayback machine.
  2. Create two redirect rule on your web server, one to block archive.org’s archiving script, and one to redirect all otehr traffic to archive.org

Breakdown:

Finding the best instance of your website

Archive.org indexes your website with varying regularity. You might want their latest version – but it doesn’t just depend on your website. Archive.org archives the environment of your site too.

There may be other websites that went offline before your site – if you link to the latest version that archive.org has, it may be that links out of your site are therefore dead or linking to domain squatters that moved in when people you linked to moved out.

The solution to this is to use archive.org’s time navigation bar to find the ‘optimum’ time at which your site, and it’s significant neighbours was in it’s heydey. Use this as the basis of the apache redirect rule.

Redirect rules

Most web servers allow you to define redirect rules. I use Apache. Apache provides you with several ways of doing this. You can either create redirect rules in your Apache configuration files via your sites-available, or add them to an .htaccess file in the root directory of your domain.

My redirect rule for my old art collective’s website looks like this:

<IfModule mod_rewrite.c>
        Options +FollowSymLinks
        RewriteEngine on

        # if it's archive.org trying to archive itself
        RewriteCond %{HTTP_USER_AGENT} ^ia_archiver
        RewriteRule ^.* - [F,L]

        # otherwise redirect to archive.org
        RewriteRule (.*) http://web.archive.org/web/20061205014515/http://twenteenthcentury.com/$1 [R=301,L]
</IfModule>

Thoughts?

I’d be interested to know what archive.org thinks of this use of their service. It seems such an obvious solution to a very widespread problem.

So many of my artist friends – particularly those who developed their own web skills for artistic purposes – now spend inordinate amounts of time keeping their websites alive across server and database incompatibilities, changes in the programming languages they used to create their services, and various other headaches.

This isn’t ideal – obviously the services can’t run on archive.org, and sadly, some of the more interesting bits of work I did were not very friendly to web crawlers so didn’t provide archive.org with much to go on, but that’s a good pointer for future work: make sure that your web projects are easily crawled by archive.org so you don’t have to sysadmin it for the rest of time.

Install Dropbox On Your Server.

Start Dropbox Automatically On Boot

Dropbox provides a handy little service management script that makes it easy to start, stop and check the status of the Dropbox client.

Create a new file for the service management script

sudo vi /etc/init.d/dropbox

 

Paste the following script into the new file

#!/bin/sh
# dropbox service
# Replace with linux users you want to run Dropbox clients for
DROPBOX_USERS="user1 user2"

DAEMON=.dropbox-dist/dropbox

start() {
    echo "Starting dropbox..."
    for dbuser in $DROPBOX_USERS; do
        HOMEDIR=`getent passwd $dbuser | cut -d: -f6`
        if [ -x $HOMEDIR/$DAEMON ]; then
            HOME="$HOMEDIR" start-stop-daemon -b -o -c $dbuser -S -u $dbuser -x $HOMEDIR/$DAEMON
        fi
    done
}

stop() {
    echo "Stopping dropbox..."
    for dbuser in $DROPBOX_USERS; do
        HOMEDIR=`getent passwd $dbuser | cut -d: -f6`
        if [ -x $HOMEDIR/$DAEMON ]; then
            start-stop-daemon -o -c $dbuser -K -u $dbuser -x $HOMEDIR/$DAEMON
        fi
    done
}

status() {
    for dbuser in $DROPBOX_USERS; do
        dbpid=`pgrep -u $dbuser dropbox`
        if [ -z $dbpid ] ; then
            echo "dropboxd for USER $dbuser: not running."
        else
            echo "dropboxd for USER $dbuser: running (pid $dbpid)"
        fi
    done
}

case "$1" in

    start)
        start
        ;;

    stop)
        stop
        ;;

    restart|reload|force-reload)
        stop
        start
        ;;

    status)
        status
        ;;

    *)
        echo "Usage: /etc/init.d/dropbox {start|stop|reload|force-reload|restart|status}"
        exit 1

esac

exit 0

 

Make sure you replace the value of DROPBOX_USERS with a comma separated list of the linux users on your machine you want to run the Dropbox client to run for. Each user in the list should have a copy of the Dropbox files and folders that you extracted from the archive, available under their home directory.

Make sure the script is executable and add it to default system startup run levels

sudo chmod +x /etc/init.d/dropbox
sudo update-rc.d dropbox defaults

 

Control the Dropbox client like any other Ubuntu service

sudo service dropbox start|stop|reload|force-reload|restart|status

 

 

Dropbox Delorean By Dropbox Artwork Team

Dropbox Delorean By Dropbox Artwork Team

Depending upon the number of files you have on Dropbox and the speed of your internet connection it may take some time for the Dropbox client to synchronize everything.

Check Status with Dropbox CLI

Dropbox has a command line python script available separately to provide more functionality and details on the status of the Dropbox client.

Download the dropbox.py script and adjust the file permissions

wget -O ~/.dropbox/dropbox.py "http://www.dropbox.com/download?dl=packages/dropbox.py"
chmod 755 ~/.dropbox/dropbox.py

 

You can download the script anywhere you like, I’ve included it along with the rest of the Dropbox files.

Now you can easily check the status of the Dropbox client

~/.dropbox/dropbox.py status
Downloading 125 files (303.9 KB/sec, 1 hr left)

 

Get a full list of CLI commands

~/.dropbox/dropbox.py help

Note: use dropbox help <command> to view usage for a specific command.

 status       get current status of the dropboxd
 help         provide help
 puburl       get public url of a file in your dropbox
 stop         stop dropboxd
 running      return whether dropbox is running
 start        start dropboxd
 filestatus   get current sync status of one or more files
 ls           list directory contents with current sync status
 autostart    automatically start dropbox at login
 exclude      ignores/excludes a directory from syncing

 

Use the exclude command to keep specific files or folders from syncing to your server

~/.dropbox/dropbox.py help exclude

dropbox exclude [list]
dropbox exclude add [DIRECTORY] [DIRECTORY] ...
dropbox exclude remove [DIRECTORY] [DIRECTORY] ...

"list" prints a list of directories currently excluded from syncing.  
"add" adds one or more directories to the exclusion list, then resynchronizes Dropbox. 
"remove" removes one or more directories from the exclusion list, then resynchronizes Dropbox.
With no arguments, executes "list". 
Any specified path must be within Dropbox.

 

Once the Dropbox service is running and fully syncrhonized you can access all your Dropbox files and easily share files on your server with all your other Dropbox connected gadgets!

For more resources and troubleshooting tips visit the Text Based Linux Install page on the Dropbox wiki and the Dropbox forums. Happy syncing!

via Install Dropbox On Your Ubuntu Server (10.04, 10.10 & 11.04) | Ubuntu Server GUI.

Building & Installing 4store on Debian Lenny

It took a good few attempts to get 4-store installed on my Debian Lenny box, even after reading a very useful guide by Richard Reynolds.

For anyone following that guide, here are the modifications I had to make:

Firstly, I had to install Raptor first (it complains that there’s no Rasqual otherwise). That was fairly straight forward, I was able to follow Richard Reynolds guide:


wget http://download.librdf.org/source/raptor2-2.0.2.tar.gz
tar -xzvf raptor2-2.0.2.tar.gz
cd raptor2-2.0.2
./configure
make
sudo make install

Then I was able to build Rasqual:


wget http://download.librdf.org/source/rasqal-0.9.25.tar.gz
tar -xjvf rasqal-0.9.25.tar.gz
cd rasqal-0.9.25
./configure
make
sudo make install

When it came to building 4store, I couldn’t get the sources from github. This line:

git clone https://github.com/garlik/4store.git

Got me:


Initialized empty Git repository in /home/blah/4store-v1.1.4/4store/.git/
warning: remote HEAD refers to nonexistent ref, unable to checkout.

Which wasn’t very useful, and created an empty 4store directory that I had to delete. A bit of googling indicated that the maintainers need to issue a few commands to push the default branch to the server. I couldn’t do anything about that, so I tried other methods of getting hold of the sources.

Then I tried several times to download auto-zipped up sources from github, unzipped them, and struggled with building the Makefile using the included automake.sh script, which I never got to work.

So finally I downloaded the sources from the 4store website here, unzipped them, found a nice Makefile and followed the INSTALL instructions from there.

It was a bit of a mission getting 4store to compile, I had to apt-get install:

  • libglib2.0-dev (Make complained about not having glibc-2.0)
  • libxml++-dev
  • libreadline-dev

But I finally got it configured, made and installed. Next: configuration!

How to insert a special character in Vim

Press ctl-k, then press one of these characters:

Character	Meaning
--------------------------------
!		Grave
'		Acute accent
>		Circumflex accent
?		Tilde
-		Macron
(		Breve
.		Dot above
:		Diaeresis
,		Cedilla
_		Underline
/		Stroke
"		Double acute (Hungarumlaut)
;		Ogonek
<		Caron
0		Ring above
2		Hook
9		Horn
=		Cyrillic
*		Greek
%		Greek/Cyrillic special
+		Smalls: Arabic, caps: Hebrew
3		Some Latin/Greek/Cyrillic
4		Bopomofo
5		Hiragana
6		Katakana

from http://vim.runpaint.org/typing/inserting-accented-characters/