How to prepare for an EM/CA data session

At the inaugural EMCA Doctoral Network meeting (my write-up here), where there were a mix of researchers with different levels of familiarity with EM/CA, I realised that the process of preparing for a data session – one of the most productive and essential tools of interaction analysis – is really poorly documented. There are some guidelines provided in textbooks and on websites that usually include issues of how to do the analysis/transcription itself, but nowhere is there a simple guide for how to actually get ready to contribute your data to a data session. This short primer is intended to fulfil that function, and invite others to contribute their own tips and best practices.

I was more or less in this situation (having crammed my head full of CA literature without having had a great deal of hands-on data session practice) last year when I went to the Centre for Language Interaction and Culture at UCLA. There I had the chance to witness and participate in four separate weekly data sessions run by Steven Clayman, Chuck Goodwin, John Heritage, Gene Lerner, Tanya Stivers and Sandy Thompson and their students. It was a bit of a baptism of fire, but I learned a lot from it.

Each of the pros had interestingly different approaches to preparing data for the sessions, all useful in slightly different ways so I decided to write up my synthesis of best practice for preparing your data for a data session. Feedback and comments are very welcome!

What/who this guide is for:

This guide is intended for researchers interested in participating in data sessions in the tradition of Ethnomethodology and Conversation Analysis (EM/CA), who already have data and want to figure out how to present it.

This is not about gathering data, specific analytic approaches or about actually doing detailed analysis or any of the meat and potatoes of EM/CA work which is amply covered in many books and articles including:

Goodwin, C. (1993). Recording human interaction in natural settings. Pragmatics, 3(2), 181–209.
Schegloff, E. A. (2007). Sequence organization in interaction: Volume 1: A primer in conversation analysis (Vol. 1). Cambridge University Press.
Heath, C., Hindmarsh, J., & Luff, P. (2010). Video in qualitative research. London: Sage Publications.
Sidnell, J. (2011). Conversation analysis: an introduction. London: Wiley-Blackwell.

This guide is intended to help researchers who may not have had much experience of data sessions to prepare their data in such a way that the session will be fun and analytically useful for them and everyone else who attends.

This is also not intended to be a primer in the use of specific bits of audio/video/text editing or transcription software – there are so many out there, I will recommend some that are freely available but pretty much any will do. I do plan to do this kind of guide, but that’s not what this article is for.

Selecting from your data for the session

Doing a data session obviously requires that some kind of data selection has to be made, so it helps to have a focal phenomenon of some sort. Since the data session is exploratory rather than about findings, it doesn’t really matter what the phenomenon is.

That’s the great thing about naturally occurring data – you might not find what you’re looking for, but you will find something analytically interesting. Negative findings about your focal phenomenon are also useful – i.e. you might find out that you’ve selected your clips with some assumptions about how they are related – you might find that this is not borne out by the interaction analysis. That is still a useful finding and will make for a fun and interesting data session.

Example phenomena for a rough data-session-like collection of extracts might focus on any one or on a combination of lexical, gestural, sequential, pragmatic, contextual, topical etc. features. E.g.:

Different sequential or pragmatic uses of specific objects such as ‘Oh’, ‘wow’ or ‘maybe’.
Body orientation shifts or specific patterns of these shifts during face-to-face interaction.
Word repeats by speaker and/or recipient at different sequential locations in talk.
Extracts from interactions in a particular physical location or within a specific institutional context.
Extracts of talk-in-interaction where speakers topicalize something specific (e.g.: doctors/teapots/religion/traffic).

At this stage your data doesn’t have to be organised into a principled ‘collection’ as such. Having cases that are ostensibly the same or similar, and then finding out how they are different is a tried and tested way of finding out what phenomenon you are actually dealing with in EM/CA terms.

There are wonderful accounts of this data-selection / phenomenon discovery process with notes and caveats about some of the practical and theoretical consequences of data selection in these two papers:

Jefferson, G. (1988). Preliminary notes on a possible metric which provides for a’standard maximum’silence of approximately one second in conversation. In D. Roger & P. Bull (Eds.), Conversation: An interdisciplinary perspective. Clevedon, UK: Multilingual Matters.
Schegloff, E. A. (1996). Confirming allusions: Toward an empirical account of action. American Journal of Sociology, 102(1), 161–216.

Pre-data session selection: how to focus your selection on a specific phenomenon.

You can bring any natural data at all to a session and it will be useful as long as it’s prepared reasonably well. However if you want the session to focus on something relevant to your overall project, it is helpful to think about what kind of analysis will be taking place in the session in relation to your candidate phenomenon and select clips accordingly.

There are proper descriptions of how to actually do this detailed interaction analysis in the references linked above. However, here is a paraphrase of some of the simple tips on data analysis that Gene Lerner and Sandy Thompson give when introducing an interdisciplinary data session where many people are doing it for the first time in their wonderful Language and the Body course:

Describe the occasion and current situation being observed (where/when/sequence/location etc.).
Limit your observations to those things you can actually point to on the screen/transcript.
Then, pick out for data analysis features/occasions that are demonstrably oriented to by the participant themselves.
That is your ‘target’, then zoom in to line-by-line, action-by-action sequences and describe each.
Select a few targets where you can specify what is being done as the sequence of action unfolds.

Then in the data session itself, you and other researchers can look at how all interactional resources (bodily movements / prosody / speech / environmental factors) etc. are involved in these processes and make observations about how these things are being done.

Providing a transcript

I find it very hard to focus on analysis without having a printed transcript but there are a few different approaches each with different advantages and disadvantages. Chuck Goodwin, for example, recommends putting Jeffersonian transcription subtitles directly onto the video/audio clips so you don’t have to split focus between screen and page. However, most researchers produce a transcript using Jeffersonian transcription and play their clips separately.

Advantages of printed transcriptions

You and other participants have something convenient to write notes on.
You can capture errors or issues in the transcription easily.
Participants can refer to line numbers that are off-screen when they make their observations.

Advantages of subtitles on-screen

You don’t miss the action looking up and down between page and screen.
Generally easier to understand immediately than multi-line transcript when presenting data in a language your session participants might not understand.
You can present this data in environments where you don’t have the opportunity to print out and distribute paper transcripts.

In either case you will need to take the time to do a Jeffersonian transcription so why not do both?

Jeffersonian transcription

There are lots of resources for learning Jeffersonian transcription, here are some especially useful ones:

Schegloff’s classic Transcription Module.
Ochs, Elinor (1979) Transcription as theory. In: E. Ochs and B.B. Schiefelin, eds. Developmental Pragmatics. New York: Academic Press: 43-72
Hepburn, A., & Bolden, G. B. (2013). The Conversation Analytic Approach to Transcription. In J. Sidnell & T. Stivers (Eds.), The Handbook of Conversation Analysis (pp. 57–76). London: Wiley-Blackwell.
A larger collection of transcription resources on Paul ten Have’s website.

Visual/graphical transcripts

Chuck and Candy Goodwin often also present carefully designed illustrations alongside their final analyses. Some people also present their data, usually at a later stage of research with detailed multi-modal transcripts incorporating drawings, animations, film-strip-like representations etc. (see Eric Laurier’s paper for a great recent overview):

Laurier, E. (2014). The Graphic Transcript: Poaching Comic Book Grammar for Inscribing the Visual, Spatial and Temporal Aspects of Action. Geography Compass, 8(4), 235–248.

How much work you want to do on your transcript before a data session is up to you but it is probably premature to work on illustrations etc. until you have some analytic findings to illustrate.

Transcription issues vs. errors

It’s inevitable that other people will hear things differently, so the data session is a legitimate environment for improving a transcript-in-progress. In fact, often analytic findings may hinge on how something is heard, and then how it is transcribed – this is a useful thing to discuss in a data session and will be instructive for everyone. However, it is important to capture as much as possible of the obvious stuff as accurately as possible to provide people with a basic resource for doing analysis together without getting hung up on simple transcription errors rather than the interesting transcription questions.

Introducing your data

It is useful to give people a background to your data before you present it. This does not have to be a full background to all your research and the study you are undertaking. In fact, it’s useful to omit most of this kind of information because the resource you have access to in the data session are fresh eyes and ears that aren’t yet contaminated by assumptions about what is going on.

In terms of introducing your study as a whole, it’s useful to have a mini presentation (5 mins max for a 1.5h data session) prepared about your study with two or three key points that can give people an insight into where/what you are studying. Once you’ve made one of these for each study you can re-use it in multiple data sessions.

In terms of introducing each clip, have a look at Schegloff’s (2007) descriptions of his data extracts. They have a brilliantly pithy clarity that provides just enough information to understand what is going on without giving away any spoilers or showing any bias.

Preparing your audio/video data.

Assuming you already have naturalistic audio/video data of some kind, make some short clips using your favourite piece of audio/video editing software. The shorter the better (under 30s ideally) – longer clips, especially complex/busy ones may need to be broken down for analysis into smaller chunks.

It can be time-consuming searching through longer clips for specific sections, so I recommend making clips that correspond precisely to your transcript, but noting down where in the larger video/audio file this clip is located, in case someone wants to see what happens next or previously.

Copy these clips into a separate file or folder on your computer that is specifically for this data session – finding them if they’re buried in your file system can waste time.

If possible, test the audio and video projection/display equipment in the room you’re running the data session to make sure that your clips are audible and visible without headphones and on other screens. If in doubt, use audio editing software (such as Audacity) to make sure the audio in your files is as loud as possible without clipping. You can always turn a loud sound system down – but if your data and the playback system you’re using is too quiet – you’re stuck in the data session without being able to hear anything.

There are many more useful tips about sound and lighting etc. in data collection in Heath, Hindmarsh & Luff (2010).

The mechanics of showing your data

I find it useful to think of this as a kind of presentation – just like at a conference or workshop, so I recommend using presentation software to cue up and organise your clips for display rather than struggling looking through files and folders with different names etc…

Make sure each clip/slide is clearly named and/or numbered to correspond with sections of a transcript, so that people can follow it easily, and make sure you can get the clip to play – and pause/rewind/control it – with a minimum of fuss.

The data session is probably the most useful analytic resource you have after the data itself, so make sure you use every second of it.

Feedback / comments / comparisons very welcome

I hope this blog-post-grade guide is useful for those just getting into EMCA, and while I know that data session conventions vary widely, I hope this represents a sufficiently widely applicable set of recommendations to make sense in most contexts.

In general I am very interested in different data session conventions and would very much welcome tips, advice, recommendations and descriptions of specialized data session practices from other researchers/groups.

More very useful tips (thanks!):

@saul @elliotthoey that's so useful!Piece of advice I'd give is don't take data you've already worked on in detail(other than transcribing)

— Dr Jo Meredith (is on leave) (@JoMeredith82) June 10, 2014

Dr Jo Meredith adds: “because data sessions can be a bit terrifying the temptation’s to take some data you can talk about in an intelligent way, best data sessions i’ve been to have been with new pieces of data, and I’ve got inspired by other people’s observations”