I gave this keynote at the first European Conference on Conversation Analysis (ECCA 2020), which, due to C-19, had to be delivered as a video instead of a stand-up talk.
I tried to make a mix between a film essay and a research presentation of work-in-progress, so it didn’t always work to put references on every slide. I’ve added them below with links to the data used where available.
Sacks’ (1963) first published paper on ‘sociological description’ uses the metaphor of a mysterious ‘talking-and-doing’ machine, where researchers from different disciplines come up with incompatible, contradictory descriptions of its functionality. We may soon find ourselves in a similar situation to the one Sacks describes as AI continues to permeate the social sciences, and CA begins to encounter AI either as a research object, as a research tool, or more likely as a pervasive feature of both.
There is now a thriving industry in ‘Conversational AI’ and AI-based tools that claim to emulate or analyse talk, but both the study and use of AI within CA is still unusual. While a growing literature is using CA to study social robotics, voice interfaces, and conversational user experience design (Pelikan & Broth, 2016; Porcheron et al., 2018), few conversation analysts even use digital tools, let alone the statistical and computational methods that underpin conversational AI. Similarly, researchers and developers of conversational AI rarely cite CA research and have only recently become interested in CA as a possible solution to hard problems in natural language processing (NLP). This situation presents an opportunity for mutual engagement between conversational AI and CA (Housley et al., 2019). To prompt a debate on this issue, I will present three projects that combine AI and CA very differently and discusses the implications and possibilities for combined research programmes.
The first project uses a series of single case analyses to explore recordings in which an advanced conversational AI successfully makes appointments over the phone with a human call-taker. The second revisits debates on using automated speech recognition for CA transcription (Moore, 2015) in light of significant recent advances in AI-based speech-to-text, and includes a live demo of ‘Gailbot’, a Jeffersonian automated transcription system. The third project both uses and studies AI in an applied CA context. Using video analysis, it asks how a disabled man and his care worker interact while using AI-based voice interfaces and a co-designed ‘home automation’ system as part of a domestic routine of waking, eating, and personal care. Data are drawn from a corpus of ~500 hours of video data recorded by the participants using a voice-controlled, AI-based ‘smart security camera’ system.
These three examples of CA’s potential interpretations and uses of AI’s ‘talking-and-doing’ machines provide material for a debate about how CA research programmes might conceptualize AI, and use or combine it with CA in a mutually informative way.
Videos (in order of appearance)
The Senster. (2007, March 29). https://www.youtube.com/watch?v=wY85GrYGnyw
MIT AI Lab. (2011, September 25). https://www.youtube.com/watch?v=hp9NHNKTV-M
Keynote (Google I/O ’18). (2018, May 9). https://www.youtube.com/watch?v=ogfYd705cRs
Linguistic Data Consortium. (2013). CABank CallHome English Corpus [Data set]. Talkbank. https://ca.talkbank.org/access/CallHome/eng.html
Jefferson, G. (2007). CABank English Jefferson NB Corpus [Data set]. TalkBank. https://doi.org/10.21415/T58P4Z
Agre, P. (1997). Toward a critical technical practice: Lessons learned in trying to reform AI. Social Science, Technical Systems and Cooperative Work: Beyond the Great Divide. Erlbaum.
Alač, M., Gluzman, Y., Aflatoun, T., Bari, A., Jing, B., & Mozqueda, G. (2020). How Everyday Interactions with Digital Voice Assistants Resist a Return to the Individual. Evental Aesthetics, 9(1), 51.
Berger, I., Viney, R., & Rae, J. P. (2016). Do continuing states of incipient talk exist? Journal of Pragmatics, 91, 29–44. https://doi.org/10.1016/j.pragma.2015.10.009
Bolden, G. B. (2015). Transcribing as Research: “Manual” Transcription and Conversation Analysis. Research on Language and Social Interaction, 48(3), 276–280. https://doi.org/10.1080/08351813.2015.1058603
Brooker, P., Dutton, W., & Mair, M. (2019). The new ghosts in the machine: “Pragmatist” AI and the conceptual perils of anthropomorphic description. Ethnographic Studies, 16, 272–298. https://doi.org/10.5281/zenodo.3459327
Button, Graham. (1990). Going Up a Blind Alley: Conflating Conversation Analysis and Computational Modelling. In P. Luff, N. Gilbert, & D. Frolich (Eds.), Computers and Conversation (pp. 67–90). Academic Press. https://doi.org/10.1016/B978-0-08-050264-9.50009-9
Button, Graham, & Dourish, P. (1996). Technomethodology: Paradoxes and possibilities. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. http://dl.acm.org/citation.cfm?id=238394
Button, G., & Sharrock, W. (1996). Project work: The organisation of collaborative design and development in software engineering. Computer Supported Cooperative Work (CSCW), 5(4), 369–386. https://doi.org/10.1007/BF00136711
Casino, T., & Freenor, Michael. (2018). An introduction to Google Duplex and natural conversations, Willowtree. https://willowtreeapps.com/ideas/an-introduction-to-google-duplex-and-natural-conversations
Duca, D. (2019). Who’s disrupting transcription in academia? — SAGE Ocean | Big Data, New Tech, Social Science. SAGE Ocean. https://ocean.sagepub.com/blog/whos-disrupting-transcription-in-academia
Fischer, J. E., Reeves, S., Porcheron, M., & Sikveland, R. O. (2019). Progressivity for voice interface design. Proceedings of the 1st International Conference on Conversational User Interfaces – CUI ’19, 1–8. https://doi.org/10.1145/3342775.3342788
Garfinkel, H. (1967). Studies in ethnomethodology. Prentice-Hall.
Goodwin, C. (1996). Transparent vision. In E. A. Schegloff & S. A. Thompson (Eds.), Interaction and Grammar (pp. 370–404). Cambridge University Press.
Heath, C., & Luff, P. (1992). Collaboration and control: Crisis management and multimedia technology in London Underground Line Control Rooms. Computer Supported Cooperative Work (CSCW), 1(1–2), 69–94.
Heritage, J. (1984). Garfinkel and ethnomethodology. Polity Press.
Heritage, J. (1988). Explanations as accounts: A conversation analytic perspective. In C. Antaki (Ed.), Analysing Everyday Explanation: A Casebook of Methods (pp. 127–144). Sage Publications.
Hoey, E. M. (2017). Lapse organization in interaction [PhD Thesis, Max Planck Institute for Psycholinguistics, Radbound University, Nijmegen]. http://bit.ly/hoey2017
Housley, W., Albert, S., & Stokoe, E. (2019). Natural Action Processing. In J. E. Fischer, S. Martindale, M. Porcheron, S. Reeves, & J. Spence (Eds.), Proceedings of the Halfway to the Future Symposium 2019 (pp. 1–4). Association for Computing Machinery. https://doi.org/10.1145/3363384.3363478
Kendrick, K. H. (2017). Using Conversation Analysis in the Lab. Research on Language and Social Interaction, 50(1), 1–11. https://doi.org/10.1080/08351813.2017.1267911
Lee, S.-H. (2006). Second summonings in Korean telephone conversation openings. Language in Society, 35(02). https://doi.org/10.1017/S0047404506060118
Leviathan, Y., & Matias, Y. (2018). Google Duplex: An AI System for Accomplishing Real-World Tasks Over the Phone [Blog]. Google AI Blog. http://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html
Local, J., & Walker, G. (2005). Methodological Imperatives for Investigating the Phonetic Organization and Phonological Structures of Spontaneous Speech. Phonetica, 62(2–4), 120–130. https://doi.org/10.1159/000090093
Luff, P., Gilbert, N., & Frolich, D. (Eds.). (1990). Computers and Conversation. Academic Press.
Moore, R. J. (2015). Automated Transcription and Conversation Analysis. Research on Language and Social Interaction, 48(3), 253–270. https://doi.org/10.1080/08351813.2015.1058600
Ogden, R. (2015). Data Always Invite Us to Listen Again: Arguments for Mixing Our Methods. Research on Language and Social Interaction, 48(3), 271–275. https://doi.org/10.1080/08351813.2015.1058601
O’Leary, D. E. (2019). Google’s Duplex: Pretending to be human. Intelligent Systems in Accounting, Finance and Management, 26(1), 46–53. https://doi.org/10.1002/isaf.1443
Pelikan, H. R. M., & Broth, M. (2016). Why That Nao? Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems – CHI \textquotesingle16. https://doi.org/10.1145/2858036.2858478
Pelikan, H. R. M., Broth, M., & Keevallik, L. (2020). “Are You Sad, Cozmo?”: How Humans Make Sense of a Home Robot’s Emotion Displays. Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, 461–470. https://doi.org/10.1145/3319502.3374814
Porcheron, M., Fischer, J. E., Reeves, S., & Sharples, S. (2018). Voice Interfaces in Everyday Life. Proceedings of the 2018 ACM Conference on Human Factors in Computing Systems (CHI’18).
Reeves, S. (2017). Some conversational challenges of talking with machines. Talking with Conversational Agents in Collaborative Action, Workshop at the 20th ACM Conference on Computer-Supported Cooperative Work and Social Computing. http://eprints.nottingham.ac.uk/40510/
Relieu, M., Sahin, M., & Francillon, A. (2019). Lenny the bot as a resource for sequential analysis: Exploring the treatment of Next Turn Repair Initiation in the beginnings of unsolicited calls. https://doi.org/10.18420/muc2019-ws-645
Robles, J. S., DiDomenico, S., & Raclaw, J. (2018). Doing being an ordinary technology and social media user. Language & Communication, 60, 150–167. https://doi.org/10.1016/j.langcom.2018.03.002
Sacks, H. (1984). On doing “being ordinary.” In J. Heritage & J. M. Atkinson (Eds.), Structures of social action: Studies in conversation analysis (pp. 413–429). Cambridge University Press.
Sacks, H. (1987). On the preferences for agreement and contiguity in sequences in conversation. In G Button & J. R. Lee (Eds.), Talk and social organization (pp. 54–69). Multilingual Matters.
Sacks, H. (1995a). Lectures on conversation: Vol. II (G. Jefferson, Ed.). Wiley-Blackwell.
Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 50(4), 696–735. https://doi.org/10.2307/412243
Sahin, M., Relieu, M., & Francillon, A. (2017). Using chatbots against voice spam: Analyzing Lenny’s effectiveness. Proceedings of the Thirteenth Symposium on Usable Privacy and Security, 319–337.
Schegloff, E. A. (1988). On an Actual Virtual Servo-Mechanism for Guessing Bad News: A Single Case Conjecture. Social Problems, 35(4), 442–457. https://doi.org/10.2307/800596
Schegloff, E. A. (1993). Reflections on Quantification in the Study of Conversation. Research on Language & Social Interaction, 26(1), 99–128. https://doi.org/10.1207/s15327973rlsi2601_5
Schegloff, E. A. (2004). Answering the Phone. In G. H. Lerner (Ed.), Conversation Analysis: Studies from the First Generation (pp. 63–109). John Benjamins Publishing Company.
Schegloff, E. A. (2010). Some Other “Uh(m)s.” Discourse Processes, 47(2), 130–174. https://doi.org/10.1080/01638530903223380
Soltau, H., Saon, G., & Kingsbury, B. (2010). The IBM Attila speech recognition toolkit. 2010 IEEE Spoken Language Technology Workshop, 97–102. https://doi.org/10.1109/SLT.2010.5700829
Stivers, T. (2015). Coding Social Interaction: A Heretical Approach in Conversation Analysis? Research on Language and Social Interaction, 48(1), 1–19. https://doi.org/10.1080/08351813.2015.993837
Stokoe, E. (2011). Simulated Interaction and Communication Skills Training: The `Conversation-Analytic Role-Play Method’. In Applied Conversation Analysis (pp. 119–139). Palgrave Macmillan UK. https://doi.org/10.1057/9780230316874_7
Stokoe, E. (2013). The (In)Authenticity of Simulated Talk: Comparing Role-Played and Actual Interaction and the Implications for Communication Training. Research on Language & Social Interaction, 46(2), 165–185. https://doi.org/10.1080/08351813.2013.780341
Stokoe, E. (2014). The Conversation Analytic Role-play Method (CARM): A Method for Training Communication Skills as an Alternative to Simulated Role-play. Research on Language and Social Interaction, 47(3), 255–265. https://doi.org/10.1080/08351813.2014.925663
Stokoe, E., Sikveland, R. O., Albert, S., Hamann, M., & Housley, W. (2020). Can humans simulate talking like other humans? Comparing simulated clients to real customers in service inquiries. Discourse Studies, 22(1), 87–109. https://doi.org/10.1177/1461445619887537
Turing, A. (1950). Computing machinery and intelligence. Mind, 49, 433–460.
Walker, G. (2017). Pitch and the Projection of More Talk. Research on Language and Social Interaction, 50(2), 206–225. https://doi.org/10.1080/08351813.2017.1301310
Wong, J. C. (2019, May 29). “A white-collar sweatshop”: Google Assistant contractors allege wage theft. The Guardian. https://www.theguardian.com/technology/2019/may/28/a-white-collar-sweatshop-google-assistant-contractors-allege-wage-theft