Methodology

Getting a backchannel in wordwise: using “big data” with CA

Here’s the abstract to an ICCA 2018 paper I’m working on with J.P. de Ruiter at the Human Interaction Lab at Tufts. The goal is to use computational linguistic methods (that often use the term ‘backchannel’) to see if all these responsive particles really belong in one big undifferentiated ‘bucket’.

Many studies of dialogue use the catch-all term ‘backchannel’ (Yngve ,1970) to refer to a wide range of utterances and behaviors as forms of listener-feedback in interaction. The use of this wide category ignores nearly half a century of research into the highly differentiated interactional functions of ‘continuers’ such as ‘uh huh’ or ‘wow’ (Schegloff, 1982, Goodwin, 1986), acknowledgement tokens such as ‘yeah’, ‘right’ or ‘okay’ (Jefferson, 1984; Beach, 1993) and change-of-state markers such as ‘oh’ or ‘nå’ (Heritage, 1984; Heinemann, 2017). These studies show how participants use responsive particles as fully-fledged, individuated, and distinctive words that do not belong in an undifferentiated functional class of ‘backchannels’ (Sorjonen, 2001). For this paper we use the Conversation Analytic British National Corpus (CABNC) (Albert, L. de Ruiter & J. P. de Ruiter, 2015) – a 4.2M word corpus featuring audio recordings of interaction from a wide variety of everyday settings that facilitates ‘crowdsourced’ incremental improvements and multi-annotator coding. We use Bayesian model comparison to evaluate the relative predictive performance of two competing models. In the first of these, all ‘backchannels’ imply the same amount of floor-yielding, while the second CA informed model assumes that different response tokens are more or less effective in ushering extended turns or sequences to a close. We argue that using large corpora together with statistical models can also identify candidate ‘deviant cases’, providing new angles and opportunities for ongoing detailed, inductive conversation analysis. We discuss the methodological implications of using “big data” with CA, and suggest key guidelines and common pitfalls for researchers using large corpora and statistical methods at the interface between CA and cognitive psychology (De Ruiter & Albert, 2017).

References (including references for the final talk – which has many more references than this abstract).

  • Albert, S., De Ruiter, L., & De Ruiter, J. P. (2015). The CABNC. Retrieved from https://saulalbert.github.io/CABNC/ 9/09/2017
  • Albert, S., & De Ruiter, J.P. (2018, in press), Ecological grounding in interaction research. Collabra: Psychology.
  • Beach, W. A. (1990). Searching for universal features of conversation. Research on Language & Social Interaction, 24(1–4), 351–368.
  • Bolden, G. B. (2015). Transcribing as Research: ‘Manual’; Transcription and Conversation Analysis. Research on Language and Social Interaction, 48(3), 276–280. https://doi.org/10.1080/08351813.2015.1058603
  • de Ruiter, J. P., & Albert, S. (2017). An Appeal for a Methodological Fusion of Conversation Analysis and Experimental Psychology. Research on Language and Social Interaction, 50(1), 90–107. https://doi.org/10.1080/08351813.2017.1262050
  • Goodwin, C. (1986). Between and within: Alternative sequential treatments of continuers and assessments. Human Studies, 9(2), 205–217. https://doi.org/10.1007/BF00148127
  • Greiffenhagen, C., Mair, M., & Sharrock, W. (2011). From Methodology to Methodography: A Study of Qualitative and Quantitative Reasoning in Practice. Methodological Innovations Online, 6(3), 93–107. https://doi.org/10.4256/mio.2011.009
  • Hayashi, M., & Yoon, K. (2009). Negotiating boundaries in talk. Conversation Analysis: Comparative Perspectives, 27, 250.
  • Hepburn, A., & Bolden, G. B. (2017). Transcribing for social research. London: Sage.
  • Heritage, J. (1984). A change-of-state token and aspects of its sequential placement. In M. Atkinson & J. Heritage, M. Atkinson & J. Heritage (Eds.), Structures of social action: Studies in conversation analysis (pp. 299–345). Cambridge: Cambridge University Press.
  • Heritage, J. (1998). Oh-prefaced responses to inquiry. Language in Society, 27(3), 291–334. https://doi.org/10.1017/S0047404500019990
  • Heritage, J. (2002). Oh-prefaced responses to assessments: A method of modifying agreement/disagreement. In C. E. Ford, B. A. Fox, & S. A. Thompson, C. E. Ford, B. A. Fox, & S. A. Thompson (Eds.), The Language of Turn and Sequence (pp. 1–28). New York: Oxford University Press.
  • Hoey, E. M., & Kendrick, K. H. (2017). Conversation Analysis. In A. M. B. de Groot & P.Hagoort, A. M. B. de Groot & P.Hagoort (Eds.), Research Methods in Psycholinguistics: A Practical Guide (pp. 151–173). Hoboken, NJ: WileyBlackwell.
  • Housley, W., Procter, R., Edwards, A., Burnap, P., Williams, M., Sloan, L., … Greenhill, A. (2014). Big and broad social data and the sociological imagination: A collaborative response. Big Data & Society, 1(2). https://doi.org/10.1177/2053951714545135
  • Jefferson, G. (1981). On the Articulation of Topic in Conversation. Final Report. London: Social Science Research Council.
  • Jefferson, G. (1984). Notes on a systematic Deployment of the Acknowledgement tokens ’Yeah’ and ’Mmhm’. Papers in Linguistics, 17(2), 197–216. https://doi.org/10.1080/08351818409389201
  • Kendrick, K. H. (2017). Using Conversation Analysis in the Lab. Research on Language and Social Interaction , 1–11. https://doi.org/10.1080/08351813.2017.1267911
  • MacWhinney, B. (1992). The CHILDES project: Tools for analyzing talk. Child Language Teaching and Therapy, (2000).
  • Nishizaka, A. (2015). Facts and Normative Connections: Two Different Worldviews. Research on Language and Social Interaction, 48(1), 26–31. https://doi.org/10.1080/08351813.2015.993840
  • Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606. https://doi.org/10.1073/pnas.1708274114
  • Ochs, E. (1979). Transcription as theory. In E. Ochs & B. B. Schieffelin, E. Ochs & B. B. Schieffelin (Eds.), Developmental pragmatics (pp. 43–72). New York: Academic Press.
  • Potter, J., & te Molder, H. (2005). Talking cognition: Mapping and making the terrain. In J. Potter & D. Edwards, J. Potter & D. Edwards (Eds.), Conversation and cognition (pp. 1–54).
  • Sacks, H. (1963). Sociological description. Berkeley Journal of Sociology, 1–16.
  • Schegloff, E. A. (1982). Discourse as an interactional achievement: Some uses of ?uh huh?and other things that come between sentences. In D. Tannen, D. Tannen (Ed.), Analyzing discourse: Text and talk (pp. 71–93). Georgetown University Press.
  • Schegloff, E. A. (2007). Sequence organization in interaction: Volume 1: A primer in conversation analysis. Cambridge: Cambridge University Press.
  • Steensig, J., & Heinemann, T. (2015). Opening Up Codings? Research on Language and Social Interaction, 48(1), 20–25. https://doi.org/10.1080/08351813.2015.993838
  • Stivers, T. (2015). Coding Social Interaction: A Heretical Approach in Conversation Analysis? Research on Language and Social Interaction, 48(1), 1–19. https://doi.org/10.1080/08351813.2015.993837
  • Rühlemann (2017). Integrating Corpus-Linguistic and Conversation-Analytic Transcription in XML: The Case of Backchannels and Overlap in Storytelling Interaction. Corpus Pragmatics, 1(3), 201–232.
  • Rühlemann, C., & Gee, M. (2018). Conversation Analysis and the XML method. Gesprächsforschung–Online-Zeitschrift Zur Verbalen Interaktion, 18.
  • Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., & Sloetjes, H. (2006). ELAN: a professional framework for multimodality research. In 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 1556–1559).
  • Yngve, V. (1970). On getting a word in edgewise. Chicago Linguistics Society, 6th Meeting, 566–579. Retrieved from http://ci.nii.ac.jp/naid/10009705656/

Sustainable Research Literature Management with Docear part II

docear.png

This is part II of a two-part post in which I will walk you through some key parts of a technically-savvy user’s long-term literature review and maintenance strategy.

In part I you learned to

  • take notes while reading that won’t get lost or damaged by your software,
  • organise notes and annotations so you won’t forget why you took them,

In this section, you will learn how to:

  • maintain associated bibliographical records,
  • use Docear in a way that will keep your literature reviewing current for years to come.

Warning: long post, so here’s a table of contents:

First, Import your annotations into Docear to manage them

So far, this guide has given you some pretty generic advice about note taking, you could use it in any piece of software. The Docear-specific pay-off for this process comes when you import your PDFs into Docear: you can use Docear’s internal scripting language (well, FreePlane’s version of Groovy) to format, re-organise and label your new annotations automatically. I have some complicated scripts that I won’t cover here, but here’s a very simple one I use to automatically apply visual labels to my annotations.

Docear offers a number of visual labels you can use to decorate the nodes in your maps, to make them visually appealing and easily distinguishable:

icons.png

I have written a script that looks for all annotations beginning with ‘idea’, or ‘ref’ or ‘term’, and allocates them one of a number of pre-set visual labels provided by Docear.

// @ExecutionModes({ON_SELECTED_NODE, ON_SELECTED_NODE_RECURSIVELY})
if (node.text.toLowerCase().startsWith("todo")) {
      node.getIcons().addIcon("checked")
} else if (node.text.toLowerCase().startsWith("idea"))  {
      node.getIcons().addIcon("idea")
} else if (node.text.toLowerCase().startsWith("ref"))  {
      node.getIcons().addIcon("attach")
} else if (node.text.toLowerCase().startsWith("question"))  {
      node.getIcons().addIcon("help")
} else if (node.text.toLowerCase().startsWith("q:"))  {
      node.getIcons().addIcon("help")
} else if (node.text.toLowerCase().startsWith("quote"))  {
      node.getIcons().addIcon("bookmark")
} else if (node.text.toLowerCase().startsWith("note"))  {
      node.getIcons().addIcon("edit")
} else if (node.text.toLowerCase().startsWith("term"))  {
      node.getIcons().addIcon("desktop_new")
} else if (node.text.toLowerCase().startsWith("crit"))  {
      node.getIcons().addIcon("pencil")
}

To install this script, I wrote this code to a file called addiconNodes.groovy, which I then put it in my /home/saul/.docear/scripts directory (NB: the location of this directory may vary on Mac/Win). Docear also has a built-in script editor you can use to write groovy scripts. The script then becomes available as a contextual menu item.

Here are some illustrations showing the script being run on a newly imported paper:
before.png

Selecting the script option in Docear:
during.png

What it looks like when the script has run finished:
after.png

And how you might then choose to organise your annotations:
organised.png

You’ll find lots of Freeplane scripts you can modify and play with here – there are some amazing possibilities – the icons script above doesn’t even begin to scratch the surface of what this method could do for your literature reviewing process.

Using Docear to manage thousands of papers across multiple projects and multiple years

Docear’s demo shows someone writing a paper with about 20 or 30 references. This is fine for one project, but I have over 3000 PDF books and papers in my literature repository. Over the years, I suspect this will continue to grow. I want to feel secure that my library of research papers, annotations and references is in one safe location on my hard drive. I also don’t want to have to duplicate those PDFs each time I start a new project. Here are some of my solutions to these issues geared towards a long-term research strategy.

First: a geeky caveat

Having recommended Docear for the approach outlined so far, there are some problems with Docear that I think you will have to address if you are really going to use it for a long-term research and literature management strategy.

If you think you might just use it for a masters-level one year project, the rest of this guide probably isn’t necessary. If you want to read your way into and stay up to date with the vast literature of one or more academic fields long-term, read on, but be warned: it gets even more geeky from here on in.

Docear’s default per-project folder structure and its problems

At the moment Docear encourages you to store your PDFs on a per-project basis, as if you were starting from literature year 0 each time you write something. Also (by default, at least) it puts them in a rather obscure folder structure. I don’t really trust myself to reliably back-up obscure folder structures.

Here’s how Docear does a default file structure for a demo project I just created

/Home
    /Docear
            /projects
                /Docear demo
                    /_data
                        /!!!info.txt
                        /1493C9745013F2UNMIV1XCU93LBPZ4Y0KU14
                            /default_files
                                /Docear demo.bib
                                /literature_and_annotations.mm
                                /temp.mm
                                /trash.mm
                            /My Drafts
                                /My New Paper.mm
                            /settings.xml
                    /literature_repository
                        /where_I_am_expected_to_put_my_pdfs.pdf
                        /Example PDFs
                            /Docears_sample_PDFs.pdf
                    /Project Data.mm

The idea from Docear’s developers here is that you get a few files by default when you start a new project including a dummy ‘My New Paper.mm’ (.mm stands for Mind Map) , a project-name.bib file and a literature_and_annotations.mm file, and a folder to hold all the PDFs you’ve associated with this project..

The literature_and_annotations.mm file contains a script that – when you open it – will scan through this project-specific literature_repository and check for updated files or new annotations.

This creates several problems:

  1. Docear’s structure works fine for a per-project use, but I have 9GB of PDFs, I do not want to wait for Docear to scan through those and check for updates every time I start it up.
  2. I’d rather not store all my precious PDFs with thousands of hours-worth of annotations 5 levels down an application-specific folder hierarchy that I may or may not remember to transfer to a new machine. Similarly, I don’t want my references – which may have taken a long time to assemble stored in a folder handily called ‘1493C9745013F2UNMIV1XCU93LBPZ4Y0KU14’.
  3. I want to be able to use PDFs and bibliographic data from all my previous projects in new projects easily.

Work-around 1: consolidating your literature archive

Use a ‘main_literature_repository’ for your key files

  • I create a ‘default’ project into which I first import for all my PDFs and references
  • I set up this project to store PDFs in a folder in my Dropbox called main_literature_repository.
  • I put the main BibTeX file for this project in the same folder.

This means I have one canonical BibTeX file with all the books and papers I will ever import in my main_literature_repository folder – this makes it easy to back up. I use Dropbox to keep rough versioning for me in case I do something silly or lose my machine/s – use your backup strategy and folder location of choice.

Use a per-paper mind map for long-term annotation and re-annotation

I delete the literature_and_annotations.mm file from my default project. I do not want to wait 3 hours while Docear scans through all 9GB of my papers when it starts up. Instead, in the same main_literature_repository folder, I create a per-paper mind map.

I do this because I may use a paper six times in six different paper/research contexts. Ideally, I want to be able to read and re-read it, and keep track of what interested me about it and when… not have to delve into each project I used it for.

So, once I’ve done my annotations, I create a new map in my default project, I import the PDF, copy and paste the title of the PDF and use it to name the new mind map identically to the paper, so that in my main_literature_repository folder I have:

/home
    /saul
        /literature_repository
                /my_new_favourite_paper.pdf
                /my_new_favourite_paper.mm

Apart from anything else, I can glance through the folder listed alphabetically and see which papers I have actually read! Now every time I update my annotations in that paper, I can import them into this paper-specific map, and organise them.

I might want to do this in a number of ways (one big list, thematically etc.) but I usually organise them to show how they relate to the project I’m currently working on. If I read that paper three or four times, and each time I organise the new annotations in this way, after six or seven readings/uses of that paper it’s going to be interesting to be able to see how my use of this paper has changed over time.

A quick example of importing a paper:

I download a paper helpfully entitled: ‘12312131231512312313.pdf’ from a publisher. I re-name it something useful (e.g.:AuthornameYYYY-title.pdf)2, and put it into my main_literature_repository folder, using Docear or Jabref to add or automatically import its bibliographical record into my default project BibTeX file. Once I’ve read it and taken notes in annotations, I manually create a new map just for this paper in the same main_literature_repository folder. This is now the mother lode folder with all my really important work in it.

Work-around 2: using your papers across multiple projects

This bit is kind of tricky and involves many trade-offs, I think Docear will fix it some day, for now, this is how I am doing it.

I create a new project, and treat it as a ‘sub-project’ of my default project

When I start a new Docear project, I let Docear create a default folder structure something like the one above. To get literature from my main_literature_repository folder into this new project, I create symbolic file system links to the relevant PDF files in my main_literature_repository in the new sub-project-specific literature_repository folder.

So the original PDFs live here:

/home
    /saul
         /literature_repository
                    /my_new_favourite_paper1.pdf
                    /my_new_favourite_paper2.pdf
                    /my_new_favourite_paper3.pdf
                    /my_new_favourite_paper4.pdf

I create a new Docear project and create symlinks (only for the relevant PDFs) here:

/Home
    /Docear
            /projects
                /Docear demo
                    /_data
                        /!!!info.txt
                        /1493C9745013F2UNMIV1XCU93LBPZ4Y0KU14
                            /default_files
                                /Docear demo.bib
                                /literature_and_annotations.mm
                                /temp.mm
                                /trash.mm
                            /My Drafts
                                /My New Paper.mm
                            /settings.xml
                    /literature_repository
                        /link_to_my_new_favourite_paper1.pdf
                        /link_to_my_new_favourite_paper2.pdf
                        /link_to_my_new_favourite_paper3.pdf
                        /link_to_my_new_favourite_paper4.pdf
                        /Example PDFs
                            /Docears_default_sample.pdf
                    /Project Data.mm

Now if I open my literature_and_annotations.mm file in the new sub-project, it will import these new PDFs and their associated annotations and I can start working with them. Of course any changes to annotations I make in these maps will also change annotations in the original PDFs.

Maintaining bibliographical references across Docear projects (and other software)

The only issue with this approach so far is that your new project-specific BibTeX file will not automatically import metadata from your ‘default’ project. This means that when you import your PDFs into this new project’s literature_and_annotations.mm map, they will have no bibliographical reference data attached.

To understand why – and how to solve it – you need to know a little bit more about how Docear works:

Docear allows you to re-organise your annotations while maintaining their associations with the bibliographical reference of the paper they’re drawn from by linking nodes in your Mind Map (.mm) files to PDFs referenced in your BibTeX file. Docear does this by adding a ‘file’ BibTeX field entry for each paper. Here’s an example of a BibTeX entry from my databse:

ARTICLE{Hepburn2012,
  author = {Alexa Hepburn and Sue Wilkinson and Rebecca Shaw},
  title = {Repairing self- and recipient reference},
  journal = {Research on Language and Social Interaction},
  year = {2012},
  volume = {45},
  pages = {175-190},
  number = {2},
  file = {:/home/saul/main_literature_repository/hepburn_repairingselfand_2012.pdf:PDF},
  keywords = {EMCA, Self-reference, Reference ; Repair},
}

So when Docear scans this PDF, it extracts its annotations, places them in the map, and creates a hyperlink to the PDF listed the file field. This means if I click on the node, it opens the file. Docear also extracts the bibliographical information from this BibTeX reference, and then adds them as attributes of the associated annotation node on my map.

So, when I create a symbolic link to this file in my new sub-project, Docear sees it as a new PDF, namely:

/home/saul/Docear/projects/sub-project-title/literature_repository/hepburn_repairingselfand_2012.pdf

But it doesn’t have any BibTeX data in this new project, so it won’t recognise this PDF and paste in associated bibliographical data.

To solve this issue, there are several possible solutions:

  • Sym link your ‘default project’ BibTeX file into each new sub-project using a symlink – just like you do with your PDF files.
  • Duplicate your ‘default project’ BibTeX file into each new sub-project, search/replacing the ‘file’ field of each entry to point to your new sub-project’s literature_repository folder.
  • Or, (and this is what I do), open your main BibTeX file in a recent, stand-alone version of JabRef and use the ‘write XMP data’ option to make sure that the PDFs themselves contain their own reference data. When you import these PDFs, you can then use the reference embedded in the PDF itself to create a new and separate project-specific BiBteX file.

JabRef’s XMP writing option:

jabref_xmpp.png

This third option is preferable to me for several reasons:

  1. I don’t want to see all my references in every new project – it’s distracting.
  2. XMP data can be read by lots of other bits of software so it makes my reference library somewhat more portable. Also, if I lose my BibTeX file in some catastrophic data loss episode, as long as I have my PDFs with XMP bibliographic data I can pretty much reconstruct my literature, annotation and reference archive from just those files.
  3. I may want to update my bibliographical records for a new project, but keep the references of older projects intact. Although I’m aware that I improve my bibliographies continually and incrementally, I really want to control how I change them. For example, if I continually use my default project BibTeX file, symlinked in to each new sub-project as in option 1, I may not be able to re-generate a paper I wrote three years ago before I made those changes and improvements. I really want that paper to be re-created exactly as it was when I wrote it, including all the reference details and errors. I can always update an old BibTeX file from an old project easily – because the PDF file itself now contains the latest up-to-date XMP data.

I see this feature of the latest, stand-alone version of JabRef (not available in Docear’s embedded version of JabRef) as a significant plus in terms of the sustainability of this approach to literature management.

Things I didn’t cover but may post about in the future

There are lots of other things you can do using this approach to Docear – and Docear’s approach in general, a few I can think of that I didn’t cover are:

  • Using the command line to search/filter your annotations.
  • Using recoll, spotlight or similar configurable full text search systems on your repository.
  • Importing folder structures containing other research materials into your map.
  • Using Docear (or freeplane) to take detailed and well structured notes during lectures.
  • Using Docear to manage and search Jeffersonian transcripts of conversational data.

If you have any questions or would like to hear about these, drop me an email or get me on @saul

Notes

  1. ^ Because I like small tools for simple jobs, I actually do this using JabRef in stand-alone mode, along with JabRef’s rename files plugin to do this automatically and configurably. NB: Docear will do this automatically in upcoming versions – the feature is already in there, just not quite ready yet.

Sustainable Research Literature Management with Docear part I

docear.png

This is part I of a two-part post in which I will walk you through some key parts of a technically-savvy user’s long-term literature review and maintenance strategy. In part I you will learn how to use Docear to:

  • take notes while reading that won’t get lost or damaged by your software,
  • organise notes and annotations so you won’t forget why you took them,

In part II, should you choose to get geeky and read that bit too, you will learn to:

  • maintain associated bibliographical records,
  • use Docear in a way that will keep your literature reviewing current for years to come.

Warning: long post, so here’s a table of contents:

Introduction

What this guide is for

There are many software systems that purport to be helpful in managing academic literature, and everyone swears by their own. My belief about software is that it’s usually a nightmare, and your choice should be driven by considerations of damage limitation. With that in mind, I am using Docear to limit the damage that software can do to my literature reviewing and thesis preparation process.

This guide will outline some ways to use this software with long-term sustainability in mind. If you don’t know what Docear is, you could spend 6 minutes watching this video.

If you’re starting a PhD or a research process, and thinking about how to keep up with the literature long-term, you might want to think about using Docear in the ways described here. To get started with that, first download and install Docear, read Docear’s own very good user guide to understand the basics, then come back and read this1.

Why Docear works for a long-term research strategy

There are lots of good reasons listed on the Docear website that compare Docear’s features to Zotero, Mendeley or other reference management systems.

My choices are driven by issues of long-term software sustainability, and focus on cross-compatibility, reliability and stability. Docear fits my criteria because:

  • It’s Open Source software using well adopted, documented and supported file formats.
  • Docear’s plain text-based file formats for are searchable and editable.
  • Text-based files enable version control and collaboration (including with your future self).
  • Docear, JabRef and FreePlane all work together or separably on most platforms.

In general, Docear conforms with the tenets of Unix Philosophy i.e.: Docear is designed to be modular, clear, simple, transparent, robust, and extensible for users and developers.

What all this means for academics is that

  • You are probably always going to be able to edit and view these files on any platform.
  • If you just want to change a bibliographic reference, you can just use the bibliography manager (or a text editor) to do it on any computing platform without even firing up Docear.
  • If you just want to view your Docear file on Android, i0S, or using any mind-map viewer, you can open it (albeit with limited features) in FreePlane, FreeMind, Xmind or the many associated pieces of software that can read these files.
  • If you want to search your entire archive of papers, you can do it using grep on a command line or with any text-search and indexing system that can read your file system (I use Recoll).
  • It doesn’t mess with your files or do complex or potentially destructive things, use fancy databases etc. You can move away from Docear at any time – you’ll still have your annotations, your PDFs, your BibTeX reference files.

No vendor lock-in, no dodgy or dangerous games with your data. That’s a lot of damage-limitation right there, and this isn’t even mentioning a compelling and unusual combination of features that Docear itself documents very well – so I won’t go over those, but nonetheless, here is my list of:

Killer features of Docear

  • Import annotations from PDFs, and cross-sync them (change the annotation in your PDF – it gets changed in Docear, change it in Docear, it gets synced in your PDF).
  • Organise your annotations in multiple ways
    • Organise your annotations visually by research theme / category / heading
    • Organise your annotations visually by paper / book / author
    • Mix these up, copy and paste annotations multiple times, make further notes on annotations etc.
  • Import file/folder structures from your hard disk, so you can get an overview of your data, files and research materials alongside your literature, and make notes and connections between them.
  • Maintain the bibliographical associations of your annotations and notes, even after copy/pasting/reorganising them.

Just to re-state this: I’m not going to go through these basics in this how-to, so if you want to learn to use Docear from scratch you really should read the manual. What follows are some adaptations I’ve made to the Docear workflow that I think make it even more useful as a secure and long-term bet for research literature management.

How to take notes that won’t get lost or corrupted

PDFs, however flawed as a document format, are a de facto standard in academia and aren’t going away soon. You can read, edit and share them relatively easily on all devices and platforms, so that’s probably how you should store your annotations and bibliographical data.

General annotation strategy

Many pieces of literature review / bibliography management / annotation software keep notes and bibliographical records scattered about in proprietary databases or separate annotation files, so following Docear’s excellent advice on the issue I use ezPDF Reader on Android, and PDF-XChange Viewer on Linux (via wine) to make my annotations in my PDFs themselves.

Docear allows you to manage these annotations effectively without sacrificing the simplicity and security of having it all in one, cross-platform, easily accessible file.

Synchronisation and backup across clients/computers

The benefits of this are clear: you can easily back up your PDFs.

I use Dropsync to synchronise my main_literature_repository folder with a folder on my Android tablet, so when I’m on the go I can take notes and have them appear automatically in my literature review mind map when I start up Docear.

I tried using Dropbox’s own android client, I found that it would sync too frequently, and sometimes randomly deletes its temporary files. For this reason I recommend syncing your entire PDF repository to your mobile devices, editing the PDF locally (on the android device’s file system), then synchronising with Dropbox or whatever local/cloud/repo/backup service you prefer.

How to remember why you took your notes in the first place

Use action-related tags for each annotation

I have most of my research ideas while reading, but they’re not all just ‘notes’, they are really different in response to different ideas about what I plan to do with that idea. So I find it useful to distinguish between the kinds of notes I take on documents. When I take an annotation, I track that difference by starting the annotation with one of 10 or so labels:

  • todo: The most important label – this reminds me to do something (look up a paper, change something in my manuscript etc.)
  • idea: I’m inspired with a new idea, somehow based on this paper, but it’s my own thing.
  • ref: This is a reference, or contains a reference that I want to use for something.
  • question: or just q: I have a question about this, maybe to ask the author or myself in relation to my data / research.
  • quote: I want to quote this, or it contains a useful quote
  • note: Not a specific use in mind for this, but it’s worth remembering next time I pick up this paper.
  • term: A new term or word I’m not familiar with: I look it up or define it in the annotation.
  • crit: I have a criticism of this bit of the paper.

There are a few others I use occasionally, but these are the most common. You probably can think of your own based on how you would categorise the kinds of thoughts that come to you while reading research papers.

Use keywords for each research project/idea

I have 3 or 4 project constantly on the go, and lots of ideas for new projects and papers. I want to capture my responses to what I’m reading in relation to those projects in a reliable way.

So, I have short, unique keywords for each of my projects:

  • camedia: a CA project about how people talk about the recording devices they’re using
  • cadance: a CA project about partner dance
  • thesis: my thesis
  • thesis_noticings: my chapter on noticings
  • thesis_introduction: get it?

So if I’m reading a paper and it says something like:

“Something I really disagree with and want to comment on or respond to in my next article on dance”

I’ll highlight, copy and paste that into a new annotation, and add a few keywords on the top:

    quote: cadance: "Something I really disagree with and want to comment on or respond to in my next article on dance"

This means when I search for all my annotations to do with ‘cadance’ project, I’ll find this one, and I’ll know I wanted to use this as a quote.

Similarly, I may have multiple projects:

    quote: cadance: thesis_noticings: "Something I really disagree with and want to comment on or respond to in my next article on dance"

If I want to quote something, but also want to write a note about it, I’ll make two separate annotations on the PDF, one that says:

    quote: cadance: thesis_noticings: "Something I really disagree with and want to comment on or respond to in my next article on dance"

The other that says:

    note: cadance: thesis_noticings: "Something I really disagree with and want to comment on or respond to in my next article on dance": I really disagree with this for reason, reason and reason.

These will show up in my literature review map as two separate annotations, with different actions attached to them.

Use auto-completion software to make this less painful

I use Switfkey for all my annotation on my Android tablet (where I do most of it). This greatly reduces the time required to type in repetitive tags or keywords that I use all the time to enhance my annotations (see the next section). It also offers auto-complete suggestions so I can remember more complex project keywords / tags easily.

That’s the general advice bit. Geeky advice follows in part II.

So far, this guide has given you some pretty generic advice about note taking, you could use it in any piece of software. The Docear-specific pay-off for this process comes when you import your PDFs into Docear. However, that bit gets pretty geeky. You’ll need to be comfortable with scripting, modifying workflows of existing software packages, and generally be unperturbed by geeky terminology.

If this isn’t your thing, you can just use Docear with the above strategies – or use them more generally in your literature reviewing.

If you are geekily inclined, or just curious, check out part II of this post.

Notes

  1. ^ One little gotcha: if you’re using a Mac (esp. Yosemite (10.9.X or newer)), you’ll have to do some terminal diddling to make sure you’ve enabled software from unsigned sources to run on your machine or you’ll get an unhelpful error message. Thanks Apple!