Sustainable Research Literature Management with Docear part II

13 Comments / blog / April 30, 2015 / howto, Methodology, research

This is part II of a two-part post in which I will walk you through some key parts of a technically-savvy user’s long-term literature review and maintenance strategy.

In part I you learned to

take notes while reading that won’t get lost or damaged by your software,
organise notes and annotations so you won’t forget why you took them,

In this section, you will learn how to:

maintain associated bibliographical records,
use Docear in a way that will keep your literature reviewing current for years to come.

Warning: long post, so here’s a table of contents:

Contents

First, Import your annotations into Docear to manage them

So far, this guide has given you some pretty generic advice about note taking, you could use it in any piece of software. The Docear-specific pay-off for this process comes when you import your PDFs into Docear: you can use Docear’s internal scripting language (well, FreePlane’s version of Groovy) to format, re-organise and label your new annotations automatically. I have some complicated scripts that I won’t cover here, but here’s a very simple one I use to automatically apply visual labels to my annotations.

Docear offers a number of visual labels you can use to decorate the nodes in your maps, to make them visually appealing and easily distinguishable:

I have written a script that looks for all annotations beginning with ‘idea’, or ‘ref’ or ‘term’, and allocates them one of a number of pre-set visual labels provided by Docear.

// @ExecutionModes({ON_SELECTED_NODE, ON_SELECTED_NODE_RECURSIVELY})
if (node.text.toLowerCase().startsWith("todo")) {
      node.getIcons().addIcon("checked")
} else if (node.text.toLowerCase().startsWith("idea"))  {
      node.getIcons().addIcon("idea")
} else if (node.text.toLowerCase().startsWith("ref"))  {
      node.getIcons().addIcon("attach")
} else if (node.text.toLowerCase().startsWith("question"))  {
      node.getIcons().addIcon("help")
} else if (node.text.toLowerCase().startsWith("q:"))  {
      node.getIcons().addIcon("help")
} else if (node.text.toLowerCase().startsWith("quote"))  {
      node.getIcons().addIcon("bookmark")
} else if (node.text.toLowerCase().startsWith("note"))  {
      node.getIcons().addIcon("edit")
} else if (node.text.toLowerCase().startsWith("term"))  {
      node.getIcons().addIcon("desktop_new")
} else if (node.text.toLowerCase().startsWith("crit"))  {
      node.getIcons().addIcon("pencil")
}

To install this script, I wrote this code to a file called addiconNodes.groovy, which I then put it in my /home/saul/.docear/scripts directory (NB: the location of this directory may vary on Mac/Win). Docear also has a built-in script editor you can use to write groovy scripts. The script then becomes available as a contextual menu item.

Here are some illustrations showing the script being run on a newly imported paper:

Selecting the script option in Docear:

What it looks like when the script has run finished:

And how you might then choose to organise your annotations:

You’ll find lots of Freeplane scripts you can modify and play with here – there are some amazing possibilities – the icons script above doesn’t even begin to scratch the surface of what this method could do for your literature reviewing process.

Using Docear to manage thousands of papers across multiple projects and multiple years

Docear’s demo shows someone writing a paper with about 20 or 30 references. This is fine for one project, but I have over 3000 PDF books and papers in my literature repository. Over the years, I suspect this will continue to grow. I want to feel secure that my library of research papers, annotations and references is in one safe location on my hard drive. I also don’t want to have to duplicate those PDFs each time I start a new project. Here are some of my solutions to these issues geared towards a long-term research strategy.

First: a geeky caveat

Having recommended Docear for the approach outlined so far, there are some problems with Docear that I think you will have to address if you are really going to use it for a long-term research and literature management strategy.

If you think you might just use it for a masters-level one year project, the rest of this guide probably isn’t necessary. If you want to read your way into and stay up to date with the vast literature of one or more academic fields long-term, read on, but be warned: it gets even more geeky from here on in.

Docear’s default per-project folder structure and its problems

At the moment Docear encourages you to store your PDFs on a per-project basis, as if you were starting from literature year 0 each time you write something. Also (by default, at least) it puts them in a rather obscure folder structure. I don’t really trust myself to reliably back-up obscure folder structures.

Here’s how Docear does a default file structure for a demo project I just created

/Home
    /Docear
            /projects
                /Docear demo
                    /_data
                        /!!!info.txt
                        /1493C9745013F2UNMIV1XCU93LBPZ4Y0KU14
                            /default_files
                                /Docear demo.bib
                                /literature_and_annotations.mm
                                /temp.mm
                                /trash.mm
                            /My Drafts
                                /My New Paper.mm
                            /settings.xml
                    /literature_repository
                        /where_I_am_expected_to_put_my_pdfs.pdf
                        /Example PDFs
                            /Docears_sample_PDFs.pdf
                    /Project Data.mm

The idea from Docear’s developers here is that you get a few files by default when you start a new project including a dummy ‘My New Paper.mm’ (.mm stands for Mind Map) , a project-name.bib file and a literature_and_annotations.mm file, and a folder to hold all the PDFs you’ve associated with this project..

The literature_and_annotations.mm file contains a script that – when you open it – will scan through this project-specific literature_repository and check for updated files or new annotations.

This creates several problems:

Docear’s structure works fine for a per-project use, but I have 9GB of PDFs, I do not want to wait for Docear to scan through those and check for updates every time I start it up.
I’d rather not store all my precious PDFs with thousands of hours-worth of annotations 5 levels down an application-specific folder hierarchy that I may or may not remember to transfer to a new machine. Similarly, I don’t want my references – which may have taken a long time to assemble stored in a folder handily called ‘1493C9745013F2UNMIV1XCU93LBPZ4Y0KU14’.
I want to be able to use PDFs and bibliographic data from all my previous projects in new projects easily.

Work-around 1: consolidating your literature archive

Use a ‘main_literature_repository’ for your key files

I create a ‘default’ project into which I first import for all my PDFs and references
I set up this project to store PDFs in a folder in my Dropbox called main_literature_repository.
I put the main BibTeX file for this project in the same folder.

This means I have one canonical BibTeX file with all the books and papers I will ever import in my main_literature_repository folder – this makes it easy to back up. I use Dropbox to keep rough versioning for me in case I do something silly or lose my machine/s – use your backup strategy and folder location of choice.

Use a per-paper mind map for long-term annotation and re-annotation

I delete the literature_and_annotations.mm file from my default project. I do not want to wait 3 hours while Docear scans through all 9GB of my papers when it starts up. Instead, in the same main_literature_repository folder, I create a per-paper mind map.

I do this because I may use a paper six times in six different paper/research contexts. Ideally, I want to be able to read and re-read it, and keep track of what interested me about it and when… not have to delve into each project I used it for.

So, once I’ve done my annotations, I create a new map in my default project, I import the PDF, copy and paste the title of the PDF and use it to name the new mind map identically to the paper, so that in my main_literature_repository folder I have:

/home
    /saul
        /literature_repository
                /my_new_favourite_paper.pdf
                /my_new_favourite_paper.mm

Apart from anything else, I can glance through the folder listed alphabetically and see which papers I have actually read! Now every time I update my annotations in that paper, I can import them into this paper-specific map, and organise them.

I might want to do this in a number of ways (one big list, thematically etc.) but I usually organise them to show how they relate to the project I’m currently working on. If I read that paper three or four times, and each time I organise the new annotations in this way, after six or seven readings/uses of that paper it’s going to be interesting to be able to see how my use of this paper has changed over time.

A quick example of importing a paper:

I download a paper helpfully entitled: ‘12312131231512312313.pdf’ from a publisher. I re-name it something useful (e.g.:AuthornameYYYY-title.pdf)², and put it into my main_literature_repository folder, using Docear or Jabref to add or automatically import its bibliographical record into my default project BibTeX file. Once I’ve read it and taken notes in annotations, I manually create a new map just for this paper in the same main_literature_repository folder. This is now the mother lode folder with all my really important work in it.

Work-around 2: using your papers across multiple projects

This bit is kind of tricky and involves many trade-offs, I think Docear will fix it some day, for now, this is how I am doing it.

I create a new project, and treat it as a ‘sub-project’ of my default project

When I start a new Docear project, I let Docear create a default folder structure something like the one above. To get literature from my main_literature_repository folder into this new project, I create symbolic file system links to the relevant PDF files in my main_literature_repository in the new sub-project-specific literature_repository folder.

So the original PDFs live here:

/home
    /saul
         /literature_repository
                    /my_new_favourite_paper1.pdf
                    /my_new_favourite_paper2.pdf
                    /my_new_favourite_paper3.pdf
                    /my_new_favourite_paper4.pdf

I create a new Docear project and create symlinks (only for the relevant PDFs) here:

/Home
    /Docear
            /projects
                /Docear demo
                    /_data
                        /!!!info.txt
                        /1493C9745013F2UNMIV1XCU93LBPZ4Y0KU14
                            /default_files
                                /Docear demo.bib
                                /literature_and_annotations.mm
                                /temp.mm
                                /trash.mm
                            /My Drafts
                                /My New Paper.mm
                            /settings.xml
                    /literature_repository
                        /link_to_my_new_favourite_paper1.pdf
                        /link_to_my_new_favourite_paper2.pdf
                        /link_to_my_new_favourite_paper3.pdf
                        /link_to_my_new_favourite_paper4.pdf
                        /Example PDFs
                            /Docears_default_sample.pdf
                    /Project Data.mm

Now if I open my literature_and_annotations.mm file in the new sub-project, it will import these new PDFs and their associated annotations and I can start working with them. Of course any changes to annotations I make in these maps will also change annotations in the original PDFs.

Maintaining bibliographical references across Docear projects (and other software)

The only issue with this approach so far is that your new project-specific BibTeX file will not automatically import metadata from your ‘default’ project. This means that when you import your PDFs into this new project’s literature_and_annotations.mm map, they will have no bibliographical reference data attached.

To understand why – and how to solve it – you need to know a little bit more about how Docear works:

Docear allows you to re-organise your annotations while maintaining their associations with the bibliographical reference of the paper they’re drawn from by linking nodes in your Mind Map (.mm) files to PDFs referenced in your BibTeX file. Docear does this by adding a ‘file’ BibTeX field entry for each paper. Here’s an example of a BibTeX entry from my databse:

ARTICLE{Hepburn2012,
  author = {Alexa Hepburn and Sue Wilkinson and Rebecca Shaw},
  title = {Repairing self- and recipient reference},
  journal = {Research on Language and Social Interaction},
  year = {2012},
  volume = {45},
  pages = {175-190},
  number = {2},
  file = {:/home/saul/main_literature_repository/hepburn_repairingselfand_2012.pdf:PDF},
  keywords = {EMCA, Self-reference, Reference ; Repair},
}

So when Docear scans this PDF, it extracts its annotations, places them in the map, and creates a hyperlink to the PDF listed the file field. This means if I click on the node, it opens the file. Docear also extracts the bibliographical information from this BibTeX reference, and then adds them as attributes of the associated annotation node on my map.

So, when I create a symbolic link to this file in my new sub-project, Docear sees it as a new PDF, namely:

/home/saul/Docear/projects/sub-project-title/literature_repository/hepburn_repairingselfand_2012.pdf

But it doesn’t have any BibTeX data in this new project, so it won’t recognise this PDF and paste in associated bibliographical data.

To solve this issue, there are several possible solutions:

Sym link your ‘default project’ BibTeX file into each new sub-project using a symlink – just like you do with your PDF files.
Duplicate your ‘default project’ BibTeX file into each new sub-project, search/replacing the ‘file’ field of each entry to point to your new sub-project’s literature_repository folder.
Or, (and this is what I do), open your main BibTeX file in a recent, stand-alone version of JabRef and use the ‘write XMP data’ option to make sure that the PDFs themselves contain their own reference data. When you import these PDFs, you can then use the reference embedded in the PDF itself to create a new and separate project-specific BiBteX file.

JabRef’s XMP writing option:

This third option is preferable to me for several reasons:

I don’t want to see all my references in every new project – it’s distracting.
XMP data can be read by lots of other bits of software so it makes my reference library somewhat more portable. Also, if I lose my BibTeX file in some catastrophic data loss episode, as long as I have my PDFs with XMP bibliographic data I can pretty much reconstruct my literature, annotation and reference archive from just those files.
I may want to update my bibliographical records for a new project, but keep the references of older projects intact. Although I’m aware that I improve my bibliographies continually and incrementally, I really want to control how I change them. For example, if I continually use my default project BibTeX file, symlinked in to each new sub-project as in option 1, I may not be able to re-generate a paper I wrote three years ago before I made those changes and improvements. I really want that paper to be re-created exactly as it was when I wrote it, including all the reference details and errors. I can always update an old BibTeX file from an old project easily – because the PDF file itself now contains the latest up-to-date XMP data.

I see this feature of the latest, stand-alone version of JabRef (not available in Docear’s embedded version of JabRef) as a significant plus in terms of the sustainability of this approach to literature management.

Things I didn’t cover but may post about in the future

There are lots of other things you can do using this approach to Docear – and Docear’s approach in general, a few I can think of that I didn’t cover are:

Using the command line to search/filter your annotations.
Using recoll, spotlight or similar configurable full text search systems on your repository.
Importing folder structures containing other research materials into your map.
Using Docear (or freeplane) to take detailed and well structured notes during lectures.
Using Docear to manage and search Jeffersonian transcripts of conversational data.

If you have any questions or would like to hear about these, drop me an email or get me on @saul

Notes

^ Because I like small tools for simple jobs, I actually do this using JabRef in stand-alone mode, along with JabRef’s rename files plugin to do this automatically and configurably. NB: Docear will do this automatically in upcoming versions – the feature is already in there, just not quite ready yet.

13 thoughts on “Sustainable Research Literature Management with Docear part II”

Anna
October 19, 2015 at 12:38 pm

Thanks for this very useful information! Although I am not into scripting, I just copy-pasted your script and it works great. Hopefully you will post some more things about Docear in the future.
saul
October 19, 2015 at 12:44 pm

Glad it was helpful Anna!
Guido
November 12, 2015 at 10:59 am

Thank you very much for this wonderful post.
I tried Docear one year ago, I love the concept and approach, but I suddenly ran into some of the problems you highlighted, then I decided to wait for Docear being more “mature”.
The solutions you propose seem indeed to be excellent ways to improve Docear, they should think about incorporating some of them in the next releases.
Do you have any reasoned and sustainable workaround for the LibreOffice integration?
(even a geeky one, maybe via the already existing standalone Jabref plugin).
Thank you
saul
November 12, 2015 at 11:09 am

Hey Guido,

I’m glad you enjoyed this. My LibreOffice workaround involves mostly… not using LibreOffice, but I do have a way to reliably get my references rendered into a LO document using MarkDown and Pandoc to create my .odt files.

I really recommend authoring in Markdown and using Pandoc for all the good reasons listed here: http://programminghistorian.org/lessons/sustainable-authorship-in-plain-text-using-pandoc-and-markdown

All I need to do to get my references rendered is a simple shortcut in my text editor and I have a very decent and consistent references list based on my BibTex file.

After trying many different approaches, there simply seems to be no adequate way to address these issues within an entirely WYSIWYG environment. The Pandoc/Markdown approach allows you the benefits of WYSIWYG rendering/tweaking (at the last stage, for example before publication) without the pain of having to go back and tweak every reference by hand when things go awry.

Cheers!

Saul.
Julie Wilkes
November 18, 2015 at 12:58 pm

I’m a Zotero user but its not too late to switch! I found this:

http://www.docear.org/2014/01/15/comprehensive-comparison-of-reference-managers-mendeley-vs-zotero-vs-docear/

tbh, I’m a bad Zotero user but it mostly works for me except I dont understand why it can retrieve metadata on some occasions and not others. They score the same on this point so I presume I would be no better off switching?
saul
November 18, 2015 at 1:16 pm

Hi Julie,

In my experience Zotero is pretty good, but I’ve not successfully used it with Docear. I suspect this is possible as I know Zotero can produce BibTex files, but whether the compatibility with Docear’s pdf file management approach has been fixed I don’t know.

Personally I use a combination of tools, but these days mostly JabRef as a standalone BibTex manager along with the version bundled into Docear when I’m mapping things out.

Cheers!

Saul
Frédéric Vachon
December 7, 2015 at 5:32 pm

Thanks for these two articles! I just started to use Docear a week ago and your ideas for organisation and ways of reading and annotate your pdfs are really a nice addition to the Docear manual. I hope you’ll write more on the subject, I would interested to read more about by your way of adding more research materials to your mindmaps as I’m myself working with annotated pictures used in a qualitative analysis for my master degree.
saul
December 7, 2015 at 6:02 pm

Hi Frédéric,

I’m really glad you’re finding this useful.

If you let me know what issues you’re facing and what your research workflow is I might be able to offer some insight based on my experience of Docear. I do have some mindmaps with images but I haven’t experienced many problems integrating them into my maps.

Cheers,

Saul.
Matthew
August 12, 2016 at 9:12 pm

Wow that was very helpful. It looks like what I want and more but it was a bit confusing. This cleared many things up but a mind map for each papers seems a bit much. I hope option 3 helps with transferring references as I have papers from photo copy days to now. Lots of books constantly moving around and I can write legible notes.
saul
August 29, 2016 at 9:25 am

Hi Matthew,

I know it may seem like overkill to have one map per paper, but to be honest this has been one of the most useful parts of my system. It shows me, very clearly, which papers I’ve read and taken notes on. If I want to return to a paper in a named pdf, say AuthornameYYYY.pdf, I look in my massive literature review folder, and I see that there is also a file called AuthornameYYYY.mm – the corresponding mind map file.

If I re-read that paper, or want to refresh my memory or add to it, I simply open that mindmap file and add to it.

This isn’t the ideal way to use Docear, as I think their per-project folder structure has a rationale that it’s a pain to battle against, but it works well enough for me for the moment, and seems relatively future-proof (one folder, easy to sync, contains all my notes, I can open them in Freeplane if Docear every tanks).

Anyway – thanks for the feedback, and good luck with it!
Jonas
February 17, 2017 at 10:01 pm

Thank you for sharing your approach. I am considering whether the one-map-per-paper idea would be helpful for me. Right now I’m collecting all my annotations in a single map in Docear. But after 4 years of adding papers and notes it has grown rather big, runs slowly and crashes too often. The big advantage for me is, though, that I can rather quickly gather all relevant notes for a project I’m working on by applying tags to the annotations, quite similar to yours, and then using either a script or a filter.

I just wondered how you would collect the relevant quotes across all of your annotated papers if they are spread over so many mindmaps. I mean, for example, how do you retrieve all potential quotes that you marked with the “thesis” tag? Do you use a Docear script for that, or do you simply search your literature folder with Recoll for the relevant tags? Does it search within the mm files?

Looking forward to your future posts on this.
saul
February 20, 2017 at 9:49 pm

Hi Jonas,

I keep all my maps in the same folder as my PDF files, then when I want to search for a particular tag, I just use grep on the command line within my one, large literature_repository folder to search through all the .mm files:

grep "#thesis" *.mm

If that throws up more than one screen’s worth of results, I pipe it to less:

grep "#thesis" *.mm | less

The nice thing about this approach is that there are loads of cool text-searchy things you can do with grep that are very quick.

For example, if I want to filter the results of my first #thesis search to only include things about #assessments, I could do:

grep "#thesis" *.mm | grep "#assessments"

Then I’d only get results tagged with #thesis that are also tagged with #assessments

I hope that’s helpful!

Cheers,

Saul.
Al
November 15, 2017 at 3:17 am

Thank you for a truly great solution! I just started my Doctorate and I already have over 200 articles to review. I’ve been exploring Mendeley for a while, however, this is the best solution that works for me …I’ve incorporated the symlink, Jabref, similar file structure, and I’ve worked in Zotero on the front end. I’m looking forward to research!