Transcribing with Microsoft Word Online and CAQDAS packages

This blog post outlines the steps of working with MS Word online (part of office 365) to generate automatic transcripts and import them into your CAQDAS package as a transcript.

By bringing audio and synchronised transcripts into a CAQDAS package you gain the opportunity to engage with the data as you correct the transcript – bringing the immersion that is often touted as the key benefit of manual transcription and linking this in with the tools you’ll use for analysis (annotation, memoing, coding) and bringing the efficiencies of automated transcription (typically speeding the process up from 5-7 hours manual transcription per hour of audio to 1-2 hours of correction and engagement per hour of auto transcribed audio).

CONTENTS:

Why use Word?

Working with Word – video

Preparing and importing a transcript into ATLAS.ti

Preparing and importing a transcript into MAXQDA

Preparing and importing a transcript into NVivo

Why use Word?

Microsoft Word transcription has a LOAD of advantages:

  1. First and foremost – it’s free to many students, researchers and academics as part of an institutional Office 365 license or their own personal Office 365 subscription.
  2. It’s VERY good and amazingly accurate.
  3. As a result of the first it is likely to be approved for data management as part of your institutional policies for REC purposes and research data as that will be part of the office 3656 agreement E.G. see this from Lancaster University.
    • While a lot of students may use Descript, Trint, Otter.ai or others they probably aren[”t compliant with Ethics requirements on research data!
  4. It’s multi-lingual (though the documentation claims otherwise – so it’s unclear how multilingual!)
  5. It’s familiar.
  6. It’s simple.
  7. There’s good documentation information and detailed step-by-steps

So that’s a pretty powerful ansd solid list.

Limitations, Options and considerations

There’s a key limitation with Word: time. You have up to 5 hours (300 minutes) per month. After that you can’t transcribe. This may change (it could become a charged service for more – who knows). There’s information and detailed step-by-steps for the word part here, however it’s not entirely accurate as it can and does work in languages other than EN-US.

There are two key options: synchronised or not.

Why synchronise?

Synchronisation allows you to listen to the audio/view the video that accompanies a segment of transcription. This has a lot of potential for analysis a listening to the audio gives you the opportunity to engage with (and add to analysis) the “four Ts” of spoken language that are lost in transcription: Tone, Tempo and Tenor (or Timbre) which all carry a LOT of meaning that is lost when spoken interaction is reduced to the written word. Don’t believe me? Try relaxing and unwinding to this.

With video there is even more to be gained by synchronising a transcript to the audio – and the opportunity is then therte to add additional informatyion to the transcript to make it a visual transcript too.

So, where synchronisation is easy and you can work with the text easily with or without the synchrnised audio/video (as is the case with ATLAS.ti and MAXQDA) this, to me, is a bit of a no-brainer – import synchronised.

Why not synchronise?

This is perhaps a more pragmatic decision where technology is going to get in the way. Unfortunately, with NVivo, it is substantially more fiddly, error-prone and usually involves quite a lot of to-and-fro correction so it’s then worth thinking a little more about costs vs benefits for correction and engagement.

Working with Word

The video below shows some basic steps for working with MS Word

Preparing and importing a transcript into ATLAS.ti

Preparing transcripts in word

Importing audio and transcript into ATLAS.ti 9

NOTES:
The (awesome) ATLAS.ti focus group coding tool requires the speaker name to be on a new line either preceded by @ or followed by :

So.. When renaming the speakers in Word include a colon after the speaker name

OR if you forget – do this when running the search and replace above.

The main search and replace would be to change the timestamp, space,speaker id, paragraph mark (^p), text

00:00:01 SW

content here

To:

00:00:01 
SW   content here  

timestamp,paragraph mark(^p),speaker id,tab(^t),text.

So the speaker ID was SW it would be

Search for: SW^p

Replace with: ^pSW:^t

Preparing and importing a transcript into MAXQDA

The video below shows this process. (Please bear with – I need to work more on grasping the process of speaker coding via search and code before I document this further here.)

The paraphrase function in MAXQDA makes it an amazing tool for this process as it’s so good at supporting the move from correction and making notes to coding.

Preparing and importing a transcript into NVivo

With NVivo the question of whether or not to import synhchronised is worth a little more consideration.

Importing is a bit of a pain, the steps to take are more complex and the steps to add in order to debug the steps make it harder still. The interface is a bit clunky and it’s harder to see, code and work with the transcript.

Preparing an unsynchronised transcript

So… if you’re happy with just correcting the transcript in Word and don’t really need to engage with the three T’s then consider sticking to that and just export with speaker names and import into NVivo as a document.

I’d definitely recommend having a second document open (e.g. in a text editor or notes app like OneNote) as you make your corrections in word to make notes and reflections as you work through correcting the transcript. You’d then be able to add that as a linked source memo in NVivo to connect those initial notes from correcting.

Preparing a synchronised transcript to import into NVivo- step-by-step:

To make this work effectively you’ll need to convert the text into a table and number the rows – this makes auto-coding for speaker and debugging errors WAY easier. It’s not too hard – honest!

Preparing Transcripts for NVivo

STEPS

  1. Search and replace in word to get timestamp, speaker name and transcript onto a single row using S’n’R
  2. Check it’s correct
  3. Number the table rows
  4. Close
  5. Import
  6. Use error messages and table row info to debug
  7. Import again
  8. Repeat 6&7 till it works
  9. Listen, Correct and engage with the data through annotating
  10. Write up reflections and insights in a memo
  11. Auto-code by speaker name

The video below takes you through this step-by-step.

NOTES:

00:00:01 SW
Content here

The main search and replace would be to change the timestamp, space,speaker id, paragraph mark (^p), text

To: timestamp,tab (^t),speaker id,tab (^t),text

 00:00:01   SW   Content here 

If the speaker ID was SW it would be

Search for: SW^p

Replace with: ^tSW^t

You’d then insert the column to the left and auto-number.

Importing transcripts into NVivo

Having prepared you then import – and as you’ll see you usually cycle back a few times and will find the steps of doing this in a table and auto-numbering the rows invaluable in this!

Advertisement

Using ATLAS.ti to Correct and Code Automatically generated Transcripts from Teams or Zoom

This post takes you through the process of automatically creating a full written transcript for an audio or video file and importing it into ATLAS.ti to correct and code.

There is now excellent documentation of this in the the online manual for ATLAS.ti Mac and the online manual for Windows. So this is post is more focused on getting those transcripts out of other platforms and also about opportunities for analysis once imported in to ATLAS.ti

ATLAS.ti has led the way in making this really easy and cut out a step as it will clean on import.

UPDATES:

ATLAS.ti has great documentation of update changes here.
March 2021 – VTT and SRT import supported in Windows.

December 2020 – VTT and SRT import added in Mac

See this post for a cross-platform text step-by-step with lots of links to how to’s and documentation etc.

Prerequisites

The following are important prerequisites. You will need:

  1. A media file that is either:
    1. A recording within Microsoft Teams saved to Stream
      OR
    2. A media file you can convert and upload to Microsoft Stream*
      OR
    3. An audio or video recording through an institutionally licensed Zoom account (with subtitling enabled)
      OR
    4. A recording from another system that outputs a subtitle file (that you will then convert to VTT)
  2. Installed version of ATLAS.ti v9
  3. Installation of the free VLC media player

Process

  1. Create a media file with subtitle file in VTT format
  2. Download the media file and the subtitle file
  3. Clean the subtitle file ready for import.
  4. Import the media file into ATLAs.ti
  5. Importing the cleaned subtitles as a synchronised transcript in your CAQDAS package
  6. Listen to the media file and read the synchronised transcript in order to begin analysis through
    • Correcting the transcript
    • Labeling speakers
    • Making notes (annotation)
    • Initial coding the transcript

Step One – Create a media file with subtitle file in VTT format

Depending where you start there are a few ways this will work – all have the same end point: a media file and a VTT transcript. It’s all detailed over in this post.

The introductory video was created with Teams, another was created in Zoom. You can also (currently) upload videos to Stream or use a wide range of other applications and system to create an automatic transcript of a media file.

Step Two – Download the media file and the subtitle file

Here’s a copy of the interview video you can download and a VTT file if you want to try it:

Interview with Freidrich Markgraf (mp4 42Mb)

VTT file from Stream.

Step Three – Clean the subtitle file ready for import using an online tool

This step is no longer needed in ATLAS.ti as native support for VTT import is now enabled.

Option 1 – Clean the VTT file into CAQDAS ready format online

Go to https://www.lancaster.ac.uk/staff/ellist/vtttocaqdas.html

Upload your VTT file, Click convert, download the text file.

Option 2 – create your own copy of the converter

Go to the GitHub page at https://github.com/TimEllis/vttprocessor

Step Four – Import the media file into your CAQDAS package

This varies a little between packages. The previous difference is difference is no longer the case – you can now edit timestamps in both Mac and Windows, however as these are auto-generated you shouldn’t need to.

ATLAS.ti 9 Windows

It’s now well documented in the online manual for Windows.

There is information on page 11 of the manual and details here about windows supported media formats used by ATLAS.ti

Details of adding documents to a project is in online quick tour documentation here and in the manual on page 24. Details about working with transcripts is on page 10.

ATLAS.ti 9 Mac

Working with transcripts is in the online manual for ATLAS.ti Mac and

Adding documents to ATLAS.ti for Mac is in the online quick tour here

There is further information in the online manual for ATLAS.ti Mac about transcript formats on page 48, about adding media files on page 51. There is also extensive information about working with transcripts on pages 52-54.

Step Five – Import the cleaned subtitles as a synchronised transcript

ATLAS.ti 9 Windows

There is now excellent documentation on this process in the online manual much improved information on editing transcripts in the 90 page “quick (?! :-O ) tour” manual (pages 18-19).

ATLAS.ti 9 Mac

There is excellent information in the online manual for ATLAS.ti Mac about transcript formats on page 48, about adding media files on page 51. There is also extensive information about working with transcripts on pages 52-54 – again there is at present no information on editing the transcript and correcting it – so here’s a video:

Step Six – Listen to the media and correct the transcript (and begin initial analysis steps)

So this is where it all pays off!

This process allows you to now use the powerful tools within the CAQDAS package to playback the audio / video (including slowing playback speed,adjusting volume and setting reqwind intervals when you press play/pause + keyboard shortcuts for the play/pause functions) whilst you read the transcript and make corrections. But not only corrections! You can also annotate the transcript and even start coding at this stage.

ATLAS.ti 9 Windows

ATLAS.ti Mac

Resources

Here’s the ATLAS.ti file (89Mb) with one corrected plus focus group coded transcript and several uncorrected transcripts from the videos above if you want to have a look / play.

The blog bit – background, next steps, context

So this has been a real focus for me recently. I’ve had a lot of help and encouragement – see acknowledgements below – but also NEED from students and groups who are wondering how to do transcription better.

I’ve REALLY liked working with this in ATLAS.ti 9 – the way that you can integrate annotation and auto-coding via the focus group coding tool into the transcription process is key.

I also think it really gives the lie to the idea that manual transcription is “the best way” to get in touch with audio. I’m kind of hoping that the sudden shifts the pandemic has caused in practice and process might lead to some developments and rethinking of analysis. This quote has been too true for too long:

Over the past 50 years the habitual nature of our research practice has obscured serious attention to the precise nature of the devices used by social scientists (Platt 2002, Lee 2004). For qualitative researchers the tape-recorder became the prime professional instrument intrinsically connected to capturing human voices on tape in the context of interviews. David Silverman argues that the reliance on these techniques has limited the sociological imagination: “Qualitative researchers’ almost Pavlovian tendency to identify research design with interviews has blinkered them to the possible gains of other kinds of data” (Silverman 2007: 42). The strength of this impulse is widely evident from the methodological design of undergraduate dissertations to multimillion pound research grant applications. The result is a kind of inertia, as Roger Stack argues: “It would appear that after the invention of the tape-recorder, much of sociology took a deep sigh, sank back into the chair and decided to think very little about the potential of technology for the practical work of doing sociology” (Slack 1998: 1.10).

Back L. (2010) Broken Devices and New Opportunities: Re-imagining the tools of Qualitative Research. ESRC National Centre for Research Methods

Citing:
Lee, R. M. (2004) ‘Recording Technologies and the Interview in Sociology, 1920-2000’, Sociology, 38(5): 869-899
Platt, J. (2002) ‘The History of the Interview,’ in J. F. Gubrium and J. A. Holstein (eds) Handbook of the Interview Research: Context and Method, Thousand Oaks, CA: Sage pp. 35-54.
Silverman D. (2007) A very short, fairly interesting and reasonably cheap book about qualitative research, Los Angeles, Calif.: SAGE.
Slack R. (1998) On the Potentialities and Problems of a www based naturalistic Sociology. Sociological Research Online 3. http://socresonline.org.uk/3/2/3.html

Various additional links and notes:

How and when Stream will be changing https://docs.microsoft.com/en-gb/stream/streamnew/new-stream

Bits about zoom needing transcripts switched on and how to do this (ie.e. send this link to your institutional zoom administrator see https://support.zoom.us/hc/en-us/articles/115004794983-Using-audio-transcription-for-cloud-recordings- )

A cool free online tool for converting other transcript formats (e.g. from EStream, Panopto or other systems) https://subtitletools.com/

And finally for more information on the VTT format see this excellent page.

Thanks and acknowledgements

This hasn’t happened alone. Throughout this Friedrich Markgraf has been incredibly accommodating – and giving his time for the demo interview was just a part of that. Thanks definitely due for all his excellent, encouraging and very helpful input via Twitter – and for working on the great new features for direct import into ATLAS.ti.

Many thanks to Tim Ellis especially for his work on the VTT cleaner and sharing it via GitHub.

And to Amir Michalovich for his enthusiasm and sharing some excel tricks and of course Christina Silver for her draft reading, promoting and general enthusiasm, encouragement and suggestions. And also to Sandra Flynn and her great blog post about trials and tribulations of a PhD student working with NVivo which really helped me realise that time spent on this stuff can have an impact and a value.

If you’ve got suggestions, ideas, updates, developments or found this useful please post a comment, link to this or build on it.

Featured

Auto-Creating, Correcting and Coding Transcripts from Microsoft Teams or Zoom in CAQDAS Software (ATLAS.ti, NVivo or MAXQDA)

COVID-19 has had a HUGE impact on qualitative and mixed-methods research processes. A key change I’ve seen and heard about with the PhD candidates and research teams I support is a shift to interviewing via MS Teams or Zoom. And this has prompted more than one person to ponder: “surely if I can automatically create subtitles I must be able to use that for analysis – can’t I? Well yes – you now can 🙂

NOTES:

This page is text-heavy, there are then additional pages with sequences of video demos.

There will also be changes to the process and software – I’ll note these and work to keep the page up to date as there are exciting developments coming in this area.

Now – I really dislike those cookery blogs where this bit would continue for several pages about who those people were and what they said etc etc. when all you wanted was the recipe I’m now going to cut straight to the details- then come back to some of the context and next steps after that. 🙂

Video resources

Step-by-step for ATLAS.ti (with video demonstrations and example files)

Step-by-step for MAXQDA (with video demonstrations and example files)

Step-by-step for NVivo (with video demonstrations and example files)

Getting yourself free transcripts to correct and code in ATLAS.ti, NVivo or MAXQDA

This post takes you through the process of automatically creating a full written transcript for an audio or video file and importing it into CAQDAS software to correct and code.

The audio/video could start from Teams or Zoom – or you could have it from another audio or video recorder.

Prerequisites

The following are important prerequisites. You will need:

  1. A media file that is either:
    1. A recording within Microsoft Teams saved to Stream
      OR
    2. A media file you can convert and upload to Microsoft Stream*
      OR
    3. An audio or video recording through an institutionally licensed Zoom account (with subtitling enabled)
      OR
    4. A recording from another system that outputs a subtitle file (that you will then convert to VTT)
  2. Installed version of ATLAS.ti v9 or NVivo or MAXQDA
  3. Installation of the free VLC media player

Process

  1. Create a media file with subtitle file in VTT format
  2. Download the media file and the subtitle file
  3. Clean the subtitle file ready for import.
  4. Import the media file into your CAQDAS package (ATLAS.ti, NVivo, MAXQDA)
  5. Importing the cleaned subtitles as a synchronised transcript in your CAQDAS package
  6. Listen to the media file and read the synchronised transcript in order to begin analysis through
    • Correcting the transcript
    • Labeling speakers
    • Making notes (annotation)
    • Initial coding the transcript

Each step is documented below with descriptions and specific videos illustratrive videos.

I’m hearing exciting rumours that ATLAS.ti will very soon support other formats for subtitle files so steps 3 and 4 will be integrated.

Step One – Create a media file with subtitle file in VTT format

Depending where you start there are a few ways this will work – all have the same end point: a media file and a VTT transcript. There are other routes but these are the main ones.

1a A recording within Microsoft Teams saved to Stream and auto captioned.

Currently if you’re using MS Teams through an insitutional installation then when you record a meeting it is added to Stream.

This post from Microsoft takes you through the process of call recording in Teams – and also notes the changes coming in 2021 to Stream.

You will then need to access your institution’s Microsoft stream server and login and locate your video. There’s support about that from Microsoft here.

This post from Microsoft then takes you through the process of autocaptioning your recording(s)

Note: This is changing in 2021 with educational institutions delayed till July. It’s not entirely clear what will happen and sounds like there are some live discussions with Microsoft over required features. The current expectation is that when it moves over to teams recordings being added to OneDrive there will be a VTT file created and uploaded as well – a process that sounds similar to the one with Zoom calls outlined below but managed via your institutional OneDrive.

1b Upload a file to Microsoft Stream for auto-captioning.

Another option (at the moment at least – though probably only till July 2021 for HE institutions) is to upload a recording from another source to Stream for auto-captioning. To do this you need to upload a video file.

The good news is it’s easy to convert an audio file (or a video) to a stream-compatible video using the free VLC media player (many institutions will make this available on the network or via AppsAnywhere.)

So you’d find your audio or video file and follow guidance here to convert it to a video.

Then you’d upload the video to Stream – detailed here.

(Note: if you need to convert or downsample any videos in step 4 you’ll need to follow the same process)

1c A media file and VTT file from Zoom

Zoom can create captions/transcripts as VTT files – see further details here.

NOTE: you will need to have a Business, Education, or Enterprise license with cloud recording enabled and account owner or admin privileges or to request those from the account admin.

Start your meeting and record to the cloud in the usual way using Zoom (e.g. start the meeting, discuss ethics etc. then start recording when you say you are, record the consent semgnet and any questiosn before starting, edn that recording and start a second on for the content etc.)

When you;ve finsihed the session and the reocrding is processed you’ll receive an email with a link so you can download the video or audio and (in due course) the transcript.

The transcription can take a little while initially you’ll see this – then it will show the transcript to download (so an excuse for one of those slider image compare things 🙂 ):

Once the transcript is completed you can download that file as a VTT. You;re then set for step 2.

1d A recording from another system

There are many other systems that create subtitle files from recordings – for example eStream or Panopto are widely used in higher education and research institutions. There are also a few hacks to download subtitles from YouTube.

If your system creates a different format of subtitle (e.g. SRT) then you ca use an online converter such as Subtitle Tools convert to VTT . Some CAQDAS sofrtware looks set to support direct SRT import soon – watch this space!

What you need is a media file and a VTT file with auto-generated captions that have the corre4ct timestamps.

Step Two – Download the media file and the subtitle file

This bit is subject to change so for now here are links to other resources plus video demonstrations:

1a and 1b – Downloading media and transcript from Stream

First you need to update the video details to set the language to English so a transcript is generated.

See step by step from Microsoft here which details how to update video details and language to generate a subtitle file.

Second you need to download the video and then transcript – see screenshots here.

Both of these are from the … menu:

First download the video, second click to Update video details. On the update screen that then displays you’ll see 3 panes i.e. DetailsPermissions and Options. From the Options pane on the right, you can download the captions file, as shown below:

1c From Zoom

This was covered above, you also get an email when the transcript it done from Zoom. Then download the video/audio and then the transcript. Make sure you take some care with file names and which transcript file is for which video/audio.

Step Three – Clean the subtitle file ready for import using an online tool

Increasing range of options here: either the software will do it (ATLAS.ti now imports VTT or SRT direct on mac and PC, MAXQDA are reportedly looking into this). Or use the online tool my colleague at Lancaster Tim Ellis developed.

UPDATE: December 2021: New tool developed from Tim’s by Charles Weir now available at https://securityessentials.github.io/Teams2NVivo/

Background: Tim created a simple VTT cleanup tool to help support moving transcripts from MS Stream to eStream for teaching and accessibility purchases. He then did some great additional development based on my looking at the requirement across CAQDAS packages for transcript sequencing. The updated page is a VTT cleaner that leaves in the initial timestamp in a form and then the text of the transcript in a text file that can be imported into ATLAS.ti, NVivo or MaxQDA. And he’s put it online for anyone to use, and the code on GitHub if you need to run it locally.

So you can go for option 1 – use his tool online (no data is saved – it is just a converter). Or, if you must do this on your own computer or network for ethics compliance reasons, you can download the code and styles from github, put them on your computer and clean your own transcripts (option 2). And if you’ve got ideas on how to improve it (e.g. removing notes?) then you can do that via GitHub.

Option 1 – Clean the VTT file into CAQDAS ready format online

Go to https://www.lancaster.ac.uk/staff/ellist/vtttocaqdas.html

Upload your VTT file, Click convert, download the text file.

Option 2 – create your own copy of the converter (e.g. if required by REC)

Go to the GitHub page at https://github.com/TimEllis/vttprocessor

Grab the html file and the css file.

Save them to your computer (or a network location) in the same folder.

Double click the vtttocadas.html file to open in a browser.

Use it to convert the files as above.

NOTES:

Yes notes indeed. Note that any NOTES / comments created in the VTT file won’t be cleaned up with thsi script. so you might want to do a quick search for NOTES and remove any lines. These can include notes about confidence of transcription or

Step Four – Import the media file into your CAQDAS package

This varies a little between packages.

ATLAS.ti 9 Windows

There is information on page 11 of the manual and details here about windows supported media formats used by ATLAS.ti

Details of adding documents to a project is in online quick tour documentation here and in the manual on page 24. Details about working with transcripts is on page 10.

ATLAS.ti 9 Mac

Adding documents to ATLAS.ti for Mac is in the online quick tour here

There is further information in the online manual for ATLAS.ti Mac about transcript formats on page 48, about adding media files on page 51. There is also extensive information about working with transcripts on pages 52-54.

NVivo Windows

NVivo Release 1 for windows transcript import is documented at https://help-nv.qsrinternational.com/20/win/Content/files/audio-and-videos.htm

(Unchanged process but slight interface changes from v12 instructions available here )

Note that it is likely you’ll need to install a codec pack for any video files.

NVivo Mac

NVivo Release 1 for Mac audio and media importing is is documented here https://help-nv.qsrinternational.com/20/mac/Content/files/audio-and-videos.htm

(Unchanged process but slight interface changes compared with the NVivo 12 notes on audio and video files here)

It’s usually pretty straightforward – if the media will play in Quicktime it will play in NVivo.

MAXQDA (Win and Mac)

Documented at https://www.maxqda.com/help-mx20/import/inserting-audio-and-video-files-in-a-maxqda-project

Step Five – Import the cleaned subtitles as a synchronised transcript

ATLAS.ti 9 Windows

There is relatively sparse information in the manual on page 10 working with transcripts is on page 10 and currently nothing about editing/updating a transcript to correct it within ATLAS.ti which is a key new opportunity in version 9. So here’s a video instead (and I’ll share the VTT file too so you can practice!)

ATLAS.ti 9 Mac

There is further information in the online manual for ATLAS.ti Mac about transcript formats on page 48, about adding media files on page 51. There is also extensive information about working with transcripts on pages 52-54 – again there is at present no information on editing the transcript and correcting it – so here’s a video:

NVivo Windows

NVivo Release 1 for windows transcript import is documented at https://help-nv.qsrinternational.com/20/win/Content/files/import-audio-video-transcripts.htm

(Unchanged process but slight interface changes from v12 instructions available here )

NVivo Mac

NVivo Release 1 for Mac transcript import is documented here https://help-nv.qsrinternational.com/20/mac/Content/files/import-audio-video-transcripts.htm

MAXQDA (Win and Mac)

Documented at https://www.maxqda.com/help-mx20/import/transcripts-with-timestamps

Step Six – Listen to the media and correct the transcript (and begin initial analysis steps)

So this is where it all pays off!

This process allows you to now use the powerful tools within the CAQDAS package to playback the audio / video (including slowing playback speed,adjusting volume and setting reqwind intervals when you press play/pause + keyboard shortcuts for the play/pause functions) whilst you read the transcript and make corrections. But not only corrections! You can also annotate the transcript and even start coding at this stage.

The blog bit – background, next steps, context

How and when Stream will be changing https://docs.microsoft.com/en-gb/stream/streamnew/new-stream

Bits about zoom needing transcripts switched on and how to do this (ie.e. send this link to your institutional zoom administrator see https://support.zoom.us/hc/en-us/articles/115004794983-Using-audio-transcription-for-cloud-recordings- )

A cool free online tool for converting other transcript formats (e.g. from EStream, Panopto or other systems) https://subtitletools.com/

And finally for more information on the VTT format see this excellent page.

Thanks and acknowledgements

This hasn’t happened alone. SO huge thanks to Tim Ellis especially for his work on the VTT cleaner and sharing it via GitHub.

Also to Friedrich Markgraf for some excellent, encouraging and very helpful conversations via Twitter.

And to Amir Michalovich for his enthusiasm and sharing some excel tricks and of course Christina Silver for her draft reading, promoting and general enthusiasm, encouragement and suggestions. And also to Sandra Flynn

Working with Arabic in NVivo (as well as Hebrew, Urdu, Persian and other Right-to-Left Scripts)

This blog is in four key parts:

  1. The background of this investigation including links to the diagnosis, data and existing information on the limitations of NVivo with Right-to-Left scripts.
  2. A detailed explanation and illustration of how Arabic and other right to left scripts are rendered in NVivo.
  3. Proposed workarounds and alternative software products including their benefits and potential limitations.
  4. Next steps and updates

Background

I recently has the amazing opportunity to work with the Palestinian Central Bureau of Statistics in Ramallah to provide technical consultancy and capacity building in qualitative research methods. This was through working with CLODE Consultants,  a UAE-based business specialising in statistics, and the use and management of data. CLODE consultants operates in both Arabic and English, providing worldwide training, research, and consultancy services. I am working as a consultant with CLODE Consulting to provide expertise on qualitative and mixed-methods in order to meet the growing needs of customers for those approaches in this data driven age.

The PCBS approached us to provide technical consultancy in using NVivo as the market-leading product. They had engaged with the built-in projects and excellent YouTube videos and identified it as having the features required for their needs to increase an engagement with qualitative and mixed-methods approaches to inform and enhance statistical analyses.

However, through working to develop materials and workshops I rapidly encountered hard limits with working with NVivo and Arabic text, combined with a relative lack of clear documentation or explanation of the limits or workarounds.

NVivo say that:

NVivo may not operate as expected when attempting to use right to left languages such as Arabic. We recommend you download and work with your data in our NVivo free trial Software first.

Searching online on forums identified some cursory information interspersed with promotional puff on ResearchGate, a proposed workaround to use images or region coding on PDFs on the NVivo forums, pleas for improvements in this area dating back to 2010 on the NVivo feature request forum and the most comprehensive response in the QDA Training forum by Ben Meehan

So I was left to do some experimentation myself and then to work with staff at PCBS who could read arabic to explore and consider what the limits are and how they affect research.

Example data:

Whilst I would normally steer WELL away form such a politically sensitive topic or text in this case as example data I am drawing on the interview in June 2018 between Jared Kushner and Walid Abu-Zalaf, Editor of the Al Quds newspaper. I STRONGLY emphasise this is NOT because of the subject matter nor in any way agreement with, support of or condonation of the the content (in fact I find the person pictured and politics he represents really repulsive) however it was selected purely for practical purposes: it is freely available and includes a full English translation. The text – both Arabic and English  is available from http://www.alquds.com/articles/1529795861841079700/

The text was copied and pasted into a word document and formatted as “traditional arabic font) with minimal clearing up of opening links etc.

Arabic text Word file available here.

Additionally the page was printed as a PDF – available here and converted to a PDF via https://webpagetopdf.com/ as well – resulting PDF available here.

Finally it was captured as both article as PDF and page as PDF via NCapture creating 2 .nvcx files (linked).

Computer System Setup:

I added Arabic (Jordan) as a language pack following information from Microsoft about adding languages. (Previously without the language pack installed the computer rendered Arabic script in western fonts (e.g. times new roman) which slightly reduces legibility and affects rendering.)

Working with NVivo and Arabic Script

NVivo works strictly left-to-right. This has serious implications when importing Arabic, Hebrew, Urdu, Persian or other Right-to-Left scripts as data.

If we look at the word document in word – the text copied from web and pasted into word file it appears like this:

NVivoArabic-wordOriginalScreenCap
Arabic text copied and pasted into Word file (available here). when text is selected it selects right-to-left.  Font set to traditional arabic.

When imported into NVivo substantial changes are made through the import process:

NVivoArabic-NVivo Conversion.png
The word document imported into NVivo and converted – the text now flows left-to-right, is relatively illegible as well. Selection now works left-to-right.

A number of serious issues follow. Firstly the text is now VERY hard to read. Secondly while you can edit the document t make the text right aligned so it appears better, the reading and selecting direction remain unaffected.

Thirdly, and most seriously – you cannot select therefore cannot search for, code or annotate, the start of paragraphs:

word-truncatedTextSelection
NVivo Text selection limitations for a word doc in Arabic.

The workaround would then seem to be PDFs – while accepting limitations with those in NVivo, e.g. you cannot auto-code for speaker or using document structure.

However the selection issues remain especially when importing web pages as PDFs via NCapture produces similarly odd results, apparently OK until you try to select content:

NCapture Page Cap

As you can see selecting (and therefore coding text) is all over the place.

Article as PDF fares best, however selection still runs left-to-right:

NVivo-article as PDF
NCapture Article as PDF produces best version but still has incorrect text flow.

The print as PDF and convert to PDF versions also had substantial issues with text selection – showing it isn’t just NVivo and NCapture that struggle here.

Effects on queries

There are then a series of oddities that result. Copying and pasting the text بأنهم ي and running a text search does work but gives odd results when there should be four identical copies of the same text:

text search results-summary
Text search results – note the different number of references per file of the “same” content!

Furthermore when you look into the results they seem not to be the actual text searched for:

Retrieved text search - detail
Text Search Results – not matching the input string?

At this point I must point out that I do not speak nor read Arabic so what remains is what I have been told about query results.

Word frequencies appear to work. As this was bi-lingual I had to spend a VERY frustrating period of time trying to select just the Arabic text in the PDFs without selecting English as well and then coding it with a node for “Script-arabic” to scope the word frequency query to that node. Here are the results – pretty, but I also think pretty useless:

wordCloud
Pretty – but pretty useless word cloud output? 

You can then double-click a word in the cloud and view a text search – however the results are as problematic in legibility as those identified above.

ًIf you do select and code Arabic text then when you run a coding query and look at the results the staff I worked with at PCBS told me that the results were illegible “like looking at text in a mirror”:

node query results
Node query results – legible?

What to do?

The limits are pretty serious as I’ve set out. It is more than just fiddly selection but runs through to text being at all legible / readable or usable.

Recommendations for approaches in NVivo and alternative packages:

If you MUST use NVivo:

Then use PDFs and use region selection i.e. treat arabic text as an image and accept the limitations.

If you can choose another package

All (yes ALL!) the other leading CAQDAS packages support Arabic and other right-to-left scripts. So it then comes down to making an informed choice of package.

The Surrey CAQDAS project provides a good overview of packages and choices at https://www.surrey.ac.uk/computer-assisted-qualitative-data-analysis/resources/choosing-appropriate-caqdas-package

For resources the excellent books by Christina Silver and Nick Woolfe cover the three leading packages: NVivo, ATLAS.ti and MaxQDA.

Getting clear information of which packages are leading and their relative use is very difficult – however this paper provides some circumstantial evidence for their use in academic research:

Woods, M., Paulus, T., Atkins, D. P., & Macklin, R. (2016). Advancing Qualitative Research Using Qualitative Data Analysis Software (QDAS)? Reviewing Potential Versus Practice in Published Studies using ATLAS.ti and NVivo, 1994–2013. Social Science Computer Review34(5), 597–617. https://doi.org/10.1177/0894439315596311

It reviews at patterns of publication citing the use of ATLAS.ti or NVivo (which were selected ” because they are two of the longest used QDAS tools (Muhr, 1991; Richards &
Richards, 1991). They are also the programs that we ourselves our familiar with; without this familiarity of our analysis would not have been possible (p599) and includes the following graph:

publicationPatterns
Subject disciplines publishing ATLAS.ti and NVivo studies. 

Another key consideration should NOW be if software adopted locks you in or enables project sharing and exporting via the recently published REFI standard – see Christina Silver’s excellent blog post in why this matters and why it should inform decisions of packages, especially for R-to-L scripts.

Suggested alternatives:

COMPREHENSIVE FULL-FEATURED CAQDAS PACKAGE SIMILAR IN SCOPE AND APPROACH TO NVIVO BUT WORKING WITH RIGHT-TO-LEFT TEXT:

My top recommendation: ATLAS.ti 

Why? It supports REFI format for project exchange so you are not locked in.

Quotation approach for identifying data segments then attaching codes, linking to toher data segments and linking memos provides unrivalled support for multi-lingual work for example coding one script and then linking to translated sections in another (uncoded) script, or attaching a translation to a data segment via quotation comment.

Alternative Recommendation: MaxQDA

Another full-featured package with extensive support for mixed-methods and an excellent interface. The lack of support for REFI standard risks being locked in and unable to exchange or archive in a standard format – hence recommending ATLAS.ti instead.

MIXED METHODS FOCUS, COLLABORATIVE, CLOUD REQUIRED/DESIRED

Consider DeDoose for a mixed-methods focussed, collaborative package. However, in some settings an online collaborative cloud-based tool may not be appropriate so serious consideration needs to be given to the implications of that approach.

LARGE SCALE ANALYSIS AND TEXT MINING (i.e. functions promoted as part of NVivo Plus)

Consider QDA Miner with or without WordStat for support of all text together with advanced text mining capabilities.

Alternatively DiscoverText plays nicely in this space with some very clever features. (However it doesn’t support REFI)

SIMPLER FEATURES SOUGHT, PARTICIPATORY ANALYSIS METHODS, SOMETHING DIFFERENT

If you want to work with something visual, simple and just for text then Quirkos is fantastic and support R-to-L scripts.

And finally…

Comments welcome and updates will follow here if/when NVivo changes or other packages adopt REFI standard for example.

Rethinking the guiding ethos of 5LQDA: from managing contradiction to harnessing creative tensions

I attended the excellent 5LQDA workshop for NVivo last week. I really can’t recommend these highly enough, as well as the books. I am actively working to integrate and develop my teaching and materials to work with, incorporate and work within the broad structure of 5LQDA and I don’t think I can personally give it a much stronger seal of approval than that!

However, this isn’t a unilateral adoption not unthinking acceptance. I want to work to adapt my materials and help to use them to scaffold and structure gaining awareness of the components of NVivo and ATLAS.ti.

The core of 5LQDA: Managing Contradiction

There is one guiding rationale of 5LQDA where my views diverge, quite strongly, from the printed word – and as it is so fundamental to the model I want to document and explore my perspective and how it differs from Christina and Nick’s book.

They state that:

The central issue is the contradiction between the nature of qualitative analysis and the nature of software used to conduct the analysis. The way these contradictions are reconciled determines the approach to harnessing the software. (P13)

And furthermore that:

there is a contradiction between the emergent nature of qualitative analysis and the step- by- step nature of computer software. The Five- Level QDA method is a way of managing this contradiction. (P157 and back cover and other blurb)

This is THE core argument of 5LQDA as a method. However there’s something that doesn’t sit quite right for me about “managing the contradiction”. The tenor of that statement and the language it evokes – of management and compromise – also seems to permeate some of the ways that potential is treated e.g.

“the potential misuse of rudimentary automated features that may be introduced in the future are concerning”. (P18)

So how to acknowledge this fundamental rationale, its reason and importance but to find a way manage the contradiction between that and my somewhat different view? Could it, itself, be translated it into something a little more positive and evocative not of managerialism and compromise but potential and opportunity?

A potential translation: from manager-subordinate to creative partnership?

I therefore hope that one way to productively resolve this is and incorporate the 5LQDA approached into my practice and teaching is through a slight tweak that I hope stays true to the intention of the original but also draws on my interests and desire for supporting step-changes in how software works in and with qualitative research.

Harnessing the creative tension between the emergent nature of qualitative analysis and the potential new and developing components in software that work in a pre-programmed way”

To me the idea of a “creative tension” is a really positive way of viewing the way that this contradiction could be played out and one that also gives a little more agency and acknowledgement to the potential of software tools to undertake new and different ways of approaching qualitative analysis in terms of scale, approach and intentions.

Thus it is neither to let new tools drive analysis but also not to place software as entirely and absolutely subservient to analytic tasks conceived without acknowledging its potential. For if those ideas and tasks and approaches are always and already prior to selecting a component – then how would those tasks develop and change in order to take advantage of the new opportunities software affords (see my previous post on technology, tactics and strategy and the tanks in ww1 for more on this)?

I’m not alone in this concern to me there is a running theme through 5LQDA that reminds me of this quote:

Over the past 50 years, the habitual nature of our research practice has obscured serious attention to the precise nature of the devices used by sociascientists (Platt, 2002; Lee, 2004). For qualitative researchers, the tape recorder became the prime professional instrument intrinsically connected to capturing human voices on tape in the context of interviews. David Silverman argues that the reliance on these techniques has limited the sociological imagination: ‘Qualitative researchers’ almost Pavlovian tendency to identify research design with interviews has blinkered them to the possible gains of other kinds of data’ (Silverman, 2007: 42).

The strength of this impulse is widely evident from thmethodological design of undergraduate dissertations to multimillion pound research grant applications. The result is a kind of inertia, as Roger Slack argues:

It would appear that after the invention of the tape recorder, much of sociology took a deep sigh, sank back into the chair and decided to think very little about the potential of technology for the practical work of doing sociology. (Slack, 1998: 1.1 0)

(Back,  2010)

And it is thinking about the potential that I think is important – rather than incredibly powerful software being subservient to the habitual nature of our research practices. “Managing the contradiction” seems to prolong that, to promote analytic strategies derived prior to and without serious attention to the potential of tools for their transformation and translation into new and different ways of working. Which segues into this great quote about how that has played out to date:

Qualitative analysts have mostly reacted to their new-found wealth of data by ignoring it. They have used their new computerized analysis possibilities to do more detailed analysis of the same (small) amount of data. Qualitative analysis has not really come to terms with the fact that enormous amounts of qualitative data are now available in electronic form. Analysis techniques have not been developed that would allow researchers to take advantage of this fact.

(Blank, 2008, p258 [3])

An example – new tools enabling the exploration of new approaches

The NSS analysis I worked on is a case in point – I was interested in seeing if and how tools could help with analysing large(r) quantities of qualitative data and how. To find out what sort of questions and analytic needs could be accomplished by the software tools. The project was therefore an exploratory one – to look at what these could do and how they could be used. But that seems to run entirely counter to the 5LQDA rationale where I should have defined the analytic task in advance and then selected the tools rather than selecting the tools and then seeing what questions they could help with. Of course at the strategic level that was the intention of the project – but the point is that with the increase in tools in QDA software to open up new and interesting ways of doing things, how is that potential going to be filtered up into developing strategies to fit new tools and their appropriate tactics? How do we follow tanks with tanks, not horses. 

Another example: CAQDAS and the ethnographic imagination

One of the key ideas in ethnography is to “make the familiar strange” (see for example Myers, 2011 here ). This runs counter to the idea of “immersion in data” and creates a dynamic, creative tension with it as a useful and essential step to reconsider conclusions or ways of thinking that are merely confirmation bias of an initial reading.

Tools such as those in NVivo to explore content and view word frequencies for example are an excellent way of “making the familiar strange” and highlighting patterns in word use that you may not have spotted – prompting new and potentially productive ways of looking at the data. Hunches about language differences can be explored further with tools such as cluster analysis. However “I want to make my data strange to help me identify things I may not spot otherwise” seems too tool-led for 5LQDA with the concepts unlikely to be rendered as strategies for immersion precisely because it runs counter to analytic intent of immersion and is produced by tools (there are loads of ways to make data strange so how you would translate that into a component? But a specific component affords this potential and from it a series of creative, perhaps unknown opportunities.)

A quick example but one that hopefully helps to illustrate why I prefer thinking of creative tensions – the seriousness of Lennon jarring and also working with the playfulness of McCartney created a myriad of tunes that individually wouldn’t have been realised – rather than the managing of contradiction. To me creative tension captures the same tensions and issues and contradictions and disputes and challenges but re-cast them in a more bi-directional and creative way, rather than the manager-subordinate of 5LQDA’s phrasing.

References:

Back, Les (2010) Broken Devices and New Opportunities: Re-imagining the tools of Qualitative Research http://eprints.ncrm.ac.uk/1579/1/0810_broken_devices_Back.pdf

Blank, G. (2008). Online Research Methods and Social Theory. In N. Fielding, R. M. Lee, & G. Blank (Eds.), The SAGE handbook of online research methods [electronic resource]: Los Angeles, Calif. ; London : SAGE.

Responses to 5LQDA pt2 – Much Ado About Affordances

Ahhh affordances – something of a bête noire for me!

This term has resurfaced again for me twice in the last two days – in reading the 5LQDA textbook on NVivo and in a discussion session/seminar I was at today with Chris Jones about devices, teaching and learning analytics who argued.

Chris argued FOR affordances on two fronts:

  1. they bring a focus on BOTH the materiality AND the interaction between the perceiver and the perceived and de-centre agency so that it exists in the interaction rather than as entirely in/of an object or in/of a person’s perception of it.
  2. despite quite a lot of well argued criticism, no-one has really proposed an equivalent or better term.

I would entirely agree with both of those statements, backing down from my usual strong view of affordances as being necessarily problematic when invoked.

(I was once told that the way to “make it” in academia was to pick an adversarial position and argue from that all the time never giving compromise and affordance critique seems a good one for that – maybe that’s why I don’t/won’t succeed in acadmeia I’m to willing to change position!)

BUT BUT BUT

Then someone does something like this:

“Think of the affordances of the program as frozen – they come fully formed, designed as the software developer thought best. In contrast think of TOOLS as emergent – we create them and they only exist in the context of use.”
(Woolf and Silver, 2017, p50)

And I end up back in my sniping position of “affordances have little merit as they mean all things to all people and even their supposedly best qualities can be cast out on a whim”. Here we see affordances stripped of ALL those interactice properties. They are now “fully formed, designed” not emergent or interactive. All of that is now being places onto the idea of a “tool” as being something that only has agency in use and in action and through interaction.

So if affordances are now tools – what then of affordances? And why is TOOL a better term?

A little background and further reading on affordances

Affordances are both an easy shorthand and a contested term (see Oliver, 2005) but one that usually retains both a common-sense understanding of “what’s easy to do” combine with a more interactionist idea of “what actions are invited”. (The latter appealing to my ANT-oriented interests in, or sensibility towards considering “non-human agency”.) I’ve read quite a lot on affordances and written on this before  in Wright and Parchoma (2011) whilst my former colleague Gale Parchoma has really extended that consideration too in her 2014 paper [4], (and also in this recorded presentation). With both of us drawing on Martin Oliver’s (2005) foundational critique [5]. I also really like Tim Ingold’s (20o0)  excellent extended explorations and extensions of Gibson’s work.

Should we keep and use a term that lacks the sort of theoretical purity or precision that may be desired because it’s very fuzziness partly evokes and exemplifies its concept? Probably.

But if it is so woolly then could “the affordances of CAQDAS” be explored systematically, empirically and meaningfully?

Could we actually investigate affordances meaningfully?

Thompson and Adams (2013, 2014) propose phenomenological enquiry as providing a basis. Within this there are opportunities to record user experience at particular junctures – moments of disruption and change being obvious ones. So for me currently encountering ATLAS.ti 8 presents an opportunity to look at the interaction of the software with my expectations and ideas and desires to achieve certain outcomes. Adapting my practices to a new environment creates an encounter between the familiar and the strange – between the known and the unknown.

However, is there a way to bring alternative ideas and approaches – perhaps even those which are normally regarded as oppositional or incommensurable with such a reflexive self-as-object-and-subject mode of enquiry? Could “affordances” be (dare I say it?) quantified? Or at least some methods and measures be proposed to support assertions.

For example, if an action is ever-present in the interface or only takes one click to achieve could that be regarded as a measure of ease – an indicator of “affordance”? Or does that stray into this fixed idea of affordances as being frozen and designed in? Or does the language used affect the “affordance” so their is a greater level of complexity still. Could that be explored through disruption – can software presented with a different interface language still “afford” things? Language is rarely part of the terminology of affordance with its roots in the psychology of perception, yet language and specific terminology seems to be the overlooked element of “software affordances”.

Could counting the steps required add to an investigation of the tacit knowledge and/or prior experience and/or comparable and parallel experience that is drawn on? Or would it merely fudge it and dilute it all?

My sense is that counts such as this, supplemented by screen shots could provide a useful measure but one that would have to be embedded in a more multi-modal approach rather than narrow quantification. This could however provide a dual function – both mapping and uncover the easiest path or the fewest steps to achieving a programmed action which will not only provide a sense or indication of simplicity/affordance vs complexity/un-afforded* (Hmmm – what is the opposite of an affordance? If there isn’t one doesn’t that challenge it’s over-use?) but also help inform teaching and action based on that research – in aprticular to show and teach and support ways to harness and also avoid or rethink these easy routes written into software that act to configure the user.

A five minute exploration – coding

Cursory checks – how much to software invite the user to “code” without doing any of the work associated with “coding”

Coding is usually the job identified with qualitative data analysis and the fucntion software is positioned to primarily support. However coding in qualitative analysis terms is NOT the same as “tagging” in software. Is “tagging” or “marking up” conflated with coding and made easy? Are bad habits “afforded” by interface?

Looking at ATLAS.ti 8 – select text and right-click:

VERY easy to create one or more codes – just right-click and code is created, no option there and then to add a code comment/definition.

Could we say then that an “affordance” of ATLAS.ti 8 is therefore creating codes and not defining them?

Looking at NVivo 11

Slightly different in that adding a new node does bring up the dialogue with an area for description – however pressing enter saves it,

Form data right-click and code > new node there is no place for defining, further supporting a code-and-code approach. This does allow adding into the hierarchy by first selecting the parent node so relational meaning is easily created – affordance = hierarchy?

AFFORDANCE = very short or one-sentence code definitions?

No way of easily identifying or differentiating commented and un-commented nodes.

Can only attach one memo to a node. The place for a longer consideration but separated.

Where next?

This is the most basic of explorations but it involves a range of approaches and also suggests interventions and teaching methods.

I really see where the 5LQDA approach seeks to work with this and get you to think and plan NOT get sucked into bad and problematic use of software – however I’m unsure of their differentiation of affordances as fixed and tools as having the properties usually ascribed to affordances…. So I definitely need to think about it more – and get other views too (so please feel free to comment) but a blog is a good place to record and share ideas-in-development, could that be “the affordance” of WordPress? 😉

 

References

Adams, C., & Thompson, T. L. (2014). Interviewing the Digital Materialities of Posthuman Inquiry: Decoding the encoding of research practices. Paper presented at the 9th International Conference on Networked Learning, Edinburgh. http://www.lancaster.ac.uk/fss/organisations/netlc/past/nlc2014/abstracts/adams.htm

Ingold, T. (2000). The perception of the environment essays on livelihood, dwelling & skill. London ; New York: Routledge.

Oliver, M. (2005). The Problem with Affordance. E-Learning, 2, 402-413. doi:10.2304/elea.2005.2.4.402 http://journals.sagepub.com/doi/pdf/10.2304/elea.2005.2.4.402

Parchoma, G. (2014) The contested ontology of affordances: Implications for researching technological affordances for fostering networked collaborative learning and knowledge creation. Computers in Human Behavior, 37, 360-368. 10.1016/j.chb.2012.05.028

Thompson, T. L., & Adams, C. (2013). Speaking with things: encoded researchers, social data, and other posthuman concoctions. Distinktion: Scandinavian Journal of Social Theory, 14(3), 342-361. doi:10.1080/1600910x.2013.838182 http://www.tandfonline.com/doi/full/10.1080/1600910X.2013.838182

Woolf, N. H., & Silver, C. (2017). Qualitative analysis using NVivo : the five-level QDA method. Abingdon: Taylor and Francis.

Wright, S., & Parchoma, G. (2011). Technologies for learning? An actor-network theory critique of ‘affordances’ in research on mobile learning. Research in Learning Technology, 19(3), 247-258. doi:10.1080/21567069.2011.624168 https://doi.org/10.3402/rlt.v19i3.17113

 

Engaging with Five Level QDA pt1 – Initial responses

The Five-Level QDA textbooks  have been top of my reading list since before they were published late last year, I’m finally getting to read them now and making notes of my reactions, responses, ideas, questions and approaches to implementation.

I’ve finally picked up the book and started reading through the NVivo edition – it took a while what with the first term of the Uni year making me very, VERY busy through to the Xmas break.

I can certainly see several blog posts coming out of it and other responses too I hope. I’m going to be very fortunate in having opportunities for learning and engagement. I’m really looking forward to attending the two training workshops for NVivo (Jan 18-19) and ATLAS.ti (Feb 12-13) . (MaxQDA still remains on my “to do/learn” list.)  And then I’ll be contributing to a session with Christina at the NCRM research methods festival, 3rd-5th July, 2018 in Bath. So this is current blogging bash is partly prep for, and also a response to, those.

Three chapters in and I must say that I really like the book. So far it’s best feature has been the “real world examples” of Christina cooking and how this evokes and illustrates the model through a non-theoretical, non-research and therefore eminently relatable and cleverly chosen analogy. It is far more effective than I expected it to be from its rather detailed and cautious rationalisation and preamble.

There are aspects I’m intrigued to see how they are developed further as I read on, and one or two points where I think my views and experience and situation differ from the authors. So I’m back in Scrivener drafting and collecting together responses to write and post here in a couple of blog posts.

Translation – in theory and practice

One of the key aspects I’m interested in – in practice and in theory – is how the idea of “translation” is central to the model:

And there is clearly a real focus and concern with getting this right – evidenced by Nick’s response and correction of Susan Freise’s interpretation of translation for ATLAS.ti. Having worked with, drawn on, and argued for Actor-Network Theory as having a well developed set of methods, intellectual tools, concepts and tools as well as a serious and sustained engagement with social sincere methods and their messiness (e.g. In John Law’s conference paper “Making a mess with method”[1] and subsequent book “After Method” [2]) this is an area I’m interested to explore further.

ANT also provides some rich resources to challenge and move beyond often simplisitc evoking of “affordances” to explain how users and technologies and methods interact – which I see lurking on page . I’ve written on this before  in Wright and Parchoma (2011) [3] and my former colleague Gale Parchoma has really extended that consideration too in her 2014 paper [4], (and also in this recorded presentation). With both of us drawing on Martin Oliver’s (2005) foundational critique [5].

Teaching Models and Their Contexts and Levels

The other BIG THING for me at least is how the 5LQDA approach can/will/could fit with other models and approaches. I’ve developed my own model for teaching ATLAS.ti and NVivo using the “backronym” POETS for:

  • Prepare (data – e.g. Formatting transcripts, naming files, organising and selecting literature)
  • Organise (importing and organising documents and literature into project folders and sets)
  • Explore (using data exploration and visualisation tools and writing annotations and memos about what you find)
  • Tag (using nodes/codes to tag and index your data to help identify phenomena of interest such as themes)
  • Synthesise (use the powerful query tools to search your data, systematically explore dimensions and variations between cases in your coding, synthesise these insights and then summarise them for your reports)

Therefore a pressing set of questions for me are:

  1. Should I just adopt 5LQDA and replace my materials and models? (I.e. is it just straight up better and something to adopt – or are their issues of translating it from Nick and Christina’s external expert status to the contexts in which I work?)
    OR
  2.  Should I adapt and develop my model and approach to work with/within 5LQDA (Could this fit in with/be adapted to/work with the 5LQDA approach, should that approach replace it
    OR
  3.  Should I borrow what I like from 5LQDA and use it to develop and adapt my teaching and materials?

There are quite a few considerations in those decisions – a blog post is in development exploring where and how the levels of 5LQDA fit with the model and other approaches and conceptualisations of instruction. Some of which I anticipate will link in to my previous posts on strategies, tactics and technological possibilities.

And finally:

Future developments – opportunities or threats?

One sentence that threw me a little was on p18 of the NVivo books: “the potential misuse of rudimentary automated features that may be introduced in the future are concerning”. Hmmm – what about them alo having potential to transform and adapt qualitative methods and push back against the apparent ceding of the territory of “big data” as a quant-only space? YES there are threats and risks but there are also opportunities. Reminds me again of one of my favourite quotes (with thanks to Daniel Turner at Quirkos for alerting me to this gem at the KWALON conference:

Qualitative analysts have mostly reacted to their new-found wealth of data by ignoring it. They have used their new computerized analysis possibilities to do more detailed analysis of the same (small) amount of data. Qualitative analysis has not really come to terms with the fact that enormous amounts of qualitative data are now available in electronic form. Analysis techniques have not been developed that would allow researchers to take advantage of this fact.
(Blank, 2008, p258 [6])

What’s next?

An aspiration is certainly to see which of the preceding areas generate interest and conversation, and if those might then help to lay the foundations for a more structured / serious exploration and development…  which seems to cluster around the future directions of CAQDAS and how to help prepare people for that. So if you have questions, ideas or responses please post a comment below or on your blog and let’s see where this could go…

References and Links:

1 – Law, J. (2003). Making a Mess with Method In Practice (pp. 1-12).
http://www.lancaster.ac.uk/fass/resources/sociology-online-papers/papers/law-making-a-mess-with-method.pdf

2 – Law, J. (2004). After method: mess in social science research. London: Routledge.
https://books.google.co.uk/books?id=dtZ-AgAAQBAJ&lpg=PP1&dq=after%20method&pg=PP1#v=onepage&q=after%20method&f=false

3 – Wright, S., & Parchoma, G. (2011). Technologies for learning? An actor-network theory critique of ‘affordances’ in research on mobile learning. Research in Learning Technology, 19(3), 247-258. doi:10.1080/21567069.2011.624168 https://doi.org/10.3402/rlt.v19i3.17113

4 – Parchoma, G. (2014) The contested ontology of affordances: Implications for researching technological affordances for fostering networked collaborative learning and knowledge creation. Computers in Human Behavior, 37, 360-368. 10.1016/j.chb.2012.05.028

5 – Oliver, M. (2005). The Problem with Affordance. E-Learning, 2, 402-413. doi:10.2304/elea.2005.2.4.402 http://journals.sagepub.com/doi/pdf/10.2304/elea.2005.2.4.402

6 – Blank, G. (2008). Online Research Methods and Social Theory. In N. Fielding, R. M. Lee, & G. Blank (Eds.), The SAGE handbook of online research methods [electronic resource]: Los Angeles, Calif. ; London : SAGE.
https://uk.sagepub.com/en-gb/eur/the-sage-handbook-of-online-research-methods/book245027#preview

“But can I bring my notes?” Ideas and investigations on improving Literature Import into CAQDAS Software

Update: ATLAS.ti 22 finally does it!!!!

5 years on from this post and the latest update from ATLAS.ti has finally brought this feature in… Gamer changer? I think so. ATLAS.ti 22 is certainly leading the way with amazing features right now! More details in this article from ATLAS.ti.

Introduction

It’s evident that a key area where CAQDAS software is having an impact on research practices is in literature reviews. This may be rather “old news” for those working with NVivo (available since version 9, released 2010) or MaxQDA (available since v11, released in 2012) but it is relatively recent for ATLAS.ti (version 8, released 2016) and for some seems a strange departure – this software is for empirical data isn’t it? We have Reference management software for managing literature and notes on that don’t we? Well – yes, and no – and this blog post explores some of the crossovers, continuities and contested spaces between these two major types of research support software for unstructured data.

Now, my background is as an ATLAS.ti user so this trend still seems relatively recent to me – it wasn’t even on my radar when I was setting out on my PhD thesis in 2012. The focus in books and articles and tutorials was on working with empirical data using varying shades of Grounded Theory-derived and/or thematic-orientated approaches to analysis. I didn’t ven think of importing literature – but by the time I came to writing up and was desperately searching through my PDf notes made in Endnote x7 and finding the search function to be very (very) poor I was frustrated by my inability to seamlessly link my annotations and their groupings via codes from my empirical work with the theoretical ideas I’d already written a lot of notes on and highlighted extensively in Endnote.

Come the end of my thesis and in subsequent work – especially with the launch of the ATLAS.ti iPad app as a great PDF reader I started to engage with literature reviews. A few blog posts starting to appear (e.g. Dr. Ken Riopelle’s experiments with the mobile app http://atlasti.com/2014/03/26/how-to-use-atlas-ti-mobile-app-with-the-browzine-app-for-literature-reviews/ ). As prep for a job interview I used the ATLAS.ti app to look at connections between my PhD work and the work related to the research and the research team – I didn’t get the job (though I came close 2nd and got useful feedback) but I did get to write it up and begin building connections with ATLAS.ti’s training programme http://atlasti.com/2014/06/12/1722/ )

Part 2) The most frequent questions about importing literature

When I’m teaching PhD students and research staff about making an informed choice and then using CAQDAS effectively, I draw on these experiences to strongly advocate for the sense and power and potential of undertaking the lit review in a CAQDAS package. This is often seen as rather novel, however, the potential is typically recognised pretty quickly especially when contrasted with the limits on classification, grouping search and retrieval of notes made in current ref management software. But it is essential to consider and account for this recognition of the potential is always in the context of, and in relation to, existing practices of managing, highlighting, annotating and summarising literature.

Unsurprisingly therefore… the following question always comes up:

“OK so I can import the reference info and the documents – can I import the notes I’ve made?”

The answer is… no.

The result is disappointment, and frequently a decision to stick with current practices due to these barriers. And it’s those barriers and steps to remove them that are the focus of this extended blog post.

And to show I’m not just making this up here’s an example – from a presentation on NVivo for lit reviews by Silvana di Gregorio at the NVivo @ Lancaster event:
https://vimeo.com/223259096/84d441ca75#t=1195s

This student has made extensive notes in Mendeley and understandably wants to import those as well as the PDFs.

Now the highlights will display but they will not be integrated into the programme architecture and all the work and ideas in those notes are left behind – to be re-created slowly and repetitively one-by-one via copy and paste. Or abandoned. Or (more likely) the lit and this practice will stay in Mendeley as a result.

HOWEVER, that phrase “slowly and repetitively one-by-one via copy and paste” seems all wrong – as it is EXACTLY that sort of thing that computers excel at doing reliably, quickly and automatically. If you have to do exactly the same thing over and over and over to move data from one place to another SURELY a computer should be doing that for you?

With that as the basis the rest of this article considers in turn:

Part 3) What Reference management software is, does and the practices it supports and has extended in to and the relationships with CAQDAS

Part 4) Turns to look more broadly at good ways and recommended practices for working with research literature and how these are supported in RM software compared with CAQDAS.

Part 5) Takes a deeper focus on RM software and changing priorities and associated practices from a focus on bibliographic accuracy to supporting reading and review.

Part 6) Turns towards practical ideas and proposals for improving import of PDFs from Rm software

Part 7) Turns to applying this in practice, in the hope of giving some help to the developers by bringing together my explorations through linking to standards, code, APIs etc.

Part 8) Lays out annotated segments of the code exported from Acrobat Reader of PDf annotations and notes and the relationship to the XML exported from ATLAS.ti to put these ideas into a coded context.

Part 9) Concludes this essay and also anticipates possible objections and potential approaches to mitigate those.

Then there are appendices of links to resources and some extended detail on the development and feature history of leading RM and CAQDAS packages

I draw on my experience of using and teaching CAQDAS software (ATLAS.ti and NVivo) and also using and teaching effective use and workflows for literature management and review software.

Part 3) What does ref management software do?

Reference Management software has evolved to extend beyond its original place in the research process: the end and writing and including in-text citations and constructing a bibliography.

They were extended to support the start of a literature review (searching for and importing references and attaching the full text).

Increasingly they are now seeking to support the middle – the actual work not just the admin – which is the reading, and working with the literature.

There’s a good table of comparisons and history at https://en.wikipedia.org/wiki/Comparison_of_reference_management_software

Gilmour and Kuo (2011) give a succinct list for reference managers (RM):
RMs serve a variety of functions. Generally, we would expect an RM to be able to:

  1. Import citations from bibliographic databases and websites
  2. Gather metadata from PDF files
  3. Allow organization of citations within the RM database
  4. Allow annotation of citations
  5. Allow sharing of the RM database or portions thereof with colleagues
  6. Allow data interchange with other RM products through standard metadata formats (e.g., RIS, BibTeX)
  7. Produce formatted citations in a variety of styles
  8. Work with word processing software to facilitate in-text citation

http://www.istl.org/11-summer/refereed2.html

Some of these are specific to RM functionality, others have continuities and impact on working with CAQDAS in literature reviews.

Point 4 that is the key point of continuity in practice and the focus for this blog post/series as that is where the interaction with CAQDAS software becomes important in terms of annotation of citations.

CAQDAS software is in a different area from points 1 and 2 which concern finding and organising literature (though with potential to learn from 2 for auto coding perhaps?).
Point 3 – organising references in the database – is important for CAQDAS to help organise imported data.
It has its own way(s) of addressing point 5 with regard to sharing projects in a research team.
There is a need to have connections to point 6 to support exporting a literature review with a meaningful connection to the references.
When it comes to writing and creating a bibliography it is not, currently, in the same game for points 7 and 8. However, in “next-generation CAQDAS” there could well be similar requirements for this sort of export to enable referencing to project items stored in data archives and referenced via open-data formats to support referencing the underlying data in a project.

Part 4) Approaches and Recommendations for working with research literature

With both RM software and CAQDAS contesting and seeking to become key actors in the middle stage of working with literature – what is this work? Well here are some useful quotes I often draw on:

Recording your Reading

By the time you begin a research degree. it is likely that you will have learned the habit of keeping your reading notes in a word processed file, organized in terms of (emerging) topics. I stress ‘reading notes’ because it is important from the start that you do not simply collate books or photocopies of articles for ‘later’ reading but read as you go. Equally, your notes should not just consist of chunks of written or scanned extracts from the original sources but should represent your ideas on the relevance of what you are reading for your (emerging) research problem.
(Silverman, 2013, p. 340)

Silverman then goes on to cite Phelps et. al.’s succinct suggestions:

Phelps, Fisher, and Ellis (2007) TABLE 19.1 Reading and Note Taking

▪ Never pick up and put down an article without doing something with it
▪ Highlight key points, write notes in the margins, and write summaries elsewhere
▪ Transfer notes and summaries to where you will use them in your dissertation
▪ Ensure that each note will stand alone without you needing to go back to the original
Adapted from Phelps et al. (2007)
(cited in Silverman, 2013, p. 341)

Drawing on these we can see that working with literature is another qualitative practice – literature after all is text that you are reading, analysing and interacting with in ways that are analogous to many qualitative analysis practices.

Phelps’ four points talk can be translated into CAQDAS and RM software support features and practices – which equate to “doing something”.

  • Notes in the margins (quotation comments – ATLAS.ti, Annotations – Nvivo, Sticky Notes – RM software)
  • Summaries elsewhere (linked memos and/or document comments – ATLAS.ti and NVivo, Notes – RM software)
  • Transfer notes and summaries to where you will use them: on a computer that’s the promise of these packages: they ARE where you will use them, and for RM software they hook in to where you will cite them (through memo links and project exports in CAQDAS, through cite-while-you-write plugins with access to the notes in RM software)

The “lightbulb” moment for students comes when contrasting how these approaches are supported by RM software compared with how CAQDAS can/could. I pose the following questions:

  1. What do you do currently?
  2. Where and when do you read?
  3. How do you highlight/annotate/summarise?
  4. How do you group those highlights/annotations/summaries together?
  5. How do you relate these pertinent segments literature together?
  6. How do you find and retrieve highlights, quotes and their associated notes?

It is points 4, 5 and 6 that really articulate the power and potential of CAQDAS – the issues of grouping, relating and locating the notes and ideas and insights they have had.

This can be contrasted with the limited grouping and search functions in RM software:

Illustration – search in PDF notes in Endote X7, which identifies8 documents with multiple comments where the word I’m searching for appears – but doesn’t show content of notes or even which note the word appears in! :

BLOG-image-Endnote-searchingPDfNotes

CAQDAS software opens up the potential of doing this by using coding to group together quotes and notes on them. Bazeley (2013) suggests that these will cluster around three areas: methods, topic and theory. This would suggest highlighting, annotating and grouping those highlights and annotations (via codes) based on:

  • different methods used and
  • previous explorations of the topic
  • collecting together results and their significance
  • the different framing of the topic and methods in different theories by different authors and theorists,

(The terms in italics can be used to structure coding for a literature review which CAQDAS software then enables approaches to explore co-occurrences between those codes which can be further explored using the reference information to track patterns within and across different types or eras of literature.)

Part 5) RM packages in focus: priorities, changes and practices

Gilmour and Cobus-Kuo’s (2011) paper  “compares four prominent RMs: CiteULike, RefWorks, Mendeley, and Zotero, in terms of features offered and the accuracy of the bibliographies that they generate.” The focus, which is the historical place of RM software, is on generating a bibliography and the accuracy of that. This is the core work of RM software and clearly differentiated from and not commensurate with CAQDAS. However developments of the packages lead them to engage with Mead and Berryman’s argument that: “it is not the users themselves who have changed, but their workflow” (Mead and Berryman 2010).

“The all-too-familiar scenario as discussed in the literature depicts the researcher with many PDFs stored in various places who needs a tool to simply upload the documents and pull the citation information into their RM product of choice (Mead and Berryman 2010; Barsky 2010).” (Gilmour and Cobus-Kuo, 2011)

However, this scenario differs markedly from the location that CAQDAS software seeks to engage with in the lit review workflow, the one that has only recently become integrated into RM software – that of actually working with PDF’s in the terms considered above -annotating, highlighting and grouping segments and notes together based on shared features.

In terms of CAQDAS’ role in the lit review process extracting reference data plays a supporting role of organising the documents in a project for the purposes of helping to order or filter queries of the metadata added through coding and annotating.

In terms of a literature review then Gilmour and Kuo’s question of “What are the primary and secondary needs of the user based on workflow?” is particularly pertinent.

What I find interesting in the question I receive from users – exemplified earlier – is just how much the workflow and use of RM software seems to have changed. For those who are engaging in read and annotating electronically this aligns to a Gilmore and Cobus-Kuo’s (2011) observation that there will be a shift towards “new researchers who are more flexible in their work habits and may be more willing to learn new RMs that provide Web 2.0 functionality and PDF features.”

What emerges from these overlapping (albeit unsystematic and partial views derived from my practices and those of students I have worked with) is a picture of some RM software having taken residence into the space that CAQDAS is seeking to (re)define and “own” – that of working WITH literature in terms of active reading and engagement with the texts. However, CAQDAS software has a set of compelling features and options that are substantially more developed than those in RM software, as well as the prospect of being the core management environment for analysing and connecting both literature and empirical data – which RM software will (probably) never do – with even the ambitious ideas of Colwiz (https://www.colwiz.com/about ) sticking to group management of literature not project management or empirical data.

This space is therefore outside the historic and traditional realm of RM software and is potentially an area where RM software could learn from both CAQDAS and Note making software and CAQDAS needs to substantially enhance its integrations if it wishes to really tempt and engage existing, increasingly sophisticated RM software users.

Part 6) Ideas and approaches to improving import of literature into CAQDAS software

If CAQDAS software is to make a bigger play for recognition as a particularly useful type of tool for conducting lit reviews – which manufacturers certainly seem keen to do (cf blog for ATLAS.ti at http://atlasti.com/2017/02/09/lit-reviews/ and blog for NVivo at http://www.qsrinternational.com/blog/hone-your-nvivo-skills-with-literature-reviews and guide from MaxQDA http://www.maxqda.com/maxqda-literature-reviews-reference-management-software ) – then there is surely a strong case for substantially removing barriers and improving the migration from some of the tools and practices considered here to both facilitate and encourage transition. This would also attract users to do reviews in these more powerful packages with the features outlined previously – namely multiple categorisation of notes and quotes (through coding), advanced retrieval (through queries), and connected writing (through memos).

As noted previously and illustrated with an example of the question being asked: If you’ve started doing a lot of your lit review in Mendeley, or Zotpad, or Endnote and you’ve made a lot of highlights and notes on PDFs you will want to preserve and use this work. It seems reasonable that software claiming to do all of those things better should be able to import the work you have already done and support you to build on it.

It doesn’t.

Could it?

From what I’ve been finding out it seems the answer is potentially yes – and what I now proceed to do is to sketch out ideas of how this could be done and some of the initial things I’ve been finding out.

There’s quite a caveat though: I’m a user of software with reasonable technical understanding but I’m not, never have been, and never will be “a programmer” so there are parts of this where I’m speculating, making educate guesses or don’t understand it fully at present – but would really welcome input from those more adept at programming and knowledgeable of the complex an consulted PDF standard(s).

High level view of improved import of PDFs from RM software

Import references and linked PDFs with additional option to include PDF comments (and ideally highlights) to be translated into the CAQDAS programme structure (e.g. as quotations with comments in ATLAS.ti, or as annotations in NVivo)

Other desirable features:

1) Importing highlights as well?

Whilst it is the case that highlights will display on the imported PDF they will not become translated into actual project elements. If they were imported rather than merely displayed then “highlight” annotations would appear in the list of all annotations in NVivo allowing quick retrieval of highlighted passages. The merit may be rather marginal but shouldn’t be dismissed.
So, if these could be imported then that could either as quotations without a comment (in ATLAS.ti) and with a code of “highlight” and either an element colour or a code of “highlight – hello”
In NVivo they could be imported as coded segments with a node named “highlight” and the appropriate element colour in NVivo.

2) Import any keywords from notes

(if applicable – still exploring this in mendeley and zotero) as code names for these items.

3) Import metadata

This could include colours authors, dates etc.

Part 7) Exploring this in practical terms for developers – standards, codebases, APIs etc

So… HOW COULD YOU DO THE EXPORT?
It looks like I’m not the only one puzzling about this based on this on github: https://github.com/nichtich/marginalia/wiki/Support-of-PDF-annotations

So this is where it gets a little more sketchy and I hit the limits of my knowledge – I’m hoping there are some good as a second loop or option in the import procedure so it was seamless across ref management programmes.

I anticipate this would involve some sort of loop for the programme on import – import ref management data, check if PDF attached (so far the same), then check if the imported (or to-be-imported) PDFs have annotations, if so export annotations as XFDF and then import the details from the XFDF into the programme structure.

I explore this in more detail below.

Alternative / interim approaches – getting the RM software to do the annotation export
However as this is something of a “nice to have” alternatives could be clear sets of instructions for using features in software or third party apps to export data into a format that can then be imported and annotated onto the PDFs. This sort of interim/experimental release stage could require that the user is required to export the XFDF files.

Mendeley

This seems more advanced in some packages than others e.g. Mendeley enables this on a document by document basis to export an annotated PDF (see https://blog.mendeley.com/2012/04/19/how-to-series-how-to-export-your-annotations-alone-or-with-your-pdf-part-8-of-12/ )

Illustration: Exporting annotated PDFs from Mendeley

BLOG-image-Mendeley Export PDF menu Screen Shot 2017-07-03 at 19.41.52

There is a python library on GitHub: https://github.com/Xunius/Menotexport to do this in bulk. However this wouldn’t create XFDF files.

Zotero

ZotPad as a plugin for zotero appears to offer bulk export of PDFs and extraction of annotations (see http://zotfile.com/#extract-pdf-annotations )
Again, no XFDF export.

Endnote

Unsurprisingly Endnote doesn’t seem to do much here – despite user requests dating back to 2014 http://community.thomsonreuters.com/t5/EndNote-Product-Suggestions/Export-PDF-annotations-highlight-notes-etc/td-p/59388
However there are ways to export multiple PDFs to a folder (see http://community.thomsonreuters.com/t5/EndNote-How-To/Exporting-PDFs-to-a-separate-folder/td-p/53127 ) in order to then work with them via Acrobat Reader or Pro. Bulk export of comments therefore isn’t great, but is possible.

Papers

Papers is Mac only but does support exporting notes, annotations and comments.
http://support.mekentosj.com/kb/share-share-and-export-collections-and-content/how-to-export-notes-and-annotations-from-papers-3-for-mac

Adobe Acrobat Reader DC

Acrobat Reader enables exporting via FDF (proprietary) and XFDF (XML based) formats (see https://helpx.adobe.com/acrobat/using/importing-exporting-comments.html ) which can be done from the free acrobat Reader DC (see https://forums.adobe.com/thread/1942791 )

BLOG-image-exportingCommentsFromAcrobatPro

Acrobat Pro

This can be automated to be done in bulk via Acrobat Pro using a script (see https://forums.adobe.com/thread/1385576 ) else Aspose offer a commercial .net library to do this (see https://docs.aspose.com/display/pdfnet/Importing+and+Exporting+Annotations+to+XFDF )

LINK: Example FDF file for comparison with XFDF (proprietary file) https://lancaster.box.com/s/utjh0s72unmfxdvxhuh1xddnn0myxuh5

Mobile Apps

If you’re not using ATLAS.ti for iPad for annotating PDFs (which is great! Unfortunately though there’s no app for iPhone and the Android app doesn’t support PDFs), or MaxQDA app for iOS (iPhone/iPad) or Android then then it is likely that other apps have inserted themselves into the space for reading and annotating.

Popular apps include:

GoodReader

Has excellent export of annotations as a flat text file via email, but doesn’t look set to create XFDF files.

iAnnotate

Similar export options to GoodReader – as a text file identifying properties (page, highlight o underline colour, text highlighted) but no clear pathway to XFDF export.

Code Snippets, APIs and Scripts I’ve identified

Commercial libraries and APIs for .net along with clear articles setting out principles an processes and formats are available from ASPOSE https://docs.aspose.com/display/pdfnet/Importing+and+Exporting+Annotations+to+XFDF

There’s some java code for getting the annotations: https://gist.github.com/i000313/6372210

And a python script to extract PDF comments too https://gist.github.com/ckolumbus/10103544

The XFDF standard

So XFDF is the standard for this area – here’s some more on it:
XFDF ISO Documentation https://www.iso.org/obp/ui/#iso:std:iso:19444:-1:ed-1:v1:en
And these are the latest Q’s on stack overflow
https://stackoverflow.com/questions/tagged/xfdf

Part 8) Mapping elements from Acrobat Reader XFDF Export to ATLAS.ti XML Export.

Whilst the inner workings of NVivo are rather obfuscated and it offers no coded export, ATLAS.ti by contrast is somewhat clearer in the ways it works with programme elements which can be exported as XML. (MaxQDA does as well – see http://www.maxqda.com/maxqda-export-options-the-new-xml-export – however as I’m only just starting to learn that software and hope to look at this again later)Whilst there is (as yet) no XML standard for interoperability between CAQDAS packages – something the KWALON project has been working on (see conference report at http://www.dlib.org/dlib/march17/karcher/03karcher.html for an account of the conference session), nor an option to import the ATLAS.ti XML it at least gives an opportunity for looking at continuities between XFDF and ATLAS.ti elements for potential import.

My Process for exploring and annotating XFDF and ATLAS.ti XML code:

1 – I marked up a PDF document in Endnote, using highlights, underlines and comments.

BLOG-image-exampleOfUnderlyingAnnotatedPDF

2 – Opened the annotated PDF attachment from Endnote in Acrobat Reader DC. Exported comments from Acrobat Reader as an XFDF file

BLOG-image-PDFannotationPaneInAcrobatPro  > BLOG-image-exportingCommentsFromAcrobatPro

FILE LINK – XFDF export – https://lancaster.box.com/s/edon8znhjh4py9f606t1qtf349vjaq1m

3 – Imported the document into ATLAS.ti Mac and marked it up in an equivalent way to how I envisage import could/would work as outlined above.

BLOG-image-ATLAS.ti marking up PDF

LINK – ATLAS.ti Project bundle https://lancaster.box.com/s/62c6xzeor9t74xoojn78lev6eqxpi7ti

4 – Opened the XFDF file in DreamWeaver to look at the structure, elements and attributes

5 – Exported the ATLAS.ti project as XML and opened that in Dreamweaver to explore the structure, elements and attributes.

BLOG-image-ExportATLAStiXML Screen Shot 2017-07-06 at 13.09.13

ATLAS.ti PROJECT FILE LINK https://lancaster.box.com/s/vx48sl3vixtktukgl5rjzyja0z56pyhr

6 – Commented the two XML files to note continuities and potential equivalencies between them – see below.

Links
Annotated XFDF FILE https://lancaster.box.com/s/tw3qiud5bdxziz08bgzaso26wq8mkn1f
Annotated ATLAS.TI XML FILE https://lancaster.box.com/s/vx48sl3vixtktukgl5rjzyja0z56pyhr

7 – Made all the above available via Box

8 – Added the example code with my annotations below within textarea tags

NEXT STEPS:
(9 – Hustle and flatter the awesome ATLAS.ti Mac developer Friedrich Markgraf, aka Fritz, aka @fzwob to read this and think about implementing it 😉

10 – Do the same for NVivo and MaxQDA and see if either the competitiveness of this market or the co-operation of developers around things like XML standards helps get this implemented in one or more packages.

11 – Get on with something less geeky… 😉

Annotated XML Examples

The key annotations here are all between the brackets.

Annotated XFDF File Exported from Acrobat Reader

The following code is displayed based on information on using the sourcecode element – detailed at https://en.support.wordpress.com/code/posting-source-code/.

<!-- XML DTD onitted -->
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<!-- annots collects together all the annotations -->
	<annots>
		<!-- *** highlight *** is one of the main waus of marking up text in a PDF- potentially useful to import as a quotation based on the coords and then add a code of "highlight" along with allocating the same color to the code  -->
		<highlight  			color="#FFFF00"  			flags="print"  			date="D:20130615195221+01'00'"  			name="c0096ebd-aa1b-7d48-894a-95b72c9f2399"  			page="0"  			coords="514.652000,326.134000,622.026000,326.134000,514.652000,314.528000,622.026000,314.528000,624.150000,326.139000,781.075000,326.139000,624.150000,314.533000,781.075000,314.533000,471.566000,313.191000,602.565000,313.191000,471.566000,301.585000,602.565000,301.585000,604.330000,313.189000,780.594000,313.189000,604.330000,301.583000,780.594000,301.583000,471.590000,300.231000,540.806000,300.231000,471.590000,288.624000,540.806000,288.624000,542.050000,300.229000,781.168000,300.229000,542.050000,288.623000,781.168000,288.623000,471.490000,287.269000,781.711000,287.269000,471.490000,275.663000,781.711000,275.663000,471.500000,274.299000,780.689000,274.299000,471.500000,262.693000,780.689000,262.693000,471.476000,261.341000,551.463000,261.341000,471.476000,249.734000,551.463000,249.734000,550.690000,261.339000,781.594000,261.339000,550.690000,249.733000,781.594000,249.733000,471.490000,248.379000,774.987000,248.379000,471.490000,236.773000,774.987000,236.773000,471.510000,235.429000,611.514000,235.429000,471.510000,223.823000,611.514000,223.823000" rect="471.476000,223.823000,781.711000,326.139000"  			title="Steve" 			>
			<popup  				flags="print,nozoom,norotate"  				open="no"  				page="0"  				rect="827.640015,206.134003,1007.640015,326.134003" 			/>
		</highlight>
<!-- other lines cut here -->
	<!-- *** underline *** is one of the wahys of marking up text in a PDF- potentially useful to import as a quotation based on the coords and then add a code of "underline" along with allocating the same color to the code -->
		<underline  			color="#0000FF"  			flags="print"  			date="D:20130616180638+01'00'"  			name="847814b0-ca2c-434a-bdc1-8fb56b678584"  			page="1"  			coords="71.422000,383.299000,167.391000,383.299000,71.422000,371.805000,167.391000,371.805000,190.030000,383.639000,356.882000,383.639000,190.030000,371.732000,356.882000,371.732000,47.047000,370.332000,51.620000,370.332000,47.047000,358.837000,51.620000,358.837000,52.550000,370.329000,130.752000,370.329000,52.550000,358.835000,130.752000,358.835000,132.380000,370.576000,149.254000,370.576000,132.380000,358.785000,149.254000,358.785000,156.620000,370.689000,331.049000,370.689000,156.620000,358.747000,331.049000,358.747000"  			rect="47.047000,358.747000,356.882000,383.639000"  			title="Steve">
			<popup  				flags="print,nozoom,norotate"  				open="no"  				page="1"  				rect="825.119995,263.298996,1005.119995,383.298996"/>
		</underline>

	<!-- *** text *** is the most important element for importing - these are the comments -->
	<!-- *** color *** attribute could be used to give a color to the element in the CAQDAS package -->
	<!-- <icon> could be used to give a code for this element in the CAQDAS package -->
	<!-- *** rect *** is co-ordinates for this comment on the PDF, nearest equivalent woudl eb a selection by area and then coding that -->
	<!-- <title> seems to map to author -->
	<text  		color="#FFFF00"  		flags="print,nozoom,norotate"  		date="D:20130616180638+01'00'"  		name="f7a56df4-b0b6-3342-b856-2a54b4bd250b"  		icon="Comment"  		page="1"  		rect="361.296997,333.329010,379.296997,351.329010"  		title="Steve" 	>
		<!-- *** contents *** is the KEY element - this is the actual content of a textual comment -->
		<contents>
			Contrasts with views from Bourdieu where taste is a way of at ratifying and dominating rather than something constructed
		</contents>
		<!-- * popup * appears redundant as this controls the display on scren of the comment which has no equivalent or relevance in CAQDAS packages -->
		<popup  			flags="print,nozoom,norotate"  			open="no"  			page="1"  			rect="396.297000,239.329000,646.297000,351.329000" 		/>
	</text>
</annots>
<!-- **<f>** is the file reference for the file itself - will be essential for co-ordinating the XFDF with the imported file -->
<f href="../Documents/My EndNote Library.Data/PDF/0914600930/Akrich-1992-DeScriptionOfTechnicalObjects_inSh.pdf" />
<ids original="EEE4ED80D36A11E280FEA0F5ADA9D1EA" modified="9C468E0F3E2DC5E695A4B9500B40565A" />
</xfdf>
<!-- remaining code omitted in this illustration -->
 

Annotated ATLAS.ti XML File Exported from ATLAS.ti Mac

The following code is displayed based on information on using the sourcecode element – detailed at https://en.support.wordpress.com/code/posting-source-code/.

<!-- DTD and initial tags omitted -->
<!-- Identifying the primary documents -->
    <primDocs size="2">
        <primDoc name="Akrich-1992-DeScriptionOfTechnicalObjects_inSh.pdf" id="pd_1_1" loc="doc_1" au="Steve Admin" cDate="2017-07-04T09:48:58" mDate="2017-07-04T09:48:58" qIndex="">
			<!-- Identifying start of quotations -->
            <quotations size="12">
				<!-- q is the tag for an individual quotation -->
                <q name="Iamarguing,therefore,thattechnicalobjectsparticipatein   ing heterogeneous networks that bring toget…" id="q1_1_1" au="Steve Admin" cDate="2017-07-04T10:04:34" mDate="2017-07-04T10:04:34" loc="start=368 end=531 startpage=1 endpage=1">
					<!-- ***  content  *** denotes the actual content of the quotation, ie the actual copy on the page, equivalent in XFDF for a highlight would be the mass of co-ords -->
                    <content size="163">

Iamarguing,therefore,thattechnicalobjectsparticipatein   ing heterogeneous networks that bring together actants of all types and sizes, whether human or nonhuman.3

                    </content>
                </q>
                <q name="But how can we describe the specific role they play within these networks? Because the answer has to…" id="q1_2_2" au="Steve Admin" cDate="2017-07-04T10:04:40" mDate="2017-07-04T10:04:40" loc="start=532 end=820 startpage=1 endpage=1">
                    <content size="288">

But how can we describe the specific role they play within these networks? Because the answer has to do with the way in which they build, maintain, and stabilize a structure of links between diverse actants, we can adopt neither simple technological determinism nor social constructivism.

                    </content>
                </q>
				<!-- q is the tag for a quotation for an area of the PDF that is empty - equivalent to the display of the comment icon on screen. THe loc values map to rect values for text element in XFDF -->
                <q name="Quotation 1:3" id="q1_3_3" au="Steve Admin" cDate="2017-07-04T10:06:18" mDate="2017-07-04T10:12:16" loc="x=359 y=338 width=23 height=23 page=1">
					<!-- A *** comment *** with a type of text is equivalent to the contents element within the text element in XFDF -->
                    <comment type="text/html" size="121">

Contrasts with views from Bourdieu where taste is a way of at ratifying and dominating rather than something constructed

                    </comment>
                </q>
                <q name="To do this we have to move constantly between the technical and the social" id="q1_4_4" au="Steve Admin" cDate="2017-07-04T10:06:31" mDate="2017-07-04T10:06:31" loc="start=3748 end=3822 startpage=1 endpage=1">
                    <content size="74">

To do this we have to move constantly between the technical and

the social

                    </content>
                </q>
                <q name="To do this we have to move constantly between the technical and the social." id="q1_5_5" au="Steve Admin" cDate="2017-07-04T10:07:16" mDate="2017-07-04T10:07:16" loc="start=3748 end=3823 startpage=1 endpage=1">
                    <content size="75">

To do this we have to move constantly between the technical and

the social.

                    </content>
                </q>
                <q name="echnological determinism pays no attention to what is brought together, and ultimately replaced, by…" id="q1_7_6" au="Steve Admin" cDate="2017-07-04T10:08:13" mDate="2017-07-04T10:08:13" loc="start=827 end=1070 startpage=1 endpage=1">
                    <content size="243">

echnological determinism pays no attention to what is brought together, and ultimately replaced, by the structural effects of a net- work. By contrast social GO tivi denies the Q.bchu:a"C_J ofobjects and assumes that oul peupi ean ave at1Js s.

                    </content>
                </q>
                <q name="The boundary is turned into a line of demarcation traced, .. within a geography ofdelegation,4 betwe…" id="q1_8_7" au="Steve Admin" cDate="2017-07-04T10:08:33" mDate="2017-07-04T10:08:33" loc="start=4051 end=4232 startpage=1 endpage=1">
                    <content size="181">

The boundary is turned into a line of demarcation traced, ..

within a geography ofdelegation,4 between what is assumed by the technical object and the competences of other actants.

                    </content>
                </q>
                <q name="the description of these elementary mechanisms ofad- justment poses two problems, one ofmethod and t…" id="q1_9_8" au="Steve Admin" cDate="2017-07-04T10:09:09" mDate="2017-07-04T10:09:09" loc="start=4241 end=4365 startpage=1 endpage=1">
                    <content size="124">

the description of these elementary mechanisms ofad- justment poses two problems, one ofmethod and the other ofvocab- ulary.

                    </content>
                </q>
                <q name="Quotation 1:10" id="q1_10_9" au="Steve Admin" cDate="2017-07-04T10:09:54" mDate="2017-07-04T10:09:54" loc="x=361 y=245 width=22 height=21 page=1"/>
                <q name="Quotation 1:11" id="q1_11_10" au="Steve Admin" cDate="2017-07-04T10:10:01" mDate="2017-07-04T10:10:01" loc="x=362 y=183 width=20 height=27 page=1">
                    <comment type="text/html" size="265">

Hugely significant para and one to empirically investigate in my data: firstly to what extent do style guides constrain how bodies relate to tasted objects, and second how can these links be characterised, how far can style guides be re-shaped, manipulated or used?

                    </comment>
                </q>
                <q name="Quotation 1:12" id="q1_12_11" au="Steve Admin" cDate="2017-07-04T10:10:09" mDate="2017-07-04T10:10:09" loc="x=362 y=108 width=27 height=28 page=1">
                    <comment type="text/html" size="193">

Competences being significant here as it is that competency that is being assessed, but the assessment is contingent on knowing, remembering and applying (implicitly accepting) the style guides

                    </comment>
                </q>
                <q name="Quotation 1:13" id="q1_13_12" au="Steve Admin" cDate="2017-07-04T10:10:51" mDate="2017-07-04T10:10:51" loc="x=361 y=156 width=32 height=28 page=1">
                    <comment type="text/html" size="61">

Boundary here, does or can this relate to "boundary objects"?

                    </comment>
                </q>
            </quotations>
        </primDoc>
        <primDoc name="Back - 2012 - Tape recorder-annotated.pdf" id="pd_2_2" loc="doc_2" au="Steve Admin" cDate="2017-07-04T10:01:28" mDate="2017-07-04T10:12:34" qIndex="">
            <quotations size="0"/>
        </primDoc>
    </primDocs>
    <codes size="2">
		<!-- codes is the list of codes - potentially used to transfer highlight types in with the name equalling their colour?-->
        <code name="highlight color=yellow" id="co_1" au="Steve Admin" cDate="2017-07-04T10:06:49" mDate="2017-07-04T10:06:49" color="" cCount="0" qCount="5"/>
        <code name="underline" id="co_2" au="Steve Admin" cDate="2017-07-04T10:08:21" mDate="2017-07-04T10:08:21" color="" cCount="0" qCount="1"/>
    </codes>
<!-- remaining code omitted in this illustration -->
 

Part 9) Concluding thoughts (and anticipating objections)

So that’s been rather long but hopefully with some point and use value! However it’s always clear that development priorities are set to allocate limited resource to an extended and never-ending list of fixes and improvements. Despite this coming up so often when teaching whether it has registered in terms of “user requests” is an unknown.

There are also two probable lines of objection I anticipate:

Developers – this is too difficult/varied/complex and marginal benefit

Companies/Sales/Marketing: this is too complex to do slickly and simply for our users.

Potential approaches to mitigate these objections:

Lots of tech companies are enabling “experimental features” – for example Tumblr https://www.theverge.com/2016/5/11/11655050/tumblrs-new-labs-program-lets-users-test-experimental-features , Google Chrome –http://ccm.net/faq/32470-google-chrome-how-to-access-and-enable-experimental-features and Firefox https://developer.mozilla.org/en-US/Firefox/Experimental_features
This approach enables development and prototyping beta testing then an experimental/opt-in release for a self-selecting group of typically more advanced users. It’s like an extra beta test and can do several key things:

  1. Enable engaging with a skilled user base for a practical pre-release test period
  2. Build a relationship with users to suggest features and develop what amount to support materials and workarounds – helping those working on programme documentation.
  3. Creating a space for features where the expectation is that the user may need to do some work or define some procedures and processes to get data to the stage needed for import – thus reducing the developer load

(In this model an interim stage may be that for advanced users opting in they can import comments from Mendeley but they either have to export one-by-one or use a third party tool. Once they’ve done what’s needed the experimental feature will do the import you requested. It then becomes an imperative on the RM user base to request a feature for bulk-export of annotated PDFs from their respective RM manufacturer or consortium, or via third party development. (Which sets up Mendeley and Zotero to do this quickly, whilst Endnote developers Thompson Reuters are pretty poor at responding to feedback and requests – certainly in my experience!)

These then become potentially powerful ways of improving a product pre-launch but also showing a more engaged and open way of working with a user base. Furthermore, as sort of approach might enable some more collaborative and innovative ways of trialling new features and collecting feedback and even crowd-sourcing support and documentation.

Conclusion:

So there we have it – ideas and approaches to improving lit import for PDF notes along with a bunch of ideas about working with lit in CAQDAS and relationships between practices. I personally think the prize for “converting” new users to a product might be quite significant as whoever nails it first and/or best can expect to have a real jump in usage if other factors are equal.

Next steps include looking at MaxQDA more to explore ideas for import there – however the programmers there are VERY adept and I hope there’s enough here to support translation into their architecture and terminology.

Anyway, thanks for reading, PLEASE comment. Oh, and if anyone thinks some of this might be worth presenting or publishing then suggestions VERY welcome too. be publishable (in a newsletter for a company? A book chapter? A practitioner journal or in a different form in an academic journal then suggestions VERY welcome too)

References

Barsky, E. (2010). Mendeley. Issues in Science and Technology Librarianship, Summer. doi:10.5062/F4S46PVC http://www.istl.org/10-summer/electronic.html

Bazeley, P. (2013). Qualitative data analysis : practical strategies. London: SAGE. https://uk.sagepub.com/en-gb/eur/qualitative-data-analysis/book234222

Gilmour, R., & Cobus-Kuo, L. (2011). Reference management software: a comparative analysis of four products. Issues in Science and Technology Librarianship, 66(66), 63-75. http://www.istl.org/11-summer/refereed2.html?a%5C_aid=3598aabf

Mead, T. L., & Berryman, D. R. (2010). Reference and PDF-manager software: complexities, support and workflow. Medical Reference Services Quarterly, 29(4), 388-393. doi:10.1080/02763869.2010.518928 http://dx.doi.org/10.1080/02763869.2010.518928

Phelps, R., Fisher, K., & Ellis, A. (2007). Organizing and managing your research: a practical guide for postgraduates. London: London : SAGE.  https://uk.sagepub.com/en-gb/eur/organizing-and-managing-your-research/book228894

Silverman, D. (2013). Doing qualitative research. London: London : Sage. https://uk.sagepub.com/en-gb/eur/doing-qualitative-research/book239644

Appendix 1 – Lit Import Development and History into the leading CAQDAS packages

Lit import into NVivo arrived in version 9 (http://help-nv9-en.qsrinternational.com/procedures/exchange_data_between_nvivo_and_reference_management_tools.htm ) and has remained relatively stable since – importing RIS information into the source classification sheet as well as the document description and a linked memo. The full text is imported with any highlighting visible and can then be annotated and coded.

Lit import into ATLAS.ti only came in much more recently with version an update to version 8 (see http://atlasti.com/2017/02/09/lit-reviews/ and 8 http://downloads.atlasti.com/docs/whatsnew8.pdf)

MaxQDA introduced literature import in v11 in 2012. They have brought increasing focus to this through providing a guide to lit reviews for users http://www.maxqda.com/maxqda-literature-reviews-reference-management-software

Appendix 2 – Details of Lit management Apps

Mendeley:

Mendeley is popular, based on a freemium model and – from my perspective at least – made a BIG impact on changing the view of the potential for reference management software to become a core part of the research process far beyond the basic origins of compiling reference lists on a single workstation. to be seen has extensively supported working across computers via cloud sync as well as having a very slick way of annotating PDFs on screen and being able to search those notes (see https://blog.mendeley.com/2012/08/28/how-to-series-how-to-search-your-notes-and-other-fields-part-10-of-12/)

Some Mendeley history:

Inception in 2008 (https://blog.mendeley.com/2008/03/11/hello-world/)
Launch of iPhone app in 2010 ( https://blog.mendeley.com/2010/07/21/our-first-iphone-app-has-arrived/ )
Improvements to app in 2011 (https://blog.mendeley.com/2011/05/23/mendeley-ios-app-gets-an-update/ )

Endnote:

Endnote has been around for a long time to manage reference lists in word. Mendeley came along and kind of re-wrote what reference management software could achieve in terms of not just being about citing work but actually integrating into the whole process of locating, grouping, reading and annotating then citing. Endnote has being playing catch up for years, with a few bumps and BAD mis-steps on the road (like trying to sue the open-source competition: https://en.wikipedia.org/wiki/EndNote#Legal_dispute_with_Zotero )
In terms of functionality it finally got to where Mendeley was in 2008 about five years late with the launch of X7 in 2013 (see ref: Endnote version history: https://en.wikipedia.org/wiki/EndNote#Version_history_and_compatibility ) – though in a FAR less well-designed or easy-to-use way that still feels clunky and retrofitted not designed-in.

However, the mobile implementation was also a challenge (high force for the app initially at £12.99 with start-of-year sales, then dropped to £2.99, now free). Initially it was VERY limited to (literally) scribbling on your iPad screen without it doing anything more than that with version 1 (launched Jan 25th 2013) – it was only with the release of 1.1 in Jan 31st 2014 that the Mendeley type functionality became available:

Version 1.1 (Jan 31, 2014)

– Expanded set of PDF annotation tools include inserting notes, highlighting, underlining, shapes, strikethrough and free hand drawing
– PDF annotations made on EndNote desktop or online can be viewed, edited, and searched in the app
– PDF annotations made in older versions of the app will be saved and made editable with the new tools
– New Reference Types include Podcast, Press Release, and Interview
– Updated Reference Types include Conference Paper, Blog, Data set, Thesis, and Manuscript
Details from – https://www.appannie.com/apps/ios/app/endnote-for-ipad/details/

In practice: Analysing large datasets and developing methods for that

A quick post here but one that seeks to place the rather polemic and borderline-ranty previous post about realising the potential of CAQDAS tools into an applied rather than abstract context.

Here’s a quote that i really like:

The signal characteristic that distinguishes online from offline data collection is the enormous amount of data available online….

Qualitative analysts have mostly reacted to their new-found wealth of data by ignoring it. They have used their new computerized analysis possibilities to do more detailed analysis of the same (small) amount of data. Qualitative analysis has not really come to terms with the fact that enormous amounts of qualitative data are now available in electronic form. Analysis techniques have not been developed that would allow researchers to take advantage of this fact.

(Blank, 2008, p258)

I’m working on a project to analyse the NSS (National Student Survey) qualitative textual for Lancaster University (around 7000 comments). Next steps include analysing the PRES and PTES survey comments. But that;s small fry – the biggie is looking at the module evaluation data for all modules for all years (~130,000 comments!)

This requires using tools to help automate the classification, sorting and sampling of that unstructured data in order to be able to engage with interpretations. This sort of work NEEDS software – there’s a prevailing view that this either can’t be done (you can only work with numbers) or that it will only quantify data and somehow corrupt it and make it non-qualitative.

I would argue that isn’t the case – tools like those I’m testing and comparing including the ProSUITE from Provalis including QDA Miner/WordSTAT, Leximancer and NVivo Plus (incorporating Lexalytics) – enable this sort of working with large datasets based on principles of content analysis and data mining.

However these only go so far – they enable the classification of data and its sorting but there is still a requirement for more traditional qualitative methods of analysis and synthesis. I’ve been using (and hacking) framework matrices in NVivo Plus in order to synthesise and summarise the comments – an application of a method that is more overtly “qualitative data analysis” in a much more traditional vein but yet applied to and mediated by tools that enable application to much MUCh larger datasets than would perhaps normally be used in qual analysis.

And this is the sort of thing I’m talking about in terms of enabling the potential of the tools to guide the strategies and tactics used. But it took an awareness of the capabilities of these tools and an extended period of playing with them to find out what they could do in order to scope the project and consider which sorts of questions could be meaningfully asked, considered and explored as well. This seems to be oppositional to some of the prescriptions in the 5LQDA views about defining strategies separate from the capabilities of the tools – and is one of the reasons for taking this stance and considering it here.

Interestingly this has also led to a rejection of some tools (e.g. MaxQDA and ATLAS.ti) precisely due to their absence of functions for this sort of automated classification – again capabilities and features are a key consideration prior to defining strategies. However I’m now reassessing this as MaxQDA can do lemmatisation which is more advanced than NVivo plus…

This is just one example but to me it seems to be an important one to consider what could be achieved if we explore features and opportunities first rather than defining strategies that don’t account for those. In other words: a symbiotic exploration of the features and potentials of tools to shape and define strategies and tactics can open up new possibilities that were previously rejected rather than those tools and features necessarily or properly being subservient to strategies that fail to account for their possibilities.

On data mining and content analysis

I would highly recommend reading Leetaru (2012)  for a good, accessible overview of data mining methods and how these are used in content analysis. These give a clear insight into the methods, assumptions, applications and limitations of the aforementioned tools helping to demystify and open what can otherwise seem to be a black-box that automagically “does stuff”.

Krippendorf’s (2013) book is also an excellent overview of content analysis with several considerations of human-centred analysis using for example ATLAS.ti or NVivo as well as automated approaches like those available in the tools above.

References:

Blank G. (2008) Online Research Methods and Social Theory. In: Fielding N, Lee RM and Blank G (eds) The SAGE handbook of online research methods. Los Angeles, Calif.: SAGE, 537-549.

Preview of Ch1 available at https://uk.sagepub.com/en-gb/eur/the-Sage-handbook-of-online-research-methods/book245027

Krippendorff, K. (2012). Content analysis: An introduction to its methodology. Sage.

Preview chapters available at https://uk.sagepub.com/en-gb/eur/content-analysis/book234903#preview

Leetaru, Kalev (2012). Data mining methods for the content analyst : an introduction to the computational analysis of content. Routledge, New York

Preview available at https://books.google.co.uk/books?id=2VJaG5cQ61kC&lpg=PA98&ots=T4gZpIk4in&dq=leetaru%20data%20mining%20methods&pg=PP1#v=onepage&q=leetaru%20data%20mining%20methods&f=false 

On agency and technology: relating to tactics, strategies and tools

This continues my response to Christina Silver’s tweet and blog post. While my initial response to one aspect of that argument was pretty simple this is the much more substantive consideration.

From my perspective qualitative research reached a crossroads a while ago, though actually I think crossroads is the wrong term here. A crossroads requires a decision, it is a place steeped in mystery and mythology (see https://en.wikipedia.org/wiki/Crossroads_(mythology) ), I sometimes feel as though qualitative research did a very british thing: turned a crossroads into a roundabout thus enabling driving round and round rather than moving forwards or making a decision.

The crossroads was the explosion in the availability of qualitative data. Previously access to accounts of experience were rather limited – you had to go into the field and write about it, find people to interview, or use the letters pages of newspapers as a site of public discourse. These paper-based records were slow and time consuming to assemble, construct and analyse. For the sake of the metaphor that follows I shall refer to these as “the cavalry era” of qualitative research. Much romanticised and with doctrines that still dominate from the (often ageing, pre-digital) professoriat.

Then the digital didn’t so much happen as explode and social life expanded or shifted online:

For researchers used to gathering data in the offline world, one of the striking characteristics of online research is the sheer volume of data. (Blank, 2008, P539)

BUT…

Qualitative analysts have mostly reacted to their new-found wealth of data by ignoring it. They have used their new computerized analysis possibilities to do more detailed analysis of the same (small) amount of data. Qualitative analysis has not really come to terms with the fact that enormous amounts of qualitative data are now available in electronic form. Analysis techniques have not been developed that would allow researchers to take advantage of this fact. (Blank, 2008, P.548)

Furthermore the same methods continue to dominate – the much vaunted reflexivity that lies at the heart of claims for authenticity and trustworthiness does not seem to have been extended to tools, methods:

Over the past 50 years the habitual nature of our research practice has obscured serious attention to the precise nature of the devices used by social scientists (Platt 2002, Lee 2004). For qualitative researchers the tape-recorder became the prime professional instrument intrinsically connected to capturing human voices on tape in the context of interviews. David Silverman argues that the reliance on these techniques has limited the sociological imagination: “Qualitative researchers’ almost Pavlovian tendency to identify research design with interviews has blinkered them to the possible gains of other kinds of data” (Silverman 2007: 42). The strength of this impulse is widely evident from the methodological design of undergraduate dissertations to multimillion pound research grant applications. The result is a kind of inertia, as Roger Stack argues: “It would appear that after the invention of the tape-recorder, much of sociology took a deep sigh, sank back into the chair and decided to think very little about the potential of technology for the practical work of doing sociology” (Slack 1998: 1.10).

My concern with the approach presented and advocated by Silver and Woolf is that it holds the potential to reinforce and prolong this inertia. There are solid arguments FOR that position – especially given the conservatism of academia, mistrust of software and the apparently un-slayable discourses (Paulus, lester & Britt, 2013), entrenched critical views and misconceptions of QDAS software that “by its very nature decontextualizes data or primarily supports coding [which] have caused concerned researchers” (Paulus, Woods, Atkins and Macklin, 2017)

BUT… BUT… BUT…

New technologies enable new things – when they first arrive they are usually perhaps inevitably and restrictively fitted in to pre-existing approaches and methods, made subservient to old ways of doing things.

A metaphor – planes, tanks and tactics

I’ve been trying to think of a metaphor for this. The one I’ve ended up with is particularly militaristic and I’m not entirely comfortable with it – especially as metaphors sometimes invite over-extension which I fear may happen here. It also feels rather jingoistically “Boys Own” and British and may be alienating to key developers and methodologists in Germany. So comments on alternative metaphors would be MOST welcome, however given the rather martial themes around strategies and tactics used in Silver and Woolf’s (2015) paper and models for 5level QDA I’ll stick with it and explore tactics, strategies and technologies and how they historically related to two new technologies: the tank and the plane.

WW1 saw the rapid development of new and terrifying technologies in collision with old tactics and strategies for their use. The overarching strategies were the same (defeat the enemy) however the tactics used failed to take account of the potential of these new tools thus restricting their potential.

Cavalry were still deployed at the start of WW1. Even with the invention of tanks the tactics used in their early deployments were for mounted cavalry to follow up the breakthroughs achieved by tanks – with predictably disastrous failure at the battle of cambrai see https://en.wikipedia.org/wiki/Tanks_in_World_War_I#Battle_of_Cambrai ).

Planes were deployed from early in WW1 but in very limited capacities – as artillery spotters and as reconnaissance. Their potential to change warfare tactics were barely recognised nor exploited.

These strategies were developed by generals from an earlier era – still wedded to the cavalry charge as the ultimate glory. (See https://en.wikipedia.org/wiki/Cavalry#First_World_War ). Which seems to be a rather appropriate metaphor for professorial supervision today with regard to junior academics and PhD students.

The point I’m seeking to make is to suggest that new technologies vary in their complexity, but they also vary in their potential. Old methods of working are used with new technologies and the transformative potential of those new technologies on methods or tactics to achieve strategic aims is often far slower, and can be slowed further when there is little immediate incentive to change (unlike say a destructive war) in the face of an established doctrine.

My view is therefore that those who do work with and seek to innovate with CAQDAS tools  need to seek to do more than just fit in with the professorial field-marshall Haig’s of our day and talk in terms of CAQDAS being “fine for breaching the front old chap you know use CAQDAS to open up the data but you send in the printouts and transcripts to really do the work of harrying the data, what what old boy”.

Meanwhile Big Data is the BIG THING – and this entire sphere of large datasets and access to public discourse and digital social life threatens to be ceded entirely to quantitative methods. Yet we have tools, methods and tactics to engage in that area meaningfully by drawing on existing approaches which have always been both qual and quant (with corpus linguistics and content analysis springing to mind).

Currently the scope of any transformation seems to be pitched to taking strategies from a “cavalry era” of qualitative research. My suggestion is that to realise the full potential of some of the tools now available in order to generate new, and extend existing, qualitative analysis practices into the diverse new areas of digital social life and digital social data we need to be bolder in proposing what these tools can achieve and what new questions and datasets can be worked with. And that means developing new strategies to enter new territories – which need to understand the potential of these tools and explore ways that they can transform and extend what is possible.

If, however, we were to place the potential of these tools as subservient to existing strategies and to attempt to locate all of the agency for their use with the user and the way that we “configure the user” (Grint and Woolgar, 1997) in relation to these tools through our pedagogies and demonstrations we could limit those potentials. Using NVivo Plus or QDA Miner/WordSTAT to reproduce what could be done with a tape recorder, paper, pen and envelopes seems akin to sending horses chasing after tanks. What I am advocating for (as well, not instead) is to also try to work out what a revolutionary engagement with the potential of the new tools we have would look like for qualitative analysis with big unstructured qualitative data and big unstructured qualitatitve data-ready tools.

To continue the parallel here – the realisation of what could be accomplish by combining the new technologies of tanks and planes together created an entirely new form of attacking warfare – named Blitzkrieg by the journalists who witnessed its lightning speed. This was developed to achieve the same overarching strategies as deployed in WW1 (conquering the enemy) but by considering the potential and integration of new tools it developed a whole new mid-level strategy and associated tactics that utilised and realised the potential of those relatively new technologies. Thus it avoided becoming bogged down in the nightmare of using the strategies and tactics from a bygone era of pre-industrial warfare with new technologies that prevented their effectiveness which dominated in WW1. My suggestion is that there is a new territory now – big data – and it is one that is being rapidly and extensively ceded to a very quantitative paradigm and methods. To make the kind of rapid advances into that territory in order to re-establish qualitative analysis as having relevance we need to be bolder in developing new strategies that utilise the tools rather than making these subservient to strategies from an earlier era in deference to a frequently luddite professoriat.

My argument thus simplifies to the idea that the potential of tools can and should productively shape not only the planning and consideration the territories now amenable to exertion and engagement but also the strategies and tactics to do that. Doing that involves engagement with the conceptualisation, design and thinking about what qualitative or mixed-methods studies are and what they can do in order that this potential is realised. From this viewpoint Blitzkrieg was performed into being by the new technologies of the tank and the plane and their combination with new strategies and tactics. These contrast with the earlier subsuming of the plane’s potential to merely being tools to achieve strategies that were conceptualised before its existence. A plane was there equivalent to a tree or a balloon for spotting cannon fire. Much of CAQDAS use today seems to be just like this – sending horses chasing after tanks – rather than seeking to achieve things that couldn’t be done without it and celebration that.

This is all rather abstract I know so I’ve tried to extend and apply this into a consideration of implementation in practice working with large unstructured datasets in a new post.

References

Back L. (2010) Broken Devices and New Opportunities: Re-imagining the tools of Qualitative Research. ESRC National Centre for Research Methods

Available from: http://eprints.ncrm.ac.uk/1579/1/0810_broken_devices_Back.pdf

Citing:

Lee, R. M. (2004) ‘Recording Technologies and the Interview in Sociology, 1920-2000’, Sociology, 38(5): 869-899

E-Print available at: https://repository.royalholloway.ac.uk/file/046b0d22-f470-9890-79ad-b9ca08241251/7/Lee_(2004).pdf

Platt, J. (2002) ‘The History of the Interview,’ in J. F. Gubrium and J. A. Holstein (eds) Handbook of the Interview Research: Context and Method, Thousand Oaks, CA: Sage pp. 35-54.

Limited Book Preview available at https://books.google.co.uk/books?id=uQMUMQJZU4gC&lpg=PA27&dq=Handbook%20of%20the%20Interview%20Research%3A%20Context%20and%20Method&pg=PA27#v=onepage&q=Handbook%20of%20the%20Interview%20Research:%20Context%20and%20Method&f=false

Silverman D. (2007) A very short, fairly interesting and reasonably cheap book about qualitative research, Los Angeles, Calif.: SAGE.

Limited Book Preview at: https://books.google.co.uk/books?id=5Nr2XKtqY8wC&lpg=PP1&pg=PP1#v=onepage&q&f=false

Slack R. (1998) On the Potentialities and Problems of a www based naturalistic Sociology. Sociological Research Online 3.

Available from: http://socresonline.org.uk/3/2/3.html

Blank G. (2008) Online Research Methods and Social Theory. In: Fielding N, Lee RM and Blank G (eds) The SAGE handbook of online research methods [electronic resource]. Los Angeles, Calif. ; London : SAGE.

Grint K and Woolgar S. (1997) Configuring the user: inventing new technologies. The machine at work: technology, work, and organization. Cambridge, Mass.: Polity Press, 65-94.

Paulus TM, Lester JN and Britt VG. (2013) Constructing Hopes and Fears Around Technology. Qualitative Inquiry 19: 639-651.

Paulus T, Woods M, Atkins DP, et al. (2017) The discourse of QDAS: reporting practices of ATLAS.ti and NVivo users with implications for best practices. International Journal of Social Research Methodology 20: 35-47.

Silver C and Woolf NH. (2015) From guided-instruction to facilitation of learning: the development of Five-level QDA as a CAQDAS pedagogy that explicates the practices of expert users. International Journal of Social Research Methodology 18: 527-543.