Seminar Recordings on Transcription, CAQDAS and more

Quick post this – just sharing and embedding two recent events I’ve hosted.

The first one form the inspiring, fantastic and highly-recommended CAQDAS Networking project Seminar Series .

Qualitative Transcription for the 21st Century: combining automated transcription with human interpretation using CAQDAS packages. 

Dr. Steven Wright, Learning and research technologist at Lancaster University, UK

17th CAQDAS webinar in which Dr Steve wright discusses automated transcription with human interpretation using CAQDAS packages.

And one for the Centre for Technology Enhanced Learning in the Dept. Educational Research at Lancaster University for their Show and TEL seminars:

Technologies for qualitative analysis, transcription and mixed methods

Discussion with Steve Wright

Downloading YouTube videos, captions and comments in ATLAS.ti

The scraper also works for Facebook and other social media comment scraping !

Background

From working with a PhD student it became clear that NCapture in NVivo no longer works – and hasn’t for about a year:

So this knocked out the use of NCapture to streamline importing a YouTube video and comments (and potentially in parallel the captions) into NVivo for analysis. However, this also creates an opportunity.

I generally prefer working with video in ATLAS.ti due to its different interface and less restrictive or structured options to transcripts and coding video vs transcripts separately rather than together.


Furthermore, for the project that prompted this investigation, there are methodological considerations that soemwhat favoured ATLAS.ti over NVivo as there was a priority for making conceptual connections between data (i.e. links) rather than allocating data segments to conceptual categories (i.e. via coding).

And that is where ATLAS.ti excels in comparison to other CAQDAS programmes.

So – how to replicate the functions of NCapture without it? Time to put the call out!

What can and can’t you do with YouTube videos?

NCapture allows one-click browser-based capture of a YoutTube videos – and should enable comment download too. The video is then streamed into NVivo and can be treated as if it is in NVivo in terms of working with the video – you can select, code, add transcript rows etc. You can also import and view the comments as a dataset. However, there is no direct import of transcript/captions and for at least the last year comment scraping is broken (doesn’t work in Chrome and IE not supported).

ETHICS and LEGALITY:

So it’s not technically illegal to download a YouTube video – however there are ethical considerations over what sort of video and who published it. With this project – looking at TED videos – these are not from an individual reliant on advertising revenue and there’s no individual to ask for permission to analyse.(See more at https://www.techadvisor.com/how-to/internet/is-it-legal-download-youtube-videos-3420353/#:~:text=For%20personal%20use%2C%20no%20it,of%20our%20industry%2C%20too).

Does it violate terms of use? Maybe but by a strict reading so does NCapture with it’s offline playback outside YouTube platform or app.

So if there are no legal or ethical considerations about downloading videos and comments for analysis it becomes a question of technical implementation.

YouTube Video Downloaders and Comment Scrapers

NCapture provided video linking to YouTube (treating them like an external video file hosted on your machine) – it would be great to see that sort of internet linking supported in ATLAS.ti and MAXQDA (if you can link to a local file why not an online one?).

So to work in ATLAS.ti with the video you need to download it.

I looked at Freemake Video downloader (https://freemake.com/free_video_downloader/) but hit issues as itst adds a logo at the start, thereby breaking the timestamp link for any downloaded transcript.

Downloading Video and Captions with 4k Video Downloader

4k Video downloader is the best I found 9and highest rated/recommended in TechRadar’s great article)

You just copy and paste the YouTube URL – and then it gives you options for quality *and* caption download (where available) including second captions in auto-translated languages:

The subtitles are downloaded as an SRT file – which can be directly imported into ATLAS.ti as a linked/synchronised transcript – RESULT (and way more than NCapture did).

So that gets the video and captions in – what about the comments.

Scraping and Importing the Comments

This has proved a little trickier and still has some slight anomalies I’m trying to iron out

I started out with https://app.coberry.com/ for comments – it has some great features including output as PDF or as dataset and enables sentiment analysis. BUT I also found a load of issues from the TED trsnaxritp I started working with including strings of characters like this:

😨😨😨
Â
’
‘

These require a SnR in Excel – but it wasn’t easy to pick up all of them. What ARE they?

I then tried a few others and read this good but out-of-date article with a bunch of recommendations. However the YouTube Comments Exporter chrome plugin is broken (looks like same issue as NCapture). I also looked at SEOBOTS but it charges you and had very poor data exports and threw a bunch of errors.

The best tool I found – which also opens up Facebook scraping into ATLAS.ti – is export comments. This also identified where the issues about those strange text strings came in coberry from as it would download various emoji style characters e.g.

👍
🌴
etc

Exportcomments.com DOES charge for over 100 comments – however the rates allow you to pay a modest US$11 for 3 days and the options for TikTok, Facebook etc. plug a perceived “gap” for ATLAS.ti.

THOUGHT: Is it a gap? The issue of NCapture suggests to me that enabling import of comments from custom tools might be more important than trying to develop a new software suite based on external APIs that is subject to breakage by a third party change…

An anomaly in ATLAS.ti Windows – wherefore art though emojis?

So here’s where it’s got a bit weird – those characters that caused a mare in Coberry don’t display properly in ATLAS.ti on Windows.

So the good news – ATLAS.ti for Mac works fine with emojis:

Emojis from YouTube comments displaying in ATLAS.ti for Mac
And in the Document preview

However there is an issue in Windows – no problem in document preview:

Document Preview in document manager on ATLAS.ti 9 Windows – icon shows

But when you open the document itself it’s not shown nor codable:

So where’s the palm tree gone? ATLAS.ti 9 on windows main document view for coding.

Considerations: Onscreen style comments printed or dataset?

The challenge here comes from HOW to bring in comments – and that’s one of the current limits with ATLAS.ti, there are only “Codes” and working with code groups is limited. So you can’t have a type of code for “cases” (e.g. comment authors) and give those codes attrbutes. You sort of can – using code groups but they can’t be used in the same way as document groups to easily compare across conceptual/thematic codes in the same way so then you have to create smart codes and it all gets complex.

So you need to decide: see comments together in a similar presentation to the way the appear on screen (and don’t use document groups) OR import as a dataset and work with document groups but with less familiar and potentially decontextualised chains of replies.

If you use Coberry you can easily print as PDF to have all comments on one page and then code for author AND/OR export. However any emojis will be garbled. Exportcomments.com only offers dataset download.

Marking up the comments for import

This next bit is a bit of a work in progress as I try to figure out which bits do still work and which work well for working with the downloaded comments as a dataset. Some of the information I’d found on survey import no longer works so it’s best to refer to the online manual for ATLAS.ti mac or the online manual for ATLAS.ti Windows about which prefixes can be added to column headings. However, some do not seem to work as described (I had no joy with .

I have had some issue with time/date imports as well which I’m trying to resolve.

Export Comments raw data is like this:

  Name (click to view profile):DateLikesisHeartedisPinnedComment(view source)
Raw export comments data

I chose to work with this data as follows:

Comment Export List

 !CommentNo Name#NameDate:Date:Likes.isHearted.isPinnedCommentSource URL
1JCJC26/05/2021 21:51:532021-05-260nonoSo this is where the last line of Rhett&Link’s Geek vs Nerd Rap Battle came from.
view comment
Data prepared for ATLAS.ti Import

As you can hopefully see I duplicated the name and date columns to use them as data in the document and a way to classify details. With the date column I duplicated, formatted as yyyy-mm-dd then cut, pasted it into notepad/textedit, formatted cells as text and pasted it back via paste special, paste as text to get dates-as-text.

This created a document per comment numbered by col 1 both including author name and date and in the comment as well as groups.

This *seems* to work quite well – but I’m still working on it!

Coming soon:

Here: Final bits on the comment import process and also alternatives via coberry.

A new post: A proper focus on analysis processes and opportunities – making use of split screens/tab groups, hyperlinks, networks and some coding to stitch all of this together.

Transcribing with Microsoft Word Online and CAQDAS packages

This blog post outlines the steps of working with MS Word online (part of office 365) to generate automatic transcripts and import them into your CAQDAS package as a transcript.

By bringing audio and synchronised transcripts into a CAQDAS package you gain the opportunity to engage with the data as you correct the transcript – bringing the immersion that is often touted as the key benefit of manual transcription and linking this in with the tools you’ll use for analysis (annotation, memoing, coding) and bringing the efficiencies of automated transcription (typically speeding the process up from 5-7 hours manual transcription per hour of audio to 1-2 hours of correction and engagement per hour of auto transcribed audio).

CONTENTS:

Why use Word?

Working with Word – video

Preparing and importing a transcript into ATLAS.ti

Preparing and importing a transcript into MAXQDA

Preparing and importing a transcript into NVivo

Why use Word?

Microsoft Word transcription has a LOAD of advantages:

  1. First and foremost – it’s free to many students, researchers and academics as part of an institutional Office 365 license or their own personal Office 365 subscription.
  2. It’s VERY good and amazingly accurate.
  3. As a result of the first it is likely to be approved for data management as part of your institutional policies for REC purposes and research data as that will be part of the office 3656 agreement E.G. see this from Lancaster University.
    • While a lot of students may use Descript, Trint, Otter.ai or others they probably aren[”t compliant with Ethics requirements on research data!
  4. It’s multi-lingual (though the documentation claims otherwise – so it’s unclear how multilingual!)
  5. It’s familiar.
  6. It’s simple.
  7. There’s good documentation information and detailed step-by-steps

So that’s a pretty powerful ansd solid list.

Limitations, Options and considerations

There’s a key limitation with Word: time. You have up to 5 hours (300 minutes) per month. After that you can’t transcribe. This may change (it could become a charged service for more – who knows). There’s information and detailed step-by-steps for the word part here, however it’s not entirely accurate as it can and does work in languages other than EN-US.

There are two key options: synchronised or not.

Why synchronise?

Synchronisation allows you to listen to the audio/view the video that accompanies a segment of transcription. This has a lot of potential for analysis a listening to the audio gives you the opportunity to engage with (and add to analysis) the “four Ts” of spoken language that are lost in transcription: Tone, Tempo and Tenor (or Timbre) which all carry a LOT of meaning that is lost when spoken interaction is reduced to the written word. Don’t believe me? Try relaxing and unwinding to this.

With video there is even more to be gained by synchronising a transcript to the audio – and the opportunity is then therte to add additional informatyion to the transcript to make it a visual transcript too.

So, where synchronisation is easy and you can work with the text easily with or without the synchrnised audio/video (as is the case with ATLAS.ti and MAXQDA) this, to me, is a bit of a no-brainer – import synchronised.

Why not synchronise?

This is perhaps a more pragmatic decision where technology is going to get in the way. Unfortunately, with NVivo, it is substantially more fiddly, error-prone and usually involves quite a lot of to-and-fro correction so it’s then worth thinking a little more about costs vs benefits for correction and engagement.

Working with Word

The video below shows some basic steps for working with MS Word

Preparing and importing a transcript into ATLAS.ti

Preparing transcripts in word

Importing audio and transcript into ATLAS.ti 9

NOTES:
The (awesome) ATLAS.ti focus group coding tool requires the speaker name to be on a new line either preceded by @ or followed by :

So.. When renaming the speakers in Word include a colon after the speaker name

OR if you forget – do this when running the search and replace above.

The main search and replace would be to change the timestamp, space,speaker id, paragraph mark (^p), text

00:00:01 SW

content here

To:

00:00:01 
SW   content here  

timestamp,paragraph mark(^p),speaker id,tab(^t),text.

So the speaker ID was SW it would be

Search for: SW^p

Replace with: ^pSW:^t

Preparing and importing a transcript into MAXQDA

The video below shows this process. (Please bear with – I need to work more on grasping the process of speaker coding via search and code before I document this further here.)

The paraphrase function in MAXQDA makes it an amazing tool for this process as it’s so good at supporting the move from correction and making notes to coding.

Preparing and importing a transcript into NVivo

With NVivo the question of whether or not to import synhchronised is worth a little more consideration.

Importing is a bit of a pain, the steps to take are more complex and the steps to add in order to debug the steps make it harder still. The interface is a bit clunky and it’s harder to see, code and work with the transcript.

Preparing an unsynchronised transcript

So… if you’re happy with just correcting the transcript in Word and don’t really need to engage with the three T’s then consider sticking to that and just export with speaker names and import into NVivo as a document.

I’d definitely recommend having a second document open (e.g. in a text editor or notes app like OneNote) as you make your corrections in word to make notes and reflections as you work through correcting the transcript. You’d then be able to add that as a linked source memo in NVivo to connect those initial notes from correcting.

Preparing a synchronised transcript to import into NVivo- step-by-step:

To make this work effectively you’ll need to convert the text into a table and number the rows – this makes auto-coding for speaker and debugging errors WAY easier. It’s not too hard – honest!

Preparing Transcripts for NVivo

STEPS

  1. Search and replace in word to get timestamp, speaker name and transcript onto a single row using S’n’R
  2. Check it’s correct
  3. Number the table rows
  4. Close
  5. Import
  6. Use error messages and table row info to debug
  7. Import again
  8. Repeat 6&7 till it works
  9. Listen, Correct and engage with the data through annotating
  10. Write up reflections and insights in a memo
  11. Auto-code by speaker name

The video below takes you through this step-by-step.

NOTES:

00:00:01 SW
Content here

The main search and replace would be to change the timestamp, space,speaker id, paragraph mark (^p), text

To: timestamp,tab (^t),speaker id,tab (^t),text

 00:00:01   SW   Content here 

If the speaker ID was SW it would be

Search for: SW^p

Replace with: ^tSW^t

You’d then insert the column to the left and auto-number.

Importing transcripts into NVivo

Having prepared you then import – and as you’ll see you usually cycle back a few times and will find the steps of doing this in a table and auto-numbering the rows invaluable in this!

Using NVivo to Correct and Code Transcripts generated automatically by Teams, Stream or Zoom

Introduction

Prerequisites

The following are important prerequisites. You will need:

  1. A media file that is either:
    1. A recording within Microsoft Teams saved to Stream
      OR
    2. A media file you can convert and upload to Microsoft Stream*
      OR
    3. An audio or video recording through an institutionally licensed Zoom account (with subtitling enabled)
      OR
    4. A recording from another system that outputs a subtitle file (that you will then convert to VTT)
  2. Installed version of NVivo – the following is illustrated for R1.
  3. Installation of the free VLC media player

Process

  1. Create a media file with subtitle file in VTT format
  2. Download the media file and the subtitle file
  3. Clean the subtitle file ready for import.
  4. Import the media file into NVivo
  5. Importing the cleaned subtitles as a synchronised transcript into NVivo
  6. Listen to the media file and read the synchronised transcript in order to begin analysis through
    • Correcting the transcript
    • Labeling speakers
    • Making notes (annotation)
    • Initial coding the transcript

Step One – Create a media file with subtitle file in VTT format

Depending where you start there are a few ways this will work – all have the same end point: a media file and a VTT transcript. It’s all detailed over in this post.

The introductory video was created with Teams, another was created in Zoom. You can also (currently) upload videos to Stream or use a wide range of other applications and system to create an automatic transcript of a media file.

Step Two – Download the media file and the subtitle file

Exporting from Zoom

NOTE: Zoom will (attempt to) label speakers based on their name in the zoom call – consider if you need to anonymise this as it’s easily done at this stage.

Step Three – Clean the subtitle file ready for import using an online tool

This is the essential step of development work that bridged the gap from a VTT file to an NVivo-ready file.

Option 1 – Clean the VTT file into NVivo-ready format online

Go to https://www.lancaster.ac.uk/staff/ellist/vtttocaqdas.html

Upload your VTT file, Click convert, download the text file.

Option 2 – create your own copy of the converter

Go to the GitHub page at https://github.com/TimEllis/vttprocessor

Step Four – Import the media file into NVivo

First there’s a STRONGLY recommended preparatory step for NVivo as well which adds a column for labelling the speaker in a synchronised transcript. It’s a bit hidden in the documentation though: under the Audio/Video section in Changing Application options.

I would recommend:

  1. Set a skip back on play in transcribe mode of 1 second (this means audio skips back when correcting so you go back to where you were) AND
  2. Add a Custom Transcript Field for speaker.

NVivo Windows

NVivo Release 1 for windows transcript import is documented at https://help-nv.qsrinternational.com/20/win/Content/files/audio-and-videos.htm

(Unchanged process but slight interface changes from v12 instructions available here )

Note that it is likely you’ll need to install a codec pack for any video files.

NVivo Mac

NVivo Release 1 for Mac audio and media importing is is documented here https://help-nv.qsrinternational.com/20/mac/Content/files/audio-and-videos.htm

(Unchanged process but slight interface changes compared with the NVivo 12 notes on audio and video files here)

It’s usually pretty straightforward – if the media will play in Quicktime it will play in NVivo.

Step Five – Import the cleaned subtitles as a synchronised transcript

NVivo Windows

NVivo Release 1 for windows transcript import is documented at https://help-nv.qsrinternational.com/20/win/Content/files/import-audio-video-transcripts.htm

(Unchanged process but slight interface changes from v12 instructions available here )

NVivo Mac

NVivo Release 1 for Mac transcript import is documented here https://help-nv.qsrinternational.com/20/mac/Content/files/import-audio-video-transcripts.htm

Step Six – Listen to the media and correct the transcript (and begin initial analysis steps)

So this is where it all pays off!

This process allows you to now use the powerful tools within NVivo to playback the audio / video (including slowing playback speed,adjusting volume and setting rewind intervals when you press play/pause + keyboard shortcuts for the play/pause functions) whilst you read the transcript and make corrections. But not only corrections! You can also annotate the transcript, label speakers and even start coding at this stage.

Resources

Example project file NVivo R1 (Windows) here

Example project NVivo R1 (Mac) here

Example media file and VTT file from the first video also available here.

The blog bit – background, next steps, context

So this has been a real focus for me recently. I’ve had a lot of help and encouragement – see acknowledgements below – but also NEED from students and groups who are wondering how to do transcription better.

I also think this really gives the lie to the idea that manual transcription is “the best way” to get in touch with audio. I’m kind of hoping that the sudden shifts the pandemic has caused in practice and process might lead to some developments and rethinking of analysis. This quote has been too true for too long:

Over the past 50 years the habitual nature of our research practice has obscured serious attention to the precise nature of the devices used by social scientists (Platt 2002, Lee 2004). For qualitative researchers the tape-recorder became the prime professional instrument intrinsically connected to capturing human voices on tape in the context of interviews. David Silverman argues that the reliance on these techniques has limited the sociological imagination: “Qualitative researchers’ almost Pavlovian tendency to identify research design with interviews has blinkered them to the possible gains of other kinds of data” (Silverman 2007: 42). The strength of this impulse is widely evident from the methodological design of undergraduate dissertations to multimillion pound research grant applications. The result is a kind of inertia, as Roger Stack argues: “It would appear that after the invention of the tape-recorder, much of sociology took a deep sigh, sank back into the chair and decided to think very little about the potential of technology for the practical work of doing sociology” (Slack 1998: 1.10).

Back L. (2010) Broken Devices and New Opportunities: Re-imagining the tools of Qualitative Research. ESRC National Centre for Research Methods

Citing:
Lee, R. M. (2004) ‘Recording Technologies and the Interview in Sociology, 1920-2000’, Sociology, 38(5): 869-899
Platt, J. (2002) ‘The History of the Interview,’ in J. F. Gubrium and J. A. Holstein (eds) Handbook of the Interview Research: Context and Method, Thousand Oaks, CA: Sage pp. 35-54.
Silverman D. (2007) A very short, fairly interesting and reasonably cheap book about qualitative research, Los Angeles, Calif.: SAGE.
Slack R. (1998) On the Potentialities and Problems of a www based naturalistic Sociology. Sociological Research Online 3. http://socresonline.org.uk/3/2/3.html

Various additional links and notes:

How and when Stream will be changing https://docs.microsoft.com/en-gb/stream/streamnew/new-stream

Bits about zoom needing transcripts switched on and how to do this (ie.e. send this link to your institutional zoom administrator see https://support.zoom.us/hc/en-us/articles/115004794983-Using-audio-transcription-for-cloud-recordings- )

A cool free online tool for converting other transcript formats (e.g. from EStream, Panopto or other systems) https://subtitletools.com/

And finally for more information on the VTT format see this excellent page.

Thanks and acknowledgements

This hasn’t happened alone. Many thanks to Tim Ellis especially for his work on the VTT cleaner and sharing it via GitHub.

If you’ve got suggestions, ideas, updates, developments or found this useful please post a comment, link to this or build on it.

Using MAXQDA to Correct and Code Automatically generated Transcripts from Teams or Zoom

Update – May 2021

MAXQDA now supports direct subtitle file (SRT) import

https://www.maxqda.com/help-mx20/import/subtitle-data-srt

You can easily convert VTT files to SRT using https://subtitletools.com/convert-to-srt-online

MAXDAYS presentation

Here’s my presentation and session from the (excellent) MAXDAYS conference in 2021:

Introduction:

Prerequisites

The following are important prerequisites. You will need:

  1. A media file that is either:
    1. A recording within Microsoft Teams saved to Stream
      OR
    2. A media file you can convert and upload to Microsoft Stream*
      OR
    3. An audio or video recording through an institutionally licensed Zoom account (with subtitling enabled)
      OR
    4. A recording from another system that outputs a subtitle file (that you will then convert to VTT)
  2. Installed version of MAXQDA (this is all illustrated and lined to 2020 pro edition – with a common experience and manual across windows and Mac).
  3. Installation of the free VLC media player

Process

Steps four and five are reversed below from main sequence with MaxQDA as it picks up timestamps in a file and then asks you to import media (you can do it the other way around but it’s a little more efficient this way).

  1. Create a media file with subtitle file in VTT format
  2. Download the media file and the subtitle file
  3. Clean the subtitle file ready for import.
  4. Import the subtitle file into MAXQDA
  5. Importing the associated media file
  6. Listen to the media file and read the synchronised transcript in order to begin analysis through
    • Correcting the transcript
    • Labeling speakers
    • Making notes (annotation)
    • Initial coding the transcript

Step One – Create a media file with subtitle file in VTT format

Depending where you start there are a few ways this will work – all have the same end point: a media file and a VTT transcript. It’s all detailed over in this post.

The introductory video was created with Teams, another was created in Zoom. You can also (currently) upload videos to Stream or use a wide range of other applications and system to create an automatic transcript of a media file.

Step Two – Download the media file and the subtitle file

Step Three – Clean the subtitle file ready for import using an online tool

This area is developing quite rapidly – the following steps will give you a MAXQDA ready file.

Option 1 – Clean the VTT file into CAQDAS ready format online

Go to https://www.lancaster.ac.uk/staff/ellist/vtttocaqdas.html

Upload your VTT file, Click convert, download the text file.

Option 2 – create your own copy of the converter

Go to the GitHub page at https://github.com/TimEllis/vttprocessor

Step Four – Import the cleaned subtitles as a synchronised transcript

Documented at https://www.maxqda.com/help-mx20/import/transcripts-with-timestamps

Step Five – Import the media file into MAXQDA

Documented at https://www.maxqda.com/help-mx20/import/inserting-audio-and-video-files-in-a-maxqda-project

Step Six – Listen to the media and correct the transcript (and begin initial analysis steps)

So this is where it all pays off!

This process allows you to now use the powerful tools within MAXQDA to playback the audio / video (including slowing playback speed,adjusting volume and setting rewind intervals when you press play/pause + keyboard shortcuts for the play/pause functions) whilst you read the transcript and make corrections. But not only corrections! You can also annotate the transcript and even start coding at this stage.

An AMAZING tool for this process in MAXQDA is the “summarise” function to use when correcting the transcript – this allows you to annotate and make notes but also use those to create candidate codes. It’s a really really nice tool and function for this process.

Resources

The example MAXQDA 2020 Project file from the above is available here.

The additional files (recordings and VTT files) can be downloaded from here if you want to have a play at cleaning, importing and correcting them.

The blog bit – background, next steps, context

So this has been a real focus for me recently. I’ve had a lot of help and encouragement – see acknowledgements below – but also NEED from students and groups who are wondering how to do transcription better.

I also think this really gives the lie to the idea that manual transcription is “the best way” to get in touch with audio. I’m kind of hoping that the sudden shifts the pandemic has caused in practice and process might lead to some developments and rethinking of analysis. This quote has been too true for too long:

Over the past 50 years the habitual nature of our research practice has obscured serious attention to the precise nature of the devices used by social scientists (Platt 2002, Lee 2004). For qualitative researchers the tape-recorder became the prime professional instrument intrinsically connected to capturing human voices on tape in the context of interviews. David Silverman argues that the reliance on these techniques has limited the sociological imagination: “Qualitative researchers’ almost Pavlovian tendency to identify research design with interviews has blinkered them to the possible gains of other kinds of data” (Silverman 2007: 42). The strength of this impulse is widely evident from the methodological design of undergraduate dissertations to multimillion pound research grant applications. The result is a kind of inertia, as Roger Stack argues: “It would appear that after the invention of the tape-recorder, much of sociology took a deep sigh, sank back into the chair and decided to think very little about the potential of technology for the practical work of doing sociology” (Slack 1998: 1.10).

Back L. (2010) Broken Devices and New Opportunities: Re-imagining the tools of Qualitative Research. ESRC National Centre for Research Methods

Citing:
Lee, R. M. (2004) ‘Recording Technologies and the Interview in Sociology, 1920-2000’, Sociology, 38(5): 869-899
Platt, J. (2002) ‘The History of the Interview,’ in J. F. Gubrium and J. A. Holstein (eds) Handbook of the Interview Research: Context and Method, Thousand Oaks, CA: Sage pp. 35-54.
Silverman D. (2007) A very short, fairly interesting and reasonably cheap book about qualitative research, Los Angeles, Calif.: SAGE.
Slack R. (1998) On the Potentialities and Problems of a www based naturalistic Sociology. Sociological Research Online 3. http://socresonline.org.uk/3/2/3.html

Various additional links and notes:

How and when Stream will be changing https://docs.microsoft.com/en-gb/stream/streamnew/new-stream

Bits about zoom needing transcripts switched on and how to do this (ie.e. send this link to your institutional zoom administrator see https://support.zoom.us/hc/en-us/articles/115004794983-Using-audio-transcription-for-cloud-recordings- )

A cool free online tool for converting other transcript formats (e.g. from EStream, Panopto or other systems) https://subtitletools.com/

And finally for more information on the VTT format see this excellent page.

Thanks and acknowledgements

This hasn’t happened alone. Many thanks to Tim Ellis especially for his work on the VTT cleaner and sharing it via GitHub.

If you’ve got suggestions, ideas, updates, developments or found this useful please post a comment, link to this or build on it.

Using ATLAS.ti to Correct and Code Automatically generated Transcripts from Teams or Zoom

This post takes you through the process of automatically creating a full written transcript for an audio or video file and importing it into ATLAS.ti to correct and code.

There is now excellent documentation of this in the the online manual for ATLAS.ti Mac and the online manual for Windows. So this is post is more focused on getting those transcripts out of other platforms and also about opportunities for analysis once imported in to ATLAS.ti

ATLAS.ti has led the way in making this really easy and cut out a step as it will clean on import.

UPDATES:

ATLAS.ti has great documentation of update changes here.
March 2021 – VTT and SRT import supported in Windows.

December 2020 – VTT and SRT import added in Mac

See this post for a cross-platform text step-by-step with lots of links to how to’s and documentation etc.

Prerequisites

The following are important prerequisites. You will need:

  1. A media file that is either:
    1. A recording within Microsoft Teams saved to Stream
      OR
    2. A media file you can convert and upload to Microsoft Stream*
      OR
    3. An audio or video recording through an institutionally licensed Zoom account (with subtitling enabled)
      OR
    4. A recording from another system that outputs a subtitle file (that you will then convert to VTT)
  2. Installed version of ATLAS.ti v9
  3. Installation of the free VLC media player

Process

  1. Create a media file with subtitle file in VTT format
  2. Download the media file and the subtitle file
  3. Clean the subtitle file ready for import.
  4. Import the media file into ATLAs.ti
  5. Importing the cleaned subtitles as a synchronised transcript in your CAQDAS package
  6. Listen to the media file and read the synchronised transcript in order to begin analysis through
    • Correcting the transcript
    • Labeling speakers
    • Making notes (annotation)
    • Initial coding the transcript

Step One – Create a media file with subtitle file in VTT format

Depending where you start there are a few ways this will work – all have the same end point: a media file and a VTT transcript. It’s all detailed over in this post.

The introductory video was created with Teams, another was created in Zoom. You can also (currently) upload videos to Stream or use a wide range of other applications and system to create an automatic transcript of a media file.

Step Two – Download the media file and the subtitle file

Here’s a copy of the interview video you can download and a VTT file if you want to try it:

Interview with Freidrich Markgraf (mp4 42Mb)

VTT file from Stream.

Step Three – Clean the subtitle file ready for import using an online tool

This step is no longer needed in ATLAS.ti as native support for VTT import is now enabled.

Option 1 – Clean the VTT file into CAQDAS ready format online

Go to https://www.lancaster.ac.uk/staff/ellist/vtttocaqdas.html

Upload your VTT file, Click convert, download the text file.

Option 2 – create your own copy of the converter

Go to the GitHub page at https://github.com/TimEllis/vttprocessor

Step Four – Import the media file into your CAQDAS package

This varies a little between packages. The previous difference is difference is no longer the case – you can now edit timestamps in both Mac and Windows, however as these are auto-generated you shouldn’t need to.

ATLAS.ti 9 Windows

It’s now well documented in the online manual for Windows.

There is information on page 11 of the manual and details here about windows supported media formats used by ATLAS.ti

Details of adding documents to a project is in online quick tour documentation here and in the manual on page 24. Details about working with transcripts is on page 10.

ATLAS.ti 9 Mac

Working with transcripts is in the online manual for ATLAS.ti Mac and

Adding documents to ATLAS.ti for Mac is in the online quick tour here

There is further information in the online manual for ATLAS.ti Mac about transcript formats on page 48, about adding media files on page 51. There is also extensive information about working with transcripts on pages 52-54.

Step Five – Import the cleaned subtitles as a synchronised transcript

ATLAS.ti 9 Windows

There is now excellent documentation on this process in the online manual much improved information on editing transcripts in the 90 page “quick (?! :-O ) tour” manual (pages 18-19).

ATLAS.ti 9 Mac

There is excellent information in the online manual for ATLAS.ti Mac about transcript formats on page 48, about adding media files on page 51. There is also extensive information about working with transcripts on pages 52-54 – again there is at present no information on editing the transcript and correcting it – so here’s a video:

Step Six – Listen to the media and correct the transcript (and begin initial analysis steps)

So this is where it all pays off!

This process allows you to now use the powerful tools within the CAQDAS package to playback the audio / video (including slowing playback speed,adjusting volume and setting reqwind intervals when you press play/pause + keyboard shortcuts for the play/pause functions) whilst you read the transcript and make corrections. But not only corrections! You can also annotate the transcript and even start coding at this stage.

ATLAS.ti 9 Windows

ATLAS.ti Mac

Resources

Here’s the ATLAS.ti file (89Mb) with one corrected plus focus group coded transcript and several uncorrected transcripts from the videos above if you want to have a look / play.

The blog bit – background, next steps, context

So this has been a real focus for me recently. I’ve had a lot of help and encouragement – see acknowledgements below – but also NEED from students and groups who are wondering how to do transcription better.

I’ve REALLY liked working with this in ATLAS.ti 9 – the way that you can integrate annotation and auto-coding via the focus group coding tool into the transcription process is key.

I also think it really gives the lie to the idea that manual transcription is “the best way” to get in touch with audio. I’m kind of hoping that the sudden shifts the pandemic has caused in practice and process might lead to some developments and rethinking of analysis. This quote has been too true for too long:

Over the past 50 years the habitual nature of our research practice has obscured serious attention to the precise nature of the devices used by social scientists (Platt 2002, Lee 2004). For qualitative researchers the tape-recorder became the prime professional instrument intrinsically connected to capturing human voices on tape in the context of interviews. David Silverman argues that the reliance on these techniques has limited the sociological imagination: “Qualitative researchers’ almost Pavlovian tendency to identify research design with interviews has blinkered them to the possible gains of other kinds of data” (Silverman 2007: 42). The strength of this impulse is widely evident from the methodological design of undergraduate dissertations to multimillion pound research grant applications. The result is a kind of inertia, as Roger Stack argues: “It would appear that after the invention of the tape-recorder, much of sociology took a deep sigh, sank back into the chair and decided to think very little about the potential of technology for the practical work of doing sociology” (Slack 1998: 1.10).

Back L. (2010) Broken Devices and New Opportunities: Re-imagining the tools of Qualitative Research. ESRC National Centre for Research Methods

Citing:
Lee, R. M. (2004) ‘Recording Technologies and the Interview in Sociology, 1920-2000’, Sociology, 38(5): 869-899
Platt, J. (2002) ‘The History of the Interview,’ in J. F. Gubrium and J. A. Holstein (eds) Handbook of the Interview Research: Context and Method, Thousand Oaks, CA: Sage pp. 35-54.
Silverman D. (2007) A very short, fairly interesting and reasonably cheap book about qualitative research, Los Angeles, Calif.: SAGE.
Slack R. (1998) On the Potentialities and Problems of a www based naturalistic Sociology. Sociological Research Online 3. http://socresonline.org.uk/3/2/3.html

Various additional links and notes:

How and when Stream will be changing https://docs.microsoft.com/en-gb/stream/streamnew/new-stream

Bits about zoom needing transcripts switched on and how to do this (ie.e. send this link to your institutional zoom administrator see https://support.zoom.us/hc/en-us/articles/115004794983-Using-audio-transcription-for-cloud-recordings- )

A cool free online tool for converting other transcript formats (e.g. from EStream, Panopto or other systems) https://subtitletools.com/

And finally for more information on the VTT format see this excellent page.

Thanks and acknowledgements

This hasn’t happened alone. Throughout this Friedrich Markgraf has been incredibly accommodating – and giving his time for the demo interview was just a part of that. Thanks definitely due for all his excellent, encouraging and very helpful input via Twitter – and for working on the great new features for direct import into ATLAS.ti.

Many thanks to Tim Ellis especially for his work on the VTT cleaner and sharing it via GitHub.

And to Amir Michalovich for his enthusiasm and sharing some excel tricks and of course Christina Silver for her draft reading, promoting and general enthusiasm, encouragement and suggestions. And also to Sandra Flynn and her great blog post about trials and tribulations of a PhD student working with NVivo which really helped me realise that time spent on this stuff can have an impact and a value.

If you’ve got suggestions, ideas, updates, developments or found this useful please post a comment, link to this or build on it.

Featured

Auto-Creating, Correcting and Coding Transcripts from Microsoft Teams or Zoom in CAQDAS Software (ATLAS.ti, NVivo or MAXQDA)

COVID-19 has had a HUGE impact on qualitative and mixed-methods research processes. A key change I’ve seen and heard about with the PhD candidates and research teams I support is a shift to interviewing via MS Teams or Zoom. And this has prompted more than one person to ponder: “surely if I can automatically create subtitles I must be able to use that for analysis – can’t I? Well yes – you now can 🙂

NOTES:

This page is text-heavy, there are then additional pages with sequences of video demos.

There will also be changes to the process and software – I’ll note these and work to keep the page up to date as there are exciting developments coming in this area.

Now – I really dislike those cookery blogs where this bit would continue for several pages about who those people were and what they said etc etc. when all you wanted was the recipe I’m now going to cut straight to the details- then come back to some of the context and next steps after that. 🙂

Video resources

Step-by-step for ATLAS.ti (with video demonstrations and example files)

Step-by-step for MAXQDA (with video demonstrations and example files)

Step-by-step for NVivo (with video demonstrations and example files)

Getting yourself free transcripts to correct and code in ATLAS.ti, NVivo or MAXQDA

This post takes you through the process of automatically creating a full written transcript for an audio or video file and importing it into CAQDAS software to correct and code.

The audio/video could start from Teams or Zoom – or you could have it from another audio or video recorder.

Prerequisites

The following are important prerequisites. You will need:

  1. A media file that is either:
    1. A recording within Microsoft Teams saved to Stream
      OR
    2. A media file you can convert and upload to Microsoft Stream*
      OR
    3. An audio or video recording through an institutionally licensed Zoom account (with subtitling enabled)
      OR
    4. A recording from another system that outputs a subtitle file (that you will then convert to VTT)
  2. Installed version of ATLAS.ti v9 or NVivo or MAXQDA
  3. Installation of the free VLC media player

Process

  1. Create a media file with subtitle file in VTT format
  2. Download the media file and the subtitle file
  3. Clean the subtitle file ready for import.
  4. Import the media file into your CAQDAS package (ATLAS.ti, NVivo, MAXQDA)
  5. Importing the cleaned subtitles as a synchronised transcript in your CAQDAS package
  6. Listen to the media file and read the synchronised transcript in order to begin analysis through
    • Correcting the transcript
    • Labeling speakers
    • Making notes (annotation)
    • Initial coding the transcript

Each step is documented below with descriptions and specific videos illustratrive videos.

I’m hearing exciting rumours that ATLAS.ti will very soon support other formats for subtitle files so steps 3 and 4 will be integrated.

Step One – Create a media file with subtitle file in VTT format

Depending where you start there are a few ways this will work – all have the same end point: a media file and a VTT transcript. There are other routes but these are the main ones.

1a A recording within Microsoft Teams saved to Stream and auto captioned.

Currently if you’re using MS Teams through an insitutional installation then when you record a meeting it is added to Stream.

This post from Microsoft takes you through the process of call recording in Teams – and also notes the changes coming in 2021 to Stream.

You will then need to access your institution’s Microsoft stream server and login and locate your video. There’s support about that from Microsoft here.

This post from Microsoft then takes you through the process of autocaptioning your recording(s)

Note: This is changing in 2021 with educational institutions delayed till July. It’s not entirely clear what will happen and sounds like there are some live discussions with Microsoft over required features. The current expectation is that when it moves over to teams recordings being added to OneDrive there will be a VTT file created and uploaded as well – a process that sounds similar to the one with Zoom calls outlined below but managed via your institutional OneDrive.

1b Upload a file to Microsoft Stream for auto-captioning.

Another option (at the moment at least – though probably only till July 2021 for HE institutions) is to upload a recording from another source to Stream for auto-captioning. To do this you need to upload a video file.

The good news is it’s easy to convert an audio file (or a video) to a stream-compatible video using the free VLC media player (many institutions will make this available on the network or via AppsAnywhere.)

So you’d find your audio or video file and follow guidance here to convert it to a video.

Then you’d upload the video to Stream – detailed here.

(Note: if you need to convert or downsample any videos in step 4 you’ll need to follow the same process)

1c A media file and VTT file from Zoom

Zoom can create captions/transcripts as VTT files – see further details here.

NOTE: you will need to have a Business, Education, or Enterprise license with cloud recording enabled and account owner or admin privileges or to request those from the account admin.

Start your meeting and record to the cloud in the usual way using Zoom (e.g. start the meeting, discuss ethics etc. then start recording when you say you are, record the consent semgnet and any questiosn before starting, edn that recording and start a second on for the content etc.)

When you;ve finsihed the session and the reocrding is processed you’ll receive an email with a link so you can download the video or audio and (in due course) the transcript.

The transcription can take a little while initially you’ll see this – then it will show the transcript to download (so an excuse for one of those slider image compare things 🙂 ):

Once the transcript is completed you can download that file as a VTT. You;re then set for step 2.

1d A recording from another system

There are many other systems that create subtitle files from recordings – for example eStream or Panopto are widely used in higher education and research institutions. There are also a few hacks to download subtitles from YouTube.

If your system creates a different format of subtitle (e.g. SRT) then you ca use an online converter such as Subtitle Tools convert to VTT . Some CAQDAS sofrtware looks set to support direct SRT import soon – watch this space!

What you need is a media file and a VTT file with auto-generated captions that have the corre4ct timestamps.

Step Two – Download the media file and the subtitle file

This bit is subject to change so for now here are links to other resources plus video demonstrations:

1a and 1b – Downloading media and transcript from Stream

First you need to update the video details to set the language to English so a transcript is generated.

See step by step from Microsoft here which details how to update video details and language to generate a subtitle file.

Second you need to download the video and then transcript – see screenshots here.

Both of these are from the … menu:

First download the video, second click to Update video details. On the update screen that then displays you’ll see 3 panes i.e. DetailsPermissions and Options. From the Options pane on the right, you can download the captions file, as shown below:

1c From Zoom

This was covered above, you also get an email when the transcript it done from Zoom. Then download the video/audio and then the transcript. Make sure you take some care with file names and which transcript file is for which video/audio.

Step Three – Clean the subtitle file ready for import using an online tool

Increasing range of options here: either the software will do it (ATLAS.ti now imports VTT or SRT direct on mac and PC, MAXQDA are reportedly looking into this). Or use the online tool my colleague at Lancaster Tim Ellis developed.

Background: Tim created a simple VTT cleanup tool to help support moving transcripts from MS Stream to eStream for teaching and accessibility purchases. He then did some great additional development based on my looking at the requirement across CAQDAS packages for transcript sequencing. The updated page is a VTT cleaner that leaves in the initial timestamp in a form and then the text of the transcript in a text file that can be imported into ATLAS.ti, NVivo or MaxQDA. And he’s put it online for anyone to use, and the code on GitHub if you need to run it locally.

So you can go for option 1 – use his tool online (no data is saved – it is just a converter). Or, if you must do this on your own computer or network for ethics compliance reasons, you can download the code and styles from github, put them on your computer and clean your own transcripts (option 2). And if you’ve got ideas on how to improve it (e.g. removing notes?) then you can do that via GitHub.

Option 1 – Clean the VTT file into CAQDAS ready format online

Go to https://www.lancaster.ac.uk/staff/ellist/vtttocaqdas.html

Upload your VTT file, Click convert, download the text file.

Option 2 – create your own copy of the converter (e.g. if required by REC)

Go to the GitHub page at https://github.com/TimEllis/vttprocessor

Grab the html file and the css file.

Save them to your computer (or a network location) in the same folder.

Double click the vtttocadas.html file to open in a browser.

Use it to convert the files as above.

NOTES:

Yes notes indeed. Note that any NOTES / comments created in the VTT file won’t be cleaned up with thsi script. so you might want to do a quick search for NOTES and remove any lines. These can include notes about confidence of transcription or

Step Four – Import the media file into your CAQDAS package

This varies a little between packages.

ATLAS.ti 9 Windows

There is information on page 11 of the manual and details here about windows supported media formats used by ATLAS.ti

Details of adding documents to a project is in online quick tour documentation here and in the manual on page 24. Details about working with transcripts is on page 10.

ATLAS.ti 9 Mac

Adding documents to ATLAS.ti for Mac is in the online quick tour here

There is further information in the online manual for ATLAS.ti Mac about transcript formats on page 48, about adding media files on page 51. There is also extensive information about working with transcripts on pages 52-54.

NVivo Windows

NVivo Release 1 for windows transcript import is documented at https://help-nv.qsrinternational.com/20/win/Content/files/audio-and-videos.htm

(Unchanged process but slight interface changes from v12 instructions available here )

Note that it is likely you’ll need to install a codec pack for any video files.

NVivo Mac

NVivo Release 1 for Mac audio and media importing is is documented here https://help-nv.qsrinternational.com/20/mac/Content/files/audio-and-videos.htm

(Unchanged process but slight interface changes compared with the NVivo 12 notes on audio and video files here)

It’s usually pretty straightforward – if the media will play in Quicktime it will play in NVivo.

MAXQDA (Win and Mac)

Documented at https://www.maxqda.com/help-mx20/import/inserting-audio-and-video-files-in-a-maxqda-project

Step Five – Import the cleaned subtitles as a synchronised transcript

ATLAS.ti 9 Windows

There is relatively sparse information in the manual on page 10 working with transcripts is on page 10 and currently nothing about editing/updating a transcript to correct it within ATLAS.ti which is a key new opportunity in version 9. So here’s a video instead (and I’ll share the VTT file too so you can practice!)

ATLAS.ti 9 Mac

There is further information in the online manual for ATLAS.ti Mac about transcript formats on page 48, about adding media files on page 51. There is also extensive information about working with transcripts on pages 52-54 – again there is at present no information on editing the transcript and correcting it – so here’s a video:

NVivo Windows

NVivo Release 1 for windows transcript import is documented at https://help-nv.qsrinternational.com/20/win/Content/files/import-audio-video-transcripts.htm

(Unchanged process but slight interface changes from v12 instructions available here )

NVivo Mac

NVivo Release 1 for Mac transcript import is documented here https://help-nv.qsrinternational.com/20/mac/Content/files/import-audio-video-transcripts.htm

MAXQDA (Win and Mac)

Documented at https://www.maxqda.com/help-mx20/import/transcripts-with-timestamps

Step Six – Listen to the media and correct the transcript (and begin initial analysis steps)

So this is where it all pays off!

This process allows you to now use the powerful tools within the CAQDAS package to playback the audio / video (including slowing playback speed,adjusting volume and setting reqwind intervals when you press play/pause + keyboard shortcuts for the play/pause functions) whilst you read the transcript and make corrections. But not only corrections! You can also annotate the transcript and even start coding at this stage.

The blog bit – background, next steps, context

Various additional links and notes:

How and when Stream will be changing https://docs.microsoft.com/en-gb/stream/streamnew/new-stream

Bits about zoom needing transcripts switched on and how to do this (ie.e. send this link to your institutional zoom administrator see https://support.zoom.us/hc/en-us/articles/115004794983-Using-audio-transcription-for-cloud-recordings- )

A cool free online tool for converting other transcript formats (e.g. from EStream, Panopto or other systems) https://subtitletools.com/

And finally for more information on the VTT format see this excellent page.

Thanks and acknowledgements

This hasn’t happened alone. SO huge thanks to Tim Ellis especially for his work on the VTT cleaner and sharing it via GitHub.

Also to Friedrich Markgraf for some excellent, encouraging and very helpful conversations via Twitter.

And to Amir Michalovich for his enthusiasm and sharing some excel tricks and of course Christina Silver for her draft reading, promoting and general enthusiasm, encouragement and suggestions. And also to Sandra Flynn

Working with Arabic in NVivo (as well as Hebrew, Urdu, Persian and other Right-to-Left Scripts)

This blog is in four key parts:

  1. The background of this investigation including links to the diagnosis, data and existing information on the limitations of NVivo with Right-to-Left scripts.
  2. A detailed explanation and illustration of how Arabic and other right to left scripts are rendered in NVivo.
  3. Proposed workarounds and alternative software products including their benefits and potential limitations.
  4. Next steps and updates

Background

I recently has the amazing opportunity to work with the Palestinian Central Bureau of Statistics in Ramallah to provide technical consultancy and capacity building in qualitative research methods. This was through working with CLODE Consultants,  a UAE-based business specialising in statistics, and the use and management of data. CLODE consultants operates in both Arabic and English, providing worldwide training, research, and consultancy services. I am working as a consultant with CLODE Consulting to provide expertise on qualitative and mixed-methods in order to meet the growing needs of customers for those approaches in this data driven age.

The PCBS approached us to provide technical consultancy in using NVivo as the market-leading product. They had engaged with the built-in projects and excellent YouTube videos and identified it as having the features required for their needs to increase an engagement with qualitative and mixed-methods approaches to inform and enhance statistical analyses.

However, through working to develop materials and workshops I rapidly encountered hard limits with working with NVivo and Arabic text, combined with a relative lack of clear documentation or explanation of the limits or workarounds.

NVivo say that:

NVivo may not operate as expected when attempting to use right to left languages such as Arabic. We recommend you download and work with your data in our NVivo free trial Software first.

Searching online on forums identified some cursory information interspersed with promotional puff on ResearchGate, a proposed workaround to use images or region coding on PDFs on the NVivo forums, pleas for improvements in this area dating back to 2010 on the NVivo feature request forum and the most comprehensive response in the QDA Training forum by Ben Meehan

So I was left to do some experimentation myself and then to work with staff at PCBS who could read arabic to explore and consider what the limits are and how they affect research.

Example data:

Whilst I would normally steer WELL away form such a politically sensitive topic or text in this case as example data I am drawing on the interview in June 2018 between Jared Kushner and Walid Abu-Zalaf, Editor of the Al Quds newspaper. I STRONGLY emphasise this is NOT because of the subject matter nor in any way agreement with, support of or condonation of the the content (in fact I find the person pictured and politics he represents really repulsive) however it was selected purely for practical purposes: it is freely available and includes a full English translation. The text – both Arabic and English  is available from http://www.alquds.com/articles/1529795861841079700/

The text was copied and pasted into a word document and formatted as “traditional arabic font) with minimal clearing up of opening links etc.

Arabic text Word file available here.

Additionally the page was printed as a PDF – available here and converted to a PDF via https://webpagetopdf.com/ as well – resulting PDF available here.

Finally it was captured as both article as PDF and page as PDF via NCapture creating 2 .nvcx files (linked).

Computer System Setup:

I added Arabic (Jordan) as a language pack following information from Microsoft about adding languages. (Previously without the language pack installed the computer rendered Arabic script in western fonts (e.g. times new roman) which slightly reduces legibility and affects rendering.)

Working with NVivo and Arabic Script

NVivo works strictly left-to-right. This has serious implications when importing Arabic, Hebrew, Urdu, Persian or other Right-to-Left scripts as data.

If we look at the word document in word – the text copied from web and pasted into word file it appears like this:

NVivoArabic-wordOriginalScreenCap
Arabic text copied and pasted into Word file (available here). when text is selected it selects right-to-left.  Font set to traditional arabic.

When imported into NVivo substantial changes are made through the import process:

NVivoArabic-NVivo Conversion.png
The word document imported into NVivo and converted – the text now flows left-to-right, is relatively illegible as well. Selection now works left-to-right.

A number of serious issues follow. Firstly the text is now VERY hard to read. Secondly while you can edit the document t make the text right aligned so it appears better, the reading and selecting direction remain unaffected.

Thirdly, and most seriously – you cannot select therefore cannot search for, code or annotate, the start of paragraphs:

word-truncatedTextSelection
NVivo Text selection limitations for a word doc in Arabic.

The workaround would then seem to be PDFs – while accepting limitations with those in NVivo, e.g. you cannot auto-code for speaker or using document structure.

However the selection issues remain especially when importing web pages as PDFs via NCapture produces similarly odd results, apparently OK until you try to select content:

NCapture Page Cap

As you can see selecting (and therefore coding text) is all over the place.

Article as PDF fares best, however selection still runs left-to-right:

NVivo-article as PDF
NCapture Article as PDF produces best version but still has incorrect text flow.

The print as PDF and convert to PDF versions also had substantial issues with text selection – showing it isn’t just NVivo and NCapture that struggle here.

Effects on queries

There are then a series of oddities that result. Copying and pasting the text بأنهم ي and running a text search does work but gives odd results when there should be four identical copies of the same text:

text search results-summary
Text search results – note the different number of references per file of the “same” content!

Furthermore when you look into the results they seem not to be the actual text searched for:

Retrieved text search - detail
Text Search Results – not matching the input string?

At this point I must point out that I do not speak nor read Arabic so what remains is what I have been told about query results.

Word frequencies appear to work. As this was bi-lingual I had to spend a VERY frustrating period of time trying to select just the Arabic text in the PDFs without selecting English as well and then coding it with a node for “Script-arabic” to scope the word frequency query to that node. Here are the results – pretty, but I also think pretty useless:

wordCloud
Pretty – but pretty useless word cloud output? 

You can then double-click a word in the cloud and view a text search – however the results are as problematic in legibility as those identified above.

ًIf you do select and code Arabic text then when you run a coding query and look at the results the staff I worked with at PCBS told me that the results were illegible “like looking at text in a mirror”:

node query results
Node query results – legible?

What to do?

The limits are pretty serious as I’ve set out. It is more than just fiddly selection but runs through to text being at all legible / readable or usable.

Recommendations for approaches in NVivo and alternative packages:

If you MUST use NVivo:

Then use PDFs and use region selection i.e. treat arabic text as an image and accept the limitations.

If you can choose another package

All (yes ALL!) the other leading CAQDAS packages support Arabic and other right-to-left scripts. So it then comes down to making an informed choice of package.

The Surrey CAQDAS project provides a good overview of packages and choices at https://www.surrey.ac.uk/computer-assisted-qualitative-data-analysis/resources/choosing-appropriate-caqdas-package

For resources the excellent books by Christina Silver and Nick Woolfe cover the three leading packages: NVivo, ATLAS.ti and MaxQDA.

Getting clear information of which packages are leading and their relative use is very difficult – however this paper provides some circumstantial evidence for their use in academic research:

Woods, M., Paulus, T., Atkins, D. P., & Macklin, R. (2016). Advancing Qualitative Research Using Qualitative Data Analysis Software (QDAS)? Reviewing Potential Versus Practice in Published Studies using ATLAS.ti and NVivo, 1994–2013. Social Science Computer Review34(5), 597–617. https://doi.org/10.1177/0894439315596311

It reviews at patterns of publication citing the use of ATLAS.ti or NVivo (which were selected ” because they are two of the longest used QDAS tools (Muhr, 1991; Richards &
Richards, 1991). They are also the programs that we ourselves our familiar with; without this familiarity of our analysis would not have been possible (p599) and includes the following graph:

publicationPatterns
Subject disciplines publishing ATLAS.ti and NVivo studies. 

Another key consideration should NOW be if software adopted locks you in or enables project sharing and exporting via the recently published REFI standard – see Christina Silver’s excellent blog post in why this matters and why it should inform decisions of packages, especially for R-to-L scripts.

Suggested alternatives:

COMPREHENSIVE FULL-FEATURED CAQDAS PACKAGE SIMILAR IN SCOPE AND APPROACH TO NVIVO BUT WORKING WITH RIGHT-TO-LEFT TEXT:

My top recommendation: ATLAS.ti 

Why? It supports REFI format for project exchange so you are not locked in.

Quotation approach for identifying data segments then attaching codes, linking to toher data segments and linking memos provides unrivalled support for multi-lingual work for example coding one script and then linking to translated sections in another (uncoded) script, or attaching a translation to a data segment via quotation comment.

Alternative Recommendation: MaxQDA

Another full-featured package with extensive support for mixed-methods and an excellent interface. The lack of support for REFI standard risks being locked in and unable to exchange or archive in a standard format – hence recommending ATLAS.ti instead.

MIXED METHODS FOCUS, COLLABORATIVE, CLOUD REQUIRED/DESIRED

Consider DeDoose for a mixed-methods focussed, collaborative package. However, in some settings an online collaborative cloud-based tool may not be appropriate so serious consideration needs to be given to the implications of that approach.

LARGE SCALE ANALYSIS AND TEXT MINING (i.e. functions promoted as part of NVivo Plus)

Consider QDA Miner with or without WordStat for support of all text together with advanced text mining capabilities.

Alternatively DiscoverText plays nicely in this space with some very clever features. (However it doesn’t support REFI)

SIMPLER FEATURES SOUGHT, PARTICIPATORY ANALYSIS METHODS, SOMETHING DIFFERENT

If you want to work with something visual, simple and just for text then Quirkos is fantastic and support R-to-L scripts.

And finally…

Comments welcome and updates will follow here if/when NVivo changes or other packages adopt REFI standard for example.

Rethinking the guiding ethos of 5LQDA: from managing contradiction to harnessing creative tensions

I attended the excellent 5LQDA workshop for NVivo last week. I really can’t recommend these highly enough, as well as the books. I am actively working to integrate and develop my teaching and materials to work with, incorporate and work within the broad structure of 5LQDA and I don’t think I can personally give it a much stronger seal of approval than that!

However, this isn’t a unilateral adoption not unthinking acceptance. I want to work to adapt my materials and help to use them to scaffold and structure gaining awareness of the components of NVivo and ATLAS.ti.

The core of 5LQDA: Managing Contradiction

There is one guiding rationale of 5LQDA where my views diverge, quite strongly, from the printed word – and as it is so fundamental to the model I want to document and explore my perspective and how it differs from Christina and Nick’s book.

They state that:

The central issue is the contradiction between the nature of qualitative analysis and the nature of software used to conduct the analysis. The way these contradictions are reconciled determines the approach to harnessing the software. (P13)

And furthermore that:

there is a contradiction between the emergent nature of qualitative analysis and the step- by- step nature of computer software. The Five- Level QDA method is a way of managing this contradiction. (P157 and back cover and other blurb)

This is THE core argument of 5LQDA as a method. However there’s something that doesn’t sit quite right for me about “managing the contradiction”. The tenor of that statement and the language it evokes – of management and compromise – also seems to permeate some of the ways that potential is treated e.g.

“the potential misuse of rudimentary automated features that may be introduced in the future are concerning”. (P18)

So how to acknowledge this fundamental rationale, its reason and importance but to find a way manage the contradiction between that and my somewhat different view? Could it, itself, be translated it into something a little more positive and evocative not of managerialism and compromise but potential and opportunity?

A potential translation: from manager-subordinate to creative partnership?

I therefore hope that one way to productively resolve this is and incorporate the 5LQDA approached into my practice and teaching is through a slight tweak that I hope stays true to the intention of the original but also draws on my interests and desire for supporting step-changes in how software works in and with qualitative research.

Harnessing the creative tension between the emergent nature of qualitative analysis and the potential new and developing components in software that work in a pre-programmed way”

To me the idea of a “creative tension” is a really positive way of viewing the way that this contradiction could be played out and one that also gives a little more agency and acknowledgement to the potential of software tools to undertake new and different ways of approaching qualitative analysis in terms of scale, approach and intentions.

Thus it is neither to let new tools drive analysis but also not to place software as entirely and absolutely subservient to analytic tasks conceived without acknowledging its potential. For if those ideas and tasks and approaches are always and already prior to selecting a component – then how would those tasks develop and change in order to take advantage of the new opportunities software affords (see my previous post on technology, tactics and strategy and the tanks in ww1 for more on this)?

I’m not alone in this concern to me there is a running theme through 5LQDA that reminds me of this quote:

Over the past 50 years, the habitual nature of our research practice has obscured serious attention to the precise nature of the devices used by sociascientists (Platt, 2002; Lee, 2004). For qualitative researchers, the tape recorder became the prime professional instrument intrinsically connected to capturing human voices on tape in the context of interviews. David Silverman argues that the reliance on these techniques has limited the sociological imagination: ‘Qualitative researchers’ almost Pavlovian tendency to identify research design with interviews has blinkered them to the possible gains of other kinds of data’ (Silverman, 2007: 42).

The strength of this impulse is widely evident from thmethodological design of undergraduate dissertations to multimillion pound research grant applications. The result is a kind of inertia, as Roger Slack argues:

It would appear that after the invention of the tape recorder, much of sociology took a deep sigh, sank back into the chair and decided to think very little about the potential of technology for the practical work of doing sociology. (Slack, 1998: 1.1 0)

(Back,  2010)

And it is thinking about the potential that I think is important – rather than incredibly powerful software being subservient to the habitual nature of our research practices. “Managing the contradiction” seems to prolong that, to promote analytic strategies derived prior to and without serious attention to the potential of tools for their transformation and translation into new and different ways of working. Which segues into this great quote about how that has played out to date:

Qualitative analysts have mostly reacted to their new-found wealth of data by ignoring it. They have used their new computerized analysis possibilities to do more detailed analysis of the same (small) amount of data. Qualitative analysis has not really come to terms with the fact that enormous amounts of qualitative data are now available in electronic form. Analysis techniques have not been developed that would allow researchers to take advantage of this fact.

(Blank, 2008, p258 [3])

An example – new tools enabling the exploration of new approaches

The NSS analysis I worked on is a case in point – I was interested in seeing if and how tools could help with analysing large(r) quantities of qualitative data and how. To find out what sort of questions and analytic needs could be accomplished by the software tools. The project was therefore an exploratory one – to look at what these could do and how they could be used. But that seems to run entirely counter to the 5LQDA rationale where I should have defined the analytic task in advance and then selected the tools rather than selecting the tools and then seeing what questions they could help with. Of course at the strategic level that was the intention of the project – but the point is that with the increase in tools in QDA software to open up new and interesting ways of doing things, how is that potential going to be filtered up into developing strategies to fit new tools and their appropriate tactics? How do we follow tanks with tanks, not horses. 

Another example: CAQDAS and the ethnographic imagination

One of the key ideas in ethnography is to “make the familiar strange” (see for example Myers, 2011 here ). This runs counter to the idea of “immersion in data” and creates a dynamic, creative tension with it as a useful and essential step to reconsider conclusions or ways of thinking that are merely confirmation bias of an initial reading.

Tools such as those in NVivo to explore content and view word frequencies for example are an excellent way of “making the familiar strange” and highlighting patterns in word use that you may not have spotted – prompting new and potentially productive ways of looking at the data. Hunches about language differences can be explored further with tools such as cluster analysis. However “I want to make my data strange to help me identify things I may not spot otherwise” seems too tool-led for 5LQDA with the concepts unlikely to be rendered as strategies for immersion precisely because it runs counter to analytic intent of immersion and is produced by tools (there are loads of ways to make data strange so how you would translate that into a component? But a specific component affords this potential and from it a series of creative, perhaps unknown opportunities.)

A quick example but one that hopefully helps to illustrate why I prefer thinking of creative tensions – the seriousness of Lennon jarring and also working with the playfulness of McCartney created a myriad of tunes that individually wouldn’t have been realised – rather than the managing of contradiction. To me creative tension captures the same tensions and issues and contradictions and disputes and challenges but re-cast them in a more bi-directional and creative way, rather than the manager-subordinate of 5LQDA’s phrasing.

References:

Back, Les (2010) Broken Devices and New Opportunities: Re-imagining the tools of Qualitative Research http://eprints.ncrm.ac.uk/1579/1/0810_broken_devices_Back.pdf

Blank, G. (2008). Online Research Methods and Social Theory. In N. Fielding, R. M. Lee, & G. Blank (Eds.), The SAGE handbook of online research methods [electronic resource]: Los Angeles, Calif. ; London : SAGE.

Responses to 5LQDA pt2 – Much Ado About Affordances

Ahhh affordances – something of a bête noire for me!

This term has resurfaced again for me twice in the last two days – in reading the 5LQDA textbook on NVivo and in a discussion session/seminar I was at today with Chris Jones about devices, teaching and learning analytics who argued.

Chris argued FOR affordances on two fronts:

  1. they bring a focus on BOTH the materiality AND the interaction between the perceiver and the perceived and de-centre agency so that it exists in the interaction rather than as entirely in/of an object or in/of a person’s perception of it.
  2. despite quite a lot of well argued criticism, no-one has really proposed an equivalent or better term.

I would entirely agree with both of those statements, backing down from my usual strong view of affordances as being necessarily problematic when invoked.

(I was once told that the way to “make it” in academia was to pick an adversarial position and argue from that all the time never giving compromise and affordance critique seems a good one for that – maybe that’s why I don’t/won’t succeed in acadmeia I’m to willing to change position!)

BUT BUT BUT

Then someone does something like this:

“Think of the affordances of the program as frozen – they come fully formed, designed as the software developer thought best. In contrast think of TOOLS as emergent – we create them and they only exist in the context of use.”
(Woolf and Silver, 2017, p50)

And I end up back in my sniping position of “affordances have little merit as they mean all things to all people and even their supposedly best qualities can be cast out on a whim”. Here we see affordances stripped of ALL those interactice properties. They are now “fully formed, designed” not emergent or interactive. All of that is now being places onto the idea of a “tool” as being something that only has agency in use and in action and through interaction.

So if affordances are now tools – what then of affordances? And why is TOOL a better term?

A little background and further reading on affordances

Affordances are both an easy shorthand and a contested term (see Oliver, 2005) but one that usually retains both a common-sense understanding of “what’s easy to do” combine with a more interactionist idea of “what actions are invited”. (The latter appealing to my ANT-oriented interests in, or sensibility towards considering “non-human agency”.) I’ve read quite a lot on affordances and written on this before  in Wright and Parchoma (2011) whilst my former colleague Gale Parchoma has really extended that consideration too in her 2014 paper [4], (and also in this recorded presentation). With both of us drawing on Martin Oliver’s (2005) foundational critique [5]. I also really like Tim Ingold’s (20o0)  excellent extended explorations and extensions of Gibson’s work.

Should we keep and use a term that lacks the sort of theoretical purity or precision that may be desired because it’s very fuzziness partly evokes and exemplifies its concept? Probably.

But if it is so woolly then could “the affordances of CAQDAS” be explored systematically, empirically and meaningfully?

Could we actually investigate affordances meaningfully?

Thompson and Adams (2013, 2014) propose phenomenological enquiry as providing a basis. Within this there are opportunities to record user experience at particular junctures – moments of disruption and change being obvious ones. So for me currently encountering ATLAS.ti 8 presents an opportunity to look at the interaction of the software with my expectations and ideas and desires to achieve certain outcomes. Adapting my practices to a new environment creates an encounter between the familiar and the strange – between the known and the unknown.

However, is there a way to bring alternative ideas and approaches – perhaps even those which are normally regarded as oppositional or incommensurable with such a reflexive self-as-object-and-subject mode of enquiry? Could “affordances” be (dare I say it?) quantified? Or at least some methods and measures be proposed to support assertions.

For example, if an action is ever-present in the interface or only takes one click to achieve could that be regarded as a measure of ease – an indicator of “affordance”? Or does that stray into this fixed idea of affordances as being frozen and designed in? Or does the language used affect the “affordance” so their is a greater level of complexity still. Could that be explored through disruption – can software presented with a different interface language still “afford” things? Language is rarely part of the terminology of affordance with its roots in the psychology of perception, yet language and specific terminology seems to be the overlooked element of “software affordances”.

Could counting the steps required add to an investigation of the tacit knowledge and/or prior experience and/or comparable and parallel experience that is drawn on? Or would it merely fudge it and dilute it all?

My sense is that counts such as this, supplemented by screen shots could provide a useful measure but one that would have to be embedded in a more multi-modal approach rather than narrow quantification. This could however provide a dual function – both mapping and uncover the easiest path or the fewest steps to achieving a programmed action which will not only provide a sense or indication of simplicity/affordance vs complexity/un-afforded* (Hmmm – what is the opposite of an affordance? If there isn’t one doesn’t that challenge it’s over-use?) but also help inform teaching and action based on that research – in aprticular to show and teach and support ways to harness and also avoid or rethink these easy routes written into software that act to configure the user.

A five minute exploration – coding

Cursory checks – how much to software invite the user to “code” without doing any of the work associated with “coding”

Coding is usually the job identified with qualitative data analysis and the fucntion software is positioned to primarily support. However coding in qualitative analysis terms is NOT the same as “tagging” in software. Is “tagging” or “marking up” conflated with coding and made easy? Are bad habits “afforded” by interface?

Looking at ATLAS.ti 8 – select text and right-click:

VERY easy to create one or more codes – just right-click and code is created, no option there and then to add a code comment/definition.

Could we say then that an “affordance” of ATLAS.ti 8 is therefore creating codes and not defining them?

Looking at NVivo 11

Slightly different in that adding a new node does bring up the dialogue with an area for description – however pressing enter saves it,

Form data right-click and code > new node there is no place for defining, further supporting a code-and-code approach. This does allow adding into the hierarchy by first selecting the parent node so relational meaning is easily created – affordance = hierarchy?

AFFORDANCE = very short or one-sentence code definitions?

No way of easily identifying or differentiating commented and un-commented nodes.

Can only attach one memo to a node. The place for a longer consideration but separated.

Where next?

This is the most basic of explorations but it involves a range of approaches and also suggests interventions and teaching methods.

I really see where the 5LQDA approach seeks to work with this and get you to think and plan NOT get sucked into bad and problematic use of software – however I’m unsure of their differentiation of affordances as fixed and tools as having the properties usually ascribed to affordances…. So I definitely need to think about it more – and get other views too (so please feel free to comment) but a blog is a good place to record and share ideas-in-development, could that be “the affordance” of WordPress? 😉

 

References

Adams, C., & Thompson, T. L. (2014). Interviewing the Digital Materialities of Posthuman Inquiry: Decoding the encoding of research practices. Paper presented at the 9th International Conference on Networked Learning, Edinburgh. http://www.lancaster.ac.uk/fss/organisations/netlc/past/nlc2014/abstracts/adams.htm

Ingold, T. (2000). The perception of the environment essays on livelihood, dwelling & skill. London ; New York: Routledge.

Oliver, M. (2005). The Problem with Affordance. E-Learning, 2, 402-413. doi:10.2304/elea.2005.2.4.402 http://journals.sagepub.com/doi/pdf/10.2304/elea.2005.2.4.402

Parchoma, G. (2014) The contested ontology of affordances: Implications for researching technological affordances for fostering networked collaborative learning and knowledge creation. Computers in Human Behavior, 37, 360-368. 10.1016/j.chb.2012.05.028

Thompson, T. L., & Adams, C. (2013). Speaking with things: encoded researchers, social data, and other posthuman concoctions. Distinktion: Scandinavian Journal of Social Theory, 14(3), 342-361. doi:10.1080/1600910x.2013.838182 http://www.tandfonline.com/doi/full/10.1080/1600910X.2013.838182

Woolf, N. H., & Silver, C. (2017). Qualitative analysis using NVivo : the five-level QDA method. Abingdon: Taylor and Francis.

Wright, S., & Parchoma, G. (2011). Technologies for learning? An actor-network theory critique of ‘affordances’ in research on mobile learning. Research in Learning Technology, 19(3), 247-258. doi:10.1080/21567069.2011.624168 https://doi.org/10.3402/rlt.v19i3.17113