Downloading YouTube videos, captions and comments in ATLAS.ti

The scraper also works for Facebook and other social media comment scraping !

Background

From working with a PhD student it became clear that NCapture in NVivo no longer works – and hasn’t for about a year:

So this knocked out the use of NCapture to streamline importing a YouTube video and comments (and potentially in parallel the captions) into NVivo for analysis. However, this also creates an opportunity.

I generally prefer working with video in ATLAS.ti due to its different interface and less restrictive or structured options to transcripts and coding video vs transcripts separately rather than together.


Furthermore, for the project that prompted this investigation, there are methodological considerations that soemwhat favoured ATLAS.ti over NVivo as there was a priority for making conceptual connections between data (i.e. links) rather than allocating data segments to conceptual categories (i.e. via coding).

And that is where ATLAS.ti excels in comparison to other CAQDAS programmes.

So – how to replicate the functions of NCapture without it? Time to put the call out!

What can and can’t you do with YouTube videos?

NCapture allows one-click browser-based capture of a YoutTube videos – and should enable comment download too. The video is then streamed into NVivo and can be treated as if it is in NVivo in terms of working with the video – you can select, code, add transcript rows etc. You can also import and view the comments as a dataset. However, there is no direct import of transcript/captions and for at least the last year comment scraping is broken (doesn’t work in Chrome and IE not supported).

ETHICS and LEGALITY:

So it’s not technically illegal to download a YouTube video – however there are ethical considerations over what sort of video and who published it. With this project – looking at TED videos – these are not from an individual reliant on advertising revenue and there’s no individual to ask for permission to analyse.(See more at https://www.techadvisor.com/how-to/internet/is-it-legal-download-youtube-videos-3420353/#:~:text=For%20personal%20use%2C%20no%20it,of%20our%20industry%2C%20too).

Does it violate terms of use? Maybe but by a strict reading so does NCapture with it’s offline playback outside YouTube platform or app.

So if there are no legal or ethical considerations about downloading videos and comments for analysis it becomes a question of technical implementation.

YouTube Video Downloaders and Comment Scrapers

NCapture provided video linking to YouTube (treating them like an external video file hosted on your machine) – it would be great to see that sort of internet linking supported in ATLAS.ti and MAXQDA (if you can link to a local file why not an online one?).

So to work in ATLAS.ti with the video you need to download it.

I looked at Freemake Video downloader (https://freemake.com/free_video_downloader/) but hit issues as itst adds a logo at the start, thereby breaking the timestamp link for any downloaded transcript.

Downloading Video and Captions with 4k Video Downloader

4k Video downloader is the best I found 9and highest rated/recommended in TechRadar’s great article)

You just copy and paste the YouTube URL – and then it gives you options for quality *and* caption download (where available) including second captions in auto-translated languages:

The subtitles are downloaded as an SRT file – which can be directly imported into ATLAS.ti as a linked/synchronised transcript – RESULT (and way more than NCapture did).

So that gets the video and captions in – what about the comments.

Scraping and Importing the Comments

This has proved a little trickier and still has some slight anomalies I’m trying to iron out

I started out with https://app.coberry.com/ for comments – it has some great features including output as PDF or as dataset and enables sentiment analysis. BUT I also found a load of issues from the TED trsnaxritp I started working with including strings of characters like this:

😨😨😨
Â
’
‘

These require a SnR in Excel – but it wasn’t easy to pick up all of them. What ARE they?

I then tried a few others and read this good but out-of-date article with a bunch of recommendations. However the YouTube Comments Exporter chrome plugin is broken (looks like same issue as NCapture). I also looked at SEOBOTS but it charges you and had very poor data exports and threw a bunch of errors.

The best tool I found – which also opens up Facebook scraping into ATLAS.ti – is export comments. This also identified where the issues about those strange text strings came in coberry from as it would download various emoji style characters e.g.

👍
🌴
etc

Exportcomments.com DOES charge for over 100 comments – however the rates allow you to pay a modest US$11 for 3 days and the options for TikTok, Facebook etc. plug a perceived “gap” for ATLAS.ti.

THOUGHT: Is it a gap? The issue of NCapture suggests to me that enabling import of comments from custom tools might be more important than trying to develop a new software suite based on external APIs that is subject to breakage by a third party change…

An anomaly in ATLAS.ti Windows – wherefore art though emojis?

So here’s where it’s got a bit weird – those characters that caused a mare in Coberry don’t display properly in ATLAS.ti on Windows.

So the good news – ATLAS.ti for Mac works fine with emojis:

Emojis from YouTube comments displaying in ATLAS.ti for Mac
And in the Document preview

However there is an issue in Windows – no problem in document preview:

Document Preview in document manager on ATLAS.ti 9 Windows – icon shows

But when you open the document itself it’s not shown nor codable:

So where’s the palm tree gone? ATLAS.ti 9 on windows main document view for coding.

Considerations: Onscreen style comments printed or dataset?

The challenge here comes from HOW to bring in comments – and that’s one of the current limits with ATLAS.ti, there are only “Codes” and working with code groups is limited. So you can’t have a type of code for “cases” (e.g. comment authors) and give those codes attrbutes. You sort of can – using code groups but they can’t be used in the same way as document groups to easily compare across conceptual/thematic codes in the same way so then you have to create smart codes and it all gets complex.

So you need to decide: see comments together in a similar presentation to the way the appear on screen (and don’t use document groups) OR import as a dataset and work with document groups but with less familiar and potentially decontextualised chains of replies.

If you use Coberry you can easily print as PDF to have all comments on one page and then code for author AND/OR export. However any emojis will be garbled. Exportcomments.com only offers dataset download.

Marking up the comments for import

This next bit is a bit of a work in progress as I try to figure out which bits do still work and which work well for working with the downloaded comments as a dataset. Some of the information I’d found on survey import no longer works so it’s best to refer to the online manual for ATLAS.ti mac or the online manual for ATLAS.ti Windows about which prefixes can be added to column headings. However, some do not seem to work as described (I had no joy with .

I have had some issue with time/date imports as well which I’m trying to resolve.

Export Comments raw data is like this:

  Name (click to view profile):DateLikesisHeartedisPinnedComment(view source)
Raw export comments data

I chose to work with this data as follows:

Comment Export List

 !CommentNo Name#NameDate:Date:Likes.isHearted.isPinnedCommentSource URL
1JCJC26/05/2021 21:51:532021-05-260nonoSo this is where the last line of Rhett&Link’s Geek vs Nerd Rap Battle came from.
view comment
Data prepared for ATLAS.ti Import

As you can hopefully see I duplicated the name and date columns to use them as data in the document and a way to classify details. With the date column I duplicated, formatted as yyyy-mm-dd then cut, pasted it into notepad/textedit, formatted cells as text and pasted it back via paste special, paste as text to get dates-as-text.

This created a document per comment numbered by col 1 both including author name and date and in the comment as well as groups.

This *seems* to work quite well – but I’m still working on it!

Coming soon:

Here: Final bits on the comment import process and also alternatives via coberry.

A new post: A proper focus on analysis processes and opportunities – making use of split screens/tab groups, hyperlinks, networks and some coding to stitch all of this together.