In practice: Analysing large datasets and developing methods for that

A quick post here but one that seeks to place the rather polemic and borderline-ranty previous post about realising the potential of CAQDAS tools into an applied rather than abstract context.

Here’s a quote that i really like:

The signal characteristic that distinguishes online from offline data collection is the enormous amount of data available online….

Qualitative analysts have mostly reacted to their new-found wealth of data by ignoring it. They have used their new computerized analysis possibilities to do more detailed analysis of the same (small) amount of data. Qualitative analysis has not really come to terms with the fact that enormous amounts of qualitative data are now available in electronic form. Analysis techniques have not been developed that would allow researchers to take advantage of this fact.

(Blank, 2008, p258)

I’m working on a project to analyse the NSS (National Student Survey) qualitative textual for Lancaster University (around 7000 comments). Next steps include analysing the PRES and PTES survey comments. But that;s small fry – the biggie is looking at the module evaluation data for all modules for all years (~130,000 comments!)

This requires using tools to help automate the classification, sorting and sampling of that unstructured data in order to be able to engage with interpretations. This sort of work NEEDS software – there’s a prevailing view that this either can’t be done (you can only work with numbers) or that it will only quantify data and somehow corrupt it and make it non-qualitative.

I would argue that isn’t the case – tools like those I’m testing and comparing including the ProSUITE from Provalis including QDA Miner/WordSTAT, Leximancer and NVivo Plus (incorporating Lexalytics) – enable this sort of working with large datasets based on principles of content analysis and data mining.

However these only go so far – they enable the classification of data and its sorting but there is still a requirement for more traditional qualitative methods of analysis and synthesis. I’ve been using (and hacking) framework matrices in NVivo Plus in order to synthesise and summarise the comments – an application of a method that is more overtly “qualitative data analysis” in a much more traditional vein but yet applied to and mediated by tools that enable application to much MUCh larger datasets than would perhaps normally be used in qual analysis.

And this is the sort of thing I’m talking about in terms of enabling the potential of the tools to guide the strategies and tactics used. But it took an awareness of the capabilities of these tools and an extended period of playing with them to find out what they could do in order to scope the project and consider which sorts of questions could be meaningfully asked, considered and explored as well. This seems to be oppositional to some of the prescriptions in the 5LQDA views about defining strategies separate from the capabilities of the tools – and is one of the reasons for taking this stance and considering it here.

Interestingly this has also led to a rejection of some tools (e.g. MaxQDA and ATLAS.ti) precisely due to their absence of functions for this sort of automated classification – again capabilities and features are a key consideration prior to defining strategies. However I’m now reassessing this as MaxQDA can do lemmatisation which is more advanced than NVivo plus…

This is just one example but to me it seems to be an important one to consider what could be achieved if we explore features and opportunities first rather than defining strategies that don’t account for those. In other words: a symbiotic exploration of the features and potentials of tools to shape and define strategies and tactics can open up new possibilities that were previously rejected rather than those tools and features necessarily or properly being subservient to strategies that fail to account for their possibilities.

On data mining and content analysis

I would highly recommend reading Leetaru (2012)  for a good, accessible overview of data mining methods and how these are used in content analysis. These give a clear insight into the methods, assumptions, applications and limitations of the aforementioned tools helping to demystify and open what can otherwise seem to be a black-box that automagically “does stuff”.

Krippendorf’s (2013) book is also an excellent overview of content analysis with several considerations of human-centred analysis using for example ATLAS.ti or NVivo as well as automated approaches like those available in the tools above.


Blank G. (2008) Online Research Methods and Social Theory. In: Fielding N, Lee RM and Blank G (eds) The SAGE handbook of online research methods. Los Angeles, Calif.: SAGE, 537-549.

Preview of Ch1 available at

Krippendorff, K. (2012). Content analysis: An introduction to its methodology. Sage.

Preview chapters available at

Leetaru, Kalev (2012). Data mining methods for the content analyst : an introduction to the computational analysis of content. Routledge, New York

Preview available at 


On agency and technology: relating to tactics, strategies and tools

This continues my response to Christina Silver’s tweet and blog post. While my initial response to one aspect of that argument was pretty simple this is the much more substantive consideration.

From my perspective qualitative research reached a crossroads a while ago, though actually I think crossroads is the wrong term here. A crossroads requires a decision, it is a place steeped in mystery and mythology (see ), I sometimes feel as though qualitative research did a very british thing: turned a crossroads into a roundabout thus enabling driving round and round rather than moving forwards or making a decision.

The crossroads was the explosion in the availability of qualitative data. Previously access to accounts of experience were rather limited – you had to go into the field and write about it, find people to interview, or use the letters pages of newspapers as a site of public discourse. These paper-based records were slow and time consuming to assemble, construct and analyse. For the sake of the metaphor that follows I shall refer to these as “the cavalry era” of qualitative research. Much romanticised and with doctrines that still dominate from the (often ageing, pre-digital) professoriat.

Then the digital didn’t so much happen as explode and social life expanded or shifted online:

For researchers used to gathering data in the offline world, one of the striking characteristics of online research is the sheer volume of data. (Blank, 2008, P539)


Qualitative analysts have mostly reacted to their new-found wealth of data by ignoring it. They have used their new computerized analysis possibilities to do more detailed analysis of the same (small) amount of data. Qualitative analysis has not really come to terms with the fact that enormous amounts of qualitative data are now available in electronic form. Analysis techniques have not been developed that would allow researchers to take advantage of this fact. (Blank, 2008, P.548)

Furthermore the same methods continue to dominate – the much vaunted reflexivity that lies at the heart of claims for authenticity and trustworthiness does not seem to have been extended to tools, methods:

Over the past 50 years the habitual nature of our research practice has obscured serious attention to the precise nature of the devices used by social scientists (Platt 2002, Lee 2004). For qualitative researchers the tape-recorder became the prime professional instrument intrinsically connected to capturing human voices on tape in the context of interviews. David Silverman argues that the reliance on these techniques has limited the sociological imagination: “Qualitative researchers’ almost Pavlovian tendency to identify research design with interviews has blinkered them to the possible gains of other kinds of data” (Silverman 2007: 42). The strength of this impulse is widely evident from the methodological design of undergraduate dissertations to multimillion pound research grant applications. The result is a kind of inertia, as Roger Stack argues: “It would appear that after the invention of the tape-recorder, much of sociology took a deep sigh, sank back into the chair and decided to think very little about the potential of technology for the practical work of doing sociology” (Slack 1998: 1.10).

My concern with the approach presented and advocated by Silver and Woolf is that it holds the potential to reinforce and prolong this inertia. There are solid arguments FOR that position – especially given the conservatism of academia, mistrust of software and the apparently un-slayable discourses (Paulus, lester & Britt, 2013), entrenched critical views and misconceptions of QDAS software that “by its very nature decontextualizes data or primarily supports coding [which] have caused concerned researchers” (Paulus, Woods, Atkins and Macklin, 2017)


New technologies enable new things – when they first arrive they are usually perhaps inevitably and restrictively fitted in to pre-existing approaches and methods, made subservient to old ways of doing things.

A metaphor – planes, tanks and tactics

I’ve been trying to think of a metaphor for this. The one I’ve ended up with is particularly militaristic and I’m not entirely comfortable with it – especially as metaphors sometimes invite over-extension which I fear may happen here. It also feels rather jingoistically “Boys Own” and British and may be alienating to key developers and methodologists in Germany. So comments on alternative metaphors would be MOST welcome, however given the rather martial themes around strategies and tactics used in Silver and Woolf’s (2015) paper and models for 5level QDA I’ll stick with it and explore tactics, strategies and technologies and how they historically related to two new technologies: the tank and the plane.

WW1 saw the rapid development of new and terrifying technologies in collision with old tactics and strategies for their use. The overarching strategies were the same (defeat the enemy) however the tactics used failed to take account of the potential of these new tools thus restricting their potential.

Cavalry were still deployed at the start of WW1. Even with the invention of tanks the tactics used in their early deployments were for mounted cavalry to follow up the breakthroughs achieved by tanks – with predictably disastrous failure at the battle of cambrai see ).

Planes were deployed from early in WW1 but in very limited capacities – as artillery spotters and as reconnaissance. Their potential to change warfare tactics were barely recognised nor exploited.

These strategies were developed by generals from an earlier era – still wedded to the cavalry charge as the ultimate glory. (See ). Which seems to be a rather appropriate metaphor for professorial supervision today with regard to junior academics and PhD students.

The point I’m seeking to make is to suggest that new technologies vary in their complexity, but they also vary in their potential. Old methods of working are used with new technologies and the transformative potential of those new technologies on methods or tactics to achieve strategic aims is often far slower, and can be slowed further when there is little immediate incentive to change (unlike say a destructive war) in the face of an established doctrine.

My view is therefore that those who do work with and seek to innovate with CAQDAS tools  need to seek to do more than just fit in with the professorial field-marshall Haig’s of our day and talk in terms of CAQDAS being “fine for breaching the front old chap you know use CAQDAS to open up the data but you send in the printouts and transcripts to really do the work of harrying the data, what what old boy”.

Meanwhile Big Data is the BIG THING – and this entire sphere of large datasets and access to public discourse and digital social life threatens to be ceded entirely to quantitative methods. Yet we have tools, methods and tactics to engage in that area meaningfully by drawing on existing approaches which have always been both qual and quant (with corpus linguistics and content analysis springing to mind).

Currently the scope of any transformation seems to be pitched to taking strategies from a “cavalry era” of qualitative research. My suggestion is that to realise the full potential of some of the tools now available in order to generate new, and extend existing, qualitative analysis practices into the diverse new areas of digital social life and digital social data we need to be bolder in proposing what these tools can achieve and what new questions and datasets can be worked with. And that means developing new strategies to enter new territories – which need to understand the potential of these tools and explore ways that they can transform and extend what is possible.

If, however, we were to place the potential of these tools as subservient to existing strategies and to attempt to locate all of the agency for their use with the user and the way that we “configure the user” (Grint and Woolgar, 1997) in relation to these tools through our pedagogies and demonstrations we could limit those potentials. Using NVivo Plus or QDA Miner/WordSTAT to reproduce what could be done with a tape recorder, paper, pen and envelopes seems akin to sending horses chasing after tanks. What I am advocating for (as well, not instead) is to also try to work out what a revolutionary engagement with the potential of the new tools we have would look like for qualitative analysis with big unstructured qualitative data and big unstructured qualitatitve data-ready tools.

To continue the parallel here – the realisation of what could be accomplish by combining the new technologies of tanks and planes together created an entirely new form of attacking warfare – named Blitzkrieg by the journalists who witnessed its lightning speed. This was developed to achieve the same overarching strategies as deployed in WW1 (conquering the enemy) but by considering the potential and integration of new tools it developed a whole new mid-level strategy and associated tactics that utilised and realised the potential of those relatively new technologies. Thus it avoided becoming bogged down in the nightmare of using the strategies and tactics from a bygone era of pre-industrial warfare with new technologies that prevented their effectiveness which dominated in WW1. My suggestion is that there is a new territory now – big data – and it is one that is being rapidly and extensively ceded to a very quantitative paradigm and methods. To make the kind of rapid advances into that territory in order to re-establish qualitative analysis as having relevance we need to be bolder in developing new strategies that utilise the tools rather than making these subservient to strategies from an earlier era in deference to a frequently luddite professoriat.

My argument thus simplifies to the idea that the potential of tools can and should productively shape not only the planning and consideration the territories now amenable to exertion and engagement but also the strategies and tactics to do that. Doing that involves engagement with the conceptualisation, design and thinking about what qualitative or mixed-methods studies are and what they can do in order that this potential is realised. From this viewpoint Blitzkrieg was performed into being by the new technologies of the tank and the plane and their combination with new strategies and tactics. These contrast with the earlier subsuming of the plane’s potential to merely being tools to achieve strategies that were conceptualised before its existence. A plane was there equivalent to a tree or a balloon for spotting cannon fire. Much of CAQDAS use today seems to be just like this – sending horses chasing after tanks – rather than seeking to achieve things that couldn’t be done without it and celebration that.

This is all rather abstract I know so I’ve tried to extend and apply this into a consideration of implementation in practice working with large unstructured datasets in a new post.


Back L. (2010) Broken Devices and New Opportunities: Re-imagining the tools of Qualitative Research. ESRC National Centre for Research Methods

Available from:


Lee, R. M. (2004) ‘Recording Technologies and the Interview in Sociology, 1920-2000’, Sociology, 38(5): 869-899

E-Print available at:

Platt, J. (2002) ‘The History of the Interview,’ in J. F. Gubrium and J. A. Holstein (eds) Handbook of the Interview Research: Context and Method, Thousand Oaks, CA: Sage pp. 35-54.

Limited Book Preview available at

Silverman D. (2007) A very short, fairly interesting and reasonably cheap book about qualitative research, Los Angeles, Calif.: SAGE.

Limited Book Preview at:

Slack R. (1998) On the Potentialities and Problems of a www based naturalistic Sociology. Sociological Research Online 3.

Available from:

Blank G. (2008) Online Research Methods and Social Theory. In: Fielding N, Lee RM and Blank G (eds) The SAGE handbook of online research methods [electronic resource]. Los Angeles, Calif. ; London : SAGE.

Grint K and Woolgar S. (1997) Configuring the user: inventing new technologies. The machine at work: technology, work, and organization. Cambridge, Mass.: Polity Press, 65-94.

Paulus TM, Lester JN and Britt VG. (2013) Constructing Hopes and Fears Around Technology. Qualitative Inquiry 19: 639-651.

Paulus T, Woods M, Atkins DP, et al. (2017) The discourse of QDAS: reporting practices of ATLAS.ti and NVivo users with implications for best practices. International Journal of Social Research Methodology 20: 35-47.

Silver C and Woolf NH. (2015) From guided-instruction to facilitation of learning: the development of Five-level QDA as a CAQDAS pedagogy that explicates the practices of expert users. International Journal of Social Research Methodology 18: 527-543.

Approaches to defining Basic vs Advanced Features… Manufacturers, Existing Definitions or Other Conceptualisations?

Continuing from my previous post and the extended response from Christina Silver at

  1. On what grounds is the basic vs advanced rejected? Is there alternative evidence to assert this might not be such an easy rejection to defend. (Spoiler: Lots IMHO)

Now Christina has, most flatteringly, responded to my initial blog post with a very extended consideration in response. This enables me to engage in dialogue with soemthing much MUCh more considered and nuanced than a tweet – which is great. In her response she argues that:

Distinguishing between ‘basic’ and ‘advanced’ features implies that when learning a CAQDAS package it makes sense to first learn the ‘basic’ features and only later move on to learning the ‘advanced’ features. In developing an instructional design this begs the question of which features are ‘basic’ and which are ‘advanced’, in order to know which features are taught first and which later. We remain to be convinced how this distinction can meaningfully be made. What criteria are used to decide which features are ‘basic’ or ‘advanced’? Is it that some features are easier to use than others? Or that some features are more commonly used than others? Or that some features are used earlier in a project than others? I’m interested to hear what others criteria are in this regard.   We believe that attempting to distinguish between ‘basic’ and ‘advanced’ features is unhelpful. – See more at:

Now, I can really see the point and purpose of this approach, but also wonder if there is some merit in exploring and contesting it.

What criteria are used to decide which features are ‘basic’ or ‘advanced’?

Option 1 – using Manufacturers’ product differentiation

One way of defining this would be to draw on the way packages are marketed, developed and positioned. And the manufacturers provide plenty of text and charts and details to do just this. WHY? Well these classifications exist, they are in play, they are acting as differentiators between packages. They will be guiding people and positioning options as well as costs.

From a teaching perspective I can also see a huge benefit – stripped down software with fewer options is just far FAR less daunting! i have seen students looking slightly terrified of the complexity and option of NVivo or ATLAS.ti really light up when F4 analyse is introduced.

F4 analyse is part of the new generation of “QDA Lite” packages. These include the EXCELLENT F4 analyse as well as the quirky, touch-oriented QUIRKOS. Joining this grouping are also the cut-down versions of “full featured” packages: NVivo Starter , MaxQDA Base. Potenitally we could also include tablet-versions of key packages such as the ATLAS.ti  app and  MaxQDA App .

Looking across these we could come up with a list of common features that would provide an empirically based list of “features that are included in basic versions of QDA software” and thus achieve a working definition of “basic features”.

The list from F4 Analyse seems pretty good to work from:

  • Write memos, code contents
  • Display and filter quotations
  • Develop a hierarchical code system
  • Description and differentiation of codes
  • Distribution of code frequencies
  • Export the results

My suggestion here is that these packages DO position some technologies as simple and others as advanced – seeking to erase rather than reposition that difference could therefore be less productive even if it is theoretically justified.

Option 2: Established definitions

Alternatively we could go back to older existing and established definitions e.g. those proposed by the CAQDAS networking project :


We use the term ‘CAQDAS’ to refer to software packages which include too ls designed to facilitate a qualitative
approach to qualitative data. Qualitative data includes texts, graphics, audio or video . CAQDAS packages may
also enable the incorporation of quantitative (numeric) data and/or include tools for taking quantitative
approaches to qualitative data. However, they must directly handle at least one type of qualitative data and
include some – but not necessarily all – of the following tools for handling and analysing it:

  • Content searching tools
  • Linking tools
  • Coding tools
  • Query tools
  • Writing and annotation tools
  • Mapping or networking tools

The combination of tools within CAQDAS packages varies, with many providing additional options to those listed here. The relative sophistication and ease of use also varies and we aim to uncover some of these differences in
our comparative reviews

So here again we have a list of tools that could be considered to be “basic” with the additional criteria of “relative sophistication” and “ease of use” giving dimensions for considering those criteria.

But – does that do anything?

Option 3 – (A bit of a “thought in progress… “)Conceptualising Affordances

Affordances are both an easy shorthand and a contested term (see Oliver, 2005) but one that rains both a common-sense understanding of “what’s easy to do” or maybe – with a more interactionist or even ANTy sensibility of non-human agency “what actions are invited” – that whilst it may lack the sort of theoretical purity or precision that may be desired remains a useful concept.

How then could “the affordances of CAQDAS” be explored systematically, empirically and meaningfully?

Thompson and Adams (2011, 2013, 2016) propose phenomenological enquiry as providing a basis. Within this there are opportunities to record user experience at particular junctures – moments of disruption and change being obvious ones. So for me encountering ATLAS.ti 8 presents an opportunity to look at the interaction of the software with my expectations and ideas and desires to achieve certain outcomes. Adapting my practices to a new environment creates an encounter between the familiar and the strange – between the known and the unknown.

However is there a way to bring alternative ideas and approaches – perhaps even those which are normally regarded as oppositional or incommensurable with such a reflexive self-as-object-and-subject mode of enquiry? Could “affordances” be (dare I say it?) quantified? Or at least some measures be proposed to support assertions. For example if an action is ever-present in the interface or only takes one click to achieve could that be regarded as a measure of ease – an indicator of affordance?

Could counting the steps required add to an investigation of the tacit knowledge and/or prior experience and/or comparable and parallel experience that is drawn on? Or would it merely fudge it and dilute it all?

My sense is that counts such as this, supplemented by screen shots could provide a twin function – that is the function of trying to map and uncover the easiest path or the fewest steps to achieving a desired outcome which will not only provide a sense or indication of simplicity/affordance vs complexity/un-afforded* (Hmmm – what is the opposite of an affordance? If there isn’t one doesn’t that challenge it’s over-use?) action but also the basis for teaching and action based on that research – to show and teach and support ways around the easy routes written into software that configure the user.

Drawing this together

This is part of my consideration of simplicity vs complexity and how this distributes agency when working with complex technologies for qualitative analysis. I’m not convinced that the erasing of simplicity vs complexity is the right way to approach this. here I’ve tried to set out some ideas and existing approaches which are already circulating and propose some ideas around the influence these have and my experiences too.

This is in part to anticipate lines of argument or proposals  about something being simple, basic or easy which that have some demonstrable grounding.

But where is this going – well there’s two aspects to my thinking:

  • one aspect is about complexity in practice: how do software packages shape our practices and make some things very visible and very simple to achieve? I’ve started sketching this out with the affordances bit here but there’s something more to it.  I do believe this can be empirically considered and assessed in terms of visibility and complexity in local practice – whether that is the number of clicks to get to something or the number of options available to customise a feature. It can also be considered more generally in terms of consideration of the shaping of method and patterns of use and non-use and how certain approaches to qualitative research become reinforced whilst others become marginalised from a software supported paradigm.
  • the other is a more comprehensive argument about the challenge and problems and potential for missed opportunities. My concern here is if and how the transformative potential of tools are not realised if and when they are made subservient to strategies based on older ways of working from when such tools were not available. The outcome of that is that the potential of tools would be something important to foreground and explore as these can (and I would argue should) lead to new strategies that were simply not possible before… And that’s the topic of my next post. 

So this was a first step to respond to one aspect of the argument Christina and Nicholas advance. Their approach is one one which I think has huge merit, however, as with anything of merit for teaching and practice I also believe there is a value in contesting it in order to explore, deepen and enhance it and anticipate lines of critique as well as developing responses to support its use, implementation and adaptation.



Adams, C., & Thompson, T. L. (2016). Researching a Posthuman World Palgrave Macmillan UK.

Preview at

Adams, C. A., & Thompson, T. L. (2011). Interviewing objects: Including educational technologies as qualitative research participants. International Journal of Qualitative Studies in Education, 24(6), 733-750.

Oliver M. (2005) The Problem with Affordance. E-Learning 2: 402-413.  DOI:10.2304/elea.2005.2.4.402

Thompson TL and Adams C. (2013) Speaking with things: encoded researchers, social data, and other posthuman concoctions. Distinktion: Scandinavian Journal of Social Theory 14: 342-361.

E-Print available at

Basic vs advanced CAQDAS features?

Part one of a series of posts in dialogue with Christina.

There are no basic or advanced #CAQDAS features, but straightforward or more sophisticated uses of tools appropriate for different tasks

— Christina Silver (@Christina_QDAS) April 27, 2017

This tweet got me thinking a LOT about the ideas it  – it’s a tweet so it’s trying to distill a complex argument down into a pithy soundbite. However something about it doesn’t sit quite right with me. This blog post is an attempt to start working out some of those questions and hopefully do so in a space with sufficient space (rather than twitter character limits) to engage in dialogue but also work out the issues at some length.

I want to try and break it down into it’s key aspects the engage with each:

There are no basic or advanced #CAQDAS features

CAQDAS = Computer Assisted Qualitative Data Analysis Software

Basic vs Advanced features = not only a false dichotomy but something that doesn’t exist

Instead there’s a new dichotomy proposed of:

Straightforward vs sophisticated uses of tools.

And the straightforwardness or sophistication is to be judged in terms of their “appropriateness for different tasks”.

My key questions therefore are:

  1. On what grounds is the basic vs advanced rejected? Is there alternative evidence to assert this might not be such an easy rejection to defend. (Spoiler: Lots IMHO)
  2. The more complex exploration of how would a judgement of appropriateness be based for considering if you are doing “straightforward” or “more sophisticated” use of tools, and how would those tasks be determined in a way that to me at least reads as being independent of, preceding or separable from the tools?

Fundamentally, I see this as a question of the distribution of agency between

  1. manufacturers and designers of tools,
  2. the tools,
  3. the tasks that can be done, and
  4. the users.

I interpret this formulation as being one that sees or proposes that the agency is (or should be) primarily with the users. Which I further interpret as proposing a new way to (re)configure the user – to draw on Grint and Woolgar’s (1997) conceptualisations.


I’m VERY pleased to say that Christina has responded to this post to expand those ideas substantially over at in response to this post. So I shall compose further responses in other linked posts.


On considering and defining basic vs advanced tools – which is pretty minimal but proposes possibel criteria.

And a much more extended consideration of the distribution of agency and relationships between tools, potentials, strategies and tactics.