KWIC interfaces and concordances

This image from the excellent QD in Practice event organised at Leeds University really drove home to me just how powerful and useful KWIC (Key Words In Context) concordance displays can be.

kwicinarabic

In the image above I cannot even read the script – I don’t read arabic. Not only can I not read the script it is written from right-to-left, yet KWIC works.

I can see, without being able to understand, that there is a difference between lines 1, 2 3, lines 4 though 11 are the same, line 12 is different and lines 13 through 20 are the same in terms of the words in red that appear before (it’s R>L text, remember!) the highlighted keyword.

Since I first encountered KWIC in a module on corpus approaches to language teaching I have recognised that it has an incredible simplicity and power compared to many other ways of showing highlighted text.

From text to context – displaying search results in NVivo at Present

Compare it to this:

nvivowordsincontextsearchview1

Which is the results output from a text search in NVivo.

This is not a bad output, I see context in a similar was a KWIC concordance and can access the underlying data immediately. However, the appearance precludes some rather more important options KWIC enables.

Another way to reach this sort of word search is by running a word frequency query in NVivo – which will then create a list of words along with information on their length, their count, a weighted percentage (need to learn more on that) and a list of “similar words”.

The similar words are derived by including stemmed words – a process which has some issues associated with it which I’ll go into a little later. Here I’m going to focus on the representation of that information:

nvivowordfrequencyresults

So double-clicking on a word takes me to the same display as previously for a stemmed text search:

nvivowordsincontextsearchview2

Again not bad – I get some context and information on the source. And from it I can go and find the word in context in the original text by clicking the link – and the word is helpfully highlighted:

highlightedwordsincontext

A closer view – word trees

EDIT/UPDATE – from chatting with Silvana (and revisiting Kathleen’s comments in the NVivo Users Group). Word tree is indeed *very* similar to KWIC:

wordTree-NVivo

they show the key word in the middle and the branching before and after. The differences however are still important – while you can select the text to see connections:

wordTree-highlighted

What you cannot see as easily are the sentences across, or any variation. It’s a powerful tool that does much of the work of KWIC – but I’m not sure if the simplification comes at a cost. This is one for me to look at further – thanks to Kathleen for flagging it to me to cogitate on and explore further!

Of course MaxQDA does have KWIC 

What you can’t do or see easily with this… but could with KWIC

However, there are a bunch of things I can’t do or easily see which KWIC would enable:

  • Which words come before or after? (visible in word tree)
    • Consider for example the potentially very important differences between the pronouns that precede or follow a key term that is emerging as a theme or word – for example work/working or team/s and if or how these might very between groups or align with attributes you;re interested in (e.g. managers vs subordinates)
    • Consider for example the important differences between how use and used can appear as a verb, a modal auxiliary :
      • I used the software four years ago (verb, p/t)
      • I used to hate the software (quasi-modal)
      • I got used to the software (adjective phrase)
    • Which stems are associated? (Not sure if this is visible with word tree???)
      • Consider the spurious stemming that can occur e.g.
        • Office
        • Officer
        • Official
      • Which words are associated with particular stems or synonyms
        • Consider the difference between stems of
          • be, been, being
        • Compared to lemmatisation as
          • am, was, are, were

And here’s where the power yet simplicity of KWIC really holds potential for working with this sort of query and any coding from that. Consider what you can see when the data is presented in a KWIC concordance:

Ref 1:  0.01%

 a little while since I’ve

 use

d  Adobe Connect. Okay [pause] oh
Ref 2:  0.02%

 STS and how you’ve been

 using

 caqdas software, but it’s just
Ref 3:  0.02%

 that particularly made it seem

 use

ful or relevant or drew you
Ref 4:  0.02%

 ANT, but nevertheless he is

 using

 some of the principles of
Ref 5:  0.01%

 by Actor-network theory have

 use

d  software in their research. Erm
Ref 6:  0.02%

 poll is people who are

 using

 CAQDAS packages, some is people
Ref 7:  0.02%

 is people who are not

 using

 those. Erm, and some is
Ref 8:  0.02%

 some is people who are

 using

 a mixture of-, a sort
Ref 9:  0.02%

 wondered, what software are you

 using

? Erm, and one info [skip
Ref 10:  0.02%

 you know, beca

 use

-,  I start using what I knew at that
Ref 11:  0.02%

 start my PhD, we start

 using

 a specific software that I
Ref 12:  0.02%

 software that I had been

 using

 before, which is a qualitative
Ref 13:  0.01%

 study, then I have to

 use

  something that I knew and
Ref 14:  0.01%

 with Atlas T, and I

 use

  it-, I will explain it
Ref 15:  0.01%

 but later …[15.34] Then I

 use

d  Atlas T from the very
Ref 16:  0.01%

 the very beginning, and I

 use

d  it only to qualify all
Ref 17:  0.01%

 of my research. Erm, the

 use

 of Atlas T was useful
Ref 18:  0.02%

my

 use

of Atlas T was useful at some extent,
Ref 19:  0.01%

 best tool that I can

 use

, but I will explain it
Ref 20:  0.01%

 apply principles of ANT and

 use

  a specific software?’ [18.54] So
Ref 21:  0.01%

 of mine, err, quite frequently

 use

s the phrase ‘auto-magical’, and
Ref 22:  0.02%

 understand how ANTA can be

 use

ful in that sense. Of course
Ref 23:  0.02%

 learning, analytics, big data and

 using

 those special softwares, but I
Ref 24:  0.01%

 didn’t get how I can

 use

  it for my research, really
Ref 25:  0.01%

 and show me how you

 use

  Atlas.ti that would be really
Ref 26:  0.01%

 tools and options you do

 use

, that have supported you the
Ref 27:  0.02%

 broken.’ So which-, so you’re

 using

 Atlas T on a Mac
Ref 28:  0.01%

 Yes I [skip]-, I’m just

 use

[skip] [25.47] Steve W Okay
Ref 29:  0.02%

 finished my thesis, I am

 using

 [skip] as a module from
Ref 30:  0.02%

 you. This paper is about

 using

 ANT principles through my research
Ref 31:  0.01%

 yesterday found that I can

 use

  AtlasT not in my Windows
Ref 33:  0.02%

 with statements from other documents

 using

 categories of analysis. I mean
Ref 34:  0.01%

 you generate and did you

 use

? Alberto There is no [unclear

The power and importance of sorting

What I would like to be able to see is the kind of output shown above as an option along with the normal contextual view. I would want to be able to sort it by the middle column and/or the words immediately preceding or following that. This then really helps spot patterns:

Loc %

Text 1

Stem

Text 2
Ref 13:  0.01%

 study, then I have to

 use

  something that I knew and
Ref 14:  0.01%

 with Atlas T, and I

 use

  it-, I will explain it
Ref 17:  0.01%

 of my research. Erm, the

 use

 of Atlas T was useful
Ref 18:  0.02%

my

 use

of Atlas T was use ful at some extent, to some
Ref 19:  0.01%

 best tool that I can

 use

, but I will explain it
Ref 20:  0.01%

 apply principles of ANT and

 use

  a specific software?’ [18.54] So
Ref 24:  0.01%

 didn’t get how I can

 use

  it for my research, really
Ref 25:  0.01%

 and show me how you

 use

  Atlas.ti that would be really
Ref 26:  0.01%

 tools and options you do

 use

, that have supported you the
Ref 28:  0.01%

 Yes I [skip]-, I’m just

 use

[skip] [25.47] Steve W Okay
Ref 31:  0.01%

 yesterday found that I can

 use

  AtlasT not in my Windows
Ref 34:  0.01%

 you generate and did you

 use

? Alberto There is no [unclear
Ref 10:  0.02%

 you know, beca

 use

-,  I start using what I knew at that
Ref 1:  0.01%

 a little while since I’ve

 use

d  Adobe Connect. Okay [pause] oh
Ref 5:  0.01%

 by Actor-network theory have

 use

d  software in their research. Erm
Ref 15:  0.01%

 but later …[15.34] Then I

 use

d  Atlas T from the very
Ref 16:  0.01%

 the very beginning, and I

 use

d  it only to qualify all
Ref 3:  0.02%

 that particularly made it seem

 use

ful or relevant or drew you
Ref 22:  0.02%

 understand how ANTA can be

 use

ful in that sense. Of course
Ref 21:  0.01%

 of mine, err, quite frequently

 use

s the phrase ‘auto-magical’, and
Ref 2:  0.02%

 STS and how you’ve been

 using

 caqdas software, but it’s just
Ref 4:  0.02%

 ANT, but nevertheless he is

 using

 some of the principles of
Ref 6:  0.02%

 poll is people who are

 using

 CAQDAS packages, some is people
Ref 7:  0.02%

 is people who are not

 using

 those. Erm, and some is
Ref 8:  0.02%

 some is people who are

 using

 a mixture of-, a sort
Ref 9:  0.02%

 wondered, what software are you

 using

? Erm, and one info [skip
Ref 11:  0.02%

 start my PhD, we start

 using

 a specific software that I
Ref 12:  0.02%

 software that I had been

 using

 before, which is a qualitative
Ref 23:  0.02%

 learning, analytics, big data and

 using

 those special softwares, but I
Ref 27:  0.02%

 broken.’ So which-, so you’re

 using

 Atlas T on a Mac
Ref 29:  0.02%

 finished my thesis, I am

 using

 [skip] as a module from
Ref 30:  0.02%

 you. This paper is about

 using

 ANT principles through my research
Ref 33:  0.02%

 with statements from other documents

 using

 categories of analysis. I mean

This would help with viewing the associations created from a query.

The next level – making this KWIC view a way of shaping the associations of stems and synonyms

However, to really have power you would need to be able to use it to interact with and change those associations.  the functions I would really like (via right click or similar) are:

1 – remove link of stem (e.g. De-link office and officer as being the same word)

2 – remove synonym association (e.g.

3 – (Ideally – probably harder!)  create a link for lemmatisation and ideally save it to a dictionary or thesaurus. AND / OR differentiate on set of used to from another set of used to.

All of these are hugely facilitated by a KWIC concordance view – and hopefully some of this is fairly simple whilst other aspects may need to be on a longer list but I believe are really worthy of consideration especially for approaches oriented more towards content analysis and data mining rather than inductive analysis.

2 thoughts on “KWIC interfaces and concordances

    1. Hi Christina, Thanks for taking the time to read and comment.

      I’ve just started using QDA Miner/wordstat (and REALLY liking it) – one of the first things I really liked seeing was KWIC concordancing – which in part promoted this post.

      Using and learning MaxQDA is still (sadly) stuck on my “to do” list though their trainers certification programme is hugely attractive and great to see they’ve done KWIC http://www.maxqda.com/maxqda-update-12-3-maxdictio

      Will be updating/extending this post soon to explore how it could come in to ATLAS.ti 8 and Mac.

      Next post will be about relationships as I continue trying to work out what relationship nodes do in NVivo and how they contrast with hyperlinking in ATLAS.ti in terms of visual representation alone vs functional querying… something of a work in progress but trying to become more disciplined about writing every day and getting things up here as a place to begin a conversation and get comments so your input and responses are a big boost 😀

      Like

Leave a comment