6  The Categorization-Coding Continuum

6.1 Categorization

For some entities, the potential values an extracted entity can take are knowable in advance in a given systematic review context. For example, when reviewing primary studies in humans or other animals, the sample size must be a positive integer (e.g. 1, 2, 3, …); and publication year will usually have to be a positive integer, often of four digits. In other cases, it is clear that free text will be extracted, for example, when author names or source titles are extracted.

For many entities, however, it is less obvious how to operationalize them. When something is extracted as free text, often as many unique values will be extracted as there are sources. This means that synthesis (i.e., “analysis”, see below) first requires transformation of those values. A list of raw free text values cannot be synthesized: the strings of characters have no encoded meaning, and cannot be collapsed or summarized. Nothing can be calculated from a list of free text values; and if that list is used in a table, that table will have as many rows or columns as the list of free text values has different values. Especially in scoping reviews, where including hundreds of sources can be quite common, this often isn’t feasible.

In many cases, this problem can be avoided by having extracters categorize that information during extraction. For example, imagine a scoping review into qualitative research practices in a given field. One of the entities that will be extracted is how the researchers coded the data. In this case, an infinite number of coding approaches can be used. Many textbooks on qualitative research use some categorizations to organize these. For example, coding can be categorized as “inductive” versus “deductive” or as “open” versus “axial”. Like any categorization, these simplify reality, making it easier to deal with for humans. If this simplification is not problematic given the scoping reviewers’ research question(s), they can choose to adopt one of those categorizations.

In that case, they would decide which categories to use (e.g. inductive coding and deductive coding, or two symbols representing these two categories, such as 1 and 2) and specify clear coding instructions for each (often with special attention to edge cases). After extraction, instead of having one or several sentences of free text extracted for each source (where the original authors describe their coding approach), they would then have a list with only possible two values (e.g. inductive coding and deductive coding, or 1 and 2). This lends itself to easy synthesis: the percentage of sources using inductive coding could easily be obtained, and it would be possible to answer questions such as whether that percentage seems stable over time, or differs between subdomains, or by geographical area.

6.2 Ambiguity

However, the extractors would also encounter sources where the authors used both types of coding - and they would encounter sources where a coding approach would be used that could arguably belong in either (or neither) category. There are two strategies to try and prevent such problems.

The first is developing very, very comprehensive coding instructions. If the scoping reviewers have a clear idea of all potential coding approaches, discussing all edge cases extensively in the coding instructions can ensure unequivocal (and correct) categorizations of most potential descriptions extractors can encounter in the sources. For example, the coding instructions can instruct extractors to categorize all sources using both inductive and deductive coding as “inductive” (or “deductive”, depending on what makes sense given the scoping review’s goals).

The second is putting a lot of thought into the categories that are used for each entity. For example, instead of using two categories, the scoping reviewers could add a third category inductive and deductive coding. They could also split the entity into two dichotomous entities, having extractors extract whether inductive coding was used into one, and whether deductive coding was used into another. By adding a third category unclear to each entity, ambiguous cases could be easily spotted - however, at the cost of no longer knowing what the extractor would guess if forced. That could be solved by adding more categories, for example extracting the entity inductive coding into categories no, unlikely, likely, and yes; or, alternatively, by adding a second entity that holds the extractor’s confidence in the categorization.

6.3 The cost of categorization

Each of these solutions to the problems caused by reality (including researchers’ decisions as extracted in scoping reviews) usually not being neatly organized into categories entail some costs. The more entities that are used to store the information extracted from the sources, and the more categories that are used for each entity, the less information is lost during extraction – but the more time and effort the extraction costs.

In addition, any categorization by definition means that what can be learned from the systematic review is limited to the “potential answer space” formed by what the systematic reviewers knew a priori. If a research question is “which coding approaches are used”, and the entities that systematic reviewers extract into are inductive coding and deductive coding (both with categories no, unlikely, likely, and yes), then the synthesis can never result in conclusions about the proportion of sources where the researchers reported they used guinnea pigs, neural networks, or magic crystals for coding, even if a sizeable proportion of the sources reports those approaches. Each of these three types of coding approaches will instead be categorized as either inductive coding or deductive coding (or potentially both) – if the coding instructions are of sufficient quality, they will be categorized unequivocally and consistently, but still, a lot of information will be lost.

This can be problematic depending on the research questions. Often, what the systematic reviewers do not see coming a priori can be the most interesting. When the nature or scope of the “potential answer space” is not the thing of interest (i.e., the researchers are interested in where the set of included sources falls in that space), the costs of categorization can be zero or low. However, when it is not clear in advance how that space looks, researchers may not be able to afford categorization at extraction time. In that case, coding can happen after the extraction stage.

6.4 Coding after extraction

When coding after extraction, during extraction the only decision the extractors face is which fragments to extract. They don’t need to interpret anything beyond identifying which part(s) of the source contain(s) the relevant information, which decreases the probability of errors. That interpretation then comes after.

The extracted original raw text fragments can then be exported to .rock files that can be coded using the Reproducible Open Coding Kit (ROCK) standard. The coded files can then be imported again and merged into the object holding all extracted data.

There are a lot of advantages to this approach. First, it makes the review much more transparent. It’s easy for others to see which fragments were selected, and so what the results were ultimately based on. Second, it lends itself well to rigorous quality control: having a file with extracted fragments coded by multiple coders is relatively straightforward and ‘cheap’ (timewise), since the selection of the relevant fragments is often a large part of the task. Third, it scales very well: the tasks of selecting the relevant fragments and the interpretation of those fragments can be distributed between multiple extractors and coders. Fourth, closely related, it enables a decentralized approach, where different groups can work on different parts of a project. This means that it enables involving students or citizen scientists. Fifth, it provides flexibility regarding effort distribution over time. If twenty entities are extracted as free raw text fragments, reviewers can decide to start with coding the first five, which might be enough to answer their main research questions. The other fragments can then be coded later on, without delaying the rest of the project. Sixth, it allows relatively efficient re-coding using different categories, which is for example very useful when conducting living reviews, where insights about how to categorize can change over time.

There are also disadvantages to this approach. First, it costs more time to record raw text fragments (which requires copy-pasting, usually from PDFs which also means it often also requires some cleaning of the pasted text) than it takes to record a selection from a predefined set of categories. Second, experienced extractors develop competences that make them more efficient and more consistent over time. By cutting up the tasks and potentially distributing those over more people, this training effect decreases.

6.5 The categorization-coding continuum

Whether a given entity is extracted as a raw free text fragment or categorized during extraction, and if the latter, which categories are used and whether the entity is split up into multiple entities has to be decided in the planning stage. Changing this decision once the extraction has started is extremely expensive in terms of time, energy, and “error-prone-ness”, which means that it is worthwhile to put a lot of thought into this decision for every entity.

In fact, together with which entities are extracted, how to extract each entity is the most important decision taken when planning a systematic review. These decisions determine for a large part how much time and energy the review will take, as well as how flexible the compiled database will be, how extensible the review will be, and whether the process is scalable and lends itself to decentralization.

Whether an entity should be extracted as raw free text fragments that are then later coded, or categorized during extraction, or any of the options in between (e.g. categorization into one entity with a second entity to specify raw text fragments in case of a misfit with the prespecified categories; or coding into predefined categories but using multiple entities and many categories to lose as little information as possible), depends on a number of things. For example, when few resources are available (e.g. time, people), extracting raw text fragments and coding afterwards may not be feasible. If conversely, if the reviewers aren’t confident they can specify a well-defined set of mutually exclusive categories with clear coding instructions, they will have to extract raw text fragments and defer the category definition to the coding stage.

In addition, in one-shot reviews, some of the benefits of extracting raw text fragments and separating the categorization from the extraction dissappear, and the remaining benefits may not outweigh the costs. Conversely, when conducting a living review, being able to code extracted text fragments again at some point in the future using a different perspective, or having different coders code the text fragments with different goals and instructions can be useful.

In any case, given the importance of this decision, it is worthwhile to carefully document for each entity what the justifications are for its chosen position on the categorization-coding continuum. Later in the process, it is likely you will forget those, and you may even regret your decision for one or more entities – so future you will be grateful for reminders of why that position seemed like a good idea at the time.