A recent conversation with a colleague in Lighthouse’s Focus Discovery team resonated with me – we got to chatting about TAR protocols and the evolution of TAR, analytics, and AI. It was only five years ago that people were skeptical of TAR technology and all the discussions revolved around understanding TAR and AI technology. That has shifted to needing to understand how to evaluate the process of your team or of opposing counsel’s production. Although an understanding of TAR technology can help in said task, it does not give you enough to evaluate items like the parity of types of sample documents, the impact of using production data versus one’s own data, and the type of seed documents. That discussion prompted me to grab one of our experts, Tobin Dietrich, to discuss the cliff notes of how one should evaluate a TAR protocol. It is not totally uncommon for lawyers to receive a technology assisted review methodology from producing counsel – especially in government matters but also in civil matters. In the vein of the typical law school course, this blog will teach you how to issue spot if one of those methodologies comes across your desk. Once you’ve spotted the issues, bringing in the experts is the right next step.
Issue 1: Clear explanation of technology and process. If the party cannot name the TAR tool or algorithm they used, that is a sign there is an issue. Similarly, if they cannot clearly describe their analytics or AI process, this is a sign they do not understand what they did. Given that the technology was trained by this process, this lack of understanding is an indicator that the output may be flawed.
Issue 2: Document selection – how and why. In the early days of TAR, training documents were selected fairly randomly. We have evolved to a place now where people are being choosy about what documents they use for training. This is generally a positive thing but does require you to think about what may be over or under represented in the opposing party’s choice of documents. More specifically, this comes up in 3 ways:
- Number of documents used for training. A TAR system needs to understand what responsive and non-responsive looks like so it needs to see many examples in each category to approach certainty on its categorization. When using too small a sample, e.g. 100 or 200 documents, this risks causing the TAR system to incorrectly categorize. Although a system can technically build a predictive model from a single document, it will only effectively locate documents that are very similar to the starting document. The reality of a typical document corpus is that it is not so uniform as to rely upon the single document predictive model.
- Types of seed documents. It is important to use a variety of documents in the training. The goal is to have the inputs represent the conceptual variety in the broader document corpus. Using another party’s production documents, for example, can be very misleading for the system as the vocabulary used by other parties is different, the people are different, and the concepts discussed are very different. This can then lead to incorrect categorization of documents. Production data, specifically, can also add confusion with the presence of Bates or confidentiality stamps. If the types of seed documents/training documents used do not mirror typical types of documents expected from the document corpus, you should be suspicious.
- Parity of seed document samples. Although you do not need anything approaching the perfect parity of responsive and non-responsive documents, it can be challenging to use 10x the number of non-responsive versus responsive documents. This kind of disparity can distort the TAR model. It can also exacerbate either of the above issues, number, or type of seed documents.
Issue 3: How is performance measured? People throw around common TAR metrics like recall and precision without clarifying what they are referring to. You should always be able to tell what population of documents these statistics relate to. Also, don’t skip over precision. People often throw out recall as sufficient, but precision can provide important insight into the quality of model training as well.
By starting with these three areas, you should be able to flag some of the more common issues in TAR processes and either avoid them or ask for them to be remedied.