What is TAR?
Technology Assisted Review (TAR), when narrowly defined as "predictive coding", essentially relies on algorithms similar to those utilised by Amazon, Netflix and other websites to suggest products and box sets based on previous choices. TAR software, when used to assist those carrying out disclosure for court proceedings, requires the active involvement of a lawyer and "learns" from their decisions to apply that learning and predict the choices that it should make in respect of un-reviewed documents. Most predictive coding software now utilises "Continuous Active Learning", by which it continuously recalibrates its predictions based on new human input – i.e. the more documents are reviewed, the more refined and accurate the predictions made by TAR.
It is now generally recognised (and has been confirmed in various studies) that TAR can be more effective than human only review. The advantages are significant savings in time and cost, without sacrificing quality.
When more widely defined, the term TAR can extend to more "passive" technologies which assist lawyers (and others) without requiring any prior "training" or other human input. These are useful tools for practitioners, particularly in the early stages of matter when trying to understand, in broad terms, a population of documents. Such "passive" TAR tools include:
- email threading – the organising of email exchanges into complete "threads", avoiding the need to review each email in the chain separately;
- data visualisation software which uses passive analytics of data, metadata and language to identify:
- groups of documents dealing with the same or similar themes and concepts, often representing them visually as "clusters".
- patterns of communication between individuals – which can assist in identifying custodians.
- potential gaps in the dataset (e.g. by identifying a period of time where the volume of email communication is significantly below the average, which may indicate a gap in the data collected).
- sentiment analysis tools which can identify potentially relevant changes to the tone of communications between individuals which may indicate something worth investigating further.
TAR and the Disclosure Pilot
TAR in all its forms received a boost in the legal sector with the introduction of the two-year disclosure pilot scheme in the Business and Property Courts across England and Wales (Pilot). Among the more heralded changes, a line in the guidance to the Disclosure Review Document (DRD) created a presumption, that in larger cases involving more than 50,000 documents, the parties should use TAR (or explain why not).
As a general rule, predictive coding adds more value the larger the pool of documents to be reviewed. Models C, D and E in the Pilot, involve the handling (if not review) of increasingly larger volumes of documents. The "sweet spot" for use of TAR is probably Model D, if ordered to be given with narrative documents, (i.e. the closest model to old fashioned "standard disclosure"). This is not surprising as TAR was, in any event, initially developed for use in relation to standard disclosure.
Using TAR under the Pilot
More than ever before, the rules in the Pilot have required parties (and their representatives) to have a clear understanding of the documents in their control. As noted by Sir Geoffrey Vos, Chancellor of the High Court in McParland & Partners Limited & Another v Whitehead [2020] EWHC 298 (Ch), such an understanding is a vital starting point for formulating and responding to Model C requests for documents: “The parties need to start by considering what categories of documents likely to be in the parties’ possession are relevant to the contested issues before the court.”
A party which has harvested data (whether a complete dataset or a sample from a key custodian or two) is in a good position to use a "passive" analytics tool in order, initially, to gain a better understanding of the document population. These tools can also assist with assessing the likely volume of data that may be relevant to a particular issue for disclosure and can therefore inform discussions (and arguments) as to whether a party's proposed approach is proportionate.
Highlighting the difference between the old approach and the new approach under the Pilot, the Chancellor further commented in McParland that “Under standard disclosure, the test was whether a document supported or adversely affected a party’s ‘case’. This was far too general. Under the disclosure pilot, the reviewer has defined issues against which documents can be considered. The review should be a far more clinical exercise.”
So, how could that clinical exercise be managed using the available technology? Take for example a typical commercial dispute that may require disclosure in respect of a range of issues relating to the contract in place between the parties (as amended over time) and to the quantum of the claim.
Assuming, as is often the case, that the documents are not neatly collated in folders (electronic or hard copy) but stored across custodians' emails and various locations on a server, a first step might be to use data analytics to identify the "clusters" of documents most likely to be relevant.
Once the format of any contract amendments has been identified, software can be used to locate "near duplicates" from which human reviewers can find and understand the variations made to the agreement over time.
To the extent that email correspondence between the parties and internal communications are likely to be relevant, these could be hived off into a "workflow" focussed on locating key communications relevant to a particular Model C or D disclosure request. If the volume of documents to be reviewed is sufficiently large, predictive coding could be utilised to speed up the review, locate the most relevant documents early and minimise documents requiring review by a lawyer.
To the extent that quantum issues may need to be addressed by disclosure of spreadsheets, financial reports, purchase orders and invoices, these may be quickly identified by using a combination of locating near duplicates, using data analytics and/or applying predictive coding.
The identification of privileged documents may also be assisted by the use of predictive coding. Finally, quality control checks can be undertaken by using data analytics to detect any anomalies in the coding applied to similar documents.
What next?
The Pilot is to be extended for another year from the end of 2020. It seems likely that it will eventually be adopted, in more or less its current form, by the Business & Property Courts and potentially other divisions. Data volumes will likely continue to increase and new technologies and new ways of working will create new forms of document that may become the focus of Model C (or D) requests.
Each case will present its own challenges, but a good working knowledge of the capabilities (and limitations) of the technologies available is important in order to ensure practitioners can guide parties through the disclosure in the most defensible and cost effective manner.