Editor: Please share with us your role at Xerox Litigation Services.
Jones: I am a senior search consultant with Xerox Litigation Services, the e-discovery division of Xerox. We help clients address complex e-discovery challenges – especially document review and production – utilizing a range of novel technologies and services designed to create efficiencies and cost savings throughout these processes.
In my role, I’ve had the opportunity to design and validate e-discovery strategies for our clients that maximize the efficacy of document review projects by leveraging the most effective search and text classification tools available for their specific matters. We operate in a very crowded market, but there are innovative approaches that are evolving to help clients manage their document review projects.
Editor: Please tell us more about how the market is advancing and some of the innovations that are on the rise to help organizations better manage costs.
Jones: Technology-assisted review, also known as “predictive coding,” is one of the newer approaches. It’s a trend that’s been evolving for years – in fact, we introduced our own automated document classification tool, CategoriX, over two years ago. Many of our clients opted to use it then, and many more are using it now for their high-volume reviews. As you know, technology-assisted review holds the promise of transforming the economics of e-discovery in certain types of matters, and it’s an approach our clients are embracing to reduce review costs in both government investigations and civil litigation.
Our CategoriX process draws upon the expertise of a limited number of attorneys and/or subject-matter experts to review and code a small sample of documents for iterative training and testing of the system. The software then extrapolates assessments across the entire review population, ranking each document according to its likely responsiveness.
The advantages of using technology-assisted review have proven significant across a variety of different production situations, some involving millions of documents that must be assessed within very tight timeframes. The CategoriX technology and process provides ways to prioritize documents for review that can be much faster and more accurate than traditional “brute force” manual review. That alternative typically would involve armies of contract reviewers looking at every document in no particular order – a far less efficient and cost-effective approach, and one that is notoriously prone to errors and inconsistencies.
However, depending on the specific attributes of the matter, we often will employ advanced search and analysis methods as well, including sophisticated keyword searching, advanced metadata analysis and near-dupe detection, to help clients move through an automated review process faster and with greater accuracy.
Editor: You mentioned keyword searching. There’s a perception that technology review replaces other tools and methods, including keyword search. Are these tools mutually exclusive?
Jones: We don’t believe that technology-assisted review is a substitute for predecessor technologies and methodologies. Rather, it represents a new alternative that is appropriate for some cases and most often can be complementary to other established techniques, such as keyword search. Discarding search entirely in favor of technology-assisted review would unnecessarily forfeit the many benefits search has to offer.
The goal in any e-discovery review is to identify as many responsive documents as possible, while reviewing as few non-responsive documents as possible, at a cost proportionate to the value of the case. Review efficacy is measured by the information retrieval metrics known as recall and precision. Recall represents the extent to which all responsive materials are captured in a review. Precision represents the extent to which only responsive documents are captured. Thus, recall is a measure of completeness, while precision is a measure of accuracy. While perfect recall and precision are impossible to achieve, a strategic review will strive to attain high scores on both metrics simultaneously in order to ensure that clients maximize the return on their review investment.
To achieve this objective, we use the tool or set of tools most appropriate for an individual matter based on multiple factors, such as type of case, volume of data, budget and timelines. Often, this means utilizing keyword search strategically in conjunction with technology-assisted review.
In fact, Judge Peck of the Southern District of New York affirmed the use of technology-assisted review for particular cases in his recent opinion in Da Silva Moore, et al., v. Publicis Groupe, et al. The court also acknowledged in this opinion that keywords, along with predictive coding, can be very instructive. In this particular case, keywords were used to enrich the training population and to ensure the adequacy of subject-matter coverage for key topics of particular interest to the attorney teams. Thus, technology-assisted review should not be viewed as a replacement for other methods, such as keyword searching, but rather as an additional tool that can be employed to improve the effectiveness of a document review effort when the circumstances are appropriate.
Editor: Where in the technology-assisted review process can keyword searching prove useful?
Jones: In our experience, keyword searching can be instrumental in the culling phase of any document review effort, including technology-assisted review. During culling, utilizing thoughtfully developed and thoroughly tested keyword searches can help save significant downstream hosting costs by eliminating patently off-topic documents that don’t hit on the searches. Eliminating non-responsive documents also helps to ensure that the technology is operating over a population that is rich in the relevant materials that are critical for effective training.
To that point, most technology-assisted review systems can be trained more quickly and effectively when the document collection is rich in relevant examples from which to learn, and when there is less “noise” and more positive examples for the system’s algorithms to leverage.
There are other scenarios as well, in which utilizing keyword searching as part of an automated review can optimize results. One scenario is when there are extremely rare but very important topics represented in the collection. Keyword searching can be used to target those documents in a way that random sampling alone would not, enhancing the system’s ability to recognize and retrieve this critical content.
Keyword searching also can be used strategically to capitalize on the explicit knowledge and insights of the humans working on the case. The statistical algorithms utilized by most technology-assisted review systems capitalize on the latent linguistic patterns correlated with responsiveness in the review population. Humans typically cannot access those patterns. Keywords, on the other hand, offer ways to incorporate valuable and uniquely human perspectives.
Editor: What would be an example?
Jones: Say there’s a product liability case in which a company is interested in uncovering information that may reveal root causes for a product defect. If responsive documents that offer insights into product marketing and product testing dramatically outnumber the relevant documents that relate to product design, it is possible that the technology-assisted review system will have much greater difficulty with accurately recognizing the crucial product design documents. Targeted keyword search can help unearth these documents, though, and counter the statistical biases inherent in the baseline distribution of this subject matter across the population.
Editor: You mentioned Judge Peck’s recent Da Silva Moore opinion affirming the use of technology-assisted review. What does it say about the role of keyword searching?
Jones: Da Silva Moore says a lot about how keyword searching can be applied to various parts of a technology-assisted review. In that particular matter, search was employed in developing the initial seed set for training the system. The process began with the attorneys developing an understanding of the entire email collection while identifying a small number of documents, the initial seed set, which represented the categories to be reviewed and coded. This was done by use of search and analytical tools, including keyword, Boolean and concept search and concept grouping. The documents identified through keyword searches provided documents for the expanded seed set to train the technology.
The defendants also employed a number of targeted searches to locate documents responsive to several of the plaintiffs’ specific discovery requests. This is one application we systematically employ with our clients during any review process, including technology-assisted review. So there is solid legal precedent for employing keyword searching as part of a technology-assisted review and strong practical advantages as well.
Editor: What types of competencies should parties look for when employing keyword search as part of technology-assisted review?
Jones: Keyword search has to involve the right combination of people, processes and technology – which is called for by the courts and is part of any defensible methodology – whether search is used as the sole approach or in conjunction with a technology-assisted review process. A thoughtfully crafted approach includes attorneys and subject-matter experts knowledgeable about the particular matter, and this approach can be supplemented by technical professionals with specialized search skills.
Leveraging the expertise of linguists and statisticians, for instance, can be advantageous in a number of ways. First, they can assist in the design and articulation of a principled process that is iterative, includes testing and QC and provides measurements that statistically validate the reliability and quality of search results. Second, they can improve the quality of the review results by bringing the specialized tools and techniques to bear for customized innovative approaches to clients’ unique search challenges. And finally, they can provide detailed documentation explaining which search tactics are used and why – including key parameters, inputs, decisions and results to demonstrate that the process and output is consistent and replicable – lifting this burden from the shoulders of the attorneys and making it possible for clients to achieve higher-quality results more quickly and cost-effectively.
Editor: How do you see the use of keyword search evolving as part of technology-assisted review?
Jones: Technology-assisted review has now received formal support from the courts for certain types of matters, and I think it’s now a matter of litigants understanding that it is not an either/or situation when it comes to keyword searching. If applied correctly, keyword searching will continue to have an important role in the e-discovery toolbox.
E-discovery is complicated, and it is continuing to change every day. Even when counsel is very familiar with electronic information issues, it helps to have technical experts involved in the process to be able to explain complicated processes in ways that are understandable to the courts. This is what Judge Peck jokingly referred to in Da Silva Moore as “bring your geek to court day.”
I think life will remain extremely gratifying and rewarding for everyone working in this field as long as they are willing to think creatively and devise new and ingenious ways to exploit any and all of the tools available to manage e-discovery challenges – including technology-assisted review, keyword search and whatever comes next. We are far from finished when it comes to innovative information retrieval in e-discovery, and we are looking forward to continuing to partake in that dialogue.
Published March 19, 2012.