As originally published in Forbes.
Co-authored by Amanda Jones & Ben Kerschberg
If 2011 was the year of technology-assisted document review, 2012 will be the year of re-humanizing technology-assisted review at its most strategic points. Going forward, the focus will be not only on the foundational role humans play in guiding document assessment, but also on the role human expertise can play during the earliest stages of case strategy development and later during optimization of the review process. During these phases, experts from various fields may serve as a vital extension of the legal team, providing critical perspectives that legal subject matter experts alone may not possess.
The past year’s most seminal article on technology-assisted review (commonly known as “automated document classification” or “predictive coding”) was Maura Grossman and Gordon Cormack’s law review piece, which effectively debunked the notion that manual review offers an unimpeachable gold standard. The authors succinctly summarized their statistically validated findings as follows:
This article offers evidence that . . . technology-assisted processes, while indeed more efficient, can also yield results superior to those of exhaustive manual review, as measured by recall and precision.
Maura R. Grossman & Gordon Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective And More Efficient Than Exhaustive Manual Review, XVII Rich. J.L. & Tech 11 (2011). Anne Kershaw and Joe Howie agree. In a survey of 11 e-discovery vendors who use technology-assisted review in the form of predictive coding, they found not only that technology-assisted review outpaced their aptly termed “brute force [human] linear review of electronic data,” but also technologies that have been used in the not-so-distant past. They write:
The results report that, on average, predictive coding saved 45% of the costs of normal review – beyond the savings that could be obtained by duplicated consolidation and email-threading. Seven respondents reported that in individual cases the savings were 70% or more.
Anne Kershaw & Joe Howie, Crash or Soar: Will The Legal Community Accept “Predictive Coding?” (Law Technology News Oct. 2010).
From a purely pragmatic standpoint, the volume of electronically stored information now doubles every 18-24 months. Forrester Research maintains that 70% of e-discovery costs are spent on processing, analysis, review, and production. These costs are not abating. Moreover, reducing costs isn’t just a monetary concern, but also a strategic one. As Chris Dale points out, if technology-assisted review “can save significant costs without significantly reducing accuracy then the burden falls on its opponents to point out its flaws.” Chris Dale, Having The Acuity to Determine Relevance with Predictive Coding (e-Disclosure Information Project Oct. 15, 2010).
Methodology, Not Technology: The Courts and Technology-Assisted Review
Given the significant benefits that technology-assisted review can bring to e-discovery from the dual perspectives of quality and cost, expert commentators have asked a key question: Why isn’t everyone using it? Of the 11 e-discovery vendors surveyed by Kershaw & Howie, “[t]he most mentioned reason cited by respondents was uncertainty or fear about whether judges will accept predictive coding.” Kershaw & Howie, Crash or Soar (supra).
In October, U.S. Magistrate Judge Andrew Peck of the Southern District of New York attempted to put counsel at ease. See Andrew Peck,Search, Forward (Law Technology News Oct. 1, 2011). After an exhaustive review of the manner in which keyword searches have been subjected to judicial critiques of methodology, he acknowledged that the nature of more advanced technology-assisted review presents a different paradigm for the courts. Judge Peck advised counsel to refer first to the highly respected Sedona Cooperation Proclamation. If cooperation does not result in accord between litigants, they should then “go to the court for advanced approval.” Id. On a macro level, Judge Peck further advised:
Until there is a judicial opinion approving (or even critiquing) the use of predictive coding, counsel will just have to rely on this article as a sign of judicial approval. In my opinion, computer-assisted coding should be used in those cases where it will help “secure the just, speedy, and inexpensive” determination of cases in our e-discovery world.
Id. (citing Fed. R. Civ. P. 1).
Judge Peck’s article precipitated a flurry of commentary all but proclaiming judicial acceptance of technology-assisted review. While Judge Peck’s article falls short of judicial approval as a matter of law, e-discovery counsel are sure to cite the expert personal opinion of a leading magistrate in the electronic discovery community, just as they cite other judges’ scholarship for equally non-binding but persuasive statements.
Judge Peck did far more than just provide an attention-grabbing statement. He turned back the clock to highlight seminal opinions by Judges Grimm and Facciola, as well as his own “wake up call” to the Bar in William A. Gross Constructions Associates, Inc. v. American Manufacturers Mutual Insurance, 256 F.R.D. 134 (S.D.N.Y. 2009), in order to analogize those keyword search-focused opinions to the technology-assisted review issues now confronting the courts. Judge Peck thereby accomplished at least two goals: (i) he emphasized once again the need to focus on methodology, as opposed to technology, and (ii) he effectively laid the groundwork upon which counsel can base legal arguments as to the reasonableness of technology-assisted review. While not binding, the article is certainly worthy of close attention.
Fundamentally, the focus always has been—and remains—methodology. A detailed analysis of these cases, while beyond the scope of this piece, has been nicely written by H. Christopher Boehning and David Toal in Wake-up Call on Slipshod Search Terms (Law.com Apr. 29, 2009). See also William A. Gross Constr. Assocs. (admonishing counsel for formulating keywords “by the seat of their pants” and calling for careful deliberation, quality control, and testing); Victor Stanley, Inc. v. Creative Pipe, Inc., 256 F.R.D. 134 (S.D.N.Y. 2009) (Grimm, M.J.) (criticizing litigants’ inability to explain why keywords were chosen and whether the methodology had been tested for reliability); United States v. O’Keefe, 37 F. Supp.2d (D.D.C. 2008) (Facciola, M.J.) (demanding that the government describe how it defined its defective keyword methodology and deeming the formulation of reasonable keyword search terms as complicated enough to require “at least” the cooperation of linguists, statisticians, and computer scientists); id. (opining that the sufficiency of keyword terms “is clearly beyond the ken of a layman” and requires counsel “truly to go where angels fear to tread”); Equity Analytics, LLC v. Lundin, 248 F.R.D. 331 (D.D.C. 2008) (Facciola, M.J.).
Re-Humanizing Technology-Assisted Review
In light of (i) the independent findings of Grossman & Cormack and Kershaw & Howie, respectively, challenging the efficacy of manual review and (ii) growing support from judges for technology-assisted review, it would appear that the technology is here to stay. E-discovery data sets are too large; litigation deadlines too demanding; and judicial scrutiny too rigorous to sustain traditional approaches to review.
The question thus becomes: What are the new roles and responsibilities for human expertise in this paradigm? The answer is that humans will continue to apply their insights and intelligence strategically to guide the technology. Automated document review technology is a tool like any other with potential that cannot be realized fully without the worldly knowledge and creativity that only humans can bring to bear in solving complex problems.
Statistical algorithms for text classification are capable of amazing feats when it comes to detecting and quantifying meaningful patterns amongst large data sets, but they are not capable of making the type of subjective qualitative assessments that constitute the art of discovery.
Chris Dale aptly points out that “[n]one of this technology solves the problem on its own. It needs a brain, and a legally trained brain at that . . . to [meet] the clients’ objective . . . [of] disposing of a dispute in the shortest time by the most cost-effective method.” Chris Dale,Having The Acuity (supra); see Fed. R. Civ. P. 1 (“These rules . . . should be construed and administered to secure the just, speedy, and inexpensive determination of every action and proceeding.”).
Accordingly, humans will continue to define the methodology deemed so critical in the judicial guidance discussed above. For defensibility considerations, it will be less important to dissect the technology than it will be to scrutinize the ongoing involvement of experts—e.g., lawyers, linguists, and statisticians—who must attempt to optimize technology-assisted review to (i) maximize precision and recall, (ii) find the appropriate balance between the two, and (iii) ensure that technology-generated results meet the unique demands of a given matter, regardless of what the quantitative picture alone may indicate.
Statisticians and Linguists
Only attorneys can make the type of subjective determinations required for assessment of proportionality and reasonableness in e-discovery. They also play an essential role in guiding the assessments of any technology-assisted review; they are typically the sole source of coding decisions for training sets; and they are ultimately responsible for certifying the quality of the review’s results. In this sense, their active involvement forms the bedrock upon which every aspect of automated classification is built and validated.
However, relying upon technology and legal and subject matter knowledge alone—without the support of any additional expertise—will rarely allow attorneys to achieve the best possible results, and it may weaken the overall defensibility of the approach. Given that most technology-assisted review is founded on statistical algorithms and linguistic pattern detection, empowering these systems with the expertise of linguists and statisticians results in much greater flexibility and often higher quality and more readily defensible results in less time. It also enables a more effective allocation of resources, since statisticians and linguists can develop protocols for attorneys’ sampled reviews, perform in-depth data analyses, generate reports and summaries of findings, and implement innovative solutions that would, at best, be distractions for attorneys, who should ideally be free to focus their attention on case strategy. With each team member playing to his or her talents and training, the review effort realizes greater efficiency, higher quality results, and reduced production time and costs.
Statisticians, in addition to serving as a resource for the generation of sound performance metrics, provide a wealth of data-mining tools and techniques that can be utilized to supplement and enhance built-in classification algorithms for more tailored results. Linguists, meanwhile, have specialized analytic skills that make them especially well-suited to the task of leveraging patterns in language to expedite and improve the quality of document classification. Both linguists and statisticians bring unique perspectives and a rich set of tools to the automated document classification process that provide attorney teams with options and alternatives from which they may not otherwise benefit.
To illustrate the roles these experts can play to broaden attorney perspectives and contribute to a more robust technology-assisted review, consider the fact that, for any matter, there will be multiple sub-topics that will be considered relevant, some of which will be much more prevalent than others. Variation in the availability of different types or categories of relevant information in a data population can have a serious impact on the performance of technology-assisted review systems, which depend upon sampling and statistics-based machine learning, especially if the importance of the information is inversely correlated with its abundance. Bringing specialized human insight to bear to address these challenges can be instrumental for ensuring a positive outcome for the technology-assisted review.
For example, in a manufacturing product liability case, the review population and associated training samples could be expected to contain numerous responsive documents involving routine quality reports, but relatively few “whistleblower” documents where individual employees report concerns about the adequacy of a product’s safety features. The rare whistleblower documents would contain important linguistic patterns marking responsiveness, but those patterns may involve idiosyncratic or elliptical language that the statistical model could miss due to insufficient example data. Linguists or statisticians engaged in targeted analysis of preliminary modeling outputs can isolate instances where the system failed to detect particular patterns of responsiveness resulting from a paucity of input and take corrective action to ensure that such patterns would not be missed in the final result set.
Considering further the type of “rare event” documents described above, linguists and statisticians would certainly take steps to train the system to recognize these materials more readily. However, these documents are often by their very nature idiosyncratic and difficult to generalize based on statistical frequencies alone. Important documents of this type, though, present an ideal opportunity for the application of linguistic modeling techniques. Linguistic modeling offers more flexibility and greater precision for targeting special topics of particular interest that are low in frequency but high in importance. In this way, linguists and statisticians, collaborating closely with attorneys, can offer additional assurance that the most critical documents in their review will be discovered, even when relying upon an expedited technology-assisted approach to review.
Finally, the modeling techniques and algorithms that perform best for any given matter will vary, but it is often the case that multiple inputs generate outcomes that are superior to results generated by any single algorithm. Identifying which techniques to utilize and the specific weighting principles that will be used to synthesize them for final results generation requires special skills and on-demand experimentation. A team that includes statisticians and linguists will have the proper resources to engage in this type of real-time analysis for fully optimized results, whereas an attorney team alone may not.
Toward A More Legally Defensible E-Discovery Methodology
Ideally, technology-assisted review should be viewed as a methodology predicated not only on the input and direction of legal and subject matter experts, but also on the expertise of linguists and statisticians. Their combined contributions result in methodologies that more effectively satisfy the requirements of the Federal Rules of Civil Procedure and Evidence and established case law, thereby resulting in a more defensible process. Moreover, linguists and statisticians add unparalleled flexibility. A standalone technology-assisted review system is inherently static in terms of the classification tools and algorithms it offers, whereas human beings are endlessly creative and adaptive. Linguists and statisticians have a wide array of tools and techniques they can use to build upon the core output of these systems in order to craft dynamic customized solutions tailored to the demands of each unique matter for more nuanced, higher-quality results. Utilizing linguistic and statistical expertise to drive situation-specific innovation in a technology-assisted review can also lift the technical burden from attorneys’ shoulders, resulting in significant time and cost savings and enhanced defensibility. Attorneys who work with an appropriate team of experts will never want for support and options, while attorneys working with technology alone may find themselves wishing for more when presenting their methodology to the court.