From Rome, Italy: The DESI V Workshop on predictive coding, machine learning, and review in e-discovery [WITH VIDEO INTERVIEWS]

Home / Uncategorized / From Rome, Italy: The DESI V Workshop on predictive coding, machine learning, and review in e-discovery [WITH VIDEO INTERVIEWS]

 

[ per la versione italiana di questo articolo clicca qui, traduzione a cura di Valentina Agostinelli ] 

 

By: Gregory P. Bufithis, Esq., Founder/Chairman of The Project Counsel Group

(with special thanks to Ryan Costello, Esq., our Italy-based consultant, who conducted the video interviews below and who also contributed his analysis of the event)

22 July 2013 -The fifth edition of the DESI workshops was held in Rome, Italy last month.  It was part of the 14th International Conference on Artificial Intelligence and Law (ICAIL 2013) and sponsored by the Italian National Research Council (CNR).

ICAIL is promoted by the International Association for Artificial Intelligence and Law (IAAIL) of which we have been long-time members.  The IAAIL is an organization devoted to research and development in the field of artificial intelligence and law with members throughout the world. CNR is the largest public research institution in Italy, the only one under the Italian government’s Research Ministry performing multidisciplinary activities. It’s primary mission is to perform research in its own Institutes and to promote innovation and competitiveness in Italy’s industrial system.

The Rome, Italy venue could not have been better.  Our Italy offices are in Frascati, a town in the Lazio region of Italy, located about 20 kilometres (12 miles) south-east of Rome. We chose Frascati because it is closely associated with science and technology, being the location of several international scientific laboratories that include ENEA (Italian National Agency for New Technologies, Energy and Sustainable Economic Development), INFN (National Institute for Nuclear Physics) and ESA (the European Space Agency).  It is also a major media, artistic and cultural center. Well, ok.  It is also renowned for its brilliant white wine, Frascati Superiore, which might have had an influence on my decision.

But I have had a long association with Italy through my work with the fashion industry (my wife once worked for Versace) in the area of intellectual property litigation as well as the IP asset management side (brand strategy, licensing, trademark, etc.)

Plus every year we send a staffer to the “Summer School on Law and Logic”, held at the European University Institute in Florence and sponsored by Harvard Law School and by Cirsfid-University of Bologna.  This summer’s special sessions at the school are extremely timely: an introduction to the legal framework for evidence which compares the U.S. and representative European legal systems, and a series of classes on comparing/using Bayesian, narrative and argumentation-based approaches to legal presentation of a case. Spot on.

But my major legal experience in Italy involved several litigations surrounding  the Parmalat scandal, which matched the complexity and cacophony of issues I worked with in the  Société Générale/Jérome Kerviel trading scandal case.  Parmalat involved complex issues of accounting fraud, securities fraud, professional liability, vicarious liability for international accounting organizations and their member firms, etc.

The firestorm of Parmalat litigations ensued from the 2003 collapse of Parmalat in both Italy and the U.S. upon the discovery of a massive fraud that involved the understatement of Parmalat’s debt by nearly $10 billion and the overstatement of its net assets by $16 billion.  Coming shortly after the Enron and WordCom scams, the Parmalat scandal was a good opportunity to compare failures on both sides of the Atlantic.  In Italy Parmalat was called “il fallimento dei guardiani” (“the failure of the gate keepers”),  fingers pointed at the auditors, Deloitte Touche and Grant Thornton, plus the banks. Gatekeepers are substantially undeterred in Italy because of poor enforcement rather than legislative black holes. In fact, the law on the books, in particular the civil law concerning auditors, is even more severe than common law. But this under-enforcement was the reason why Parmalat generated litigation in the U.S. rather than Italy. The Italian public learned from the mass-media shortly after Parmalat’s collapse that civil actions were being launched, at a speed unthinkable for Italy, by class-action lawyers in the US, and that those actions could also involve unsuspecting Italian investors. And if you were following these events in the Italian and U.S. media you often heard Lord Denning’s dictum that “as a moth is drawn to the light, so is a litigant drawn to the United States”. There was a deep need for “boots on the ground” in Italy and since I was based in Europe and had a large listserv of Italian-based attorneys and paralegals and knew several of the U.S. law firms involved I was able to field teams for investigation and review. It gave birth to our European e-discovery document review unit.

Through its work analyzing the Parmalat case, I was introduced to the Swiss Federal Institute of Technology which is a leading player in artificial intelligence, machine learning, artificial neural networks and predictive analytics.  As I have noted in several client posts, they often collaborate with IBM Research Zurich, the European branch of “IBM Research”, in areas of computational sciences/information science and have presented numerous papers on the law and information science/information retrieval at the EU Corporate Law Making Conferences which they co-sponsor with Harvard University.

ICAIL

The whole week of ICAIL 2013 is actually quite interesting. Outside of the DESI workshop, there was a brilliant session “An Introduction to Artificial Intelligence and Law” by Kevin Ashley and Matthias Grabmair.  Kevin is Professor of Law and Intelligent Systems at the University of Pittsburgh and many of us know him for his work on ontology concepts in the field of law. Matthias is a Ph.D. candidate in the Intelligent Systems Program at the University of Pittsburgh.  In addition there were sessions on textual information extraction from legal resources, argumentation in artificial intelligence and law, plus a fabulous session ““Network Analysis in Law” which brought together researchers from computational social science, computational legal theory, network science and related disciplines in order to discuss the use and usefulness of network analysis in the legal domain.

My favorite though was a paper on IP analysis (well, I am an intellectual property nerd) titled “Identifying Patent Monetization Entities” by Mihai Surdeanu and Sara Jeruss of Lex Machina.  It describes a number of the technical details of their Patent Troll Identification System and the painstaking ways in which they evaluated it and performed error analysis (for a link to their slidedeck click here and for a link to their paper click here).  Companies are paying significant amounts to license the application and LexMachina has just come out of a Series A funding round in which they’ve fared well.

But the standout presentation was from Oahn Thi Tran who won the Donald H. Berman Award for Best Student Paper. Her paper, “Reference Resolution in Legal Texts,” was co-authored with Minh Le Nguyen and Akiri Shimazu and focused on resolving terms, definitions and provisions in Japanese language legal documents.  Oahn Thi Tran is a Vietnamese graduate student studying at the Japan Advanced Institute of Science and Technology.  She is fluent in Vietnamese, Japanese and English.  And given English is her third language, she was simply brilliant, articulate and detailed.  Spot on job.

DESI

DESI … Discovery of Electronically Stored Information … was conceived by Jason R. Baron and Doug Oard in 2007.  Doug is a professor in the college of Information Studies at the University of Maryland.  Most readers will recognize Doug’s name from the “Thinking Big” series which is part of the Oral History in the Digital Age project funded by the Institute of Museum and Library Services.    Here is a video clip of that series where Doug discusses the current state of automatic speech recognition and its applications to oral history:

 

 

Jason is … for our neuroscience and artificial intelligence listserv members who are receiving this … the Director of Litigation at the U.S. National Archives and Records Administration, plus a major cog in The Sedona Conference, plus a founding coordinator of the TREC Legal Track. And that’s just scratching the surface. For Jason’s full CV click here.

What did Jason have in mind when conceiving DESI? As he has said “I had in mind a forum that would essentially bring Picasso and Einstein together for dinner: i.e., academics with PhDs in Information Retrieval and Artificial Intelligence mixing it up with lawyers and legal service provider reps, all in an effort to advance the ball on how to introduce more advanced and sophisticated search and review methods into the ediscovery space”.

This edition of DESI was focused on standards for using predictive coding, machine learning, and other advanced search and review methods in e-discovery. The scope of methods considered included applications of automation to any aspect of e-discovery (e.g., early case assessment, review for responsiveness, or review for privilege) with the goal of improving accuracy, reducing cost, or both. The workshop was intended to build upon past discussions at ICAIL/DESI forums in promoting the use of AI and other advanced forms of search techniques in legal settings, as cost-efficient alternatives to traditional Boolean and manual searching.

It picked-up where the DESI IV Workshop in Pittsburgh left off, focusing on best practices and standards for using predictive coding, machine learning and other advanced search and review methods in e-discovery.  The Rome edition was marked by spirited discussion and enthusiastic give and take throughout the day, particularly during the lunch hour breakout session and the final afternoon panel chaired by Debra Logan of Gartner Research. As Jason reiterated several times, “we need to get information retrieval (IR) scientists and academics from a variety of disciplines in the same room as lawyers, two groups of people who don’t normally talk to one another”. With the continuing emergence of technology and advance search methods in the e-Discovery space, it’s immensely important to, as Jason says, to “foster a conversation between IR scientists who know A LOT about the subject and lawyers” in order to “share a peek under the hood” or to “look into the black box”, and find out a bit more about how these tools actually work. The goal of this exercise being, of course, to discuss, evaluate and perhaps even eventually propose standards  for the emerging field of predictive coding.

We had the opportunity to sit down with Jason for an extended interview after the event was over to discuss the background of the DESI workshop series, delve into some of the technology at issue, and even touch upon his own career and role in this space at the National Archives:

 

If indeed the goals of the Workshop were to foster discussion and pave new ground in discussing standards and potential best practices, DESI V certainly did not disappoint. The conversation ebbed and flowed throughout the day through a number of  different areas, from how to measure and how to maximize error reduction in text classification processes, as presented by Fabrizio Sebastiani of the Italian Institute of Science and Information Technology (one of the world’s leading experts in text classification); to the leading recent court decisions (Da Silva Moore and In Re Actos, among others), the transparency between parties question, and the need for best practices for the efficacy of legal review, as discussed by Keynote speakers Conor Crowley and Bill Butterfield who were also gracious enough to sit for an interview during the work shop to discuss their insights and recent collaborative paper Reality Bites: Why TAR’s Promises Have Yet to be Fulfilled:

Just as fascinating was the take presented by Simon Attfield and Lawrence Chapin on the need  for narrative in the e-Discovery process, part of their paper entitled Predicative Coding, Story Telling and God (discussed in Ralph Losey’s blog which you can access by clicking here).

 

For a link to all the papers presented at DESI V click here.

Note: Jason has done his own review of the event with some detailed comments on several of the papers presented (click here).

 

So what of the best practices and standards for the uncharted world of predictive coding and vector machine technology in e-Discovery? Where are we after DESI V? While there are a  variety of different takes and point-of-views on the matter, it does seem clear that standards could add a level of quality and process, and indeed could also be varied… there could be differing standards for vendors, established best practices for lawyers, and so on. Paraphrasing Tom Barnett, who presented his paper Similar Document Detection and Electronic Discovery: So Many Documents, So Little Time during the morning session, and as Jason touches upon in the interview above, at some level of abstraction you can have a standard that gets it right, but the key is to not “fall all over yourself” in terms of being too specific. With quality discussion, adequate cooperation in the industry and the establishment of agreed upon best practices, we might just be able to coalesce around a standard that makes sense.

 

Postscript: we also had the opportunity to sit down with our gracious host, Enrico Francesconi of the Institute of Legal Information Theory and Techniques, part of the Italian National Research Council, for a Q&A (in Italian) about the Council and his take-aways from the Conference:

 

Related Posts