What Large Language Models, Predictive AI, and Generative AI Mean for eDiscovery
April 3, 2024
By:
Large language models (LLMs) have changed how people think and talk about AI. As legal teams become increasingly open to using AI in eDiscovery, it helps to get a little more familiar with what LLMs are and what they can do.
LLMs enable two types of AI
LLMs entered the lexicon thanks to chat platforms like ChatGPT, which use LLMs to mimic human language and conversation. LLMs are also behind powerful tools for assessing responsiveness and supporting other human judgments in eDiscovery. Compared to the machine learning models which are most commonly used in TAR, LLMs are able to analyze data in a more sophisticated way, including the nuances of language.
These examples demonstrate the two types of AI enabled by LLMs: generative and predictive. Both types do exactly what their names suggest.
- Predictive AI predicts things, such as the likelihood that a document should be classified (as responsive, privileged, etc.) based on prior coding.
- Generative AI generates things, creating new content, such as answers to questions and summaries of source material (ChatGPT is an example).
Predictive AI and generative AI both have applications in eDiscovery. Neither is “better” than the other. In fact, they are most powerful when stacked together. And there is still a lot of untapped potential within each type—including how they can complement each other.
Predictive AI: established and measurable
Predictive AI can play a valuable role in classifying documents for responsiveness and other data types, including privilege, PII, PHI, trade secrets, and so on, for the scale of documents going through eDiscovery today. We have been working with predictive LLMs since 2019 and have used LLMs to analyze 2 billion documents.
Predictive AI classifiers can significantly reduce eyes-on review. Attorneys start by reviewing a sample of the full document set. Then predictive AI learns from the attorneys’ decisions and predicts how a human would classify the rest of the documents. Meticulous QC and retraining enhance the AI model as it goes.
The precision of AI predictions can be astonishing. One of the first times we ever used predictive AI for responsiveness, it proved to be 10 times more precise than the old machine learning-based models that are standard today.
But since LLMs deliver more nuanced analysis, they also require more computing power. And as with many newer technologies, it comes at a higher price. Thus, there’s a risk-reward ratio to using AI in this way: The larger the dataset, the more you gain from AI’s help.
Generative AI: compelling and still emerging
Generative AI (or “gen AI,” for short) is dominating attention right now because it’s the new kid on the block. It’s also extremely compelling—you can experience it firsthand.
But legal teams have to be smart about how they weave it into their eDiscovery work so it actually adds value. For example, clients sometimes ask us whether gen AI can form the basis of a new kind of TAR workflow for responsiveness. Technically, the answer is yes, but gen AI hasn’t been optimized for this use case, so the cost is not necessarily worth the investment.
To see where (and whether) gen AI can add value, it’s important to identify and test pragmatic use cases. With gen AI, we are focused on an AI-powered privilege log builder. When tested on real matter data, outside counsel found the AI-generated log lines were 12% more accurate than log lines written by third-party contract reviewers.
Integrating predictive and generative AI in eDiscovery
Present and future use cases will include opportunities for predictive AI and gen AI to feed outputs into one another.
We are already seeing this in the area of privilege detection and logging. We can use a predictive AI model to predict privilege classifications for a set of documents (based on a sample reviewed by attorneys, as above). Documents classified as privileged can then be fed into a gen AI model that drafts a privilege log description, based on intricate analysis of the documents and understanding of the privilege parameters.
Someday, the sequence may be able to flow the other direction, where outputs from gen AI get fed into a predictive model. For example, you might use gen AI to create hypothetical documents of a certain type. These documents would then serve as the seed set for predictive AI to study and use as a basis for identifying other documents that are the same type.
A critical component of using and integrating these technologies will continue to be the experts who know how to implement technology into review workflows to unleash its highest value. The technology on its own will not bring the results and value that are needed in eDiscovery.
As we build AI-powered solutions, we make sure we can answer the following questions. We encourage you to ask them as well, as you evaluate what AI will mean for your eDiscovery practice:
- How do we know the model performed well?
- How do we know that this was a good review?
- How defensible is the model and workflow?
For a deeper look at how we’re using predictive and generative AI at Lighthouse, check out our AI-Powered Privilege Review solution.