Beyond Relevance: Finding Evidence in a Fraction of the Time
Lighthouse goes beyond linear review to help a global technology company make its case to the IRS.
Download the case studyKey Actions
- Targeting critical case documents with Key Document Identification rather than performing linear review on the whole document set.
- Identifying key events that took place within specific hours, by applying advanced linguistic modelling to overcome challenges presented by multiple time zones and different time stamp formats within email traffic.
Key Results
- 1.5 million total documents reduced to roughly 37,500.
- Results in 100-500% less time and at 90-240% lower cost than linear review.
Building a Case for Tax-Exempt Lunches
A global technology company was facing IRS scrutiny over the complementary lunches the company provided to staff. Full-time workers were comped the meals because, the company claimed, staff were required to respond to emergencies during lunch hours. The IRS was dubious of that claim and inclined to consider the lunches a taxable benefit.
To prevent the meals from being taxed, the company needed to demonstrate to the IRS that, over a two-year period, at least 50% of employees at its San Francisco office had in fact responded to an emergency between the hours of 11 a.m. and 2 p.m. local time. For evidence, the company had 1.5 million documents—mostly emails—pertaining to about 1,000 employees.
The company reached out to Lighthouse for help finding the best case-building documents within those 1.5 million. Lighthouse offered its Key Document Identification service. Rather than prioritize documents for linear review, the Lighthouse team promised to identify the most valuable and evidential documents—and do so in less time and at a lower cost.
Hacking Through the Haystack
The Lighthouse team eliminated less-valuable documents in stages. First, they used an advanced algorithm to remove junk and duplicative documents, reducing the document set to 943,000 (a 38% reduction). Among those, the team targeted San Francisco employee names and emails, which brought the total down to 484,000 (an additional 49% reduction).
From here, the team employed nuanced, multi-layered linguistic search techniques to zero in on the most necessary and informative documents. Along the way, Lighthouse encountered a number of challenges that would have thwarted other search tools and teams.
One of these was the knot of different time stamps attached to emails: the last in time email in every thread was converted to Coordinated Universal Time (UTC), while every previous email in the thread was stamped according to the local time zone of the sender. The Lighthouse team circumvented this by searching the emails’ metadata, which converted all times to UTC. Using this metadata, the team was able to search using a single timeframe (6 to 9 p.m. UTC, corresponding with 11 a.m. to 2 p.m. Pacific).
Another challenge was looping together all emails stemming from the same incident, so that Lighthouse could provide the company with a complete account of each emergency response (and avoid counting a given emergency more than once). The team did this by flagging one email tied to a specific emergency and using proprietary threading technology to propagate that flagging to all other emails associated with that emergency.
Finally, the Lighthouse team had to classify documents by level of emergency, to help the company build the strongest case. The emergency level of some documents was already classified, thanks to a system installed by the company toward the end of the two years under investigation. But for the majority of documents, it was unknown. Lighthouse was able to classify them using advanced search features of proprietary technology, which identified key terms like “time-sensitive” and other ways emergencies were referenced in the document population.
Major Savings and Critical Insights
In only two weeks, a two-person team delivered on Lighthouse’s promise to help the company gather evidence, shrink the document population, and save time and money. Had the company tried to build a case with linear review instead, it would have taken up to 5 times longer and cost up to twice as much.
Of the 1.5 million total documents, Lighthouse escalated approximately 37,500 (2.5% of the original dataset). To help with case building, the team sorted documents into three tiers of descending priority: employees responding to high-level emergencies during the lunch hour, employees responding to any level of emergency during the lunch hour, and employees responding to high-level emergencies at any time in the day. The Lighthouse team also normalized the metadata for all documents to make it easy for company counsel to see which employees were involved in each document and thread.
Across the three tiers:
78% of San Francisco employees were tied to at least one document
74% were tied to at least one non-propagated document (i.e., an email associated with a unique emergency)
68% were the sender of at least one non-propagated document
This strongly suggested that more than 50% of employees actively responded to emergencies in the target timeframe and helped counsel hit the ground running in collecting the facts to prove it.