Lighthouse Blog

Read the latest insights from industry experts on the rapidly evolving legal and technology landscapes with topics including strategic and technology-driven approaches to eDiscovery, innovation in artificial intelligence and analytics, modern data challenges, and more.

Get the latest insights

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Filter by trending topics
Select filters
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Blog

Preparing for Big Data Battles: How to Win Over AI and Analytics Naysayers

Artificial intelligence (AI), advanced analytics, and machine learning are no longer new to the eDiscovery field. While the legal industry admittedly trends towards caution in its embrace of new technology, the ever-growing surge of data is forcing most legal professionals to accept that basic machine learning and AI are becoming necessary eDiscovery tools.However, the constant evolution and improvement of legal tech bestow an excellent opportunity to the forward-thinking eDiscovery legal professional who seeks to triumph over the growing inefficiencies and ballooning costs of older technology and workflow models. Below, we’ll provide you with arguments to pull from your quiver when you need to convince Luddites that leveraging the most advanced AI and analytics solutions can give your organization or law firm a competitive and financial advantage, while also reducing risk.Argument 1: “We already use analytical and AI technology like Technology Assisted Review (TAR) when necessary. Why bring on another AI/analytical tool?”Solutions like TAR and other in-case analytical tools remain worthwhile for specific use cases (for example, standalone cases with massive amounts of data, short deadlines, and static data sets). However, more advanced analytical technology can now be used to provide incredible insight into a wider variety of cases or even across multiple matters. For example, newer solutions now have the ability to analyze previous attorney work product across a company’s entire legal portfolio, giving legal teams unprecedented insight into institutional challenges like identifying attorney-client privilege, trade secret information, and irrelevant junk data that gets pulled into cases and re-reviewed time and time again. This gives legal teams the ability to make better decisions about how to review documents on new matters.Additionally, new technology has become more powerful, with the ability to run multiple algorithms and search within metadata, where older tools could only use single algorithms to search text alone. This means that newer tools are more effective and efficient at identifying critical information such as privileged communications, confidential information, or protected personal information. In short, printing out roadmap directions was advanced and useful at the time, but we’ve all moved on to more efficient and reliable methods of finding our way.Argument 2: “I don’t understand this technology, so I won’t use it” This is one of the easiest arguments to overcome. A good eDiscovery solution provider can offer a myriad of options to help users understand and leverage the advances in analytics and AI to achieve the best possible results. Whether you want to take a hands-off approach and have a team of experts show you what is possible (“Here are a million documents. Show me all the documents that are very likely to be privileged by next week”), or you want to really dive into the technology yourself (“Show me how to use this tool so that I can delve into the privilege rate of every custodian across multiple matters in order to effectuate a better overall privilege review strategy”), a quality solution provider should be able to accommodate. Look for providers that offer training and have the ability to clearly explain how these new technologies work and how they will improve legal outcomes. Your provider should have a dedicated team of analytics experts with the credentials and hands-on experience to quell any technology fears. Argument 3: “This technology will be too expensive.”Again, this one should be a simple argument to overcome. The efficiencies that the effective use of AI and analytics achieve can far outweigh the cost to use it. Look for a solution provider that offers a variety of predictable pricing structures, like per gig pricing, flat fee, fees generated by case, fees generated across multiple cases, or subscription-based fees. Before presenting your desired solution to stakeholders, draft your battle plan by preparing a comparison of your favored pricing structure vs. the cost of performing a linear review with a traditional pricing structure (say, $1 per doc). Also, be sure to identify and outline any efficiencies a more advanced analytical tool can provide in future cases (for example, the ability to analyze and re-use past attorney work product). Finally, when battling against risk-averse stakeholders, come armed with a cost/benefit analysis outlining all of the ways in which newer AI can mitigate risk, such as by enabling more accurate and consistent work product, case over case.ai-and-analyticsanalytics, ai-big-data, blog, ai-and-analytics,analytics; ai-big-data; bloglighthouse
AI and Analytics
Blog

Document Review: It’s Not Location, Location, Location. It’s Process, Process, Process.

Much of the workforce has been forced into remote work due to social distancing requirements because of the pandemic, and that includes the workforce conducting services related to electronic discovery. Many providers have been forced into remote work for services including collection and review. Other providers have been already conducting those services remotely for years, so they were well prepared to continue to provide those services remotely during the pandemic.Make no mistake, it’s important to select a review provider that has considerable experience conducting remote reviews which extends well before the pandemic. Not all providers have that level of experience. But the success of your reviews isn’t about location, location, location; it’s about process, process, process — and the ability to manage the review effectively regardless of where it’s conducted. Here are four best practices to make your document reviews more efficient and cost effective, regardless of where they’re conducted:Maximize culling and filtering techniques up front: Successful reviews begin with identifying the documents that shouldn’t be reviewed in the first place and removing them from the document collection before starting review. Techniques for culling the document collection include de-duplication and de-nisting and identification of irrelevant domains. But it’s also important to craft a search that maximizes the balance between recall and precision to exclude thousands of additional documents that might otherwise be needlessly reviewed, saving time and money during document review.Combine subject matter and best practice expertise: Counsel understands the issues associated with the case, but they often don’t understand how to implement sophisticated discovery workflows that incorporate the latest technological approaches (such as linguistic search) to maximize efficiency. It’s important to select the provider that knows the right questions to ask to combine subject matter expertise with eDiscovery best practices to ensure an efficient and cost-effective review process. It’s also important to continue to communicate and adjust workflows during the case as you learn more about the document collection and how it relates to the issues of the case.Conduct search and review iteratively: Many people think of eDiscovery document review as a linear process, but the most effective reviews today are those that implement an iterative process that that interweave search and review to continue to refine the review corpus. The use of AI algorithms and expert-designed linguistic models to test, measure and refine searches is important to achieve a high accuracy rate during review, so remember the mantra of “test, measure, refine, repeat” for search and review to maximize the quality of your search and review process.Consider producing iteratively, as well: Discovery is a deadline driven process, but that doesn’t mean you have to wait for the deadline to provide your entire production to opposing counsel. Rolling productions are common today to enable producing parties to meet their discovery obligations over time, establishing goodwill with opposing counsel and demonstrating to the court that you have been meeting your obligations in good faith along the way if disputes occur. Include discussion of rolling productions in your Rule 26(f) meet and confer with opposing counsel to enable you to manage the production more effectively over the life of the project.You’re probably familiar with the famous quote from The Art of War by Sun Tzu that “every battle is won or lost before it is ever fought,” which emphasizes the importance of preparation before proceeding with the task or process you plan to perform. Regardless where your review is being conducted, it’s not the location, location, location that will determine the success of your review, but the process, process, process. After all, it’s called “managed review” for a reason!ediscovery-reviewblog, -document-review, ediscovery-review,blog; document-reviewlighthouse
eDiscovery and Review
Blog

Building Your Case for Cutting-Edge AI and Analytics in Five Easy Steps

As the amount of data generated by companies exponentially increases each year, leveraging artificial intelligence (AI), analytics, and machine learning is becoming less of an option and more of a necessity for those in the eDiscovery industry. However, some organizations and law firms are still reluctant to utilize more advanced AI technology. There are different reasons for the reluctance to embrace AI, including fear of the learning curve, uncertainty around cost, and unknown return on investment. But where this is uncertainty, there is often great opportunity. Adopting AI provides an excellent opportunity for ambitious legal professionals to act as the catalysts for revitalizing their organization’s or law firm’s outdated eDiscovery model. Below, I’ve outlined a simple, five-step process that can help you build a business case for bringing on cutting-edge AI solutions to reduce cost, lower risk, and improve win rates for both organizations and law firms.Step 1: Find the Right Test CaseYou will want to choose the best possible test case that highlights all the advantages that newer, cutting-edge AI solutions can provide to your eDiscovery program.One of the benefits of newer solutions is that they can be utilized in a much wider variety of cases than older tools. However, when developing a business case to convince reluctant stakeholders – bigger is better. If possible, select a case with a large volume of data. This will enable you to show how effectively your preferred AI solution can cull large volumes of data quickly compared to your current tools and workflows.Also try to select a case with multiple review issues, like privilege, confidentiality, and protected health information(PHI)/personally identifiable information (PII) concerns. Newer tools hitting the market today have a much higher degree of efficiency and accuracy because they are able to run multiple algorithms and search within metadata. This means they are much better at quickly and correctly identifying types of information that would need be withheld or redacted than older AI models that only use a single algorithm to search text alone.Finally, if possible, choose a case that has some connection to, or overlap with, older cases in your (or your client’s) legal portfolio. For a law firm, this means selecting a case where you have access to older, previously reviewed data from the same client (preferably in the same realm of litigation). For a corporation, this just means choosing a case, if possible, that shares a common legal nexus, or overlapping data/custodians with past matters. This way, you can leverage the ability that new technology has to re-use and analyze past attorney work product on previously collected data.Step 2: Aggregate the Data Once you’ve selected the best test case, as well as any previous matters from which you want to analyze data, the AI solution vendor will collect the respective data and aggregate it into a big data environment. A quality vendor should be able to aggregate all data, prior coding, and other key information, including text and metadata into a single database, even if the previously reviewed data was hosted by different providers in different databases and reviewed by different counsel.Step 3: Analyze the Data Once all data is aggregated, it’s time for the fun to begin. Cutting-edge AI and machine learning will analyze all prior attorney decisions from previous data, along with metadata and text features found within all the data. Using this data analysis, it can then identify key trends and provide a holistic view of the data you are analyzing. This type of powerful technology is completely new to the eDiscovery field and something that will certainly catch the eye of your organization or your clients.Step 4: Showcase the Analytical ResultsOnce the data has been analyzed, it’s time to showcase the results to key decision makers, whether that is your clients, partners, or in-house eDiscovery stakeholders. Create a presentation that drills down to the most compelling results, and clearly illustrates how the tool will create efficiency, lower costs, and mitigate risk, such as:Large numbers of identical documents that had been previously collected, reviewed, and coded non-responsive multiple times across multiple mattersLarge percentages of identical documents picked up by your privilege screen (and thus, thrust into costly privilege re-review) that have actually never been coded privilege in any matterLarge numbers of identical documents that were previously tagged as containing privilege or PII information in past matters (thus eliminating the need for review for those issues in the current test case).Large percentages of documents that have been re-collected and re-reviewed across many mattersStep 5: Present the Cost ReductionYour closing argument should always focus on the bottom line: how much money will this tool be able to save your firm, client, or company? This should be as easy as taking the compelling analytical results above and calculating their monetary value:What is the monetary difference between conducting a privilege review in your test case using your traditional privilege screen vs. re-using privilege coding and redactions from previous matters?What is the monetary difference between conducting an extensive search for PII or PHI in your test case, vs. re-using the PII/PHI coding and redactions from previous matters?How much money would you save by cutting out a large percent of manual review in the test case due to culling non-responsive documents identified by the tool?How much money would you save by eliminating a large percentage of privilege “false positives” that the tool identified by analyzing previous attorney work product?How much money will you (or your client) save in the future if able to continue to re-use attorney work product, case after case?In the end, if you’ve selected the right AI solution, there will be no question that bringing on best-of-breed AI technology will result in a better, more streamlined, and more cost-effective eDiscovery program.ai-and-analyticsanalytics, ai-big-data, data-re-use, phi, pii, blog, ai-and-analytics,analytics; ai-big-data; data-re-use; phi; pii; bloglighthouse
AI and Analytics
Blog

Advanced Analytics – The Key to Mitigating Big Data Risks

Big data sets are the “new normal” of discovery and bring with them six sinister large data set challenges, as recently detailed in my colleague Nick’s article. These challenges range from classics like overly broad privileged screens, to newer risks in ensuring sensitive information (such as personally identifiable information (PII) or proprietary information such as source code) does not inadvertently make its way into the hands of opposing parties or government regulators. While these challenges may seem insurmountable due to ever-increasing data volumes (and also tend to keep discovery program managers and counsel up at night) there are new solutions that can help mitigate these risks and optimize workflows.As I previously wrote, eDiscovery is actually a big data challenge. Advances in AI and machine learning, when applied to eDiscovery big data, can help mitigate and reduce these sinister risks by breaking down the silos of individual cases, learning from a wealth of prior case data, and then transferring these learnings to new cases. Having the capability to analyze and understand large data sets at scale combined with state-of-the-art methods provides a number of benefits, five of which I have outlined below.Pinpointing Sensitive Information - Advances in deep learning and natural language processing has now made pinpointing sensitive content achievable. A company’s most confidential content could be laying in plain sight within their electronic data and yet be completely undetected. Imagine a spreadsheet listing customers, dates of birth, and social security numbers attached to an email between sales reps. What if you are a technology company and two developers are emailing each other snippets of your company’s source code? Now that digital medium is the dominant form of communication within workplaces, situations like this are becoming ever-present and it is very challenging for review teams to effectively identify and triage this content. To solve this challenge, advanced analytics can learn from massive amounts of publically available and computer-generated data and then fine tuned to specific data sets using a recent breakthrough innovation in natural language processing (NLP) called “transfer learning.” In addition, at the core of big data is the capability to process text at scale. Combining these two techniques enables precise algorithms to evaluate massive amounts of discovery data, pinpoint sensitive data elements, and elevate them to review teams for a targeted review workflow.Prioritizing the Right Documents - Advanced analytics can learn both key trends and deep insights about your documents and review criteria. A normal search term based approach to identify potentially responsive or privileged content provides a binary output. Documents either hit on a search term or they do not. Document review workflows are predicated on this concept, often leading to suboptimal review workflows that both over-identify documents that are out of scope and miss documents that should be reviewed. Advanced analytics provide a range of outcomes that enable review teams to create targeted workflow streams tailored to the risk at hand. Descriptive analysis on data can generate human interpretable rules that help organize documents, such as “all documents with more than X number of recipients is never privileged” or “99.9% of the time, documents coming from the following domains are never responsive”. Deep learning-based classifiers, again using transfer learning, can generalize language on open source content and then fine-tune models to specific review data sets. Having a combination of analytics, both descriptive and predictive, provides a range of options and gives review teams the ability to prioritize the right content, rather than just the next random document. Review teams can now concentrate on the most important material while deprioritizing the less important content for a later effort.Achieving Work-Product Consistency - Big data and advanced analytics approaches can ensure the same document or similar documents are treated consistently across cases. Corporations regularly collect, process, and review the same data across cases over and over again, even when cases are not related. Keeping document treatment consistent across these matters can obviously be extremely important when dealing with privilege content – but is also important when it comes to responsiveness across related cases, such as a multi-district litigation. With the standard approach, cases are in siloes without any connectivity between them to enable consistent approaches. A big data approach enables connectivity between cases using hub-and-spoke techniques to communicate and transit learnings and work-product between cases. Work product from other cases, such as coding calls, redactions, and even production information can be utilized to inform workflows on the next case. For big data, activities like this are table stakes.Mitigating Risk - What do all of these approaches have in common? At its core, big data and analytics is an engine for mitigating risk. Having the ability to pinpoint sensitive data, prioritize what you look at, and ensure consistency across your cases is a no-brainer. This all may sound like a big change, but in reality, it’s pretty seamless to implement. Instead of simply batching out documents that hit on an outdated privilege screen for privilege review, review managers can instead use a combination of analytics and fine-tuned privilege screen hits. Review then occurs from there largely as it does today, just with the right analytics to inform reviewers with the context needed to make the best decision.Reducing Cost - The other side of the coin is cost savings. Every case has a different cost and risk profile and advanced analytics should provide a range of options to support your decision making process on where to set the lever. Do you really need to review each of these categories in full, or would an alternative scenario based on sampling high-volume and low-risk documents be a more cost-effective and defensible approach? The point is that having a better and more holistic view of your data provides an opportunity to make these data-driven decisions to reduce costs.One key tip to remember - you do not need to try to implement this all at once! Start by identifying a key area where you want to make improvements, determine how you can measure the current performance of the process, then apply some of these methods and measure the results. Innovation is about getting a win in order to perpetuate the next.If you are interested in this topic or just love to talk about big data and analytics, feel free to reach out to me at KSobylak@lighthouseglobal.com.ai-and-analyticsdigital-forensics, ai-and-analyticsanalytics; ai-big-data; data-re-use; blogkarl sobylak
AI and Analytics
Blog

Automating Legal Operations - A DIY Model

Legal department automation may be top of mind for you like several other legal operations professionals, however, you might be dependent on IT or engineering resources to be able to execute. Or perhaps you are struggling with change management and not able to implement something new. You are not alone. These were the top two blockers to building out an efficient process within legal departments as shared by recent CLOC conference attendees. The good news is that off-the-shelf technologies have advanced to the point where you may not need any time from those resources and may be able to manage automation without needing to change user behavior. With “no code” automation, you can execute end-to-end automation for your legal operations department, yourself!What is “No Code” Automation?As recently highlighted in Forbes magazine, “no-code platforms feature prebuilt drag-and-drop activities and tasks that facilitate integration at the business user level.” This is not “low code” automation that has been around for decades. Low code refers to using existing code, whether from open source or from other internal development, to lower the need to create new code. Low code allows you to build faster but still requires the knowledge of code. In “no code,” however, you do not need to have an understanding of coding. What this really means is that no code platforms are so user-friendly that even a lawyer, or legal operations professional, can create automated actions…I know because I am a lawyer that has successfully done this!But, How Does this Apply in Legal Operations?The short answer is that it lets you, the legal operations professional, automate workflows with little external help. There are some legal departments already taking advantage of this technology. At a recent CLOC conference, Google shared how they had leveraged “no code” automation to remove the change management process for ethics and compliance in the code of conduct, conflict of interest, and anti-bribery and corruption areas. With respect to outside counsel management, Google was similarly able to remove IT/engineering dependencies for conflict waiver approvals, outside counsel engagements, and matter creation. For more details, watch Google describe their no-code automation use cases.Google’s workflow automation is impressive and more mature than those of us who are just starting, so I wanted to share a simple example. A commonplace challenge for smaller legal teams is to manage tasks – ensuring all legal requests are captured and assigned to someone on the legal team. Many teams are dealing with dozens, or hundreds, of emails and it can be cumbersome to look through those to determine who is working on what. Inevitably some of those requests get missed. It is also challenging to then later report on legal requests – e.g., what types of requests the legal team receives daily, how long they take to resolve, and how many requests each person can work on. A “no code” platform can help. For example, you can connect your email to a shared Excel spreadsheet that captures all legal tasks. You would do this by creating a process that has the tool log each email sent to a certain address (e.g. legal@insertconame.com) on an Excel spreadsheet in a shared location (e.g. LegalTasks.xls). You would “map” parts of the email to columns in the spreadsheet. For example, you would want to capture the sender, the date, the time, the subject, and the body. You can even ask users who are sending requests into that email to put the type of request in the subject line. Your legal team can then check the shared spreadsheet daily and “check out” tasks by putting their initials in another column. Once complete, they would also mark that on the spreadsheet. Capturing all this information will allow you to see who is working on what, ensure that all requests are being worked on, and use pivot reporting on all legal tasks later on. Although this is a really simple use case with basic tools, it is also one that takes only a few minutes to set up and can measurably improve organization among legal team members.You can use “no code” automation in most areas of legal operations department automation. Some of the most common things to automate with “no code” are as follows:Legal ApprovalsDocument GenerationsEvidence CollectionTracking of Policy AcceptanceMany “no code” companies work with legal departments, so they may have experience with legal operations use cases. Be sure to ask how they have seen their technologies deployed in other legal departments.Can I Really Do This Without Other Departments?About 90% of the work can be done by you or your team, and in some cases, even 100%. However, sometimes connecting the tools or even installing the software has to be done by your IT and development teams. This is particularly true if you are connecting to proprietary software or have a complex infrastructure. This 10% of work required by these teams, however, is much smaller than if you were asking for those resources to create the automations from scratch. In addition, you often do not have to change user behavior so change management is removed as a blocker.I encourage you to explore using “no code” automation in your legal department. Once you start, you’ll be glad you tried. I would be excited to hear your experiences with “no code” in legal operations. If you are using it, drop me a line at djones@lighthouseglobal.com and tell me how.legal-operations; ediscovery-reviewediscovery-process, legal-ops, blog, legal-operations, ediscovery-reviewediscovery-process; legal-ops; bloglighthouse
Legal Operations
eDiscovery and Review
Blog

The Sinister Six…Challenges of Working with Large Data Sets

Collectively, we have sent an average of 306.4 billion emails each day in 2020. Add to that 23 billion text messages and other messaging apps, and you get roughly 41 million messages sent every minute[1]. Not surprisingly, there have been at least one or two articles written about expanding data volumes and the corresponding impact on discovery. I’ve also seen the occasional post discussing how the methods by which we communicate are changing and how “apps that weren’t built with discovery in mind” are now complicating our daily lives. I figured there is room for at least one more big data post. Here I’ll outline some of the specific challenges we’ll continue to face in our “new normal,” all while teasing what I’m sure will be a much more interesting post that gets into the solutions that will address these challenges.Without further delay, here are six challenges we face when working with large data sets and some insights into how we can address these through data re-use, AI, and big data analytics:Sensitive PII / SHI - The combination of expanding data volumes, data sources, and increasing regulation covering the transmission and production of sensitive personally identifiable information (PII) and sensitive health information (SHI) presents several unique challenges. Organizations must be able to quickly respond to Data Subject Access Requests (DSARs), which require that they be able to efficiently locate and identify data sources that contain this information. When responding to regulatory activity or producing in the course of litigation, the redaction of this content is often required. For example, DOJ second requests require the redaction of non-responsive sensitive PII and/or SHI prior to production. For years, we have relied on solutions based on Regular Expressions (RegEx) to identify this content. While useful, these solutions provide somewhat limited accuracy. With improvements in AI and big data analytics come new approaches to identifying sensitive content, both at the source and further downstream during the discovery process. These improvements will establish a foundation for increased accuracy, as well as the potential for proactively identifying sensitive information as opposed to looking for it reactively.Proprietary Information - As our society becomes more technologically enabled, we’re experiencing a proliferation of solutions that impact every part of our life. It seems everything nowadays is collecting data in some fashion with the promise of improving some quality of life aspect. This, combined with the expanding ways in which we communicate means that proprietary information, like source code, may be transmitted in a multitude of ways. Further, proprietary formulas, client contacts, customer lists, and other categories of trade secrets must be closely safeguarded. Just as we have to be vigilant in protecting sensitive personal and health information from inadvertent discloser, organizations need to protect their proprietary information as well. Some of the same techniques we’re going to see leveraged to combat the inadvertent disclosure of sensitive personal and health information can be leveraged to identify source code within document populations and ensure that it is handled and secured appropriately.Privilege - Every discovery effort is first aimed at identifying information relevant to the matter at hand, and second to ensure that no privileged information is inadvertently produced. That is… not new information. As we’ve seen the rise in predictive analytics, and, for those that have adopted it, a substantial rise in efficiency and positive impact on discovery costs, the identification of privileged content has remained largely an effort centered on search terms and manual review. This has started to change in recent years as solutions become available that promise a similar output to TAR-based responsiveness workflows. The challenge with privilege is that the identification process relies more heavily on “who” is communicating than “what” is being communicated. The primary TAR solutions on the market are text-based classification engines that focus on the substantive portion of conversations (i.e. the “what” portion of the above statement). Improvments in big data analytics mean we can evaluate document properties beyond text to ensure the “who” component is weighted appropriately in the predictive engine. This, combined with the potential for data re-use supported through big data solutions, promises to substantially increase our ability to accurately identify privileged, and not privileged, content.Responsiveness - Predictive coding and continuous active learning are going to be major innovations in the electronic discovery industry…would have been a catchy lead-in five years ago. They’re here, they have been here, and adoption continues to increase, yet it’s still not at the point where it should be, in my opinion. TAR-based solutions are amazing for their capacity to streamline review and to materially impact the manual effort required to parse data sets. Traditionally, however, existing solutions leverage a single algorithm that evaluates only the text of documents. Additionally, for the most part, we re-create the wheel on every matter. We create a new classifier, review documents, train the algorithm, rinse, and repeat. Inherent in this process is the requirement that we evaluate a broad data set - so even items that have a slim to no chance of being relevant are included as part of the process. But there’s more we can be doing on that front. Increases in AI and big data capabilities mean that we have access to more tools than we did five years ago. These solutions are foundational for enabling a world in which we continue to leverage learning from previous matters on each new future matter. Because we now have the ability to evaluate a document comprehensively, we can predict with high accuracy populations that should be subject to TAR-based workflows and those that should simply be sampled and set aside.Key Docs - Variations of the following phrase have been uttered time and again by numerous people (most often those paying discovery bills or allocating resources to the cause), “I’m going to spend a huge amount of time and money to parse through millions of documents to find the 10-20 that I need to make my case.” They’re not wrong. The challenge here is that what is deemed “key” or “hot” in one matter for an organization may not be similar to that which falls into the same category on another. Current TAR-based solutions that focus exclusively on text lay the foundation for honing in on key documents across engagements involving similar subject matter. Big data solutions, on the other hand, offer the capacity to learn over time and to develop classifiers, based on more than just text, that can be repurposed at the organizational and, potentially, industry level.Risk - Whether related to sensitive, proprietary, or privileged information, every discovery effort utilizes risk-mitigation strategies in some capacity. This, quite obviously, extends to source data with increasing emphasis on comprehensive records management, data loss prevention, and threat management strategies. Improvements in our ability to accurately identify and classify these categories during discovery can have a positive impact on left-side EDRM functional areas as well. Organizations are not only challenged with identifying this content through the course of discovery, but also in understanding where it resides at the source and ensuring that they have appropriate mechanisms to identify, collect and secure it. Advances in AI and big data analytics will enable more comprehensive discovery programs that leverage the identification of these data types downstream to improve upstream processes.As I alluded to above, these big data challenges can be addressed with the use of AI, analytics, data reuse, and more. Now that I have summarized some of the challenges many of you are already tasked with dealing with on a day-to-day basis, you can learn more about actual solutions to these challenges. Check out my colleague’s write up on how AI and analytics can help you gain a holistic view of your data.To discuss this topic more or to ask questions, feel free to reach out to me at NSchreiner@lighthouseglobal.com.[1] Metrics courtesy of Statistachat-and-collaboration-data; ai-and-analyticsprivilege, analytics, ai-big-data, data-re-use, phi, pii, blog, chat-and-collaboration-data, ai-and-analyticsprivilege; analytics; ai-big-data; data-re-use; phi; pii; blognick schreiner
Chat and Collaboration Data
AI and Analytics
Blog

Case Preparation - Thinking out Loud! Summarized…

Long gone are days when the majority of discovery records were kept in paper format. Documents, invoices, and other related evidence needed to be scanned and printed in the tens (if not hundreds) of thousands. Today, a huge number of discovery efforts (internal or external) revolve around digital content. Ergo, this article will highlight the collection of digital evidence and how to best prepare your case when it comes to preservation and collections as well as processing and filtering.But, before we get into that, one of the core factors to keep in mind here is time, which will always be there irrespective of what we have at hand. It is especially complicated if multiple parties are involved, such as vendors, multiple data locations, outside counsels, reviewers, and more. For the purposes of this blog, I have divided everything into the following actionable groups - preservation and collection as well as processing and filtering.Preservation and CollectionIn an investigation or litigation there could be a number of custodians involved, for example, people who have or had access to data. Whenever there are more than a handful of custodians the location may vary. It is imperative to consider where and what methods to use for data collection. Sometimes an in-person collection is more feasible than a remote collection. Other times, a remote collection is the preferred method for all those concerned. A concise questionnaire along with answers too frequently asked questions is the best approach to educate the custodian. Any consultative service provider must ensure samples are readily available to distribute that will facilitate the collection efforts.Irrespective of how large the collection is, or how many custodians there are, it is best to have a designated coordinator. This will make the communication throughout the project manageable. They can arrange the local technicians for remote collections and ship and track the equipment.The exponential growth in technology presents new challenges in terms of where the data can reside. An average person, in today’s world, can have a plethora of potential devices. Desktops and laptops are not the only media where data can be stored. Mobile devices like phones and tablets, accessories such as smartwatches, the IoT (everything connected to the internet), cars, doorbells, locks, lights…you name it. Each item presents a new challenge and must be considered when scoping the project.User-generated data is routinely stored and shared on the Cloud using a variety of platforms. From something as ancient as email servers to “new” rudimentary storage locations, such as OneDrive, Google Drive, Dropbox, and Box.com. Others include collaborative applications, such as SharePoint, Confluence, and the like.Corporate environments also heavily rely on some sort of common exchange medium like Slack, Microsoft Teams, and email servers. These applications also present their own set of challenges. We have to consider, not just what and how to collect, but equally important is how to present the data collected from these new venues.The amount of data collected for any litigation can be overwhelming. It is imperative to have a scope defined based on the need. Be warned, there are some caveats to setting limitations beforehand, and it will vary based on what the filters are. The most common and widely acceptable limitation is a date range. In most situations, a period is known and it helps to set these parameters ahead of time. In doing so, only the obvious date metadata will be used to filter the contents. For example, in the case of emails, you are limited to either the sent or received date. The attachment's metadata will be ignored completely. Each cloud storage presents its own challenges when it comes to dates.Data can be pre-filtered with keywords that are relevant to the matter at hand. It can greatly reduce the amount of data collected. However, it is solely dependent on indexing capabilities of the host, which could be non-existent. The graphical contents and other non-indexable items could be excluded unintentionally, even if they are relevant.The least favored type of filter among the digital-forensics community is a targeted collection, where the user is allowed to guide where data is stored and only those targeted locations are preserved. This may not be cost effective, however, it can restrict the amount of data being collected. This scope should always be expected to be challenged by other parties and may require a redo.Processing and FilteringOnce the data collected goes through the processing engine the contents get fully exposed. This allows the most thorough, consistent, and repetitive filtering of data. In this stage, filtering relies on the application vetted by the vendor and accompanied by a process that is tested, proven, and updated (when needed).The most common filtering in eDiscovery matters is de-NIST-ing, which excludes the known “system” files from the population. Alternatively, an inclusion filter can be applied, which only pushes forward contents that typically a user would have created, such as office documents, emails, graphic files, etc. In most cases, both de-NIST-ing and inclusion filters are applied.Once the data is sent through the meat grinder (the core processing engine) further culling can be done. At this stage, the content is fully indexed and extensive searches and filters will help limit the data population even further to a more manageable quantity. The processing engine will mark potentially corrupt items, which are likely irrelevant. It will also identify and remove any duplicate items from all collected media from the entire matter data population. Experts can then apply relevant keyword searches on the final product and select the population that will be reviewed and potentially produced.I hope this article has shed some light on how to best prepare your case when it comes to preservation and collections as well as processing and filtering. To discuss this topic further, please feel free to reach out to me at MMir@lighthouseglobal.com.digital-forensics; information-governance; chat-and-collaboration-datacollections, ediscovery-process, preservation-and-collection, processing, blog, digital-forensics, information-governance, chat-and-collaboration-data,collections; ediscovery-process; preservation-and-collection; processing; blogmahmood mir
Forensics
Information Governance
Chat and Collaboration Data
Blog

Why Moving to the Cloud can Help with DSARs (and Have Some Surprise Benefits)

However you view a DSAR, for any entity who receives one, they are time consuming to complete and disproportionately expensive to fulfill. Combined with the increasing manner in which they are being weaponized, companies are often missing opportunities to mitigate the negative effects of DSARs by not migrating data to the Cloud.Existing cloud solutions, such as M365 and Google Workplace (formerly known as G-Suite) allow administrators to,for example, set data retention policies, ensuring that data cannot routinely be deleted before a certain date, or that a decision is made as to when data should be deleted. Equally, legal hold functionality can ensure that data cannot be deleted at all. It is not uncommon for companies to discover that when they migrate to the Cloud all data is by default set to be on permanent legal hold. Whilst this may be required for some market sectors, it is worth re-assessing any existing legal hold policy regularly to prevent data volumes from ballooning out of control.Such functionality is invaluable in retaining data, but can have adverse effects in responding to DSARs, as it allows legacy or stale data to be included in any search of documents and inevitably inflates costs. Using built-in eDiscovery tools to search and filter data in place in combination with a data retention policy managed by multiple stakeholders (such as Legal, HR, IT, and Compliance) can mitigate the volumes of potentially responsive data, having a significant impact on downstream costs of fulfilling a DSAR.Typically, many key internal stakeholders are frequently unaware of the functionality available to their organization. This can help to mitigate costs, such as Advanced eDiscovery (AED) in Microsoft 365, or Google Vault in Google Workspace. Using AED, a user can quickly identify relevant data sources, from mailboxes, OneDrive, Teams, Skype, and other online data sources, apply filters such as date range and keywords, and establish the potential number of documents for review within in minutes. Compare this to those who have on-premise solutions, where they are wholly dependent on an internal IT resource, or even the individual data custodians, to identify all of the data sources, confirm with HR / Legal that they should be collected, and then either apply search criteria or export the data in its entirety to an external provider to be processed. This process can take days, if not weeks, when the clock is ticking to provide a response in 30 days. By leveraging cloud technology, it is possible to identify data sources and search in place in a fraction of the time it takes for on-premise data.Many cloud platforms include functionality, which means that when data is required for a DSAR, it can now be searched, filtered, and, crucially, reviewed in place. If required, redactions can be performed prior to any data being exported externally. Subject to the level of license held, additional functionality, such as advanced indexing or conceptual searching, can also be deployed, allowing for further filtering of data and thus reducing data volumes for review or export.The technology also allows for rapid identification of multiple data types including:Stale dataSensitive data types (financial information/ PII)Customer-specific dataSuspicious / unusual activitiesBy using the inbuilt functionality to minimize the impact of such data types as part of an Information Governance / Records Management program, there can be significant changes and improvements made elsewhere, including data retention policies, data loss prevention, and improved understanding of how data is routinely used and managed in general day-to-day business. This, in turn, has significant time and cost benefits when required to search for data, whether for a DSAR, investigation, or a litigation exercise. Subject to the agreement with the cloud service provider, this may also have benefits in reducing the overall volume and cost of data hosted.With a sufficiently robust internal protocol in place, likely data sources can be identified and mapped. Now, when a DSAR request is received, an established process exists to rapidly search and cull potential cloud-based data sources, including using tools such as Labels or Sensitivity Type to exclude data from the review pool, and efficiently respond to any such request.Migrating to the Cloud may seem daunting, but the benefits are there and can be best maximized when all stakeholders work together, across multiple teams and departments. DSARs do not have to be the burden they are today. Using tools readily available in the Cloud might also significantly reduce the burdens and costs of DSARs.To discuss this topic further, please feel free to reach out to me at MBicknell@lighthouseglobal.com.data-privacy; ediscovery-review; information-governance; microsoft-365cloud, dsars, cloud-services, blog, data-privacy, ediscovery-review, information-governance, microsoft-365cloud; dsars; cloud-services; blogmatt bicknell
Data Privacy
eDiscovery and Review
Information Governance
Microsoft 365
No items found. Please try different search parameters.