

In what may be a watershed moment for the Legal-Tech industry, the first-of-its-kind Vals Legal AI Report (VLAIR) has delivered compelling evidence that artificial intelligence can outperform human lawyers in several critical legal tasks. The study, which used actual law firm data and employed a human lawyer control group, presents the most comprehensive and methodologically sound evaluation of legal AI tools to date.
IN THIS ARTICLE...
Breaking Down the Legal AI Benchmark
The VLAIR study evaluated four leading legal AI tools across seven common legal tasks, benchmarking their results against a lawyer control group. The study’s headline finding is unmistakable: AI outperformed human lawyers in four of the seven legal performance areas tested. This result challenges previous skepticism about AI’s practical utility in legal work and signals a potential inflection point in how law firms approach technology adoption.
Harvey Assistant emerged as the standout performer, scoring the highest in five tasks and claiming the top position overall. More specifically, Harvey opted into six out of seven tasks and outperformed the Lawyer Baseline in four tasks. Harvey Assistant also received two of the highest scores across all tasks evaluated in the study: 94.8% for Document Q&A and 80.2% for Chronology Generation (the latter matching the Lawyer Baseline).
Thomson Reuters CoCounsel also demonstrated impressive capabilities, consistently ranking among the top solutions across evaluated tasks. Earning the highest score for Document Summarization (77.2%) and scoring consistently high across all four tasks it participated in (averaging 79.5%).
The study also highlighted strong performances from vLex Vincent AI and Vecflow’s Oliver, particularly in Document Q&A applications. The study initially included Lexis+ AI (from LexisNexis), but this tool was withdrawn from the sections studied in the report.
Lawyers vs Legal AI Tools Comparison
The study identifies four key areas where AI tools can outshine lawyers‘ performance:
- Data extraction (Lawyer Baseline: 71.1% v Best AI: 75.1%)
- Document Q&A (Lawyer Baseline: 70.1% v Best AI: 94.8%)
- Document summarization (Lawyer Baseline: 50.3% v Best AI: 77.2%)
- Transcript analysis (Lawyer Baseline: 53.7% v Best AI: 77.8%)


These findings provide valuable guidance for law firms and legal departments looking to prioritize their AI investments, positioning these four tasks as prime candidates for automation or augmentation. In the study, all the evaluated AI tools performed better than the lawyer control group in document summarization tasks, confirming that the technology excels in this area.
Rutvik Rau, Founder at VecFlow, noted the value of such detailed performance analysis:
“Being able to know the tasks where tools such as Vecflow and others excel is the best way to measure long term ROI of legal AI.”
The AI Speed Dilemma for Law Firms
Perhaps most striking is the efficiency advantage demonstrated by AI solutions. The tools performed tasks significantly faster than the human baseline – ranging from six times faster at their slowest to an astonishing 80 times faster at their peak. This speed differential alone presents a compelling case for AI adoption, even in scenarios where performance quality might be comparable to work done by real people.
However, this very speed advantage creates a direct conflict with how law firms traditionally make money. Since most firms bill clients by the hour, technologies that complete tasks 6-80 times faster directly threaten their revenue model. Partners face a difficult choice: adopt tools that dramatically reduce billable hours and potentially cut profits, or maintain slower manual processes to preserve income. This could lead them to a definitive answer on whether to make the shift from traditional billable hours to value-based pricing.
This tension explains why many law firms remain hesitant to embrace AI despite its proven capabilities. The better these tools perform at saving time, the more they undermine the fundamental business model that legal practices have relied on for decades.
Where Humans Still Hold the Edge
Despite AI’s impressive showing, human lawyers maintained superiority in certain domains. Specifically, lawyers outperformed all AI tools in redlining (79.7%) and EDGAR research tasks (70.1%), suggesting these areas still require the nuanced judgment and contextual understanding that experienced legal professionals bring to the table.
For chronology generation, the results showed parity, with Harvey Assistant matching the lawyer baseline at 80.2% – indicating that in this area, AI has already achieved performance levels equivalent to human experts.
Practitioner experience reinforces the study’s findings that redlining remains a challenging area for AI implementation.
Benjamin Welch, Head of Archer Transactional, shared his ongoing frustrations with AI redlining tools:
“Redlining continues to be the biggest disappointment on AI tools I have tried. Even when I told CoCounsel where the language was that needed to be redlined, it insisted that language either did not exist or did not have the issue I identified as requiring redlining.”


AI in Legal Research Challenge
Interestingly, the study challenges conventional wisdom regarding AI’s application to legal research. While many have assumed research to be an ideal use case for AI in legal settings, the VLAIR study (along with previous research) indicates that AI still falls short of expectations in complex legal research tasks.
This is particularly evident in the EDGAR research task, which involves multiple research steps and iterative decision-making and remains particularly challenging for AI tools. Only one AI tool (Oliver) attempted the challenge and scored well below the lawyer baseline (55.2% vs 70.1%).
Danny Katz, an expert in AI and knowledge management, observed:
“The gaps in research and redlining are particularly interesting – suggesting that AI is proving its value in structured tasks like extraction and summarization, but still struggles with more interpretative, context-heavy work.”
He further noted this pattern…
“seems to point to a few key takeaways: 1) AI models need better domain-specific fine-tuning for legal reasoning, 2) human oversight is still non-negotiable, and 3) firms need to be strategic about workflow integration – leaning on AI where it excels but ensuring lawyers remain central where nuance is key.”
The study suggests that increased performance on EDGAR Research may require further accuracy and reliability improvements in the nascent field of “AI agents” and “agentic workflows.” This finding suggests that firms should approach research-focused AI tools with realistic expectations and implement appropriate verification protocols.
Generative AI Base Model Question
An important critique raised by AI professionals centers on whether specialized legal AI tools provide significant advantages over properly prompted base models (such as GPT-4o or Claude).
Sam Burrett, AI Lead at MinterEllison, raised this fundamental question:
“Why didn’t this include analysis of base model (e.g. GPT4-o or o1) on the same benchmarks? Seems like a lost opportunity to me… The real question (imho) in LegalAI isn’t which provider is the best – it’s whether we should even be buying it at all, given how good [base models + prompting + context] can be.”
While VLAIR looked at specific legal AI tools compared to human performance, this perspective suggests that future studies should include baseline comparisons with foundation models using appropriate context and prompting. This would help organizations determine whether investing in specialized tools offers meaningful advantages over configuring more accessible, general-purpose AI models.
Legal AI Build or Buy: A Deeper Look
The VLAIR study’s focus on specialized legal AI tools raises a fundamental strategic question that many organizations now face: Is it better to purchase specialized legal AI solutions or build custom implementations using foundation models?
Kevin Keller, a General Counsel with AI expertise, highlighted this challenge directly:
“I don’t see it as a particularly good solution when with a modicum of tech savvy a legal team can use frontier models, their own data and much deeper legal experience in their areas of focus to deliver fantastic results without vendor dependency.”
This perspective merits closer examination. On one hand, vendor solutions like those evaluated in the VLAIR study offer several advantages:
- Ready-to-deploy functionality: Specialized tools typically require minimal technical configuration
- Purpose-built legal features: They often include document parsers, legal taxonomies, and jurisdiction-specific capabilities
- Vendor support and updates: Ongoing maintenance and improvements without internal resources
- Risk mitigation: Established vendors may offer compliance guarantees and security certifications
- Specialized training: Many tools are trained or fine-tuned specifically on legal data
However, the “build” approach using foundation models (like ChatGPT, Gemini, Claude, or open-source alternatives) offers compelling counterarguments:
- Cost efficiency: Potentially lower costs, especially for firms with significant volume
- Customization control: Ability to tailor outputs precisely to organizational practices and preferences
- Data sovereignty: Greater control over confidential information and reduced dependency on vendors
- Integration flexibility: Custom solutions can be more tightly integrated with existing workflows
- Institutional knowledge capture: Organizations can incorporate their unique legal expertise into prompts and configurations
Burrett’s question about foundation model benchmarks points to a critical gap in the current evaluation landscape. Without direct comparisons between specialized tools and properly configured foundation models on identical tasks, legal organizations lack crucial data for making informed build-vs-buy legal AI decisions.
The optimal approach likely varies based on several factors:
- Technical capacity: Organizations with strong technical teams may benefit more from custom solutions
- Practice specificity: Highly specialized practice areas might require more customized approaches
- Volume of use: Higher usage volume can justify greater investment in custom solutions
- Risk tolerance: Some organizations may prefer vendor guarantees despite potential cost premiums
- Security requirements: Different confidentiality needs may favor one approach over another
This nuanced decision extends beyond performance metrics to consider organizational strategy, technical capabilities, and risk management – dimensions not fully captured in the VLAIR study’s tool-versus-human comparison framework.
Real-World Implementation Challenges
A crucial consideration raised in response to the study concerns how these tools perform outside carefully controlled testing environments. Refat Ametov, who focuses on business automation and AI integration, raised this practical concern:
“It’s great to see clear evidence of value, but how much of this performance depends on carefully controlled conditions? Real-world legal workflows are rarely clean or linear – how do these tools perform when data is incomplete, contracts are inconsistent, or client-specific preferences override standard processes?”


The ability of AI tools to handle these messy real-world legal scenarios remains a critical question for firms considering adoption. The VLAIR study itself acknowledges this limitation, noting that there remains room for improvement in both how they evaluate legal AI tools comparison and their performance.
Implications for the Legal Industry
For law firms and legal departments, the VLAIR study offers several actionable insights:
The Time for Action is Now – The results effectively eliminate reasonable doubt about AI’s utility in legal work. Organizations that have hesitated to adopt AI due to uncertainty now have empirical evidence supporting its value.
Strategic Investment – Resources should be directed toward tools that excel in the high-performing areas identified by the study – data extraction, document Q&A, summarization, and transcript analysis.
Hybrid Approaches – Even where AI didn’t outperform lawyers, its dramatic speed advantages make it valuable as a first-pass solution that can be refined through human oversight.
Verification Remains Critical – Particularly for complex legal research, human verification of AI outputs should remain standard practice.
Start with High-ROI Workflows – Begin AI implementation with summarization and clause extraction for maximum impact. As Umar Aslam, a Legal Tech Strategist and AI Engineer, suggests “Start with high-ROI workflows (summarization, clause extraction).”
Consider Build vs Buy – Organizations must weigh the benefits of specialized legal AI tools against custom solutions made using foundation models based on their specific needs and technical capabilities.
Small Law Firm Considerations
A notable gap in the current VLAIR study is its focus on tools primarily used by larger firms. Carolyn Elefant, a prominent voice for solo practitioners, commented:
“I can’t wait to read this and glad to hear that Vlex and co counsel, two of my personal faves scored so high. But these products are largely aimed at large firms and it would be amazing to also stress test the products that solos and smalls are using.”
Industry experts emphasize that scalable, cost-effective AI implementation is possible for small firms. Aslam also noted that legal organizations should “balance off-the-shelf tools (Harvey, CoCounsel) with tailored solutions for niche needs” and “don’t forget small firms… scalable, cost-effective AI is possible.”
In-House Legal Team Perspective
Some in-house counsel expressed that the evaluated tools may not reflect what corporate legal departments are actually using. Keller offered this differing perspective:
“The thing is that most legal teams (at least in-house) don’t use any of these tools. There are better ones that move the needle for us like Streamline and GCAI.”


Nicola Shaver pointed out that major corporations like Deutsche Telekom, KKR, and Bayer use Harvey. This exchange highlights the diversity of approaches to AI adoption across different segments of the legal profession, with some organizations preferring specialized vendor solutions while others opt for more customized implementations using foundational models.
Collaborative Achievement and Future Directions
The VLAIR study represents a significant collaborative effort involving the Vals team and our friends at Legaltech Hub. Participating vendors and law firms contributed valuable time and data to make this research possible.
Additional VLAIR studies are planned to expand vendor involvement and jurisdictional coverage, with a forthcoming report focused specifically on legal research. Katz expressed curiosity about this evolution:
“Curious to see if future iterations of VLAIR will track improvement in these areas or if they’ll remain stubbornly human-dependent.”
Industry feedback suggests that future studies might benefit from including tools used by smaller firms and in-house teams, as well as direct comparisons with properly configured foundation models.
Emerging Possibilities: Next Generation of Legal AI
The VLAIR study provides a snapshot of current legal AI capabilities, but it also hints at future directions. The challenges identified in complex tasks like EDGAR research point to areas where emerging technologies like AI agents and agentic workflows may eventually bridge the gap between human and machine performance.
As these AI tools improve, they could shift from handling simple legal tasks to taking on more complex processes that need judgment and a deeper understanding of context. This change will depend on both better technology and thoughtful integration with how lawyers work and exercise their professional expertise.
Vals Legal AI Benchmark – Charting VLAIR’s Journey
The VLAIR study marks a turning point in the evidence-based evaluation of AI’s capability to transform legal practice. Without suggesting that legal AI will replace lawyers entirely, the report clearly shows where AI can enhance legal work – delivering similar or better results much more efficiently.
For forward-thinking law firms, these findings offer guidance on strategically adopting AI with proper human oversight, potentially boosting productivity and client service in a competitive legal market.
The range of opinions from lawyers and AI experts highlights that while AI is clearly valuable for legal work, the best legal AI implementation will differ based on a firm’s size, technical capabilities, practice areas, and specific needs. The most successful firms will be those who carefully match AI tools to their particular requirements rather than looking for universal solutions.
Strategic Symbiosis: Lawyers and Algorithms in Practice
As the study itself concludes, “these legal AI tools have value for lawyers and law firms, although there remains room for improvement in both how we evaluate these tools and their performance.” This balanced view recognizes both the major progress made and the ongoing challenges in developing AI that truly enhances legal practice.
The evidence points toward a future where lawyers and AI tools work in a symbiotic partnership, each leveraging their unique strengths. AI is great at quickly processing massive amounts of data and spotting patterns, while human lawyers provide judgment, creativity, and ethical reasoning for complex legal issues. The best approach isn’t AI replacing lawyers but law firms strategically using AI to boost their capabilities and efficiency.
For the legal industry, the path forward involves careful integration of these tools into existing workflows, ongoing evaluation of their performance, and a willingness to experiment with new approaches. As the VLAIR study shows, the question isn’t whether AI adds value to legal practice but how to best use its capabilities while preserving the human judgment and expertise that remain crucial to quality legal work.
Reference:
VLAIR Study in full: Vals Legal AI Report
Comments source: Nicola Shaver’s LinkedIn Post


Ricci Masero is a Chartered Marketer and content creator who has been writing about technology and innovation for many years. Alongside marketing the award-winning blended learning solutions from Intellek, he has been featured in Forbes, Entrepreneur, Alassian, and AI Journal – is a regular contributor to eLearning Industry and Training Journal – and co-hosts the L&D Insights segment in Intellek’s monthly client webinars.