Project Overview
LLM-driven evaluation is used to assess the accuracy and quality of free-text narratives in anti-money laundering (AML) reports.
Layman's Explanation
SumUp uses an AI-powered tool to evaluate text written for financial crime reports, helping ensure that these reports meet regulatory standards without human agents needing to review each one manually.
Analogy
In finance, evaluating an LLM’s text is like an auditor checking reports for completeness, ensuring each section is filled out accurately and all required information is included, without manually going through every word.
Details
SumUp developed an LLM-driven evaluation system to assess the quality of narratives generated in suspicious activity reports (SARs), specifically for AML purposes. Traditional NLP metrics often fail to capture nuances essential to these narratives, such as factual accuracy, topic relevance, and structural completeness. Instead, SumUp’s LLM-driven approach evaluates each text against custom benchmarks, such as whether it includes customer data, supporting evidence, and valid conclusions. It then assigns a score based on these criteria, allowing automated, qualitative assessment comparable to human review. This evaluation method also identifies potential improvements in generated text by flagging issues like missing data or fabricated facts, enabling data scientists to iterate and refine the text generation model effectively.