Skip to content

1.1 Understanding of Problem & Objectives

Theme Statement

Automated AI-Based Financial Transaction Categorisation

Background/Motivation

Modern financial applications—ranging from personal budgeting tools to business accounting platforms—require robust systems for classifying raw transaction strings (such as "Starbucks," "Amazon.com," or "Shell Gas") into meaningful categories ("Coffee/Dining," "Shopping," "Fuel") for budgeting, analytics, or reporting purposes.

Today, many developers rely on expensive, third-party APIs to achieve this, resulting in: - High scaling costs - Limited flexibility - Suboptimal user experience

There is a pressing need for cost-effective, in-house AI solutions that empower developers with: - Rapid transaction categorisation - Enhanced control - Full customisability


Problem Statement

Building a scalable transaction categorisation system is essential for seamless financial management. Reliance on external APIs introduces: - Recurring costs - Network latency - Limits in customising the categorisation logic

Developing an internal AI or ML-based solution enables: - Granular control - Cost savings - Improved responsiveness

However, this also raises new challenges: - Need for high-accuracy - Adaptability to user-defined categories - Rigorous evaluation - Explainable outcomes

Challenge

Build a standalone, high-performance transaction categorisation system that achieves business-grade accuracy and transparency while eliminating external service dependencies.


Key Considerations

1. End-to-End Autonomous Categorisation

  • The system must ingest raw financial transaction data and output a category and confidence score based on a predefined, user-configurable taxonomy
  • All categorisation logic and inference must take place within the team's environment—no third-party API calls

2. Accuracy & Evaluation

  • Deliver a macro F1-score of at least 0.90 on the dataset used for demonstration
  • Submissions should include a detailed evaluation report with:
  • Confusion matrix
  • Macro and per-class F1 scores
  • End-to-end reproducibility from data processing to inference

3. Customisable & Transparent

  • The category taxonomy must be easily updated via a configuration file (e.g., JSON, YAML)
  • Support admin-driven changes without direct code edits
  • Bonus points for explainability: Provide insights or feature attributions explaining classification decisions
  • Incorporate a simple feedback loop mechanism for users to review and correct low-confidence predictions

4. Robustness & Responsible AI

  • Handle noisy, variable transaction strings robustly
  • Address ethical AI aspects, particularly in mitigating biases (e.g., based on merchant, region, or transaction amount)
  • Strict adherance to global financial regulations (GDPR, CCPA) and industry standards (SOC 2). We prioritize data sovereignty by ensuring no sensitive financial data leaves our controlled infrastructure.

Annexure

Out of Scope

  • Full production deployments
  • CI/CD pipelines
  • Extensive user interfaces
  • Real-time streaming
  • Fraud/anomaly detection
  • Financial advice features

Performance Measurement

  • Extra credit for providing throughput and latency benchmarks
  • Transparent measurement notes

Resources

  • No official dataset is provided
  • Teams should source or generate their own data (e.g., public datasets or synthetics)
  • Document data acquisition process clearly

Deliverables

Required

  1. Source code repository
  2. README with setup instructions
  3. Dataset documentation

  4. Metrics report

  5. Macro/per-class F1 scores
  6. Confusion matrix
  7. Accuracy metrics

  8. Short demo

  9. Pipeline execution
  10. Evaluation results
  11. Sample predictions with confidence scores
  12. Demo of taxonomy modification via config

Bonus Objectives

  • Explainability UI
  • Robustness to input noise
  • Batch inference performance metrics
  • Simple human-in-the-loop feedback
  • Bias mitigation discussion

Our Solution Approach

This project implements a hybrid ensemble system that combines multiple classification methods to achieve superior accuracy and transparency:

Architecture Components

  1. MCC (Merchant Category Code) Classifier - ISO 18245 standard codes
  2. Rule-based Engine - Deterministic pattern matching
  3. ML Classifier - LightGBM with sentence-transformers embeddings
  4. LLM Fallback - Ollama (local) for edge cases

Key Features

  • 98.43% validation accuracy (exceeds 90% F1 requirement)
  • Configurable taxonomy via YAML (28 balanced categories)
  • Explainable results with method attribution and confidence scores
  • Feedback loop with corrections mechanism
  • Zero external API costs (fully autonomous)
  • Production-ready with Docker deployment

Performance Metrics

  • Macro F1 Score: 0.9842 (98.42%)
  • Accuracy: 98.43%
  • Average Latency: <100ms per transaction
  • Batch Processing: 1000+ transactions supported

See detailed implementation documentation in subsequent sections.