1.1 Understanding of Problem & Objectives¶

Theme Statement¶

Automated AI-Based Financial Transaction Categorisation

Background/Motivation¶

Modern financial applications—ranging from personal budgeting tools to business accounting platforms—require robust systems for classifying raw transaction strings (such as "Starbucks," "Amazon.com," or "Shell Gas") into meaningful categories ("Coffee/Dining," "Shopping," "Fuel") for budgeting, analytics, or reporting purposes.

Today, many developers rely on expensive, third-party APIs to achieve this, resulting in: - High scaling costs - Limited flexibility - Suboptimal user experience

There is a pressing need for cost-effective, in-house AI solutions that empower developers with: - Rapid transaction categorisation - Enhanced control - Full customisability

Problem Statement¶

Building a scalable transaction categorisation system is essential for seamless financial management. Reliance on external APIs introduces: - Recurring costs - Network latency - Limits in customising the categorisation logic

Developing an internal AI or ML-based solution enables: - Granular control - Cost savings - Improved responsiveness

However, this also raises new challenges: - Need for high-accuracy - Adaptability to user-defined categories - Rigorous evaluation - Explainable outcomes

Challenge¶

Build a standalone, high-performance transaction categorisation system that achieves business-grade accuracy and transparency while eliminating external service dependencies.

Key Considerations¶

1. End-to-End Autonomous Categorisation¶

The system must ingest raw financial transaction data and output a category and confidence score based on a predefined, user-configurable taxonomy
All categorisation logic and inference must take place within the team's environment—no third-party API calls

2. Accuracy & Evaluation¶

Deliver a macro F1-score of at least 0.90 on the dataset used for demonstration
Submissions should include a detailed evaluation report with:
Confusion matrix
Macro and per-class F1 scores
End-to-end reproducibility from data processing to inference

3. Customisable & Transparent¶

The category taxonomy must be easily updated via a configuration file (e.g., JSON, YAML)
Support admin-driven changes without direct code edits
Bonus points for explainability: Provide insights or feature attributions explaining classification decisions
Incorporate a simple feedback loop mechanism for users to review and correct low-confidence predictions

4. Robustness & Responsible AI¶

Handle noisy, variable transaction strings robustly
Address ethical AI aspects, particularly in mitigating biases (e.g., based on merchant, region, or transaction amount)
Strict adherance to global financial regulations (GDPR, CCPA) and industry standards (SOC 2). We prioritize data sovereignty by ensuring no sensitive financial data leaves our controlled infrastructure.

Annexure¶

Out of Scope¶

Full production deployments
CI/CD pipelines
Extensive user interfaces
Real-time streaming
Fraud/anomaly detection
Financial advice features

Performance Measurement¶

Extra credit for providing throughput and latency benchmarks
Transparent measurement notes

Resources¶

No official dataset is provided
Teams should source or generate their own data (e.g., public datasets or synthetics)
Document data acquisition process clearly

Deliverables¶

Required¶

Source code repository
README with setup instructions
Dataset documentation
Metrics report
Macro/per-class F1 scores
Confusion matrix
Accuracy metrics
Short demo
Pipeline execution
Evaluation results
Sample predictions with confidence scores
Demo of taxonomy modification via config

Bonus Objectives¶

Explainability UI
Robustness to input noise
Batch inference performance metrics
Simple human-in-the-loop feedback
Bias mitigation discussion

Our Solution Approach¶

This project implements a hybrid ensemble system that combines multiple classification methods to achieve superior accuracy and transparency:

Architecture Components¶

MCC (Merchant Category Code) Classifier - ISO 18245 standard codes
Rule-based Engine - Deterministic pattern matching
ML Classifier - LightGBM with sentence-transformers embeddings
LLM Fallback - Ollama (local) for edge cases

Key Features¶

98.43% validation accuracy (exceeds 90% F1 requirement)
Configurable taxonomy via YAML (28 balanced categories)
Explainable results with method attribution and confidence scores
Feedback loop with corrections mechanism
Zero external API costs (fully autonomous)
Production-ready with Docker deployment

Performance Metrics¶

Macro F1 Score: 0.9842 (98.42%)
Accuracy: 98.43%
Average Latency: <100ms per transaction
Batch Processing: 1000+ transactions supported

See detailed implementation documentation in subsequent sections.