Document Summerization with GenAI for Compliance Team
- Anand Nerurkar
- Oct 4
- 3 min read
Step 0: Sample Loan Agreement Document
File: sample-loan-agreement.pdf
Uploaded by: Compliance Officer Amit
Document Type: Term Loan Agreement
Sample Content (Excerpt):
1. Parties: ABC Bank (Lender) and XYZ Corp (Borrower)
2. Loan Amount: $20,000,000
3. Term: 5 years
4. Interest Rate: SOFR + 2%, compounded quarterly
5. Repayment: Quarterly installments
6. Collateral: Real estate properties at XYZ Corp HQ
7. Covenants:
7.1 Borrower must maintain DSCR ≥ 1.25
7.2 Borrower may request grace period up to 180 days
7.3 Quarterly financial reports to be submitted within 30 days
8. Prepayment: Allowed with 2% penalty
9. Regulatory:
AML compliance under FATCA & RBI KYC 2023 Master Circular
Basel III exposure norms for large corporate borrowers
10. Action Items:
Compliance verification of FATCA clause
Risk team sensitivity analysis on revenue forecast
Step 1: Angular UI Upload
Actions:
Compliance Officer Amit logs in (RBAC access).
Uploads PDF via multipart form.
Metadata submitted: document type, version, sensitivity.
Tables Updated:
Audit Table: audit_log
audit_id | event_name | document_id | performed_by | timestamp | details |
A1 | DOCUMENT_UPLOAD_INITIATED | null | Amit | 2025-10-04 10:00 | Upload initiated, metadata validated |
Event Emitted: DOCUMENT_UPLOAD_INITIATED
Consumed By: AuditService
AI Guardrails: Input validation, file type/size, PII masking.
Step 2: Document Service - Save & Persist
Actions:
File saved locally: /data/documents/sample-loan-agreement.pdf
Metadata stored in document_metadata table.
Table: document_metadata
document_id | file_name | doc_type | uploaded_by | upload_date | version | sensitivity | storage_path | processed_flag | processed_date |
UUID1 | sample-loan-agreement.pdf | LoanAgreement | Amit | 2025-10-04 10:01 | 1.0 | Confidential | /data/documents/sample-loan-agreement.pdf | false | null |
Audit Table Entry:
audit_id | event_name | document_id | performed_by | timestamp | details |
A2 | DOCUMENT_UPLOADED | UUID1 | DocumentService | 2025-10-04 10:01 | File saved, metadata persisted |
Event Emitted: DOCUMENT_UPLOADED
Consumed By: OCRService, AuditService
Step 3: OCR & Text Extraction
Actions:
OCRService consumes DOCUMENT_UPLOADED.
Extracts text and splits into chunks.
PII masked or tokenized.
Table: document_text
chunk_id | document_id | text_chunk | pii_masked_flag |
1 | UUID1 | "Parties: ABC Bank (Lender) and XYZ Corp..." | true |
2 | UUID1 | "Loan Amount: $20,000,000; Term: 5 years..." | false |
3 | UUID1 | "Covenants: 7.1 DSCR ≥1.25; 7.2 grace 180d..." | false |
Audit Table Entry:
audit_id | event_name | document_id | performed_by | timestamp | details |
A3 | DOCUMENT_OCR_COMPLETED | UUID1 | OCRService | 2025-10-04 10:03 | 3 text chunks extracted, PII masked |
Event Emitted: DOCUMENT_OCR_COMPLETED
Step 4: Indexing & Embeddings
Actions:
IndexingService consumes DOCUMENT_OCR_COMPLETED.
Generates embeddings for each chunk via OpenAI API.
Stores embeddings in PGVector table.
Redis caching for frequently accessed embeddings.
Table: document_embeddings (PGVector)
chunk_id | document_id | embedding_vector | created_at |
1 | UUID1 | [0.12,0.56,...0.78] | 2025-10-04 10:04 |
2 | UUID1 | [0.98,0.34,...0.11] | 2025-10-04 10:04 |
Audit Table Entry:
audit_id | event_name | document_id | performed_by | timestamp | details |
A4 | DOCUMENT_INDEXED | UUID1 | IndexingService | 2025-10-04 10:05 | 3 chunks indexed, embeddings created |
Event Emitted: DOCUMENT_INDEXED
Step 5: Summarization (Executive Summary)
Actions:
SummarizationService consumes DOCUMENT_INDEXED.
LLM generates structured summary with AI Guardrails (hallucination checks, citation, explainability, validation).
Table: document_summary
summary_id | document_id | executive_summary | key_obligations | risks | regulatory_mapping | clause_categorization | action_items | feedback_status |
S1 | UUID1 | "Loan agreement: $20M, 5y, SOFR+2%, quarterly" | "DSCR ≥1.25; Quarterly reports; Prepayment" | "Grace 180d; Cross-border guarantees" | "AML: FATCA & RBI KYC; Basel III exposure" | "Covenants, Prepayment, Repayment" | "Verify FATCA; Risk revenue analysis" | Pending |
Audit Table Entry:
audit_id | event_name | document_id | performed_by | timestamp | details |
A5 | DOCUMENT_SUMMARIZED | UUID1 | SummarizationService | 2025-10-04 10:06 | Summary generated, AI guardrails applied |
Event Emitted: DOCUMENT_SUMMARIZED
Step 6: Feedback Loop
Actions:
Compliance officer reviews summary: Approve / Request changes.
Updates document_summary.feedback_status.
Audit Table Entry:
audit_id | event_name | document_id | performed_by | timestamp | details |
A6 | FEEDBACK_UPDATED | UUID1 | Amit | 2025-10-04 10:07 | Feedback: Approved |
Event Emitted: FEEDBACK_UPDATED
Step 7: Approval & Notifications
Actions:
ApprovalService consumes FEEDBACK_UPDATED.
Updates document_metadata.processed_flag = true and processed_date.
Notifies risk & compliance dashboards.
Audit Table Entry:
audit_id | event_name | document_id | performed_by | timestamp | details |
A7 | DOCUMENT_APPROVED | UUID1 | ApprovalService | 2025-10-04 10:08 | Document approved, dashboards notified |
Event Emitted: DOCUMENT_APPROVED
Step 8: Semantic Search & Retrieval
Actions:
SearchService queries PGVector embeddings for semantic search.
Returns top-k relevant chunks.
Audit Table Entry:
audit_id | event_name | document_id | performed_by | timestamp | details |
A8 | DOCUMENT_SEARCHED | UUID1 | Amit | 2025-10-04 10:09 | Retrieved 3 similar chunks for query |
Step 9: Event Flow Summary
Event Name | Produced By | Consumed By | Action Taken |
DOCUMENT_UPLOAD_INITIATED | Angular UI/Controller | AuditService | Log user & metadata |
DOCUMENT_UPLOADED | DocumentService | OCRService, AuditService | OCR extraction, log metadata |
DOCUMENT_OCR_COMPLETED | OCRService | IndexingService, AuditService | Generate chunks, PII masking |
DOCUMENT_INDEXED | IndexingService | SummarizationService, SearchService, AuditService | Embeddings created, cached |
DOCUMENT_SUMMARIZED | SummarizationService | FeedbackService, AuditService | Executive summary generated, AI guardrails applied |
FEEDBACK_UPDATED | FeedbackService | ApprovalService, AuditService | Feedback processed, retraining if needed |
DOCUMENT_APPROVED | ApprovalService | AuditService | Final approval logged |
DOCUMENT_SEARCHED | SearchService | AuditService | Semantic search logged |
✅ Key Features Illustrated in This Walkthrough:
Tables: document_metadata, document_text, document_embeddings (PGVector), document_summary, audit_log
AI Guardrails: Input/output validation, hallucination checks, PII masking, citation, explainability
Caching: Redis cache for embeddings
Event-Driven Architecture: Full asynchronous flow with events emitted and consumed
Audit: Each action is logged with timestamp, actor, document_id, event_name
Comments