๐ Log Classification System
3-tier hybrid pipeline โ ๐ข Regex ยท ๐ต BERT + LogReg ยท ๐ก LLM Built to mimic production enterprise log monitoring architecture.
Source System
๐ Example Logs (click to try)
| Source System | Log Message |
|---|
Upload a CSV with columns: source, log_message
Download the classified CSV with added columns: predicted_label, tier_used, confidence.
Sample CSV format:
source,log_message
ModernCRM,User User123 logged in.
LegacyCRM,Case escalation for ticket ID 7324 failed.
BillingSystem,GET /api/v2/invoice HTTP/1.1 status: 500
๐๏ธ 3-Tier Hybrid Pipeline
| Tier | Method | Coverage | Latency | When Used |
|---|---|---|---|---|
| ๐ข Regex | Python re patterns |
~21% | < 1ms | Fixed patterns (login, backup, etc.) |
| ๐ต BERT | all-MiniLM-L6-v2 + LogReg |
~79% | 20โ80ms | High-volume categories with 150+ samples |
| ๐ก LLM | HuggingFace Inference API | ~0.3% | 500โ2000ms | LegacyCRM logs, rare patterns |
๐ Model Performance (from training)
- BERT + LogReg trained on 2,410 synthetic enterprise logs
- Confidence threshold: 0.5 (below โ escalate to LLM)
- Source-aware routing:
LegacyCRMbypasses ML entirely (only 7 training samples)
๐ Environment Variables
| Secret | Required For |
|---|---|
HF_TOKEN |
LLM inference (LegacyCRM logs) |