Real-Time Inference Engine
The Speed Problem
Every AI trading tool on the market today does the same thing: sends raw information to an LLM API and waits for a response. Each call takes 1–5 seconds. When a major event breaks and 48 related signals flood in simultaneously, the pipeline chokes.
By the time the AI finishes thinking, the market has already moved.
Roma doesn't do this. We built a layered inference system where 90% of signals never touch a large language model.
Two-Layer Architecture
Layer 1 — Lightweight Domain Model
A purpose-trained model, distilled from large language models and fine-tuned on crypto-specific data. It performs three tasks in under 100 milliseconds:
| Task | Output |
|---|---|
| Entity Recognition | Which coins/projects are mentioned |
| Sentiment Classification | Bullish / Bearish / Neutral |
| Importance Scoring | 0–100 importance rating |
90% of incoming signals are fully processed at this layer. Duplicates are detected and merged. Low-importance noise is filtered out.
Only signals scoring above a configurable threshold advance to Layer 2.
Layer 2 — Deep LLM Reasoning
For the small number of high-importance signals, a full LLM performs deep analysis in 2–3 seconds:
| Task | Output |
|---|---|
| Event Attribution | What type of event is this? |
| Impact Analysis | Which assets are affected, and how? |
| Historical Comparison | What happened last time? |
| Strategy Suggestion | What trade makes sense? |
Example: Powell Speaks
At 19:30, the Fed Chair begins speaking. Within seconds, 48 related signals flood into Roma's pipeline.
Without Roma (typical AI tool):
48 signals × 1-5 seconds per API call = 48-240 seconds total
Result: 48 duplicate alerts, high cost, massive latencyWith Roma:
19:30:05 48 signals arrive
Layer 1 (80ms):
→ Entity: BTC, ETH, USD, Treasury
→ Sentiment: Strong bullish
→ Importance: 92/100
→ 47 duplicates detected and merged
→ 1 signal advances to Layer 2
19:30:05 Layer 2 (2.5s):
→ Event type: Fed policy pivot
→ Historical: 2024 rate cut cycle → BTC +15%
→ Impact: BTC (strong positive), ETH (strong positive),
AI tokens (moderate positive)
→ Strategy: Long BTC/ETH, watch AI sector rotation
19:30:08 → 1 structured trade signal output3 seconds. 1 signal. Zero noise.
Training Data Flywheel
Layer 1's accuracy depends on its training data. This data comes from Roma's own pipeline:
30+ sources generate raw signals daily
↓
Layer 2 LLM processes high-score signals (labeled output)
↓
Labeled data feeds back to retrain Layer 1
↓
Layer 1 becomes more accurate → better filtering → better Layer 2 inputsThis creates a data flywheel: the more signals Roma processes, the better Layer 1 becomes at filtering, which improves the quality of signals reaching Layer 2, which produces better training data for Layer 1.
A competitor can build the same two-layer architecture. But without the pipeline generating domain-specific labeled data at scale, their Layer 1 model will be significantly less accurate.
Cost Efficiency
The layered approach dramatically reduces LLM API costs:
| Approach | API Calls / Day | Estimated Cost |
|---|---|---|
| Naive (every signal → LLM) | ~10,000 | High |
| Roma (layered filtering) | ~500–1,000 | ~10x lower |
Layer 1 runs on lightweight infrastructure. Only the signals that truly matter consume expensive LLM compute.