Skip to content

Real-Time Inference Engine

The Speed Problem

Every AI trading tool on the market today does the same thing: sends raw information to an LLM API and waits for a response. Each call takes 1–5 seconds. When a major event breaks and 48 related signals flood in simultaneously, the pipeline chokes.

By the time the AI finishes thinking, the market has already moved.

Roma doesn't do this. We built a layered inference system where 90% of signals never touch a large language model.

Two-Layer Architecture

Inference Engine Signal Funnel

Layer 1 — Lightweight Domain Model

A purpose-trained model, distilled from large language models and fine-tuned on crypto-specific data. It performs three tasks in under 100 milliseconds:

TaskOutput
Entity RecognitionWhich coins/projects are mentioned
Sentiment ClassificationBullish / Bearish / Neutral
Importance Scoring0–100 importance rating

90% of incoming signals are fully processed at this layer. Duplicates are detected and merged. Low-importance noise is filtered out.

Only signals scoring above a configurable threshold advance to Layer 2.

Layer 2 — Deep LLM Reasoning

For the small number of high-importance signals, a full LLM performs deep analysis in 2–3 seconds:

TaskOutput
Event AttributionWhat type of event is this?
Impact AnalysisWhich assets are affected, and how?
Historical ComparisonWhat happened last time?
Strategy SuggestionWhat trade makes sense?

Example: Powell Speaks

At 19:30, the Fed Chair begins speaking. Within seconds, 48 related signals flood into Roma's pipeline.

Without Roma (typical AI tool):

48 signals × 1-5 seconds per API call = 48-240 seconds total
Result: 48 duplicate alerts, high cost, massive latency

With Roma:

19:30:05  48 signals arrive

          Layer 1 (80ms):
          → Entity: BTC, ETH, USD, Treasury
          → Sentiment: Strong bullish
          → Importance: 92/100
          → 47 duplicates detected and merged
          → 1 signal advances to Layer 2

19:30:05  Layer 2 (2.5s):
          → Event type: Fed policy pivot
          → Historical: 2024 rate cut cycle → BTC +15%
          → Impact: BTC (strong positive), ETH (strong positive),
                    AI tokens (moderate positive)
          → Strategy: Long BTC/ETH, watch AI sector rotation

19:30:08  → 1 structured trade signal output

3 seconds. 1 signal. Zero noise.

Training Data Flywheel

Layer 1's accuracy depends on its training data. This data comes from Roma's own pipeline:

30+ sources generate raw signals daily

Layer 2 LLM processes high-score signals (labeled output)

Labeled data feeds back to retrain Layer 1

Layer 1 becomes more accurate → better filtering → better Layer 2 inputs

This creates a data flywheel: the more signals Roma processes, the better Layer 1 becomes at filtering, which improves the quality of signals reaching Layer 2, which produces better training data for Layer 1.

A competitor can build the same two-layer architecture. But without the pipeline generating domain-specific labeled data at scale, their Layer 1 model will be significantly less accurate.

Cost Efficiency

The layered approach dramatically reduces LLM API costs:

ApproachAPI Calls / DayEstimated Cost
Naive (every signal → LLM)~10,000High
Roma (layered filtering)~500–1,000~10x lower

Layer 1 runs on lightweight infrastructure. Only the signals that truly matter consume expensive LLM compute.