Quant Corner: Building Low-Latency Trading Signals with ClickHouse for On-Chain Events
A technical guide for quant teams to ingest on-chain events and build sub-second trading signals with ClickHouse—schema, queries, and ops tips for 2026.
Hook: Why quant teams are still losing time on on-chain signals — and how to fix it
Low-latency trading signals from on-chain events are no longer a research edge — they are table stakes for modern quant trading, market-making and arbitrage strategies. Yet teams still struggle with high ingestion latency, noisy raw logs, and inefficient storage that kills query performance when every millisecond matters. This guide shows how to build a production-grade pipeline using ClickHouse to ingest, normalize and query on-chain events for sub-second signals in 2026.
The 2026 context: why ClickHouse for on-chain signals
ClickHouse continues to gain traction in quant infrastructure. After the major funding and cloud adoption waves in 2025, ClickHouse made real-time OLAP the pragmatic choice for teams that need high ingestion throughput and low-latency queries. Its vectorized engine, native Kafka integration, and MergeTree family give you control over ordering and storage layout — crucial for streaming on-chain data like mempool events and DEX swaps.
Tip: ClickHouse Cloud and self-hosted ClickHouse both matured in late 2025 — choose based on latency SLAs and compliance needs: self-host for colocated execution, ClickHouse Cloud for operational simplicity.
High-level architecture for sub-second on-chain signals
Design the pipeline with latency and deterministic ordering in mind:
- Producers: mempool listeners, RPC/websocket feeds (Alchemy, Infura, Blocknative, custom full nodes), relays (Flashbots), indexers (The Graph or custom walkers).
- Enrichment layer: decode event ABIs, normalize token decimals, lookup token metadata (via a low-latency dictionary service or Redis).
- Streaming bus: Kafka or Pulsar for ordered, durable event streams.
- ClickHouse ingestion: Kafka engine → materialized views → ReplicatedMergeTree / MergeTree tables optimized for time-series reads.
- Realtime aggregation & signals: materialized views or lightweight aggregator services read from ClickHouse and push signals to execution systems (Redis, NATS).
Latency-sensitive placement
- Colocate consumers close to the blockchain relays or running nodes to reduce network RTT.
- Run ClickHouse nodes in the same region and VPC as Kafka to minimize cross-cloud hops.
- Use compact serialization (Protobuf/Avro) on the wire and decode in the enrichment layer.
Schema design: fundamentals for low-latency querying
Schema design in ClickHouse defines your performance. For on-chain events you need a schema that supports fast range scans per market (pair, contract) and quick access to the latest state.
Core event table: onchain_events (recommended)
CREATE TABLE onchain_events
(
event_time DateTime64(9),
chain_id UInt64,
block_number UInt64,
tx_index UInt32,
tx_hash String,
log_index UInt32,
contract_address LowCardinality(String),
event_signature LowCardinality(String),
args Nested(
name String, value String
),
parsed_map Map(String, String), -- optional: parsed indexed args
raw_data String,
mempool Bool DEFAULT 0,
received_at DateTime64(9) DEFAULT now64(9)
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/onchain_events', '{replica}')
PARTITION BY (chain_id, toYYYYMMDD(event_time))
ORDER BY (contract_address, event_time, block_number, tx_hash, log_index)
SETTINGS index_granularity = 8192;
Why this shape? Order by contract_address + event_time lets you scan recent events for a specific market fast. Use LowCardinality for repeated strings (event_signature, contract_address) to reduce memory and improve join speeds. Nested or Map columns let you keep parsed args available without exploding column count.
Specialized tables for DEX/AMM trades
Generate a normalized trades table that extracts numeric fields for price, volume and side. Materialize parsed swaps into this table for fast aggregates.
CREATE TABLE trades
(
event_time DateTime64(9),
chain_id UInt64,
pair_address LowCardinality(String),
token_in LowCardinality(String),
token_out LowCardinality(String),
amount_in Decimal(38, 18),
amount_out Decimal(38, 18),
price Decimal(38, 18),
taker Bool,
tx_hash String,
block_number UInt64
)
ENGINE = MergeTree()
PARTITION BY (chain_id, toYYYYMMDD(event_time))
ORDER BY (pair_address, event_time)
SETTINGS index_granularity = 4096;
Materialize writes from Kafka -> onchain_events -> trades via a materialized view to keep swap logic in SQL and ensure fast writes.
Static dictionaries for token metadata
Use ClickHouse dictionary engine or a small Redis cache for token decimals, symbols, and known pool addresses. Dictionaries avoid expensive joins in hot queries and can be updated asynchronously.
Ingestion patterns that preserve order and minimize lag
Preserving ordering matters: arbitrage and market-making need accurate event sequences. Use Kafka partitions keyed by contract_address or pair_address so ordering is deterministic for each market.
- Use small micro-batches (10–200 ms) from enrichment to Kafka to reduce per-message overhead.
- In ClickHouse, use the Kafka engine with a materialized view into MergeTree to process streams quickly.
- Set Kafka consumer batch sizes and linger.ms to tune for latency vs throughput.
- Buffer engine can be useful for absorbing spikes, but it increases tail latency — prefer it only when bursts are common.
ClickHouse SQL patterns for low-latency signals
Below are practical query patterns you can drop into your signal workers (Go, Rust, Python) to compute sub-second indicators.
1) Latest price per market (fast read)
SELECT
pair_address,
anyLast(price) AS last_price,
anyLast(event_time) AS last_time
FROM trades
WHERE event_time >= now64() - INTERVAL 3 SECOND
AND chain_id = 1
GROUP BY pair_address;
Reasoning: anyLast is a cheap aggregator to get most recent value in a time window when your ORDER BY ensures write-ordering per pair.
2) Micro-window VWAP and volume (market-making)
SELECT
pair_address,
sum(price * amount_in) / sum(amount_in) AS vwap,
sum(amount_in) AS volume
FROM trades
WHERE event_time >= now64() - INTERVAL 1 SECOND
AND pair_address IN ('0xPairA','0xPairB')
GROUP BY pair_address;
Use Decimal types for price/amount to avoid floating point noise; this query gives you instant liquidity insights.
3) Detecting cross-exchange price spreads (arbitrage)
WITH
(SELECT pair_address, anyLast(price) AS p1
FROM trades WHERE event_time >= now64() - INTERVAL 1 SECOND AND exchange='UniswapV3'
GROUP BY pair_address) AS u1,
(SELECT pair_address, anyLast(price) AS p2
FROM trades WHERE event_time >= now64() - INTERVAL 1 SECOND AND exchange='SushiSwap'
GROUP BY pair_address) AS u2
SELECT
u1.pair_address,
(u1.p1 - u2.p2) / u2.p2 * 100 AS spread_pct
FROM u1
INNER JOIN u2 USING (pair_address)
WHERE abs((u1.p1 - u2.p2) / u2.p2) > 0.25; -- threshold 0.25%
This pattern computes last prices for each exchange inside a short window and joins them. Because data is restricted to the last second, it's fast if your ORDER BY favors recent scans.
4) Lightweight mempool-based priority signal
SELECT
tx_hash,
contract_address,
parsed_map['amount'] AS amount,
received_at
FROM onchain_events
WHERE mempool = 1
AND received_at >= now64() - INTERVAL 2 SECOND
AND event_signature = 'Swap(address,uint256,uint256)'
ORDER BY received_at DESC
LIMIT 1000;
Use mempool flags to prioritize events before they hit blocks. Feed these into a risk filter to ensure execution safety.
Query optimization checklist
- ORDER BY shape: design ORDER BY to support your most common range scans (pair_address, event_time).
- Partitioning: partition by day and chain_id to speed up drop and TTL operations while keeping recent partitions hot.
- Index granularity: reduce index_granularity for smaller index scans if your queries hit narrow time ranges frequently.
- Use LowCardinality: on repeated strings (exchanges, tokens, event signatures).
- Avoid FINAL: FINAL forces merges and slows queries — use only for rare reconciliation jobs.
- Dictionary lookups: for static token metadata to avoid JOINs during hot-path queries.
- Bloom filter indexes: add token or tx_hash bloom filters if you run many existence checks on large tables.
Operational concerns and failure modes
Plan for these common issues:
- Backpressure: Kafka retention and consumer lag should be monitored. If ClickHouse can't keep up, add more consumers or increase parallelism via multiple partitions per key (with caution).
- Unordered writes: If ordering isn't preserved (multiple producers), use seq numbers at enrichment or the tx's block_number + log_index ordering at insert time.
- Schema drift: event ABI changes; use a flexible parsed_map or JSON column with schema validation in the enrichment step.
- Cost of detailed retention: keep high-cardinality raw logs for short windows (days) and aggregate or downsample for long-term storage.
Signal delivery: moving from ClickHouse rows to execution
ClickHouse should be the source of truth for signals, not the execution layer. Recommended pattern:
- Run ad-hoc low-latency queries (as shown above) in a small pool of signal workers.
- Validate candidate signals against simple safety rules (slippage, gas-price thresholds, cross-check mempool state).
- Write signals to a dedicated signals table (MergeTree) and publish via Redis/NATS for executors to act on.
CREATE TABLE signals
(
signal_id UUID,
generated_at DateTime64(9),
type LowCardinality(String),
payload String,
confidence Float32,
expires_at DateTime64(9)
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(generated_at)
ORDER BY (generated_at, signal_id);
This decouples ClickHouse reads from time-critical executions, preventing slow client queries from delaying signal delivery.
Advanced strategies and 2026 trends
Several trends in late 2025–2026 are relevant:
- On-chain mempool observability: more teams stream mempool via specialized services; integrate them for pre-block arbitrage.
- Rollups and sequencer-based ordering: L2 rollups changed latency characteristics — include chain_id and rollup sequencer timestamps in your schema.
- ClickHouse enhancements: improved cloud-managed offerings and lower operational overhead; new keeper-based consensus reduced ZooKeeper dependence.
- Hybrid execution: combine ClickHouse for real-time aggregates and lightweight in-memory state (Redis) for nanosecond decision loops.
Future-proofing
- Make your enrichment idempotent and stateless so replaying Kafka partitions is safe.
- Store both block and mempool timestamp to reconstruct ordering under reorgs.
- Use per-market partition keys so you can scale hot pools independently.
Practical implementation checklist (step-by-step)
- Instrument a mempool and block listener for your target chains. Ensure you include tx_hash, block_number, log_index and precise timestamps.
- Enrich and normalize messages; map token decimals and canonicalize addresses. Emit Protobuf to Kafka keyed by pair_address.
- Provision ClickHouse cluster near Kafka. Set up Kafka engine tables + materialized views that insert into MergeTree tables.
- Design ORDER BY and PARTITION BY for your hot queries. Use LowCardinality and Dictionaries.
- Implement signal workers that run optimized queries (examples above), validate, and publish signals to execution buses.
- Monitor consumer lag, ClickHouse query profiles, partition sizes and compaction metrics; tune index_granularity and partitions as needed.
Case study: detecting a cross-DEX arbitrage in 300ms
Real-world teams in 2025 reported detecting profitable arbitrage windows by combining mempool swaps and on-chain trades. Key elements:
- Kafka partitions per pair allowed deterministic ordering.
- Materialized trades table provided numeric price fields instantly.
- Signal workers polled recent 1s windows and compared last prices across exchanges using anyLast.
- Signals were validated against mempool liquidity and immediately sent to a colocated executor; entire detection-to-broadcast took ~300ms on average.
Security, compliance and auditability
For finance teams and auditors, ensure immutability and traceability:
- Keep raw onchain_events for a short retention window; snapshot important events into an append-only archive (S3/Parquet) with block proofs.
- Log the enrichment mapping decisions and token decimal lookups for reconstruction.
- Use ClickHouse RBAC and network segmentation to control access to signals and raw data.
Actionable takeaways
- Design ORDER BY for your hot-path scans — pair_address + event_time is the default that minimizes scan amplification.
- Use Kafka partitions keyed by market to preserve ordering, then materialize into MergeTree tables.
- Materialize numeric fields (price, amount) into a trades table for quick aggregation; avoid computing price at query time.
- Keep mempool and block data to capture pre-block opportunities and handle reorgs.
- Publish signals through a separate bus and keep execution off your ClickHouse query path.
Next steps and resources
Start by prototyping with a single hot pair: stream swaps to Kafka, create a trades table in ClickHouse, and deploy a signal worker that computes VWAP and cross-DEX spreads every 200ms. Measure end-to-end latency and iterate on partitioning and index_granularity.
Call to action
Ready to instrument your on-chain signals pipeline? Clone our example repo (includes Kafka producers, ClickHouse DDLs and sample signal workers) and run the 10-minute demo. For hands-on help scaling to production, reach out to our engineering team for a workshop tailored to your trading strategies and compliance requirements.
Related Reading
- How Supply Chain Transparency Became a Basel ine for Investors — and Which Companies Lead the Pack
- Teenage Mutant Ninja Turtles MTG Puzzle Pack: Deck-Building Challenges
- Best Hot-Water Bottles and Cozy Winter Essentials on Sale
- After a Conservatorship: Creating a Care Plan That Prevents Relapse
- Moderation Guide for Tech and Semiconductor Coverage: Handling Speculation and Rumors
Related Topics
Unknown
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
Delivery Innovations: How Last-Mile Solutions Could Enhance Crypto Merchants
Navigating Tax Season with Crypto: Best Practices for Filing in 2026
The Impact of Data Breaches on Crypto Security: Lessons Learned
Winter is Coming: How Seasonality Impacts Crypto Trading Strategies
Beyond the Hype: What Failed Bitcoin Strategies Teach Us About Investment
From Our Network
Trending stories across our publication group