Quant Corner: Building Low-Latency Trading Signals with ClickHouse for On-Chain Events
quantdeveloperanalytics

Quant Corner: Building Low-Latency Trading Signals with ClickHouse for On-Chain Events

UUnknown
2026-03-08
10 min read
Advertisement

A technical guide for quant teams to ingest on-chain events and build sub-second trading signals with ClickHouse—schema, queries, and ops tips for 2026.

Hook: Why quant teams are still losing time on on-chain signals — and how to fix it

Low-latency trading signals from on-chain events are no longer a research edge — they are table stakes for modern quant trading, market-making and arbitrage strategies. Yet teams still struggle with high ingestion latency, noisy raw logs, and inefficient storage that kills query performance when every millisecond matters. This guide shows how to build a production-grade pipeline using ClickHouse to ingest, normalize and query on-chain events for sub-second signals in 2026.

The 2026 context: why ClickHouse for on-chain signals

ClickHouse continues to gain traction in quant infrastructure. After the major funding and cloud adoption waves in 2025, ClickHouse made real-time OLAP the pragmatic choice for teams that need high ingestion throughput and low-latency queries. Its vectorized engine, native Kafka integration, and MergeTree family give you control over ordering and storage layout — crucial for streaming on-chain data like mempool events and DEX swaps.

Tip: ClickHouse Cloud and self-hosted ClickHouse both matured in late 2025 — choose based on latency SLAs and compliance needs: self-host for colocated execution, ClickHouse Cloud for operational simplicity.

High-level architecture for sub-second on-chain signals

Design the pipeline with latency and deterministic ordering in mind:

  1. Producers: mempool listeners, RPC/websocket feeds (Alchemy, Infura, Blocknative, custom full nodes), relays (Flashbots), indexers (The Graph or custom walkers).
  2. Enrichment layer: decode event ABIs, normalize token decimals, lookup token metadata (via a low-latency dictionary service or Redis).
  3. Streaming bus: Kafka or Pulsar for ordered, durable event streams.
  4. ClickHouse ingestion: Kafka engine → materialized views → ReplicatedMergeTree / MergeTree tables optimized for time-series reads.
  5. Realtime aggregation & signals: materialized views or lightweight aggregator services read from ClickHouse and push signals to execution systems (Redis, NATS).

Latency-sensitive placement

  • Colocate consumers close to the blockchain relays or running nodes to reduce network RTT.
  • Run ClickHouse nodes in the same region and VPC as Kafka to minimize cross-cloud hops.
  • Use compact serialization (Protobuf/Avro) on the wire and decode in the enrichment layer.

Schema design: fundamentals for low-latency querying

Schema design in ClickHouse defines your performance. For on-chain events you need a schema that supports fast range scans per market (pair, contract) and quick access to the latest state.

CREATE TABLE onchain_events
  (
    event_time DateTime64(9),
    chain_id UInt64,
    block_number UInt64,
    tx_index UInt32,
    tx_hash String,
    log_index UInt32,
    contract_address LowCardinality(String),
    event_signature LowCardinality(String),
    args Nested(
      name String, value String
    ),
    parsed_map Map(String, String), -- optional: parsed indexed args
    raw_data String,
    mempool Bool DEFAULT 0,
    received_at DateTime64(9) DEFAULT now64(9)
  )
  ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/onchain_events', '{replica}')
  PARTITION BY (chain_id, toYYYYMMDD(event_time))
  ORDER BY (contract_address, event_time, block_number, tx_hash, log_index)
  SETTINGS index_granularity = 8192;
  

Why this shape? Order by contract_address + event_time lets you scan recent events for a specific market fast. Use LowCardinality for repeated strings (event_signature, contract_address) to reduce memory and improve join speeds. Nested or Map columns let you keep parsed args available without exploding column count.

Specialized tables for DEX/AMM trades

Generate a normalized trades table that extracts numeric fields for price, volume and side. Materialize parsed swaps into this table for fast aggregates.

CREATE TABLE trades
  (
    event_time DateTime64(9),
    chain_id UInt64,
    pair_address LowCardinality(String),
    token_in LowCardinality(String),
    token_out LowCardinality(String),
    amount_in Decimal(38, 18),
    amount_out Decimal(38, 18),
    price Decimal(38, 18),
    taker Bool,
    tx_hash String,
    block_number UInt64
  )
  ENGINE = MergeTree()
  PARTITION BY (chain_id, toYYYYMMDD(event_time))
  ORDER BY (pair_address, event_time)
  SETTINGS index_granularity = 4096;
  

Materialize writes from Kafka -> onchain_events -> trades via a materialized view to keep swap logic in SQL and ensure fast writes.

Static dictionaries for token metadata

Use ClickHouse dictionary engine or a small Redis cache for token decimals, symbols, and known pool addresses. Dictionaries avoid expensive joins in hot queries and can be updated asynchronously.

Ingestion patterns that preserve order and minimize lag

Preserving ordering matters: arbitrage and market-making need accurate event sequences. Use Kafka partitions keyed by contract_address or pair_address so ordering is deterministic for each market.

  • Use small micro-batches (10–200 ms) from enrichment to Kafka to reduce per-message overhead.
  • In ClickHouse, use the Kafka engine with a materialized view into MergeTree to process streams quickly.
  • Set Kafka consumer batch sizes and linger.ms to tune for latency vs throughput.
  • Buffer engine can be useful for absorbing spikes, but it increases tail latency — prefer it only when bursts are common.

ClickHouse SQL patterns for low-latency signals

Below are practical query patterns you can drop into your signal workers (Go, Rust, Python) to compute sub-second indicators.

1) Latest price per market (fast read)

SELECT
    pair_address,
    anyLast(price) AS last_price,
    anyLast(event_time) AS last_time
  FROM trades
  WHERE event_time >= now64() - INTERVAL 3 SECOND
    AND chain_id = 1
  GROUP BY pair_address;
  

Reasoning: anyLast is a cheap aggregator to get most recent value in a time window when your ORDER BY ensures write-ordering per pair.

2) Micro-window VWAP and volume (market-making)

SELECT
    pair_address,
    sum(price * amount_in) / sum(amount_in) AS vwap,
    sum(amount_in) AS volume
  FROM trades
  WHERE event_time >= now64() - INTERVAL 1 SECOND
    AND pair_address IN ('0xPairA','0xPairB')
  GROUP BY pair_address;
  

Use Decimal types for price/amount to avoid floating point noise; this query gives you instant liquidity insights.

3) Detecting cross-exchange price spreads (arbitrage)

WITH
    (SELECT pair_address, anyLast(price) AS p1
     FROM trades WHERE event_time >= now64() - INTERVAL 1 SECOND AND exchange='UniswapV3'
     GROUP BY pair_address) AS u1,
    (SELECT pair_address, anyLast(price) AS p2
     FROM trades WHERE event_time >= now64() - INTERVAL 1 SECOND AND exchange='SushiSwap'
     GROUP BY pair_address) AS u2
  SELECT
    u1.pair_address,
    (u1.p1 - u2.p2) / u2.p2 * 100 AS spread_pct
  FROM u1
  INNER JOIN u2 USING (pair_address)
  WHERE abs((u1.p1 - u2.p2) / u2.p2) > 0.25; -- threshold 0.25%
  

This pattern computes last prices for each exchange inside a short window and joins them. Because data is restricted to the last second, it's fast if your ORDER BY favors recent scans.

4) Lightweight mempool-based priority signal

SELECT
    tx_hash,
    contract_address,
    parsed_map['amount'] AS amount,
    received_at
  FROM onchain_events
  WHERE mempool = 1
    AND received_at >= now64() - INTERVAL 2 SECOND
    AND event_signature = 'Swap(address,uint256,uint256)'
  ORDER BY received_at DESC
  LIMIT 1000;
  

Use mempool flags to prioritize events before they hit blocks. Feed these into a risk filter to ensure execution safety.

Query optimization checklist

  • ORDER BY shape: design ORDER BY to support your most common range scans (pair_address, event_time).
  • Partitioning: partition by day and chain_id to speed up drop and TTL operations while keeping recent partitions hot.
  • Index granularity: reduce index_granularity for smaller index scans if your queries hit narrow time ranges frequently.
  • Use LowCardinality: on repeated strings (exchanges, tokens, event signatures).
  • Avoid FINAL: FINAL forces merges and slows queries — use only for rare reconciliation jobs.
  • Dictionary lookups: for static token metadata to avoid JOINs during hot-path queries.
  • Bloom filter indexes: add token or tx_hash bloom filters if you run many existence checks on large tables.

Operational concerns and failure modes

Plan for these common issues:

  • Backpressure: Kafka retention and consumer lag should be monitored. If ClickHouse can't keep up, add more consumers or increase parallelism via multiple partitions per key (with caution).
  • Unordered writes: If ordering isn't preserved (multiple producers), use seq numbers at enrichment or the tx's block_number + log_index ordering at insert time.
  • Schema drift: event ABI changes; use a flexible parsed_map or JSON column with schema validation in the enrichment step.
  • Cost of detailed retention: keep high-cardinality raw logs for short windows (days) and aggregate or downsample for long-term storage.

Signal delivery: moving from ClickHouse rows to execution

ClickHouse should be the source of truth for signals, not the execution layer. Recommended pattern:

  1. Run ad-hoc low-latency queries (as shown above) in a small pool of signal workers.
  2. Validate candidate signals against simple safety rules (slippage, gas-price thresholds, cross-check mempool state).
  3. Write signals to a dedicated signals table (MergeTree) and publish via Redis/NATS for executors to act on.
CREATE TABLE signals
  (
    signal_id UUID,
    generated_at DateTime64(9),
    type LowCardinality(String),
    payload String,
    confidence Float32,
    expires_at DateTime64(9)
  ) ENGINE = MergeTree()
  PARTITION BY toYYYYMMDD(generated_at)
  ORDER BY (generated_at, signal_id);
  

This decouples ClickHouse reads from time-critical executions, preventing slow client queries from delaying signal delivery.

Several trends in late 2025–2026 are relevant:

  • On-chain mempool observability: more teams stream mempool via specialized services; integrate them for pre-block arbitrage.
  • Rollups and sequencer-based ordering: L2 rollups changed latency characteristics — include chain_id and rollup sequencer timestamps in your schema.
  • ClickHouse enhancements: improved cloud-managed offerings and lower operational overhead; new keeper-based consensus reduced ZooKeeper dependence.
  • Hybrid execution: combine ClickHouse for real-time aggregates and lightweight in-memory state (Redis) for nanosecond decision loops.

Future-proofing

  • Make your enrichment idempotent and stateless so replaying Kafka partitions is safe.
  • Store both block and mempool timestamp to reconstruct ordering under reorgs.
  • Use per-market partition keys so you can scale hot pools independently.

Practical implementation checklist (step-by-step)

  1. Instrument a mempool and block listener for your target chains. Ensure you include tx_hash, block_number, log_index and precise timestamps.
  2. Enrich and normalize messages; map token decimals and canonicalize addresses. Emit Protobuf to Kafka keyed by pair_address.
  3. Provision ClickHouse cluster near Kafka. Set up Kafka engine tables + materialized views that insert into MergeTree tables.
  4. Design ORDER BY and PARTITION BY for your hot queries. Use LowCardinality and Dictionaries.
  5. Implement signal workers that run optimized queries (examples above), validate, and publish signals to execution buses.
  6. Monitor consumer lag, ClickHouse query profiles, partition sizes and compaction metrics; tune index_granularity and partitions as needed.

Case study: detecting a cross-DEX arbitrage in 300ms

Real-world teams in 2025 reported detecting profitable arbitrage windows by combining mempool swaps and on-chain trades. Key elements:

  • Kafka partitions per pair allowed deterministic ordering.
  • Materialized trades table provided numeric price fields instantly.
  • Signal workers polled recent 1s windows and compared last prices across exchanges using anyLast.
  • Signals were validated against mempool liquidity and immediately sent to a colocated executor; entire detection-to-broadcast took ~300ms on average.

Security, compliance and auditability

For finance teams and auditors, ensure immutability and traceability:

  • Keep raw onchain_events for a short retention window; snapshot important events into an append-only archive (S3/Parquet) with block proofs.
  • Log the enrichment mapping decisions and token decimal lookups for reconstruction.
  • Use ClickHouse RBAC and network segmentation to control access to signals and raw data.

Actionable takeaways

  • Design ORDER BY for your hot-path scans — pair_address + event_time is the default that minimizes scan amplification.
  • Use Kafka partitions keyed by market to preserve ordering, then materialize into MergeTree tables.
  • Materialize numeric fields (price, amount) into a trades table for quick aggregation; avoid computing price at query time.
  • Keep mempool and block data to capture pre-block opportunities and handle reorgs.
  • Publish signals through a separate bus and keep execution off your ClickHouse query path.

Next steps and resources

Start by prototyping with a single hot pair: stream swaps to Kafka, create a trades table in ClickHouse, and deploy a signal worker that computes VWAP and cross-DEX spreads every 200ms. Measure end-to-end latency and iterate on partitioning and index_granularity.

Call to action

Ready to instrument your on-chain signals pipeline? Clone our example repo (includes Kafka producers, ClickHouse DDLs and sample signal workers) and run the 10-minute demo. For hands-on help scaling to production, reach out to our engineering team for a workshop tailored to your trading strategies and compliance requirements.

Advertisement

Related Topics

#quant#developer#analytics
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-03-08T00:11:25.898Z