Data lake setup: log shipping with Alloy, metrics with Prometheus, analytics with ClickHouse

[DRAFT NOTE] The Promtail → Alloy migration is fully drafted. The ClickHouse adoption and the Prometheus+ClickHouse split are skeletal — they need [TODO] markers filled with specifics (use cases, volumes, schema choices, queries that drove the adoption).

There are roughly two kinds of “data lake” articles. The first kind is about Hudi/Iceberg on S3 and the warehouse-style analytics that go with it — for us, that’s the analytics pipeline pillar. The second is about the operational data lake: where your logs live, where your metrics live, where the high-cardinality observational data goes when Prometheus politely says “no.”

This dive is the second kind.

The shape of the stack

At Kore, the operational data flows through three pipelines that meet at Grafana:

                           ┌──▶ Loki (logs)            ─┐
node ──▶ Alloy DaemonSet ──┼──▶ Prometheus (metrics)   ─┼──▶ Grafana (query, dashboard, alert)
                           └──▶ ClickHouse (high-card) ─┘

Each store handles a different shape of data:

Loki for unstructured logs. Strings, timestamps, a small set of labels. Cheap to ingest, queryable but not analytical.
Prometheus for time-series metrics. Low-to-medium cardinality, regular intervals, mathematical operations (rate, histogram, predict).
ClickHouse for everything Prometheus can’t do gracefully — high-cardinality dimensions (per-tenant, per-user, per-request) that need analytical queries against weeks of data.

The reason we run all three is that no single store does all three jobs well. Loki is great at logs and terrible at math. Prometheus is great at metrics and terrible at high cardinality. ClickHouse is great at columnar analytical queries and not the right tool for either raw logs or alerting-fast metrics.

The agent consolidation: Promtail → Grafana Alloy

Before Alloy matured, the observability stack required three agents on every node: Promtail for logs to Loki, node_exporter for host metrics, and the Grafana agent for Prometheus remote-write (and some application-level scraping). Three DaemonSets, three config languages, three upgrade cycles.

It worked. It also accumulated operational cost over time — every observability change required edits across three repos, and the aggregate sidecar footprint at our pod count was real.

Grafana Alloy folds all of that into a single agent with one configuration language (a flavour of HCL — they call it “River”). When it shipped its first production-ready release, we proposed and ran the migration.

The Alloy config for what used to be three agents:

loki.source.kubernetes_pods "pods" {
  forward_to = [loki.process.app_logs.receiver]
}
loki.process "app_logs" {
  // label hygiene, drops, stage transforms
  forward_to = [loki.write.central.receiver]
}
loki.write "central" {
  endpoint { url = "..." }
}

prometheus.exporter.unix "node" { }
prometheus.scrape "node" {
  targets    = prometheus.exporter.unix.node.targets
  forward_to = [prometheus.remote_write.central.receiver]
}
prometheus.remote_write "central" {
  endpoint { url = "..." }
}

One binary, one config, fewer moving parts.

The migration was tranched by node pool. Each tranche carried a back-out plan (re-enable old DaemonSets, disable Alloy) tested before starting. We ran the old agents and Alloy in parallel during the pilot and diffed dashboards before-and-after to confirm parity. Old DaemonSets stayed disabled for a few weeks before being removed entirely, in case we needed to roll back.

What it bought us:

Agents per node: 3 → 1. Reduced per-node observability overhead.
Config sprawl collapsed. Three repos, three languages → one. Reviews and changes became one-place.
Upgrade cycle simplified. One agent to track, one CVE feed to follow.
Migration completed with zero data gaps and zero customer-visible incidents.

Gotchas worth flagging:

Config compatibility wasn’t 100%. A few Promtail pipeline stages had no exact Alloy equivalent and needed redesigning. List them before you start the migration; don’t discover them mid-flight.
The Alloy config language (River/HCL flavour) has a learning curve. Budget engineer time for the language, not just the mechanics.
Running both old and new in parallel during the pilot is cheap insurance, and it makes the parity diff trivial.

If you’re still on Promtail and the rest of the trio, this is the migration to do now. The consolidation pays back fast.

Prometheus: what it’s great at, what it isn’t

Great at: time-series with bounded cardinality. Counters, gauges, histograms. Rate-of-change queries. Multi-burn-rate alerts. The metrics-and-alerts story that PromQL was designed for.

Not great at: high cardinality. Every unique combination of label values creates a new time series. Add a tenant_id label to a metric in a multi-tenant system with thousands of tenants and you’ve just exploded your cardinality. Add a user_id and you’re done — your Prometheus is going to OOM, or your queries are going to time out, or both.

The discipline that keeps Prometheus working at scale:

Label hygiene reviewed in PRs. No per-user labels. No per-session labels. No request_id as a label. Tenant labels only on metrics where tenant-level views are genuinely needed.
Recording rules to pre-aggregate. Anything a dashboard queries should already be pre-computed via recording rules. Cardinality is fixed at write time, not query time.
Exporters chosen carefully. mongodb_exporter, redis_exporter, node_exporter are well-behaved. Some application-level exporters expose every internal metric they have; review what they emit and drop what you don’t need.

For the cardinalities and access patterns that fit, Prometheus is excellent and we lean on it heavily. For everything else — particularly anything involving per-tenant or per-request analytics — Prometheus is the wrong shape.

ClickHouse: the high-cardinality answer

[TODO: This section needs your specific use cases and adoption details. The framing below is correct; the specifics need verification.]

ClickHouse fills the gap Prometheus can’t. It’s a columnar OLAP database optimized for analytical queries over wide, high-cardinality tables. You can throw billions of rows at it with hundreds of dimensions and get sub-second query latency if you’ve designed the table right.

What we use it for at Kore:

[TODO: per-tenant usage analytics? request-level dimensions for debugging? specific tables and the queries that drove the adoption]
[TODO: how we ingest — direct from Kafka? CDC from somewhere?]
[TODO: retention policy and partitioning scheme]

What ClickHouse is great at:

Wide tables with many columns, of which any given query reads a few.
High-cardinality dimensions (per-user, per-request, per-tenant).
Analytical aggregations over weeks-to-months of data.
Concurrent users running ad-hoc queries.

What ClickHouse isn’t:

A general-purpose database. Updates and deletes are not cheap; design for append-mostly workloads.
A replacement for Prometheus. The query model is different and PromQL doesn’t translate. Use it for what it’s good at.
A low-latency operational store. Sub-second queries are achievable; sub-millisecond is not.

[TODO: a specific example query or two showing the kind of analytical question ClickHouse makes tractable]

How the three stores actually divide work

A useful framing: which questions does each store answer?

Question	Store
”What’s the p95 latency of service X in the last 5 minutes?”	Prometheus
”What does the error log say for request abc-123?”	Loki
”How many tenants saw degraded performance during yesterday’s incident, broken down by region?”	ClickHouse
”Is the platform meeting its SLO right now?”	Prometheus (recording rule + alert)
“What’s the per-tenant request volume for the last 30 days, ranked?”	ClickHouse
”Show me all errors from pod X between 14:00 and 14:30.”	Loki

The clarity of this mapping is what makes the three-store setup workable. If you’re not sure which store to use, the question is probably ambiguous and worth refining before you write the query.

What I’d do differently

Adopt Alloy earlier. I waited for it to feel “fully production-ready” before proposing the migration. It was ready earlier than I committed to. The conservatism cost a few months of unnecessary triple-agent operation.

[TODO: ClickHouse-specific learnings]. What I underestimated about ClickHouse operationally. What schema choices I’d make differently.

Bring tracing in earlier. This stack handles metrics and logs well. Traces (OpenTelemetry collector → Tempo or similar) came late. For distributed-system debugging, traces are the missing leg of the metrics/logs/traces triad.

Centralise dashboard ownership earlier. “Every team owns its dashboards” produces inconsistent panels and broken links across reorgs. Some central convention for incident-response dashboards, even if individual service dashboards are team-owned, would have paid back.

Kore observability setup — broader observability story; this deep-dive is the data layer, that one is the consumption layer (dashboards, alerts, SLOs)
Analytics pipeline pillar — the other data lake (S3 + Hudi for warehouse-style analytics, not operational)
Kore infrastructure overview — where this stack fits in the broader architecture