Kore infrastructure overview: architecture, reasoning, pain points

[DRAFT NOTE] Architecture description is accurate; specific node counts, pod counts, and capacity numbers tagged [TODO] reflect current production rather than load-test peaks and should be verified before promoting.

Most of the deep-dives on this site reference “the platform” without explaining it. This dive is the explanation. If you’re reading any of the scaling, RMQ, MongoDB, or autoscaling content and finding yourself asking “but what is this thing actually shaped like?” — start here.

The honest framing: this is the architecture as it exists today, with all the compromises and operational debts that accumulated along the way. It’s not the architecture I’d design from scratch. It’s the architecture that evolved, and understanding why each piece is shaped the way it is requires understanding the history that shaped it.

What Kore.AI does, in one paragraph

Kore.AI is a multi-tenant conversational-AI platform serving enterprise customers. Customers build bots — for IVR (voice), webchat, RTM (long-lived WebSocket), FAQ, ML-driven flows — and Kore serves them. Each bot can route through multiple channels, call multiple downstream services (the customer’s internal APIs, third-party integrations), and use multiple ML capabilities (intent detection, FAQ retrieval, language detection, embeddings, sentiment analysis). At runtime, a user message comes in via a channel, gets routed through the bot definition, calls whatever it needs to call, and a response goes back. Multiply by thousands of concurrent users per enterprise customer, hundreds of enterprise customers, and you have a real platform.

The runtime, at a glance

                            ┌─────────────────────────────┐
   Channels                 │      koreserver pool        │
  (IVR / WebSocket /  ─────▶│   (Node.js monolith)        │
   Webhook / RTM)           │   - app pods                 │
                            │   - consumers (RMQ workers)  │
                            │   - NLP                      │
                            └─────────────────────────────┘
                                       │
        ┌──────────────────────────────┼──────────────────────────────┐
        ▼                              ▼                              ▼
   ┌─────────┐                  ┌─────────────┐                ┌──────────┐
   │ MongoDB │                  │  RabbitMQ   │                │  Redis   │
   │ sharded │                  │ 4 clusters  │                │ 6-shard  │
   │ cluster │                  │ 8 nodes ea  │                │ ElastiC. │
   └─────────┘                  └─────────────┘                └──────────┘
        │                              │                              │
        ▼                              ▼                              ▼
                              ┌─────────────────┐
                              │   shared NFS    │
                              │   (EFS, regret) │
                              └─────────────────┘

   Supporting workloads:
   - ML inference (embeddings, intent classification)
   - FAQ engine
   - Botkit (log shipper, soon-to-be-retired)
   - Analytics consumers

A few things worth highlighting:

koreserver is a Node.js monolith. Same image, different deployments per workload class (app, consumers, NLP). Per-pool resource sizing, per-pool autoscaling. The decision to keep it a monolith — rather than decompose into microservices — was a deliberate call that the scaling pillar discusses.

ML and FAQ are separate services. They have different scaling characteristics (memory-heavy, less request-shaped), different update cadences (ML models change independently), and different SLO profiles. Splitting them out was straightforward and earned its keep.

The data tier is three different systems. MongoDB for state. Redis for cache and session. RabbitMQ for async work. Each chosen for what it’s good at; each adds operational surface area. We’re not consolidating any of them in the near term.

NFS exists because of historical reasons and I would design it out if we did the whole thing again. Botkit writes logs to NFS without logrotate, which has caused inode exhaustion incidents. Plugins and shared artefacts use it. Retiring it is in the backlog; it has not happened yet.

The dual-cloud thing

Kore runs on both AWS (EKS) and Azure (AKS). Same workloads, different regions, different customer commitments. The reasons are historical and commercial: different enterprise customers had different cloud preferences, sometimes contractually. We have a region in GCP too for one specific customer.

The operational consequences:

Cloud-specific code is a constant tax. Instance types differ (c6i.8xlarge on AWS at 3.5GHz vs Azure equivalents at 2.4GHz with turbo boost). EBS gp3/io2 vs Azure managed disks have different IOPS/throughput trade-offs. ALB vs Application Gateway have different timeout and connection behaviours.
We try to keep configuration cloud-agnostic where possible. Helm charts with per-cloud values overlays. Terraform modules with cloud-specific implementations behind a common interface.
Some performance characteristics genuinely differ. koreserver’s NLP paths are clock-speed sensitive; AWS c6i at 3.5GHz outperforms Azure’s 2.4GHz baseline measurably. We compensate with autoscaling thresholds tuned per cloud.

If we were starting fresh and didn’t have the dual-cloud commitment, we’d pick one. The benefits of running both — geographic diversity, customer flexibility — are real but expensive.

The current scale (approximately)

[TODO: verify these against current production before promoting]

~100+ nodes across both clouds in the largest region
~1,000+ pods in steady state
~100+ distinct services (most are different deployments of the same monolith)
15k+ concurrent users supported as of the last scaling round (see scaling pillar)
1,000+ RPS in steady state, much higher peak

These numbers undersell the variability. Lower environments run at a fraction of this. Customer launches and load tests spike well above. Per-tenant load is wildly uneven — a handful of big tenants dominate volume.

The data tier in more detail

MongoDB: sharded cluster, multiple shards on c5d.18xlarge (72 vCPU, 144 GB, io2 disks at 50k IOPS), 1+ MongoS routers on c5n.4xlarge, config servers on c5.4xlarge. v8.x. Shard keys per collection covered in MongoDB sharding. The “5 shards + 1 replica set” pattern came out of the scaling work — analytics collections live on a separate cluster to remove load from runtime shards.

Redis: ElastiCache, currently 6 shards × 3 replicas on cache.c7gn.8xlarge. Started at 3 × 4 and grew during the scaling work. Used for cache and ephemeral session state. The shard count is driven by Redis’s single-threaded nature — at saturation, more shards is the only fix.

RabbitMQ: 4 independent clusters, 8 nodes each, c5.18xlarge. Mirror policy ha-two for high-throughput queues, ha-three for highest-criticality. The decision to run 4 clusters rather than 1 large one is in the RMQ optimizations pillar. HA policy thinking in message brokers compared.

NFS: EFS, 500GB, io2, 50k IOPS. Used for plugin storage, Botkit log shipping (regrettably), and shared artefacts. Retiring this dependency is in the backlog and has not happened. The IOPS upgrades during scaling work bought us runway, not a solution.

Kafka: separate from the runtime data tier; used for the analytics pipeline (Mongo CDC → Kafka → Glue → Hudi on S3). Detail in the analytics pipeline pillar.

Compute: node groups by workload

After the autoscaling redesign (see autoscaling pillar), workloads live on workload-specific node groups:

Node group	Instance	Workload
`app-compute`	c6i.8xlarge	koreserver, consumers, NLP
`ml-memory`	memory-optimized	ML inference, embeddings
`rmq-dedicated`	c5.18xlarge	RabbitMQ only
`mongo-compute`	c5d.18xlarge	MongoDB shards (EC2, not K8s)

The pre-split state was “everything on one node group sized for ML.” That model was wasteful in a way the autoscaling pillar covers in detail.

The big pain points

The list of things that hurt the most operationally, in roughly descending order of impact:

1. The monolith is hard to decompose incrementally. koreserver does many things. Pulling pieces out is expensive (shared state, shared config, shared assumptions). We’ve separated a few services — ML, FAQ, sandbox — but the core remains monolithic. The decomposition discussion comes up regularly; it has not been funded as a project because the immediate operational pain is manageable. Long-term this is a real constraint.

2. NFS as shared state. Repeated source of incidents (inode exhaustion, IOPS exhaustion, Botkit restarts). Should have been designed out during the K8s migration; wasn’t.

3. The koreserver Node.js single-process model. Per-pod throughput is bounded by the event loop. Scaling means scaling pod count, which means more connections to every downstream, which means more connection-tier load. We’ve worked around this with connection pinning and pool tuning; the fundamental constraint remains.

4. Cross-cloud differences as a constant tax. Most changes need to be tested on both EKS and AKS. Some changes that “work” on AWS quietly behave differently on Azure (turbo boost timing being a memorable example). The dual-cloud cost is real and constant.

5. Multi-tenancy without strong isolation. Tenants share the underlying infrastructure. A noisy tenant affects others. We have rate limiting and quota systems, but they’re tuned rather than enforced — a sufficiently determined load spike from one tenant can still affect others. True tenant isolation (separate clusters, separate Mongo databases) is in the backlog and is expensive.

What I’d change with a clean slate

If I were designing Kore’s infrastructure from scratch today:

Pick one cloud. Dual-cloud is the largest source of operational tax for not-enough benefit.
Microservices where they earn their keep. Not blanket decomposition — but ML, FAQ, voice path, and the few other clearly-bounded surfaces would be separate from day one. The monolith would stay smaller.
Per-tenant data isolation by default. Either separate Mongo databases per large tenant, or schema-level isolation with hard quotas. The cross-tenant noisy-neighbour problem is a constant.
No NFS. Logs go to object storage or a log shipper. Shared artefacts go to a registry. The “shared mounted disk” pattern is a relic that doesn’t earn its keep at scale.
Kafka for everything that’s append-style. RMQ for work distribution only. Cleaner separation of broker concerns by workload type.
Ambient-mode Istio (or whatever the equivalent is in two years) from day one. Per-pod sidecars are legacy already; starting fresh means starting in ambient.

None of this will happen because we have a running platform and the cost-benefit doesn’t justify a rewrite. The point of writing this down is for the next time someone asks “if you were designing this fresh” — there’s an honest answer.

This deep-dive is the context for most of the others. Specifically:

Scaling to 15k CCU — the journey through this architecture’s bottlenecks
VM-to-K8s migration — how we landed on this K8s shape
Autoscaling — how compute provisioning works
Kore CI/CD pipeline — how changes flow into this infrastructure
Kore observability setup — how we watch all of this