Kthena - AI Model Serving Platform for Kubernetes

Kubernetes Native

Declarative CRDs for end-to-end AI inference lifecycle management. Gang scheduling, network-topology-aware placement, and Volcano integration bring enterprise-grade orchestration to your existing K8s infrastructure.

Intelligent Routing

Request-level scheduling with pluggable scoring plugins—least latency, KV-cache awareness, prefix-cache matching, and LoRA affinity routing. Per-model fair queuing and token-based rate limiting ensure optimal throughput for every model.

Hierarchical PD Disaggregation Orchestration

Separate prefill and decode phases into independently scalable serving groups. Prefill nodes maximize compute throughput while decode nodes optimize for low latency, enabling fine-grained GPU utilization and flexible scaling ratios.