Skip to main content

Kthena

Kubernetes-native AI serving platform for scalable model serving

Kubernetes Native

Declarative CRDs for end-to-end AI inference lifecycle management. Gang scheduling, network-topology-aware placement, and Volcano integration bring enterprise-grade orchestration to your existing K8s infrastructure.

Icon_24px_K8Engine_Color

Intelligent Routing

Request-level scheduling with pluggable scoring plugins—least latency, KV-cache awareness, prefix-cache matching, and LoRA affinity routing. Per-model fair queuing and token-based rate limiting ensure optimal throughput for every model.

folder_type_model

Hierarchical PD Disaggregation Orchestration

Separate prefill and decode phases into independently scalable serving groups. Prefill nodes maximize compute throughput while decode nodes optimize for low latency, enabling fine-grained GPU utilization and flexible scaling ratios.