📄️ Model Booster
How Model Booster Controller works
📄️ Multi-Node Inference
This page describes the multi-node inference capabilities in Kthena, based on real-world examples and configurations.
📄️ Autoscaler
Overview
🗃️ Router
3 items
📄️ Gateway Inference Extension Support
Overview
📄️ Runtime
Kthena Runtime is a lightweight sidecar service designed to standardize Prometheus metrics from inference engines, provides LoRA adapter download/load/unload capabilities, and supports model downloading.
🗃️ Prefill Decode Disaggregation
1 item