User Guide Overview

📄️ Multi-Node Inference

This page describes the multi-node inference capabilities in Kthena, based on real-world examples and configurations.

📄️ Gateway Inference Extension Support

Overview

Kthena Runtime is a lightweight sidecar service designed to standardize Prometheus metrics from inference engines, provides LoRA adapter download/load/unload capabilities, and supports model downloading.

🗃️ Prefill Decode Disaggregation

1 item

User Guide Overview

📄️ Model Booster

📄️ Multi-Node Inference

📄️ Autoscaler

🗃️ Router

📄️ Gateway Inference Extension Support

📄️ Runtime

🗃️ Prefill Decode Disaggregation