Version: v0.1.0

Gateway Inference Extension Support

Overview

Gateway Inference Extension provides a standardized way to expose AI/ML inference services through Kubernetes Gateway API. This guide demonstrates how to integrate Kthena-deployed models with the upstream Gateway API Inference Extension, enabling intelligent routing and traffic management for inference workloads.

The Gateway API Inference Extension extends the standard Kubernetes Gateway API with inference-specific resources:

InferencePool: Manages collections of model server endpoints with automatic discovery and health monitoring
InferenceObjective: Defines priority and capacity policies for inference requests
Gateway Integration: Seamless integration with popular gateway implementations including Envoy Gateway, Istio and Kgateway
Model-Aware Routing: Advanced routing capabilities based on model names, adapters, and request characteristics
OpenAI API Compatibility: Full support for OpenAI-compatible endpoints (/v1/chat/completions, /v1/completions)

Prerequisites

Kubernetes cluster with Kthena installed (see Installation)
Gateway API installed (see Gateway API)
Basic understanding of Gateway API and Gateway Inference Extension

Getting Started

Deploy Sample Model Server

First, deploy a model that will serve as the backend for the Gateway Inference Extension. Follow the Quick Start guide to deploy a model in the default namespace and ensure it's in Active state.

After deployment, identify the labels of your model pods as these will be used to associate the InferencePool with your model instances:

# Get the model pods and their labels
kubectl get pods -n <your-namespace> -l workload.serving.volcano.sh/managed-by=workload.serving.volcano.sh --show-labels

# Example output shows labels like:
# modelserving.volcano.sh/name=demo-backend1
# modelserving.volcano.sh/group-name=demo-backend1-0
# modelserving.volcano.sh/role=leader
# workload.serving.volcano.sh/model-name=demo
# workload.serving.volcano.sh/backend-name=backend1
# workload.serving.volcano.sh/managed-by=workload.serving.volcano.sh

Install the Inference Extension CRDs

Install the Gateway API Inference Extension CRDs in your cluster:

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml

Deploy the InferencePool and Endpoint Picker Extension

Install an InferencePool that selects from Kthena model endpoints with the appropriate labels. The Helm install command automatically installs the endpoint-picker, inferencepool along with provider specific resources.

For Istio deployment:

export GATEWAY_PROVIDER=istio
export IGW_CHART_VERSION=v1.0.1-rc.1

# Install InferencePool and Endpoint Picker pointing to your Kthena model pods
helm install kthena-demo \
  --set inferencePool.modelServers.matchLabels."workload\.serving\.volcano\.sh/model-name"=demo \
  --set provider.name=$GATEWAY_PROVIDER \
  --version $IGW_CHART_VERSION \
  oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool

Deploy an Inference Gateway

Deploy the Istio-based inference gateway and routing configuration:

Install Istio (if not already installed):

TAG=1.27.1

curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.27.1 TARGET_ARCH=x86_64 sh -

cd istio-$TAG/bin

./istioctl install --set values.pilot.env.ENABLE_GATEWAY_API_INFERENCE_EXTENSION=true

Deploy the Gateway:

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/istio/gateway.yaml

Deploy the HTTPRoute:

Create and apply the HTTPRoute configuration that connects the gateway to your InferencePool:

cat <<EOF | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: kthena-demo-route
spec:
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: inference-gateway
  rules:
  - backendRefs:
    - group: inference.networking.k8s.io
      kind: InferencePool
      name: kthena-demo
    matches:
    - path:
        type: PathPrefix
        value: /
    timeouts:
      request: 300s
EOF

Verify Gateway Installation

Confirm that the Gateway was assigned an IP address and reports a Programmed=True status:

kubectl get gateway inference-gateway

# Expected output:
# NAME                CLASS     ADDRESS         PROGRAMMED   AGE
# inference-gateway   istio     <GATEWAY_IP>    True         30s

Verify that all components are properly configured:

# Check Gateway status
kubectl get gateway inference-gateway -o yaml

# Check HTTPRoute status - should show Accepted=True and ResolvedRefs=True
kubectl get httproute kthena-demo-route -o yaml

# Check InferencePool status
kubectl get inferencepool kthena-demo -o yaml

Try it out

Wait until the gateway is ready and test inference through the gateway:

# Get the gateway IP address
IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
PORT=80

# Test completions endpoint
curl -i ${IP}:${PORT}/v1/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "Qwen2.5-0.5B-Instruct",
    "prompt": "Write as if you were a critic: San Francisco",
    "max_tokens": 100,
    "temperature": 0
  }'

Cleanup

To clean up all resources created in this guide:

Uninstall the InferencePool and model resources:

helm uninstall kthena-demo
kubectl delete modelbooster demo -n <your-namespace> --ignore-not-found

Remove Gateway API Inference Extension CRDs:

kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml --ignore-not-found

Clean up Istio Gateway resources:

kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/istio/gateway.yaml --ignore-not-found
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/istio/httproute.yaml --ignore-not-found

Remove Istio (if you want to clean up everything):

istioctl uninstall -y --purge
kubectl delete ns istio-system

Overview​

Prerequisites​

Getting Started​

Deploy Sample Model Server​

Install the Inference Extension CRDs​

Deploy the InferencePool and Endpoint Picker Extension​

Deploy an Inference Gateway​

Verify Gateway Installation​

Try it out​

Cleanup​