ModelServing Plugin Framework
The ModelServing plugin framework lets you customize Pods created by the ModelServing controller through optional plugins. The framework supports different inference engines (e.g., vLLM, TensorRT-LLM) and different accelerators (GPU, NPU, etc.).
Overview
The plugin framework provides an extensible way to customize Pods before creation without modifying the core ModelServing API. Key features include:
- Opt-in: When
pluginsis not configured, ModelServing behavior remains unchanged - Multi-plugin composition: Support multiple plugins per ModelServing, executed in list order
- Observability: Track plugin execution status via Kubernetes Events and Status conditions
Workflow
- The controller generates the base Pod from the ModelServing
template - If
spec.pluginsis configured, the Plugin Manager invokes each plugin'sOnPodCreatehook in order - Plugins can modify Pod labels, annotations, env, volumes, nodeSelector, affinity, etc.
- The mutated Pod is created in the cluster
- When the Pod is ready, the
OnPodReadyhook is triggered (for logging, metrics, etc.)
When to Use Plugins
ModelServing already allows users to set role.entryTemplate.spec / role.workerTemplate.spec directly. In practice, organizations often need additional cross-cutting customization that depends on:
- runtime/inference engine specifics: extra sidecars, init containers, env var conventions, ports, probes
- accelerator type/vendor: GPU vs NPU, driver injection, device plugin resources, runtimeClass
- networking: RDMA, topology constraints, hugepages
- integration with cluster infrastructure: custom schedulers, gang scheduling annotations, metadata conventions
For simple configuration, use role.entryTemplate.spec / role.workerTemplate.spec directly; for reusable logic across multiple ModelServing instances, use plugins.
API Configuration
An optional plugins field is added to ModelServingSpec:
| Field | Type | Description |
|---|---|---|
name | string | Plugin type identifier; must match a registered plugin name (e.g., demo-pod-tweaks) |
type | string | Plugin type, e.g., BuiltIn (Webhook reserved for future extension) |
config | object | Plugin-specific configuration, parsed by each plugin |
scope | object (optional) | Restricts where the plugin runs: roles (role names), target (Entry/Worker/All) |
Developing a Custom Plugin
Built-in plugins live in pkg/model-serving-controller/plugins/. To add a custom plugin:
- Implement the Plugin interface with
Name(),OnPodCreate(), andOnPodReady():
type Plugin interface {
Name() string
OnPodCreate(ctx context.Context, req *HookRequest) error
OnPodReady(ctx context.Context, req *HookRequest) error
}
- Define a config struct for your plugin and decode it from
spec.ConfigusingDecodeJSON():
type MyConfig struct {
RuntimeClassName string `json:"runtimeClassName,omitempty"`
Annotations map[string]string `json:"annotations,omitempty"`
Env []corev1.EnvVar `json:"env,omitempty"`
}
- Implement a factory function that constructs the plugin from
PluginSpec:
func NewMyPlugin(spec workloadv1alpha1.PluginSpec) (Plugin, error) {
cfg := MyConfig{}
if err := DecodeJSON(spec.Config, &cfg); err != nil {
return nil, err
}
return &MyPlugin{name: spec.Name, cfg: cfg}, nil
}
- Register the plugin in
init():
func init() {
DefaultRegistry.Register("my-plugin-name", NewMyPlugin)
}
- Rebuild the controller and deploy. The plugin will be available for use in ModelServing.
See pkg/model-serving-controller/plugins/demo_plugin.go for a reference implementation.
Using Plugins
Kthena ships with a built-in demo plugin demo-pod-tweaks that can set runtimeClassName, add annotations, and inject environment variables. Use it to validate the plugin flow before developing your own.
Example: Using the demo plugin
apiVersion: workload.kthena.io/v1alpha1
kind: ModelServing
metadata:
name: llama-8b
spec:
schedulerName: volcano
replicas: 2
plugins:
- name: demo-pod-tweaks
type: BuiltIn
config:
runtimeClassName: nvidia
annotations:
example.com/custom: "value"
env:
- name: CUSTOM_VAR
value: "custom-value"
template:
roles: []
The demo-pod-tweaks plugin accepts:
| Config field | Type | Description |
|---|---|---|
runtimeClassName | string | Sets spec.runtimeClassName on the Pod |
annotations | map[string]string | Merges annotations into the Pod |
env | []EnvVar | Appends environment variables to all containers |
Example: Scoping a plugin to specific roles or targets
plugins:
- name: demo-pod-tweaks
type: BuiltIn
scope:
roles: ["worker"]
target: Worker
config:
runtimeClassName: nvidia
Plugin Execution Order
- Plugins execute in the order defined in the
pluginslist - Each plugin sees the Pod after previous plugins have applied their mutations
- If any plugin returns an error, Pod creation fails and the controller retries via workqueue
For more design details, see the ModelServing Plugin Framework proposal.