API Reference
Packages
networking.serving.volcano.sh/v1alpha1
Resource Types
BodyMatch
BodyMatch defines the predicate used to match request body content
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
model string | Model is the name of the model or lora adapter to match. If this field is not specified, any model or lora adapter will be matched. |
GlobalRateLimit
GlobalRateLimit contains configuration for global rate limiting
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
redis RedisConfig | Redis contains configuration for Redis-based global rate limiting. |
InferenceEngine
Underlying type: string
InferenceEngine defines the inference framework used by the modelServer to serve LLM requests.
Validation:
- Enum: [vLLM SGLang]
Appears in:
| Field | Description |
|---|---|
vLLM | https://github.com/vllm-project/vllm |
SGLang | https://github.com/sgl-project/sglang |
KVConnectorSpec
KVConnectorSpec defines KV connector configuration for PD disaggregated routing
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
type KVConnectorType | Type specifies the connector type. If you do not know which type to use, please use "http" as default. | http | Enum: [http lmcache nixl mooncake] |
KVConnectorType
Underlying type: string
Appears in:
| Field | Description |
|---|---|
http | |
nixl | |
lmcache | |
mooncake |
ModelMatch
ModelMatch defines the predicate used to match LLM inference requests to a given TargetModels. Multiple match conditions are ANDed together, i.e. the match will evaluate to true only if all conditions are satisfied.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
headers object (keys:string, values:StringMatch) | Header to match: prefix, exact, regex If unset, any header will be matched. | ||
uri StringMatch | URI to match: prefix, exact, regex If this field is not specified, a default prefix match on the "/" path is provided. | ||
body BodyMatch | Body contains conditions to match request body content |
ModelRoute
ModelRoute is the Schema for the Modelroutes API.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string | networking.serving.volcano.sh/v1alpha1 | ||
kind string | ModelRoute | ||
kind string | Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds | ||
apiVersion string | APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources | ||
metadata ObjectMeta | Refer to Kubernetes API documentation for fields of metadata. | ||
spec ModelRouteSpec | |||
status ModelRouteStatus |
ModelRouteList
ModelRouteList contains a list of ModelRoute.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string | networking.serving.volcano.sh/v1alpha1 | ||
kind string | ModelRouteList | ||
kind string | Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds | ||
apiVersion string | APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources | ||
metadata ListMeta | Refer to Kubernetes API documentation for fields of metadata. | ||
items ModelRoute array |
ModelRouteSpec
ModelRouteSpec defines the desired state of ModelRoute.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
modelName string | model in the LLM request, it could be a base model name, lora adapter name or evena virtual model name. This field is used to match scenarios other than model adapter name and this field could be empty, but it and ModelAdapters can't both be empty. | ||
loraAdapters string array | model in the LLM request could be lora adapter name,here is a list of Lora Adapter Names to match. | MaxItems: 10 | |
rules Rule array | An ordered list of route rules for LLM traffic. The first rule matching an incoming request will be used. If no rule is matched, an HTTP 404 status code MUST be returned. | MaxItems: 16 | |
rateLimit RateLimit | Rate limit for the LLM request based on prompt tokens or output tokens. There is no limitation if this field is not set. |
ModelRouteStatus
ModelRouteStatus defines the observed state of ModelRoute.
Appears in:
ModelServer
ModelServer is the Schema for the modelservers API.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string | networking.serving.volcano.sh/v1alpha1 | ||
kind string | ModelServer | ||
kind string | Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds | ||
apiVersion string | APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources | ||
metadata ObjectMeta | Refer to Kubernetes API documentation for fields of metadata. | ||
spec ModelServerSpec | |||
status ModelServerStatus |
ModelServerList
ModelServerList contains a list of ModelServer.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string | networking.serving.volcano.sh/v1alpha1 | ||
kind string | ModelServerList | ||
kind string | Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds | ||
apiVersion string | APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources | ||
metadata ListMeta | Refer to Kubernetes API documentation for fields of metadata. | ||
items ModelServer array |
ModelServerSpec
ModelServerSpec defines the desired state of ModelServer.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
model string | The real model that the modelServers are running. If the model in LLM inference request is different from this field, it should be overwritten by this field.Otherwise, the model in LLM inference request will not be mutated. | MaxLength: 256 | |
inferenceEngine InferenceEngine | The inference engine used to serve the model. | Enum: [vLLM SGLang] Required: {} | |
workloadSelector WorkloadSelector | WorkloadSelector is used to match the model serving instances. Currently, they must be pods within the same namespace as modelServer object. | Required: {} | |
workloadPort WorkloadPort | WorkloadPort defines the port and protocol configuration for the model server. | ||
trafficPolicy TrafficPolicy | Traffic Policy for accessing the model server instance. | ||
kvConnector KVConnectorSpec | KVConnector specifies the KV connector configuration for PD disaggregated routing |
ModelServerStatus
ModelServerStatus defines the observed state of ModelServer.
Appears in:
PDGroup
PDGroup is used to specify the group key of PD instances. Also, the labels to match the model serving instances for prefill and decode.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
groupKey string | GroupKey is the key to distinguish different PD groups. Only PD instances with the same group key and value could be paired. | ||
prefillLabels object (keys:string, values:string) | The labels to match the model serving instances for prefill. | ||
decodeLabels object (keys:string, values:string) | The labels to match the model serving instances for decode. |
RateLimit
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
inputTokensPerUnit integer | InputTokensPerUnit is the maximum number of input tokens allowed per unit of time. If this field is not set, there is no limit on input tokens. | Minimum: 1 | |
outputTokensPerUnit integer | OutputTokensPerUnit is the maximum number of output tokens allowed per unit of time. If this field is not set, there is no limit on output tokens. | Minimum: 1 | |
unit RateLimitUnit | Unit is the time unit for the rate limit. | second | Enum: [second minute hour day month] |
global GlobalRateLimit | Global contains configuration for global rate limiting using distributed storage. If this field is set, global rate limiting will be used; otherwise, local rate limiting will be used. |
RateLimitUnit
Underlying type: string
Validation:
- Enum: [second minute hour day month]
Appears in:
| Field | Description |
|---|---|
second | |
minute | |
hour | |
day | |
month |
RedisConfig
RedisConfig contains Redis connection configuration
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
address string | Address is the Redis server address in the format "host:port". | Required: {} |
Retry
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
attempts integer | The maximum number of times an individual inference request to a model server should be retried. If the maximum number of retries has been done without a successgful response, the request will be considered failed. | ||
retryInterval Duration | RetryInterval is the interval between retries. | 100ms |
Rule
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
name string | Name is the name of the rule. | ||
modelMatch ModelMatch | Match conditions to be satisfied for the rule to be activated. Empty modelMatch means matching all requests. | ||
targetModels TargetModel array | MaxItems: 16 |
StringMatch
StringMatch defines the matching conditions for string fields. Only one of the fields may be set.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
exact string | |||
prefix string | |||
regex string |
TargetModel
LLM inference traffic target model
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
modelServerName string | ModelServerName is used to specify the correlated modelServer within the same namespace. | ||
weight integer | Weight is used to specify the percentage of traffic should be sent to the target model. The value should be in the range of [0, 100]. | 100 | Maximum: 100 Minimum: 0 |
TrafficPolicy
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
timeout Duration | The request timeout for the inference request. By default, there is no timeout. | ||
retry Retry | The retry policy for the inference request. |
WorkloadPort
WorkloadPort defines the port and protocol configuration for the model server.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
port integer | The port of the model server. The number must be between 1 and 65535. | Maximum: 65535 Minimum: 1 Required: {} | |
protocol string | The protocol of the model server. Supported values are "http" and "https". | http | Enum: [http https] |
WorkloadSelector
WorkloadSelector is used to match the model serving instances. Currently, they must be pods within the same namespace as modelServer object.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
matchLabels object (keys:string, values:string) | The base labels to match the model serving instances. All serving instances must match these labels. | ||
pdGroup PDGroup | PDGroup is used to further match different roles of the model serving instances, mainly used in case like PD disaggregation. |