Version: v0.1.0

API Reference

Packages

workload.serving.volcano.sh/v1alpha1

workload.serving.volcano.sh/v1alpha1

Resource Types

AutoscalingPolicy
AutoscalingPolicyBinding
AutoscalingPolicyBindingList
AutoscalingPolicyList
ModelBooster
ModelBoosterList
ModelServing
ModelServingList

AutoscalingPolicy

AutoscalingPolicy is the Schema for the autoscalingpolicies API.

Appears in:

AutoscalingPolicyList

Field	Description	Default	Validation
`apiVersion` string	`workload.serving.volcano.sh/v1alpha1`
`kind` string	`AutoscalingPolicy`
`spec` AutoscalingPolicySpec
`status` AutoscalingPolicyStatus

AutoscalingPolicyBehavior

AutoscalingPolicyBehavior defines the scaling behaviors for up and down actions.

Appears in:

AutoscalingPolicySpec

Field	Description	Default	Validation
`scaleUp` AutoscalingPolicyScaleUpPolicy	ScaleUp defines the policy for scaling up (increasing replicas).
`scaleDown` AutoscalingPolicyStablePolicy	ScaleDown defines the policy for scaling down (decreasing replicas).

AutoscalingPolicyBinding

AutoscalingPolicyBinding binds AutoscalingPolicy rules to specific ModelServing deployments, enabling either traditional metric-based scaling or multi-target optimization across heterogeneous hardware deployments.

Appears in:

AutoscalingPolicyBindingList

Field	Description	Default	Validation
`apiVersion` string	`workload.serving.volcano.sh/v1alpha1`
`kind` string	`AutoscalingPolicyBinding`
`spec` AutoscalingPolicyBindingSpec
`status` AutoscalingPolicyBindingStatus

AutoscalingPolicyBindingList

AutoscalingPolicyBindingList contains a list of AutoscalingPolicyBinding objects.

Field	Description	Default	Validation
`apiVersion` string	`workload.serving.volcano.sh/v1alpha1`
`kind` string	`AutoscalingPolicyBindingList`
`items` AutoscalingPolicyBinding array

AutoscalingPolicyBindingSpec

AutoscalingPolicyBindingSpec defines the desired state of AutoscalingPolicyBinding.

Appears in:

AutoscalingPolicyBinding

Field	Description	Default	Validation
`policyRef` LocalObjectReference	PolicyRef references the AutoscalingPolicy that defines the scaling rules and metrics.
`optimizerConfiguration` OptimizerConfiguration	OptimizerConfiguration enables multi-target optimization that dynamically allocates replicas across heterogeneous ModelServing deployments based on overall compute requirements. This is ideal for mixed hardware environments (e.g., H100/A100 clusters) where you want to optimize resource utilization by adjusting deployment ratios between different hardware types using mathematical optimization methods (e.g. integer programming).
`scalingConfiguration` ScalingConfiguration	ScalingConfiguration defines traditional autoscaling behavior that adjusts replica counts based on monitoring metrics and target values for a single ModelServing deployment.

AutoscalingPolicyBindingStatus

AutoscalingPolicyBindingStatus defines the observed state of AutoscalingPolicyBinding.

Appears in:

AutoscalingPolicyBinding

AutoscalingPolicyList

AutoscalingPolicyList contains a list of AutoscalingPolicy.

Field	Description	Default	Validation
`apiVersion` string	`workload.serving.volcano.sh/v1alpha1`
`kind` string	`AutoscalingPolicyList`
`items` AutoscalingPolicy array

AutoscalingPolicyMetric

AutoscalingPolicyMetric defines a metric and its target value for scaling.

Appears in:

AutoscalingPolicySpec

Field	Description	Default	Validation
`metricName` string	MetricName is the name of the metric to monitor.
`targetValue` Quantity	TargetValue is the target value for the metric to trigger scaling.

AutoscalingPolicyPanicPolicy

AutoscalingPolicyPanicPolicy defines the policy for panic scaling up.

Appears in:

AutoscalingPolicyScaleUpPolicy

Field	Description	Default	Validation
`percent` integer	Percent is the maximum percentage of instances to scale up.	1000	Maximum: 1000 Minimum: 0
`panicThresholdPercent` integer	PanicThresholdPercent is the threshold percent to enter panic mode.	200	Maximum: 1000 Minimum: 110

AutoscalingPolicyScaleUpPolicy

Appears in:

AutoscalingPolicyBehavior

Field	Description	Default	Validation
`stablePolicy` AutoscalingPolicyStablePolicy	Stable policy usually makes decisions based on the average value of metrics calculated over the past few minutes and introduces a scaling-down cool-down period/delay. This mechanism is relatively stable, as it can smooth out short-term small fluctuations and avoid overly frequent and unnecessary Pod scaling.
`panicPolicy` AutoscalingPolicyPanicPolicy	When the load surges sharply within a short period (for example, encountering a sudden traffic peak or a rush of sudden computing tasks), using the average value over a long time window to calculate the required number of replicas will cause significant lag. If the system needs to scale out quickly to cope with such peaks, the ordinary scaling logic may fail to respond in time, resulting in delayed Pod startup, slower service response time or timeouts, and may even lead to service paralysis or data backlogs (for workloads such as message queues).

AutoscalingPolicySpec

AutoscalingPolicySpec defines the desired state of AutoscalingPolicy.

Appears in:

AutoscalingPolicy
ModelBackend
ModelBoosterSpec

Field	Description	Default	Validation
`tolerancePercent` integer	TolerancePercent is the percentage of deviation tolerated before scaling actions are triggered. The current number of instances is current_replicas, and the expected number of instances inferred from monitoring metrics is target_replicas. The scaling operation will only be actually performed when \|current_replicas - target_replicas\| >= current_replicas * TolerancePercent.	10	Maximum: 100 Minimum: 0
`metrics` AutoscalingPolicyMetric array	Metrics is the list of metrics used to evaluate scaling decisions.		MinItems: 1
`behavior` AutoscalingPolicyBehavior	Behavior defines the scaling behavior for both scale up and scale down.

AutoscalingPolicyStablePolicy

AutoscalingPolicyStablePolicy defines the policy for stable scaling up or scaling down.

Appears in:

AutoscalingPolicyBehavior
AutoscalingPolicyScaleUpPolicy

Field	Description	Default	Validation
`instances` integer	Instances is the maximum number of instances to scale.	1	Minimum: 0
`percent` integer	Percent is the maximum percentage of instances to scaling.	100	Maximum: 1000 Minimum: 0
`selectPolicy` SelectPolicyType	SelectPolicy determines the selection strategy for scaling up (e.g., Or, And). 'Or' represents the scaling operation will be performed as long as either the Percent requirement or the Instances requirement is met. 'And' represents the scaling operation will be performed as long as both the Percent requirement and the Instances requirement is met.	Or	Enum: [Or And]

AutoscalingPolicyStatus

AutoscalingPolicyStatus defines the observed state of AutoscalingPolicy.

Appears in:

AutoscalingPolicy

GangPolicy

GangPolicy defines the gang scheduling configuration.

Appears in:

ServingGroup

Field	Description	Default	Validation
`minRoleReplicas` object (keys:string, values:integer)	MinRoleReplicas defines the minimum number of replicas required for each role in gang scheduling. This map allows users to specify different minimum replica requirements for different roles. Key: role name Value: minimum number of replicas required for that role

LoraAdapter

LoraAdapter defines a LoRA (Low-Rank Adaptation) adapter configuration.

Appears in:

ModelBackend

Field	Description	Default	Validation
`name` string	Name is the name of the LoRA adapter.		Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$`
`artifactURL` string	ArtifactURL is the URL where the LoRA adapter artifact is stored.		Pattern: `^(hf://\|s3://\|pvc://).+`

Metadata

Metadata is a simplified version of ObjectMeta in Kubernetes.

Appears in:

PodTemplateSpec

Field	Description	Default	Validation
`labels` object (keys:string, values:string)	Map of string keys and values that can be used to organize and categorize (scope and select) objects. May match selectors of replication controllers and services. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels
`annotations` object (keys:string, values:string)	Annotations is an unstructured key value map stored with a resource that may be set by external tools to store and retrieve arbitrary metadata. They are not queryable and should be preserved when modifying objects. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations

MetricEndpoint

MetricEndpoint defines the endpoint configuration for scraping metrics from pods.

Appears in:

Target

Field	Description	Default	Validation
`uri` string	URI is the path where metrics are exposed (e.g., "/metrics").	/metrics
`port` integer	Port is the network port where metrics are exposed by the pods.	8100

ModelBackend

ModelBackend defines the configuration for a model backend.

Appears in:

ModelBoosterSpec

Field	Description	Default	Validation
`name` string	Name is the name of the backend. Can't duplicate with other ModelBackend name in the same ModelBooster CR. Note: update name will cause the old modelInfer deletion and a new modelInfer creation.		Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$`
`type` ModelBackendType	Type is the type of the backend.		Enum: [vLLM vLLMDisaggregated SGLang MindIE MindIEDisaggregated]
`modelURI` string	ModelURI is the URI where you download the model. Support hf://, s3://, pvc://.		Pattern: `^(hf://\|s3://\|pvc://).+`
`cacheURI` string	CacheURI is the URI where the downloaded model stored. Support hostpath://, pvc://.		Pattern: `^(hostpath://\|pvc://).+`
`envFrom` EnvFromSource array	List of sources to populate environment variables in the container. The keys defined within a source must be a C_IDENTIFIER. All invalid keys will be reported as an event when the container is starting. When a key exists in multiple sources, the value associated with the last source will take precedence. Values defined by an Env with a duplicate key will take precedence. Cannot be updated.
`env` EnvVar array	List of environment variables to set in the container. Supported names: "ENDPOINT": When you download model from s3, you have to specify it. "RUNTIME_URL": default is http://localhost:8000 "RUNTIME_PORT": default is 8100 "RUNTIME_METRICS_PATH": default is /metrics "HF_ENDPOINT":The url of hugging face. Default is https://huggingface.co/ Cannot be updated.
`minReplicas` integer	MinReplicas is the minimum number of replicas for the backend.		Maximum: 1e+06 Minimum: 0
`maxReplicas` integer	MaxReplicas is the maximum number of replicas for the backend.		Maximum: 1e+06 Minimum: 1
`scalingCost` integer	ScalingCost is the cost associated with running this backend.		Minimum: 0
`routeWeight` integer	RouteWeight is used to specify the percentage of traffic should be sent to the target backend. It's used to create model route.	100	Maximum: 100 Minimum: 0
`workers` ModelWorker array	Workers is the list of workers associated with this backend.		MaxItems: 1000 MinItems: 1
`loraAdapters` LoraAdapter array	LoraAdapter is a list of LoRA adapters.
`autoscalingPolicy` AutoscalingPolicySpec	AutoscalingPolicyRef references the autoscaling policy for this backend.
`schedulerName` string	SchedulerName defines the name of the scheduler used by ModelServing for this backend.

ModelBackendStatus

ModelBackendStatus defines the status of a model backend.

Appears in:

ModelStatus

Field	Description	Default	Validation
`name` string	Name is the name of the backend.
`replicas` integer	Replicas is the number of replicas currently running for the backend.

ModelBackendType

Underlying type: string

ModelBackendType defines the type of model backend.

Validation:

Enum: [vLLM vLLMDisaggregated SGLang MindIE MindIEDisaggregated]

Appears in:

ModelBackend

Field	Description
`vLLM`	ModelBackendTypeVLLM represents a vLLM backend.
`vLLMDisaggregated`	ModelBackendTypeVLLMDisaggregated represents a disaggregated vLLM backend.
`SGLang`	ModelBackendTypeSGLang represents an SGLang backend.
`MindIE`	ModelBackendTypeMindIE represents a MindIE backend.
`MindIEDisaggregated`	ModelBackendTypeMindIEDisaggregated represents a disaggregated MindIE backend.

ModelBooster

ModelBooster is the Schema for the models API.

Appears in:

ModelBoosterList

Field	Description	Default	Validation
`apiVersion` string	`workload.serving.volcano.sh/v1alpha1`
`kind` string	`ModelBooster`
`spec` ModelBoosterSpec
`status` ModelStatus

ModelBoosterList

ModelBoosterList contains a list of ModelBooster.

Field	Description	Default	Validation
`apiVersion` string	`workload.serving.volcano.sh/v1alpha1`
`kind` string	`ModelBoosterList`
`items` ModelBooster array

ModelBoosterSpec

ModelBoosterSpec defines the desired state of ModelBooster.

Appears in:

ModelBooster

Field	Description	Validation
`name` string	Name is the name of the model. ModelBooster CR name is restricted by kubernetes, for example, can't contain uppercase letters. So we use this field to specify the ModelBooster name.	MaxLength: 64 Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$`
`owner` string	Owner is the owner of the model.
`backends` ModelBackend array	Backends is the list of model backends associated with this model. A ModelBooster CR at lease has one ModelBackend. ModelBackend is the minimum unit of inference instance. It can be vLLM, SGLang, MindIE or other types.	MinItems: 1
`autoscalingPolicy` AutoscalingPolicySpec	AutoscalingPolicy references the autoscaling policy to be used for this model.
`costExpansionRatePercent` integer	CostExpansionRatePercent is the percentage rate at which the cost expands.	Maximum: 1000 Minimum: 0
`modelMatch` ModelMatch	ModelMatch defines the predicate used to match LLM inference requests to a given TargetModels. Multiple match conditions are ANDed together, i.e. the match will evaluate to true only if all conditions are satisfied.

ModelServing

ModelServing is the Schema for the LLM Serving API

Appears in:

ModelServingList

Field	Description	Default	Validation
`apiVersion` string	`workload.serving.volcano.sh/v1alpha1`
`kind` string	`ModelServing`
`spec` ModelServingSpec
`status` ModelServingStatus

ModelServingList

ModelServingList contains a list of ModelServing

Field	Description	Default	Validation
`apiVersion` string	`workload.serving.volcano.sh/v1alpha1`
`kind` string	`ModelServingList`
`items` ModelServing array

ModelServingSpec

ModelServingSpec defines the specification of the ModelServing resource.

Appears in:

ModelServing

Field	Description	Default	Validation
`replicas` integer	Number of ServingGroups. That is the number of instances that run serving tasks Default to 1.	1
`schedulerName` string	SchedulerName defines the name of the scheduler used by ModelServing	volcano
`template` ServingGroup	Template defines the template for ServingGroup
`rolloutStrategy` RolloutStrategy	RolloutStrategy defines the strategy that will be applied to update replicas
`recoveryPolicy` RecoveryPolicy	RecoveryPolicy defines the recovery policy for the failed Pod to be rebuilt	RoleRecreate	Enum: [ServingGroupRecreate RoleRecreate None]
`topologySpreadConstraints` TopologySpreadConstraint array

ModelServingStatus

ModelServingStatus defines the observed state of ModelServing

Appears in:

ModelServing

Field	Description	Default	Validation
`observedGeneration` integer	observedGeneration is the most recent generation observed for ModelServing. It corresponds to the ModelServing's generation, which is updated on mutation by the API Server.
`replicas` integer	Replicas track the total number of ServingGroup that have been created (updated or not, ready or not)
`currentReplicas` integer	CurrentReplicas is the number of ServingGroup created by the ModelServing controller from the ModelServing version
`updatedReplicas` integer	UpdatedReplicas track the number of ServingGroup that have been updated (ready or not).
`availableReplicas` integer	AvailableReplicas track the number of ServingGroup that are in ready state (updated or not).

ModelStatus

ModelStatus defines the observed state of ModelBooster.

Appears in:

ModelBooster

Field	Description	Default	Validation
`backendStatuses` ModelBackendStatus array	BackendStatuses contains the status of each backend.
`observedGeneration` integer	ObservedGeneration track of generation

ModelWorker

ModelWorker defines the model worker configuration.

Appears in:

ModelBackend

Field	Description	Default	Validation
`type` ModelWorkerType	Type is the type of the model worker.	server	Enum: [server prefill decode controller coordinator]
`image` string	Image is the container image for the worker.
`replicas` integer	Replicas is the number of replicas for the worker.		Maximum: 1e+06 Minimum: 0
`pods` integer	Pods is the number of pods for the worker.		Maximum: 1e+06 Minimum: 0
`resources` ResourceRequirements	Resources specifies the resource requirements for the worker.
`affinity` Affinity	Affinity specifies the affinity rules for scheduling the worker pods.
`config` JSON	Config contains worker-specific configuration in JSON format. You can find vLLM config here https://docs.vllm.ai/en/stable/configuration/engine_args.html

ModelWorkerType

Underlying type: string

ModelWorkerType defines the type of model worker.

Validation:

Enum: [server prefill decode controller coordinator]

Appears in:

ModelWorker

Field	Description
`server`	ModelWorkerTypeServer represents a server worker.
`prefill`	ModelWorkerTypePrefill represents a prefill worker.
`decode`	ModelWorkerTypeDecode represents a decode worker.
`controller`	ModelWorkerTypeController represents a controller worker.
`coordinator`	ModelWorkerTypeCoordinator represents a coordinator worker.

OptimizerConfiguration

OptimizerConfiguration defines parameters for multi-target optimization across multiple ModelServing deployments with different hardware characteristics.

Appears in:

AutoscalingPolicyBindingSpec

Field	Description	Default	Validation
`params` OptimizerParam array	Params contains the optimization parameters for each ModelServing group. Each entry defines a different deployment type (e.g., different hardware) to optimize.		MinItems: 1
`costExpansionRatePercent` integer	CostExpansionRatePercent defines the acceptable cost expansion percentage when optimizing across multiple deployment types. A higher value allows more flexibility in resource allocation but may increase overall costs.	200	Minimum: 0

OptimizerParam

OptimizerParam defines optimization parameters for a specific ModelServing deployment type.

Appears in:

OptimizerConfiguration

Field	Description	Validation
`target` Target	Target specifies the ModelServing deployment and its monitoring configuration.
`cost` integer	Cost represents the relative cost factor for this deployment type. Used in optimization calculations to balance performance vs. cost.	Minimum: 0
`minReplicas` integer	MinReplicas is the minimum number of replicas to maintain for this deployment type.	Maximum: 1e+06 Minimum: 0
`maxReplicas` integer	MaxReplicas is the maximum number of replicas allowed for this deployment type.	Maximum: 1e+06 Minimum: 1

PodTemplateSpec

PodTemplateSpec describes the data a pod should have when created from a template

Appears in:

Role

Field	Description	Default	Validation
`metadata` Metadata	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` PodSpec	Specification of the desired behavior of the pod.

RecoveryPolicy

Underlying type: string

Appears in:

ModelServingSpec

Field	Description
`ServingGroupRecreate`	ServingGroupRecreate will recreate all the pods in the ServingGroup if 1. Any individual pod in the group is recreated; 2. Any containers/init-containers in a pod is restarted. This is to ensure all pods/containers in the group will be started in the same time.
`RoleRecreate`	RoleRecreate will recreate all pods in one Role if 1. Any individual pod in the group is recreated; 2. Any containers/init-containers in a pod is restarted.
`None`	NoneRestartPolicy will follow the same behavior as the default pod or deployment.

Role

Role defines the specific pod instance role that performs the inference task.

Appears in:

ServingGroup

Field	Description	Default	Validation
`name` string	The name of a role. Name must be unique within an ServingGroup		MaxLength: 12 Pattern: `^[a-zA-Z0-9]([-a-zA-Z0-9]*[a-zA-Z0-9])?$`
`replicas` integer	The number of a certain role. For example, in Disaggregated Prefilling, setting the replica count for both the P and D roles to 1 results in 1P1D deployment configuration. This approach can similarly be applied to configure a xPyD deployment scenario. Default to 1.	1
`entryTemplate` PodTemplateSpec	EntryTemplate defines the template for the entry pod of a role. Required: Currently, a role must have only one entry-pod.
`workerReplicas` integer	WorkerReplicas defines the number for the worker pod of a role. Required: Need to set the number of worker-pod replicas.
`workerTemplate` PodTemplateSpec	WorkerTemplate defines the template for the worker pod of a role.

RollingUpdateConfiguration

RollingUpdateConfiguration defines the parameters to be used for RollingUpdateStrategyType.

Appears in:

RolloutStrategy

Field	Description	Default	Validation
`maxUnavailable` IntOrString	The maximum number of replicas that can be unavailable during the update. Value can be an absolute number (ex: 5) or a percentage of total replicas at the start of update (ex: 10%). Absolute number is calculated from percentage by rounding down. This can not be 0 if MaxSurge is 0. By default, a fixed value of 1 is used.	1	XIntOrString: {}
`maxSurge` IntOrString	The maximum number of replicas that can be scheduled above the original number of replicas. Value can be an absolute number (ex: 5) or a percentage of total replicas at the start of the update (ex: 10%). Absolute number is calculated from percentage by rounding up. By default, a value of 0 is used.	0	XIntOrString: {}
`partition` integer	Partition indicates the ordinal at which the ModelServing should be partitioned for updates. During a rolling update, all ServingGroups from ordinal Replicas-1 to Partition are updated. All ServingGroups from ordinal Partition-1 to 0 remain untouched. The default value is 0.

RolloutStrategy

RolloutStrategy defines the strategy that the ModelServing controller will use to perform replica updates.

Appears in:

ModelServingSpec

Field	Description	Default	Validation
`type` RolloutStrategyType	Type defines the rollout strategy, it can only be “ServingGroupRollingUpdate” for now.	ServingGroupRollingUpdate	Enum: [ServingGroupRollingUpdate]
`rollingUpdateConfiguration` RollingUpdateConfiguration	RollingUpdateConfiguration defines the parameters to be used when type is RollingUpdateStrategyType. optional

RolloutStrategyType

Underlying type: string

Appears in:

RolloutStrategy

Field	Description
`ServingGroupRollingUpdate`	ServingGroupRollingUpdate indicates that ServingGroup replicas will be updated one by one.

ScalingConfiguration

ScalingConfiguration defines the scaling parameters for a single target deployment.

Appears in:

AutoscalingPolicyBindingSpec

Field	Description	Validation
`target` Target	Target specifies the ModelServing deployment to monitor and scale.
`minReplicas` integer	MinReplicas is the minimum number of replicas to maintain.	Maximum: 1e+06 Minimum: 0
`maxReplicas` integer	MaxReplicas is the maximum number of replicas allowed.	Maximum: 1e+06 Minimum: 1

SelectPolicyType

Underlying type: string

SelectPolicyType defines the type of select olicy.

Validation:

Enum: [Or And]

Appears in:

AutoscalingPolicyStablePolicy

Field	Description
`Or`
`And`

ServingGroup

ServingGroup is the smallest unit to complete the inference task

Appears in:

ModelServingSpec

Field	Description	Default	Validation
`restartGracePeriodSeconds` integer	RestartGracePeriodSeconds defines the grace time for the controller to rebuild the ServingGroup when an error occurs Defaults to 0 (ServingGroup will be rebuilt immediately after an error)	0
`gangPolicy` GangPolicy	GangPolicy defines the gang scheduler config.
`networkTopology` NetworkTopologySpec	NetworkTopology defines the network topology affinity scheduling policy for the roles of the group, it works only when the scheduler supports network topology feature. // +optional
`roles` Role array			MaxItems: 4 MinItems: 1

Target

Target defines a ModelServing deployment that can be monitored and scaled.

Appears in:

OptimizerParam
ScalingConfiguration

Field	Description	Default	Validation
`targetRef` ObjectReference	TargetRef references the ModelServing object to monitor and scale.
`additionalMatchLabels` object (keys:string, values:string)	AdditionalMatchLabels provides additional label selectors to refine which pods within the ModelServing deployment should be monitored.
`metricEndpoint` MetricEndpoint	MetricEndpoint configures how to scrape metrics from the target pods. If not specified, defaults to port 8100 and path "/metrics".

TopologySpreadConstraint

TopologySpreadConstraint defines the topology spread constraint.

Appears in:

ModelServingSpec

Field	Description	Default	Validation
`maxSkew` integer	MaxSkew describes the degree to which ServingGroup may be unevenly distributed.
`topologyKey` string	TopologyKey is the key of node labels. Nodes that have a label with this key and identical values are considered to be in the same topology.
`whenUnsatisfiable` string	WhenUnsatisfiable indicates how to deal with an ServingGroup if it doesn't satisfy the spread constraint.

Packages​

workload.serving.volcano.sh/v1alpha1​

Resource Types​

AutoscalingPolicy​

AutoscalingPolicyBehavior​

AutoscalingPolicyBinding​

AutoscalingPolicyBindingList​

AutoscalingPolicyBindingSpec​

AutoscalingPolicyBindingStatus​

AutoscalingPolicyList​

AutoscalingPolicyMetric​

AutoscalingPolicyPanicPolicy​

AutoscalingPolicyScaleUpPolicy​

AutoscalingPolicySpec​

AutoscalingPolicyStablePolicy​

AutoscalingPolicyStatus​

GangPolicy​

LoraAdapter​

Metadata​

MetricEndpoint​

ModelBackend​

ModelBackendStatus​

ModelBackendType​

ModelBooster​

ModelBoosterList​

ModelBoosterSpec​

ModelServing​

ModelServingList​

ModelServingSpec​

ModelServingStatus​

ModelStatus​

ModelWorker​

ModelWorkerType​

OptimizerConfiguration​

OptimizerParam​

PodTemplateSpec​

RecoveryPolicy​

Role​

RollingUpdateConfiguration​

RolloutStrategy​

RolloutStrategyType​

ScalingConfiguration​

SelectPolicyType​

ServingGroup​

Target​

TopologySpreadConstraint​

Packages

workload.serving.volcano.sh/v1alpha1

Resource Types

AutoscalingPolicy

AutoscalingPolicyBehavior

AutoscalingPolicyBinding

AutoscalingPolicyBindingList

AutoscalingPolicyBindingSpec

AutoscalingPolicyBindingStatus

AutoscalingPolicyList

AutoscalingPolicyMetric

AutoscalingPolicyPanicPolicy

AutoscalingPolicyScaleUpPolicy

AutoscalingPolicySpec

AutoscalingPolicyStablePolicy

AutoscalingPolicyStatus

GangPolicy

LoraAdapter

Metadata

MetricEndpoint

ModelBackend

ModelBackendStatus

ModelBackendType

ModelBooster

ModelBoosterList

ModelBoosterSpec

ModelServing

ModelServingList

ModelServingSpec

ModelServingStatus

ModelStatus

ModelWorker

ModelWorkerType

OptimizerConfiguration

OptimizerParam

PodTemplateSpec

RecoveryPolicy

Role

RollingUpdateConfiguration

RolloutStrategy

RolloutStrategyType

ScalingConfiguration

SelectPolicyType

ServingGroup

Target

TopologySpreadConstraint