Version: v0.2.0

API Reference

Packages

workload.serving.volcano.sh/v1alpha1

workload.serving.volcano.sh/v1alpha1

Resource Types

AutoscalingPolicy
AutoscalingPolicyBinding
AutoscalingPolicyBindingList
AutoscalingPolicyList
ModelBooster
ModelBoosterList
ModelServing
ModelServingList

AutoscalingPolicy

AutoscalingPolicy defines the autoscaling policy configuration for model serving workloads. It specifies scaling rules, metrics, and behavior for automatic replica adjustment.

Appears in:

AutoscalingPolicyList

Field	Description	Default	Validation
`apiVersion` string	`workload.serving.volcano.sh/v1alpha1`
`kind` string	`AutoscalingPolicy`
`spec` AutoscalingPolicySpec
`status` AutoscalingPolicyStatus

AutoscalingPolicyBehavior

AutoscalingPolicyBehavior defines the scaling behavior configuration for both scale up and scale down operations.

Appears in:

AutoscalingPolicySpec

Field	Description	Default	Validation
`scaleUp` AutoscalingPolicyScaleUpPolicy	ScaleUp defines the policy configuration for scaling up (increasing replicas).
`scaleDown` AutoscalingPolicyStablePolicy	ScaleDown defines the policy configuration for scaling down (decreasing replicas).

AutoscalingPolicyBinding

AutoscalingPolicyBinding binds AutoscalingPolicy rules to specific ModelServing deployments. It enables either traditional metric-based scaling or multi-target optimization across heterogeneous hardware deployments.

Appears in:

AutoscalingPolicyBindingList

Field	Description	Default	Validation
`apiVersion` string	`workload.serving.volcano.sh/v1alpha1`
`kind` string	`AutoscalingPolicyBinding`
`spec` AutoscalingPolicyBindingSpec
`status` AutoscalingPolicyBindingStatus

AutoscalingPolicyBindingList

AutoscalingPolicyBindingList contains a list of AutoscalingPolicyBinding objects.

Field	Description	Default	Validation
`apiVersion` string	`workload.serving.volcano.sh/v1alpha1`
`kind` string	`AutoscalingPolicyBindingList`
`items` AutoscalingPolicyBinding array

AutoscalingPolicyBindingSpec

AutoscalingPolicyBindingSpec defines the desired state of AutoscalingPolicyBinding.

Appears in:

AutoscalingPolicyBinding

Field	Description	Default	Validation
`policyRef` LocalObjectReference	PolicyRef references the AutoscalingPolicy that defines the scaling rules and metrics.
`heterogeneousTarget` HeterogeneousTarget	HeterogeneousTarget enables optimization-based scaling across multiple ModelServing deployments with different hardware capabilities. This approach dynamically adjusts replica distribution across heterogeneous resources (e.g., H100/A100 GPUs) based on overall computing requirements.
`homogeneousTarget` HomogeneousTarget	HomogeneousTarget enables traditional metric-based scaling for a single ModelServing deployment. This approach adjusts replica count based on monitoring metrics and their target values.

AutoscalingPolicyBindingStatus

AutoscalingPolicyBindingStatus defines the observed state of AutoscalingPolicyBinding.

Appears in:

AutoscalingPolicyBinding

AutoscalingPolicyList

AutoscalingPolicyList contains a list of AutoscalingPolicy objects.

Field	Description	Default	Validation
`apiVersion` string	`workload.serving.volcano.sh/v1alpha1`
`kind` string	`AutoscalingPolicyList`
`items` AutoscalingPolicy array

AutoscalingPolicyMetric

AutoscalingPolicyMetric defines a metric and its target value for scaling decisions.

Appears in:

AutoscalingPolicySpec

Field	Description	Default	Validation
`metricName` string	MetricName defines the name of the metric to monitor for scaling decisions.
`targetValue` Quantity	TargetValue defines the target value for the metric that triggers scaling operations.

AutoscalingPolicyPanicPolicy

AutoscalingPolicyPanicPolicy defines the emergency scaling policy for handling sudden traffic surges.

Appears in:

AutoscalingPolicyScaleUpPolicy

Field	Description	Default	Validation
`percent` integer	Percent defines the maximum percentage of current instances to scale up during panic mode.	1000	Maximum: 1000 Minimum: 0
`panicThresholdPercent` integer	PanicThresholdPercent defines the metric threshold percentage that triggers panic mode. When metrics exceed this percentage of target values, panic mode is activated.	200	Maximum: 1000 Minimum: 110

AutoscalingPolicyScaleUpPolicy

AutoscalingPolicyScaleUpPolicy defines the scaling up policy configuration.

Appears in:

AutoscalingPolicyBehavior

Field	Description	Default	Validation
`stablePolicy` AutoscalingPolicyStablePolicy	StablePolicy defines the stable scaling policy that uses average metric values over time windows. This policy smooths out short-term fluctuations and avoids unnecessary frequent scaling operations.
`panicPolicy` AutoscalingPolicyPanicPolicy	PanicPolicy defines the emergency scaling policy for handling sudden traffic spikes. This policy activates during rapid load surges to prevent service degradation or timeouts.

AutoscalingPolicySpec

AutoscalingPolicySpec defines the desired state of AutoscalingPolicy.

Appears in:

AutoscalingPolicy
ModelBoosterSpec

Field	Description	Default	Validation
`tolerancePercent` integer	TolerancePercent defines the percentage of deviation tolerated before scaling actions are triggered. current_replicas represents the current number of instances, while target_replicas represents the expected number of instances calculated from monitoring metrics. Scaling operations are performed only when \|current_replicas - target_replicas\| >= current_replicas * TolerancePercent / 100.	10	Maximum: 100 Minimum: 0
`metrics` AutoscalingPolicyMetric array	Metrics defines the list of metrics used to evaluate scaling decisions.		MinItems: 1
`behavior` AutoscalingPolicyBehavior	Behavior defines the scaling behavior configuration for both scale up and scale down operations.

AutoscalingPolicyStablePolicy

AutoscalingPolicyStablePolicy defines the stable scaling policy for both scale up and scale down operations.

Appears in:

AutoscalingPolicyBehavior
AutoscalingPolicyScaleUpPolicy

Field	Description	Default	Validation
`instances` integer	Instances defines the maximum absolute number of instances to scale per period.	1	Minimum: 0
`percent` integer	Percent defines the maximum percentage of current instances to scale per period.	100	Maximum: 1000 Minimum: 0
`selectPolicy` SelectPolicyType	SelectPolicy determines the selection strategy for scaling operations. 'Or' means scaling is performed if either the Percent or Instances requirement is met. 'And' means scaling is performed only if both Percent and Instances requirements are met.	Or	Enum: [Or And]

AutoscalingPolicyStatus

AutoscalingPolicyStatus defines the observed state of AutoscalingPolicy.

Appears in:

AutoscalingPolicy

GangPolicy

GangPolicy defines the gang scheduling configuration.

Appears in:

ServingGroup

Field	Description	Default	Validation
`minRoleReplicas` object (keys:string, values:integer)	MinRoleReplicas defines the minimum number of replicas required for each role in gang scheduling. This map allows users to specify different minimum replica requirements for different roles. Notice: In practice, when determining the minTaskMember for a podGroup, it takes the minimum value between `MinRoleReplicas[role.Name]` and role.Replicas. If you set: gangPolicy: minRoleReplicas: Prefill: 2 Decode: 2 And set the roles as: roles: - name: P replicas: 1 workerReplicas: 2 - name: D replicas: 3 workerReplicas: 1 The resulting podGroup will have minTaskMember: minTaskMember: P-0: 3 (1 entry pod + 2 worker pods) D-0: 4 (1 entry pod + 3 worker pods) D-1: 4 (1 entry pod + 3 worker pods) The replicase of P is min(minRoleReplicas['P'], role.Replicas) = min(2, 1) = 1 The replicase of D is min(minRoleReplicas['D'], role.Replicas) = min(2, 3) = 2 Key: role name Value: minimum number of replicas required for that role

HeterogeneousTarget

HeterogeneousTarget defines the configuration for optimization-based autoscaling across multiple deployments.

Appears in:

AutoscalingPolicyBindingSpec

Field	Description	Default	Validation
`params` HeterogeneousTargetParam array	Params defines the configuration parameters for multiple ModelServing groups to be optimized.		MinItems: 1
`costExpansionRatePercent` integer	CostExpansionRatePercent defines the percentage rate at which the cost expands during optimization calculations.	200	Minimum: 0

HeterogeneousTargetParam

HeterogeneousTargetParam defines the configuration parameters for a specific deployment type in heterogeneous scaling.

Appears in:

HeterogeneousTarget

Field	Description	Validation
`target` Target	Target defines the scaling instance configuration for this deployment type.
`cost` integer	Cost defines the relative cost factor used in optimization calculations. This factor balances performance requirements against deployment costs.	Minimum: 0
`minReplicas` integer	MinReplicas defines the minimum number of replicas to maintain for this deployment type.	Maximum: 1e+06 Minimum: 0
`maxReplicas` integer	MaxReplicas defines the maximum number of replicas allowed for this deployment type.	Maximum: 1e+06 Minimum: 1

HomogeneousTarget

HomogeneousTarget defines the configuration for traditional metric-based autoscaling of a single deployment.

Appears in:

AutoscalingPolicyBindingSpec

Field	Description	Validation
`target` Target	Target defines the object to be monitored and scaled.
`minReplicas` integer	MinReplicas defines the minimum number of replicas to maintain.	Maximum: 1e+06 Minimum: 0
`maxReplicas` integer	MaxReplicas defines the maximum number of replicas allowed.	Maximum: 1e+06 Minimum: 1

Metadata

Metadata is a simplified version of ObjectMeta in Kubernetes.

Appears in:

PodTemplateSpec

Field	Description	Default	Validation
`labels` object (keys:string, values:string)	Map of string keys and values that can be used to organize and categorize (scope and select) objects. May match selectors of replication controllers and services. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels
`annotations` object (keys:string, values:string)	Annotations is an unstructured key value map stored with a resource that may be set by external tools to store and retrieve arbitrary metadata. They are not queryable and should be preserved when modifying objects. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations

MetricEndpoint

MetricEndpoint defines the endpoint configuration for scraping metrics from pods.

Appears in:

Target

Field	Description	Default	Validation
`uri` string	Uri defines the HTTP path where metrics are exposed (e.g., "/metrics").	/metrics
`port` integer	Port defines the network port where metrics are exposed by the pods.	8100

ModelBackend

ModelBackend defines the configuration for a model backend.

Appears in:

ModelBoosterSpec

Field	Description	Validation
`name` string	Name is the name of the backend. Can't duplicate with other ModelBackend name in the same ModelBooster CR. Note: update name will cause the old modelInfer deletion and a new modelInfer creation.	Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$`
`type` ModelBackendType	Type is the type of the backend.	Enum: [vLLM vLLMDisaggregated]
`modelURI` string	ModelURI is the URI where you download the model. Support hf://, s3://, pvc://.	Pattern: `^(hf://\|s3://\|pvc://).+`
`cacheURI` string	CacheURI is the URI where the downloaded model stored. Support hostpath://, pvc://.	Pattern: `^(hostpath://\|pvc://).+`
`envFrom` EnvFromSource array	List of sources to populate environment variables in the container. The keys defined within a source must be a C_IDENTIFIER. All invalid keys will be reported as an event when the container is starting. When a key exists in multiple sources, the value associated with the last source will take precedence. Values defined by an Env with a duplicate key will take precedence. Cannot be updated.
`env` EnvVar array	List of environment variables to set in the container. Supported names: "ENDPOINT": When you download model from s3, you have to specify it. "RUNTIME_URL": default is http://localhost:8000 "RUNTIME_PORT": default is 8100 "RUNTIME_METRICS_PATH": default is /metrics "HF_ENDPOINT":The url of hugging face. Default is https://huggingface.co/ Cannot be updated.
`minReplicas` integer	MinReplicas is the minimum number of replicas for the backend.	Maximum: 1e+06 Minimum: 0
`maxReplicas` integer	MaxReplicas is the maximum number of replicas for the backend.	Maximum: 1e+06 Minimum: 1
`workers` ModelWorker array	Workers is the list of workers associated with this backend.	MaxItems: 1000 MinItems: 1
`schedulerName` string	SchedulerName defines the name of the scheduler used by ModelServing for this backend.

ModelBackendType

Underlying type: string

ModelBackendType defines the type of model backend.

Validation:

Enum: [vLLM vLLMDisaggregated]

Appears in:

ModelBackend

Field	Description
`vLLM`	ModelBackendTypeVLLM represents a vLLM backend.
`vLLMDisaggregated`	ModelBackendTypeVLLMDisaggregated represents a disaggregated vLLM backend.
`SGLang`	ModelBackendTypeSGLang represents an SGLang backend.
`MindIE`	ModelBackendTypeMindIE represents a MindIE backend.
`MindIEDisaggregated`	ModelBackendTypeMindIEDisaggregated represents a disaggregated MindIE backend.

ModelBooster

ModelBooster is the Schema for the models API.

Appears in:

ModelBoosterList

Field	Description	Default	Validation
`apiVersion` string	`workload.serving.volcano.sh/v1alpha1`
`kind` string	`ModelBooster`
`spec` ModelBoosterSpec
`status` ModelStatus

ModelBoosterList

ModelBoosterList contains a list of ModelBooster.

Field	Description	Default	Validation
`apiVersion` string	`workload.serving.volcano.sh/v1alpha1`
`kind` string	`ModelBoosterList`
`items` ModelBooster array

ModelBoosterSpec

ModelBoosterSpec defines the desired state of ModelBooster.

Appears in:

ModelBooster

Field	Description	Validation
`name` string	Name is the name of the model. ModelBooster CR name is restricted by kubernetes, for example, can't contain uppercase letters. So we use this field to specify the ModelBooster name.	MaxLength: 64 Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$`
`owner` string	Owner is the owner of the model.
`backend` ModelBackend	Backend is the model backend associated with this model. ModelBackend is the minimum unit of inference instance. It can be vLLM or vLLMDisaggregated.
`autoscalingPolicy` AutoscalingPolicySpec	AutoscalingPolicy references the autoscaling policy to be used for this model.
`modelMatch` ModelMatch	ModelMatch defines the predicate used to match LLM inference requests to a given TargetModels. Multiple match conditions are ANDed together, i.e. the match will evaluate to true only if all conditions are satisfied.

ModelServing

ModelServing is the Schema for the LLM Serving API

Appears in:

ModelServingList

Field	Description	Default	Validation
`apiVersion` string	`workload.serving.volcano.sh/v1alpha1`
`kind` string	`ModelServing`
`spec` ModelServingSpec
`status` ModelServingStatus

ModelServingList

ModelServingList contains a list of ModelServing

Field	Description	Default	Validation
`apiVersion` string	`workload.serving.volcano.sh/v1alpha1`
`kind` string	`ModelServingList`
`items` ModelServing array

ModelServingSpec

ModelServingSpec defines the specification of the ModelServing resource.

Appears in:

ModelServing

Field	Description	Default	Validation
`replicas` integer	Number of ServingGroups. That is the number of instances that run serving tasks Default to 1.	1
`schedulerName` string	SchedulerName defines the name of the scheduler used by ModelServing	volcano
`template` ServingGroup	Template defines the template for ServingGroup
`rolloutStrategy` RolloutStrategy	RolloutStrategy defines the strategy that will be applied to update replicas
`recoveryPolicy` RecoveryPolicy	RecoveryPolicy defines the recovery policy for the failed Pod to be rebuilt	RoleRecreate	Enum: [ServingGroupRecreate RoleRecreate None]

ModelServingStatus

ModelServingStatus defines the observed state of ModelServing

Appears in:

ModelServing

Field	Description	Default	Validation
`observedGeneration` integer	observedGeneration is the most recent generation observed for ModelServing. It corresponds to the ModelServing's generation, which is updated on mutation by the API Server.
`replicas` integer	Replicas track the total number of ServingGroup that have been created (updated or not, ready or not)
`currentReplicas` integer	CurrentReplicas is the number of ServingGroup created by the ModelServing controller from the ModelServing version
`updatedReplicas` integer	UpdatedReplicas track the number of ServingGroup that have been updated (ready or not).
`availableReplicas` integer	AvailableReplicas track the number of ServingGroup that are in ready state (updated or not).

ModelStatus

ModelStatus defines the observed state of ModelBooster.

Appears in:

ModelBooster

Field	Description	Default	Validation
`observedGeneration` integer	ObservedGeneration track of generation

ModelWorker

ModelWorker defines the model worker configuration.

Appears in:

ModelBackend

Field	Description	Default	Validation
`type` ModelWorkerType	Type is the type of the model worker.	server	Enum: [server prefill decode controller coordinator]
`image` string	Image is the container image for the worker.
`replicas` integer	Replicas is the number of replicas for the worker.		Maximum: 1e+06 Minimum: 0
`pods` integer	Pods is the number of pods for the worker.		Maximum: 1e+06 Minimum: 0
`resources` ResourceRequirements	Resources specifies the resource requirements for the worker.
`affinity` Affinity	Affinity specifies the affinity rules for scheduling the worker pods.
`config` JSON	Config contains worker-specific configuration in JSON format. You can find vLLM config here https://docs.vllm.ai/en/stable/configuration/engine_args.html

ModelWorkerType

Underlying type: string

ModelWorkerType defines the type of model worker.

Validation:

Enum: [server prefill decode controller coordinator]

Appears in:

ModelWorker

Field	Description
`server`	ModelWorkerTypeServer represents a server worker.
`prefill`	ModelWorkerTypePrefill represents a prefill worker.
`decode`	ModelWorkerTypeDecode represents a decode worker.
`controller`	ModelWorkerTypeController represents a controller worker.
`coordinator`	ModelWorkerTypeCoordinator represents a coordinator worker.

NetworkTopology

NetworkTopologySpec defines the network topology affinity scheduling policy for the roles and group, it works only when the scheduler supports network topology feature.

Appears in:

ServingGroup

Field	Description	Default	Validation
`groupPolicy` NetworkTopologySpec	GroupPolicy defines the network topology scheduling requirement of all the instances within the `ServingGroup`.
`rolePolicy` NetworkTopologySpec	RolePolicy defines the fine-grained network topology scheduling requirement for instances of a `role`.

PodTemplateSpec

PodTemplateSpec describes the data a pod should have when created from a template

Appears in:

Role

Field	Description	Default	Validation
`metadata` Metadata	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` PodSpec	Specification of the desired behavior of the pod.

RecoveryPolicy

Underlying type: string

Appears in:

ModelServingSpec

Field	Description
`ServingGroupRecreate`	ServingGroupRecreate will recreate all the pods in the ServingGroup if 1. Any individual pod in the group is recreated; 2. Any containers/init-containers in a pod is restarted. This is to ensure all pods/containers in the group will be started in the same time.
`RoleRecreate`	RoleRecreate will recreate all pods in one Role if 1. Any individual pod in the group is recreated; 2. Any containers/init-containers in a pod is restarted.
`None`	NoneRestartPolicy will follow the same behavior as the default pod or deployment.

Role

Role defines the specific pod instance role that performs the inference task.

Appears in:

ServingGroup

Field	Description	Default	Validation
`name` string	The name of a role. Name must be unique within an ServingGroup		MaxLength: 12 Pattern: `^[a-zA-Z0-9]([-a-zA-Z0-9]*[a-zA-Z0-9])?$`
`replicas` integer	The number of a certain role. For example, in Disaggregated Prefilling, setting the replica count for both the P and D roles to 1 results in 1P1D deployment configuration. This approach can similarly be applied to configure a xPyD deployment scenario. Default to 1.	1
`entryTemplate` PodTemplateSpec	EntryTemplate defines the template for the entry pod of a role. Required: Currently, a role must have only one entry-pod.
`workerReplicas` integer	WorkerReplicas defines the number for the worker pod of a role. Required: Need to set the number of worker-pod replicas.
`workerTemplate` PodTemplateSpec	WorkerTemplate defines the template for the worker pod of a role.

RollingUpdateConfiguration

RollingUpdateConfiguration defines the parameters to be used for RollingUpdateStrategyType.

Appears in:

RolloutStrategy

Field	Description	Default	Validation
`maxUnavailable` IntOrString	The maximum number of replicas that can be unavailable during the update. Value can be an absolute number (ex: 5) or a percentage of total replicas at the start of update (ex: 10%). Absolute number is calculated from percentage by rounding down. This can not be 0 if MaxSurge is 0. By default, a fixed value of 1 is used.	1	XIntOrString: {}
`maxSurge` IntOrString	The maximum number of replicas that can be scheduled above the original number of replicas. Value can be an absolute number (ex: 5) or a percentage of total replicas at the start of the update (ex: 10%). Absolute number is calculated from percentage by rounding up. By default, a value of 0 is used.	0	XIntOrString: {}
`partition` integer	Partition indicates the ordinal at which the ModelServing should be partitioned for updates. During a rolling update, all ServingGroups from ordinal Replicas-1 to Partition are updated. All ServingGroups from ordinal Partition-1 to 0 remain untouched. The default value is 0.

RolloutStrategy

RolloutStrategy defines the strategy that the ModelServing controller will use to perform replica updates.

Appears in:

ModelServingSpec

Field	Description	Default	Validation
`type` RolloutStrategyType	Type defines the rollout strategy, it can only be “ServingGroupRollingUpdate” for now.	ServingGroupRollingUpdate	Enum: [ServingGroupRollingUpdate]
`rollingUpdateConfiguration` RollingUpdateConfiguration	RollingUpdateConfiguration defines the parameters to be used when type is RollingUpdateStrategyType. optional

RolloutStrategyType

Underlying type: string

Appears in:

RolloutStrategy

Field	Description
`ServingGroupRollingUpdate`	ServingGroupRollingUpdate indicates that ServingGroup replicas will be updated one by one.

SelectPolicyType

Underlying type: string

SelectPolicyType defines the selection strategy type for scaling operations.

Validation:

Enum: [Or And]

Appears in:

AutoscalingPolicyStablePolicy

Field	Description
`Or`
`And`

ServingGroup

ServingGroup is the smallest unit to complete the inference task

Appears in:

ModelServingSpec

Field	Description	Default	Validation
`restartGracePeriodSeconds` integer	RestartGracePeriodSeconds defines the grace time for the controller to rebuild the ServingGroup when an error occurs Defaults to 0 (ServingGroup will be rebuilt immediately after an error)	0
`gangPolicy` GangPolicy	GangPolicy defines the gang scheduler config.
`networkTopology` NetworkTopology	NetworkTopology defines the network topology affinity scheduling policy for the roles of the `ServingGroup`, it works only when the scheduler supports network topology-aware scheduling.
`roles` Role array			MaxItems: 4 MinItems: 1

SubTarget

Appears in:

Target

Field	Description	Default	Validation
`kind` string
`name` string

Target

Target defines a ModelServing deployment that can be monitored and scaled.

Appears in:

HeterogeneousTargetParam
HomogeneousTarget

Field	Description	Default	Validation
`targetRef` ObjectReference	TargetRef references the target object to be monitored and scaled. Default target GVK is ModelServing. Currently supported kinds: ModelServing.
`subTargets` SubTarget	SubTarget defines the sub-target object to be monitored and scaled. Currently supported kinds: `Role` when TargetRef kind is ModelServing.
`metricEndpoint` MetricEndpoint	MetricEndpoint defines the configuration for scraping metrics from the target pods.

Packages​

workload.serving.volcano.sh/v1alpha1​

Resource Types​

AutoscalingPolicy​

AutoscalingPolicyBehavior​

AutoscalingPolicyBinding​

AutoscalingPolicyBindingList​

AutoscalingPolicyBindingSpec​

AutoscalingPolicyBindingStatus​

AutoscalingPolicyList​

AutoscalingPolicyMetric​

AutoscalingPolicyPanicPolicy​

AutoscalingPolicyScaleUpPolicy​

AutoscalingPolicySpec​

AutoscalingPolicyStablePolicy​

AutoscalingPolicyStatus​

GangPolicy​

HeterogeneousTarget​

HeterogeneousTargetParam​

HomogeneousTarget​

Metadata​

MetricEndpoint​

ModelBackend​

ModelBackendType​

ModelBooster​

ModelBoosterList​

ModelBoosterSpec​

ModelServing​

ModelServingList​

ModelServingSpec​

ModelServingStatus​

ModelStatus​

ModelWorker​

ModelWorkerType​

NetworkTopology​

PodTemplateSpec​

RecoveryPolicy​

Role​

RollingUpdateConfiguration​

RolloutStrategy​

RolloutStrategyType​

SelectPolicyType​

ServingGroup​

SubTarget​

Target​

Packages

workload.serving.volcano.sh/v1alpha1

Resource Types

AutoscalingPolicy

AutoscalingPolicyBehavior

AutoscalingPolicyBinding

AutoscalingPolicyBindingList

AutoscalingPolicyBindingSpec

AutoscalingPolicyBindingStatus

AutoscalingPolicyList

AutoscalingPolicyMetric

AutoscalingPolicyPanicPolicy

AutoscalingPolicyScaleUpPolicy

AutoscalingPolicySpec

AutoscalingPolicyStablePolicy

AutoscalingPolicyStatus

GangPolicy

HeterogeneousTarget

HeterogeneousTargetParam

HomogeneousTarget

Metadata

MetricEndpoint

ModelBackend

ModelBackendType

ModelBooster

ModelBoosterList

ModelBoosterSpec

ModelServing

ModelServingList

ModelServingSpec

ModelServingStatus

ModelStatus

ModelWorker

ModelWorkerType

NetworkTopology

PodTemplateSpec

RecoveryPolicy

Role

RollingUpdateConfiguration

RolloutStrategy

RolloutStrategyType

SelectPolicyType

ServingGroup

SubTarget

Target