Help Center/ Cloud Container Engine/ FAQs/ Chart and Add-on/ Why Is a VolcanoJob (vcjob) Resource Unable to Function Properly After the Volcano Scheduler Add-on Upgrade?
Updated on 2025-05-21 GMT+08:00

Why Is a VolcanoJob (vcjob) Resource Unable to Function Properly After the Volcano Scheduler Add-on Upgrade?

Symptom

After the Volcano Scheduler add-on is upgraded from 1.4.7 or earlier to a version later than 1.4.7, a newly created VolcanoJob (vcjob) resource cannot run properly. The error information in the API server logs and that reported by the volcano-admission component are as follows:

The error information in the API server logs:
...
W0318 14:57:51.376736      13 dispatcher.go:142] Failed calling webhook, failing open validatejob.volcano.sh: failed calling webhook "validatejob.volcano.sh": failed to call webhook: Post "https://volcano-admission-service.kube-system.svc:443/jobs/validate?timeout=30s": EOF 
E0318 14:57:51.376768      13 dispatcher.go:149] failed calling webhook "validatejob.volcano.sh": failed to call webhook: Post "https://volcano-admission-service.kube-system.svc:443/jobs/validate?timeout=30s": EOF
...

The error information reported by the volcano-admission component:

... no kind "AdmissionReview" is registered for version "admission.k8s.io/v1beta1" ...

Possible Cause

The webhooks.admissionReviewVersions field information in the early version is incompatible with that in the upgraded version.

In earlier versions (1.4.7 or earlier), the value of the webhooks.admissionReviewVersions field of the MutatingWebhookConfiguration and ValidatingWebhookConfiguration resource objects is v1beta1. In later versions (1.4.7 or later), the value of this field is v1. If the HA mode is used during the add-on upgrade, and the number of volcano-admission replicas is increased, the ReplicaSet of the old version will start a pod of the old version. If the pod is not destroyed in a timely manner, it will forcibly overwrite the configurations in the MutatingWebhookConfiguration and ValidatingWebhookConfiguration resource objects and reset the value of the webhooks.admissionReviewVersions field to v1beta1. Kubernetes cannot identify the field, so the created vcjob resource cannot run properly.

For details about the MutatingWebhookConfiguration and ValidatingWebhookConfiguration resource objects, see Table 1.

Table 1 Involved resource objects

Resource Type

Resource Name

MutatingWebhookConfiguration

volcano-admission-service-jobs-mutate

volcano-admission-service-podgroups-mutate

volcano-admission-service-queues-mutate

volcano-admission-service-pods-mutate

ValidatingWebhookConfiguration

volcano-admission-service-jobs-validate

volcano-admission-service-pods-validate

volcano-admission-service-queues-validate

Solution

Change the value of webhooks.admissionReviewVersions of the following resource objects from v1beta1 to v1 and then create the vcjob again.

  1. Run the kubectl edit command to modify the fields in Table 1 one by one. The following uses volcano-admission-service-jobs-mutate as an example to describe how to modify the webhooks.admissionReviewVersions field.

    Modify the YAML file of volcano-admission-service-jobs-mutate:
    kubectl edit MutatingWebhookConfiguration volcano-admission-service-jobs-mutate
    In the YAML file, press i to edit the file content and change the value of webhooks.admissionReviewVersions from v1beta1 to v1.
    apiVersion: admissionregistration.k8s.io/v1
    kind: MutatingWebhookConfiguration
    metadata:
      annotations:
        meta.helm.sh/release-name: cceaddon-volcano
        meta.helm.sh/release-namespace: kube-system
      creationTimestamp: "2025-02-25T08:11:52Z"
      generation: 2
      labels:
        app.kubernetes.io/managed-by: Helm
        release: cceaddon-volcano
      name: volcano-admission-service-jobs-mutate
      resourceVersion: "252406"
      uid: 7e9bdaaf-1b6c-4975-a171-ada8456c12e5
    webhooks:
    - admissionReviewVersions:
      - v1beta1   # Change the value to v1.
     ...

    After the modification is complete, press Esc to exit the editing and enter :wq to save the modification.

  2. After the webhooks.admissionReviewVersions fields of the objects in Table 1 are modified, delete the vcjob resource that fails to run properly.

    kubectl delete vcjob -n namespace vcjob_name

    Information similar to the following is displayed:

    vcjob vcjob_name deleted

  3. Recreate the vcjob resource:

    kubectl create -f vcjob.yaml   # Replace vcjob.yaml with the YAML file for creating the vcjob resource.

    Information similar to the following is displayed:

    job.batch.volcano.sh/vcjob_name created

  4. Check whether the vcjob resource has been created:

    kubectl get vcjob_name -n namespace

    If the value of STATUS is Running, the vcjob resource has been created.

    NAME         STATUS    MINAVAILABLE   RUNNINGS   AGE
    vcjob_name   Running   1                         2m30s

Workaround

Before upgrading the add-on, you can take the following measures to avoid this problem:

  • To upgrade the Volcano Scheduler add-on from version 1.4.7 or earlier to any version up to and including 1.13.7, go to the Upgrade Add-on page. Set Add-on Specifications to Preset and choose Standalone, or set Add-on Specifications to Custom and configure the number of pods to 1. After making these changes, proceed with upgrading the add-on. After the add-on is upgraded, find the Volcano Scheduler add-on on the Add-ons page and click Edit. In the window that slides out from the right, change the add-on specifications as required.
  • To upgrade the Volcano Scheduler add-on from version 1.4.7 or earlier to a version later than 1.13.7, go to the Upgrade Add-on page. Set Add-on Specifications to Custom, set the number of replicas of the volcano-admission component to 1, and proceed with upgrading the add-on. After the add-on is upgraded, find the Volcano Scheduler add-on on the Add-ons page and click Edit. In the window that slides out from the right, change the add-on specifications as required.