A Training Job Created Using a Custom Image Is Always in the Running State
Symptom
A training job created using a custom image is always in the running state.
Cause Analysis and Solution
The log message below indicates that the CPU architecture of the custom image does not match that of the resource pool node.
standard_init_linux.go:215: exec user process caused "exec format error" libcontainer: container start initialization failed: standard_init_linux.go:215: exec user process caused "exec format error"
This usually happens when the resource type and specifications are incorrectly set during job creation. For example, a custom image that uses the Arm CPU architecture should have NPU specifications, but x86 CPU or x86 GPU specifications are chosen instead.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot