Help Center/ ModelArts/ Troubleshooting/ Training Jobs/ Running a Training Job Failed/ A Training Job Created Using a Custom Image Is Always in the Running State
Updated on 2024-01-26 GMT+08:00

A Training Job Created Using a Custom Image Is Always in the Running State

Symptom

A training job created using a custom image is always in the running state.

Cause Analysis and Solution

The log message below indicates that the CPU architecture of the custom image does not match that of the resource pool node.

standard_init_linux.go:215: exec user process caused "exec format error"
libcontainer: container start initialization failed: standard_init_linux.go:215: exec user process caused "exec format error"

This usually happens when the resource type and specifications are incorrectly set during job creation. For example, a custom image that uses the Arm CPU architecture should have NPU specifications, but x86 CPU or x86 GPU specifications are chosen instead.