Creating a Custom Fault

Scenarios

You can create a failure mode to perform routine drills for potential faults and verify whether the fault recovery measures and fault impact meet the expectation. This helps you better prepare for various challenges.

Precautions

A custom fault is determined by the script you compiled. Therefore, when scripts are used to attack ECSs, exceptions such as high resource usage and network faults may occur. As a result, the status of the UniAgent installed on the ECSs may change to offline or abnormal. Exercise caution when performing this operation.

Creating a Custom Fault

Create a drill task for a custom fault attack scenario on COC.

Log in to COC.
In the navigation pane on the left, choose Resilience Center > Chaos Drills. On the displayed page, click the Drill Tasks tab and create an attack task by referring to Creating and Managing Drill Tasks.
Enter the attack task name, select Elastic Cloud Server (ECS) as Source of Attack Target, and click Next.

Figure 1 Selecting ECS as the attack target source
On the Select Attack Scenario procedure, click Custom fault, and then Custom Scripts.
- If a user-defined fault script exists, you can select an existing script. In this case, go to 6.
- If no user-defined fault script has been created or the existing script does not apply to the current scenario, perform step 5 to create a script.
Figure 2 Selecting custom fault as the attack scenario

To create a custom fault script, click Scripts. The Automated O&M > Scripts page is displayed. Click Create Script. For details about how to create a script, see section Creating a Custom Script. For details about the script specifications, see the following code:

      
       
         
         
           #!/bin/bash
set +x

function usage() {
    echo "Usage: {inject_fault|check_fault_status|rollback|clean}"
    exit 2
}

function inject_fault()
{
    echo "inject fault"
}

function check_fault_status()
{
    echo "check fault status"
}

function rollback()
{
    echo "rollback"
}

function clean()
{
    echo "clean"
}

case "$ACTION" in
    inject_fault)
        inject_fault
    ;;
    check_fault_status)
        check_fault_status
    ;;
    rollback)
        if [[ X"${CAN_ROLLBACK}" == X"true" ]]; then
            rollback
        else
            echo "not support to rollback"
        fi
    ;;
    clean)
        clean
    ;;
    *)
        usage
;;
esac

          

        

      
     

You are advised to define a custom fault script based on the preceding script specifications. In the preceding specifications, you can define the fault injection function, fault check function, fault rollback function, and environment clearing function by compiling customized content in the inject_fault(), check_fault_status(), rollback() and clean() functions.

According to the preceding specifications, there are two mandatory script parameters: Whether other script parameters are included depends on your script content.

**Table 1** Mandatory parameters for customizing a fault script
Parameter	Value	Description
ACTION	inject_fault	Drill operation action. The value is automatically changed by the system background in different drill phases. The options are as follows: inject_fault: The drill is in the fault injection phase. check_fault_status: The drill is in the fault query phase. rollback: The drill is in the phase of canceling the fault injection. clean: The drill is in the environment clearing phase.
CAN_ROLLBACK	false	Whether rollback is supported. The options are as follows: true: When the drill is in the phase of canceling the fault injection, the rollback() function is executed. false: When the drill is in the phase of canceling the fault injection, the rollback() function is not executed.

In the inject_fault function, add a label indicating that the fault injection is successful, and check whether the label exists in the check_fault_status function.

If it does, the check_fault_status function can return normally (for example, exit 0).
If it does not, the check_fault_status function can return an exception (for example, exit 1).

If you already have a custom script, you can select the script by its name from the drop-down list. After the script is selected, the corresponding script content and parameters are automatically displayed on the tab page.

Figure 3 Selecting a custom script
Set the timeout interval and click Next.

Timeout Interval: specifies the maximum time allowed for executing a script. Note: The timeout interval must be longer than the actual script execution time. It is recommended that the timeout interval be at least 30 seconds longer than the actual script execution time.
Complete the remaining steps by referring to Creating and Managing Drill Tasks. Then, you can create a drill task whose attack scenario is set to Custom fault.

Custom Script Example

The following is an example of a customized script.

The file content is listed as follows:

      
       
         
         
           #!/bin/bash
set +x
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:~/bin
export PATH


function usage() {
    echo "Usage: {inject_fault|check_fault_status|rollback|clean}"
    exit 2
}

function inject_fault()
{
    echo "============start inject fault============"
    if [ ! -d "${SCRIPT_PATH}/${DIR_NAME}" ]; then
        mkdir -p "${SCRIPT_PATH}/${DIR_NAME}"
        echo "mkdir ${SCRIPT_PATH}/${DIR_NAME} successfully"
    fi

    cd "${SCRIPT_PATH}/${DIR_NAME}"

    if [ ! -f ${FILE} ]; then
        touch "${FILE}"
        echo "create tmp file ${FILE}"
        touch inject.log
        chmod u+x "${FILE}"
        chmod u+x inject.log
    else
        echo "append content">${FILE}
    fi
    echo "successfully inject">${FILE}
    echo "============end inject fault============"
}

function check_fault_status()
{
    echo "============start check fault status============"
    if [ ! -d "${SCRIPT_PATH}/${DIR_NAME}" ]; then
        echo "inject has been finished"
        exit 0
    fi
    cd "${SCRIPT_PATH}/${DIR_NAME}"
    SUCCESS_FLAG="successfully inject"

    if [ -f ${FILE} ]; then
        if [[ "$(sed -n '1p' ${FILE})" = "${SUCCESS_FLAG}" ]]; then
            echo "fault inject successfully"
        else
            echo "fault inject failed"
            exit 1
        fi
    else
        echo "inject finished"
        exit 0
    fi
    sleep ${DURATION}
    echo "============end check fault status============"
}

function rollback()
{
    echo "============start rollback============"
    cd "${SCRIPT_PATH}"
    if [ -d $DIR_NAME ]; then
        rm -rf "${SCRIPT_PATH}/${DIR_NAME}"
    fi
    echo "============end rollback============"
}

function clean()
{
    echo "============start clean============"
    cd "${SCRIPT_PATH}"
    if [ -d $DIR_NAME ]; then
        rm -rf "${SCRIPT_PATH}/${DIR_NAME}"
    fi
    echo "============end clean============"
}

case "$ACTION" in
    inject_fault)
        inject_fault
    ;;
    check_fault_status)
        check_fault_status
    ;;
    rollback)
        if [[ X"${CAN_ROLLBACK}" == X"true" ]]; then
            rollback
        else
            echo "not support to rollback"
        fi
    ;;
    clean)
        clean
    ;;
    *)
        usage
;;
esac

          

        

      
     

The input parameters of the script are described as follows:

**Table 2** Script input parameters of the customized script example
Parameter	Value	Description
ACTION	inject_fault	Drill operation action
CAN_ROLLBACK	false	Rollback is not supported.
SCRIPT_PATH	/tmp	Root directory of the custom fault log
DIR_NAME	test_script	Parent directory of the custom fault log
FILE	test.log	Custom fault log name
DURATION	10	Duration of a simulated custom fault, in seconds. (This parameter does not take effect when it is placed in the inject_fault function.)

In the sample inject_fault function, the injected fault is to create a {FILE} file and add content to the {FILE} file. If successfully inject is entered in the {FILE} file, the fault injection is successful.
In the example, the check_fault_status function checks whether the file specified in {FILE} exists. If no, the fault may have been rectified. In this case, exit 1 is returned. If yes, check whether the label indicating that the fault injection is successful exists. If the label exists, the fault injection is successful. Here, sleep {DURATION} is used to simulate the fault duration. If the label does not exist, the fault injection fails.