Help Center/ ModelArts/ SDK Reference/ Service Management/ Deploying a Local Service for Debugging
Updated on 2024-03-21 GMT+08:00

Deploying a Local Service for Debugging

Debug a service locally for a real-time service. This does not require in-cloud resources. To do so, import a model or debug a model by following the operations provided in Importing a Model or Debugging a Model, and deploy a predictor locally for local inference.

The local service predictor can be deployed only on the Linux platform. Use ModelArts notebook to deploy local services.

  • Local service predictor and real-time service predictor
    • Deploying a local service predictor is to deploy the model file to a local environment. The environment specifications depend on the local host. For example, you can deploy predictor in a notebook instance of the modelarts.vm.cpu.2u flavor.
    • Deploying the predictor in Deploying a Real-Time Service is to deploy the model file stored in OBS to the container provided by the Service Deployment module. The environment specifications (such as CPU and GPU specifications) are determined by configs parameters of predictor.
    • To deploy the predictor in Deploying a Real-Time Service, you must create a container based on the AI engine, which is time-consuming. Deploying a local service predictor takes a maximum of 10 seconds. Local service predictors can be used to test models but are not recommended for industrial applications of models.
  • In this version, the following AI engines can be used to deploy a local service predictor: XGBoost, Scikit_Learn, PyTorch, TensorFlow, and Spark_MLlib. For details about the version, see Supported AI engines and their runtime.

Sample Code

In ModelArts notebook, you do not need to enter authentication parameters for session authentication. For details about session authentication of other development environments, see Session Authentication.

Sample code of local TensorFlow 1.8 inference

Configure tensorflow_model_server in the environment. You can call the SDK API to quickly configure it. For details, see the following sample code.

  • In the CPU-based environment, call Model.configure_tf_infer_environ(device_type="CPU") to complete the configuration. You only need to configure the item and run it once in the environment.
  • In the GPU-based environment, call Model.configure_tf_infer_environ(device_type="GPU") to complete the configuration. You only need to configure the item and run it once in the environment.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
from modelarts.session import Session
from modelarts.model import Model
from modelarts.config.model_config import ServiceConfig

session = Session()
# GPU-based environment inference configuration
Model.configure_tf_infer_environ(device_type="GPU")  
# CPU-based environment inference configuration
#Model.configure_tf_infer_environ(device_type="CPU")   

model_instance = Model(
                     session, 
                     model_name="input_model_name",              # Model name
                     model_version="1.0.0",                      # Model version
                     source_location=model_location,             # Model file path
                     model_type="MXNet",                         # Model type
                     model_algorithm="image_classification",     # Model algorithm
                     execution_code="OBS_PATH",                        
                      input_params=input_params,                  # For details, see the input_params format description.
                     output_params=output_params,                # For details, see the output_params format description.
                      dependencies=dependencies,                  # For details, see the dependencies format description.
                     apis=apis)

configs = [ServiceConfig(model_id=model_instance.get_model_id(), weight="100", instance_count=1, 
                         specification="local")]
predictor_instance = model_instance.deploy_predictor(configs=configs)
if predictor_instance is not None:
    predict_result = predictor_instance.predict(data="your_raw_data_or_data_path", data_type="your_data_type")     # Local inference and prediction. data can be raw data or a file path, and data_type can be JSON, files, or images.
    print(predict_result)

Parameters

Table 1 Parameters for deploying a local service predictor

Parameter

Mandatory

Type

Description

service_name

No

String

Name of a service that consists of 1 to 64 characters and must start with a letter. Only letters, digits, underscores (_), and hyphens (-) are allowed.

configs

Yes

JSON Array

Local service configurations

Table 2 configs parameters of predictor

Parameter

Mandatory

Type

Description

model_id

Yes

String

Model ID. Obtain the value by calling the API described in Obtaining Models or from the ModelArts management console.

weight

Yes

Integer

Traffic weight allocated to a model. When a local service predictor is deployed, set this parameter to 100.

specification

Yes

String

When a local service is deployed, set this parameter to local.

instance_count

Yes

Integer

Number of instances deployed in a model. The maximum number of instances is 5. When a local service predictor is deployed, set this parameter to 1.

envs

No

Map<String, String>

(Optional) Environment variable key-value pair required for running a model. By default, this parameter is left blank.

Table 3 Parameters returned for deploying a local service predictor

Parameter

Mandatory

Type

Description

predictor

Yes

Predictor object

Predictor object, which contains only the attributes in Testing an Inference Service