Help Center/ DataArts Fabric/ API Reference/ Examples/ Creating an Inference Service
Updated on 2025-09-15 GMT+08:00

Creating an Inference Service

Overview

This section describes how to create an inference service by calling APIs.

This process assumes that the end tenant has been authorized on the console to use the DataArtsFabric service. For details about how to call APIs, see Calling APIs.

Preparations

hostname: Obtain the value from Regions and Endpoints.

Procedure

  1. Call the API for creating a workspace to create a workspace and record the workspace ID returned by the API.

    Example request

    POST https://{hostname}/v1/workspaces
    Body:
    {
      "name": "apieworkspace",
      "description": "apie test workspace"
    }
    Example response
    {   
        "id": "e935d0ef-f4eb-4b95-aff1-9d33ae9f57a6",   
        "name": "fabric",   
        "description": "fabric",   
        "create_time": "2023-05-30T12:24:30.401Z",   
        "create_domain_name": "admin",   
        "create_user_name": "user",   
        "metastore_id": "2180518f-42b8-4947-b20b-adfc53981a25",   
        "access_url": "https://:test.fabric.com/",   
        "enterprise_project_id": "01049549-82cd-4b2b-9733-ddb94350c125" 
    }
  2. Call the API for creating an endpoint to create an inference endpoint and record the endpoint ID returned by the API.

    Example request

    POST https://{hostname}/v1/workspaces/{workspace_id}/endpoints

    workspace_id: the workspace ID recorded in step 1.

    Body:
    {
      "name": "apie_test",
      "description": "apie test endpoint",
      "type": "inference",
      "reserved_resource": {
        "mu": {
          "spec_code": "mu.llama3.8b",
          "min": 0,
          "max": 1
        }
      }
    }
    Example response
    {
      "visibility": "PRIVATE",
      "id": "0b5633ba2b904511ad514346f4d23d4b",
      "name": "endpoint1",
      "type": "inference",
      "status": "CREATING",
      "description": "description",
      "create_time": "2023-05-30T12:24:30.401Z",
      "update_time": "2023-05-30T12:24:30.401Z",
      "owner": {
        "domain_name": "string",
        "domain_id": "xxx",
        "user_name": "string",
        "user_id": "xxx"
      }
      "reserved_resource": {
        "mu": {
          "spec_code": "mu.llama3.8b",
          "min": 0,
          "max": 1
        }
      }
    }
  3. Call the API for creating a model to create a private model and record the model ID returned by the API.

    Example request

    POST https://{hostname}/v1/workspaces/{workspace_id}/models

    workspace_id: the workspace ID recorded in step 1.

    Body:

    {
      "name": "LLama3-8b",
      "description": "this is a apie test model",
      "type": "LLM_MODEL",
      "version": {
        "name": "v1",
        "description": "test description",
        "config": {
          "llm_model_config": {
            "base_model_type": "",
            "model_path": ""
          }
        }
      }
    }

    Example response

    {   
       "id": "ac8111bf-3601-4905-8ddd-b41d3e636a4e"}
    }
  4. Call the API for creating an inference service to create an inference service and record the inference service ID returned by the API.

    Example request

    POST https://{hostname}/v1/workspaces/{workspace_id}/services/instances

    workspace_id: the workspace ID recorded in step 1.

    Body:

    {
      "source": {
        "id": ""
      },
      "name": "test_serviceInstanceName",
      "description": "description",
      "endpoint_id": ""}
    • id: the model ID returned by the API in step 3.
    • endpoint_id: the inference endpoint ID returned by the API in step 2.

    Example response

    {   
       "id": "b935d0ef-f4eb-4b95-aff1-9d33ae9f57b6" 
    }
  5. Call the API for inference request to initiate an inference request.

    Example request

    POST https://{hostname}/v1/workspaces/{workspace_id}/services/instances/{instance_id}/invocations
    • workspace_id: the workspace ID recorded in step 1.
    • instance_id: the inference service ID recorded in step 4.

    Body:

    {
      "messages": [
        {
          "role": "user",
          "content": "hello"
        }
      ]
    }

    Response example: The API for inference request returns results in streaming mode.

    {
      "id": "chatcmpl-62dda7304f53451c9477e0",
      "object": "chat.completion.chunk",
      "created": 1730120529,
      "model": "ada1d67d-f2a1-4e77-838f-0d8688d756f4",
      "choices": [
        {
          "index": 0,
          "delta": {
            "role": "assistant",
            "content": "\n\nHello! LLM stands for Large Language Model. It refers to artificial intelligence models, like myself,"
          },
          "finish_reason": null
        }
      ],
      "system_fingerprint": null,
      "usage": null
    }