Creating an Inference Service
Overview
This section describes how to create an inference service by calling APIs.
This process assumes that the end tenant has been authorized on the console to use the DataArtsFabric service. For details about how to call APIs, see Calling APIs.
Preparations
hostname: Obtain the value from Regions and Endpoints.
Procedure
- Call the API for creating a workspace to create a workspace and record the workspace ID returned by the API.
POST https://{hostname}/v1/workspaces
Body:{ "name": "apieworkspace", "description": "apie test workspace" } Example response { "id": "e935d0ef-f4eb-4b95-aff1-9d33ae9f57a6", "name": "fabric", "description": "fabric", "create_time": "2023-05-30T12:24:30.401Z", "create_domain_name": "admin", "create_user_name": "user", "metastore_id": "2180518f-42b8-4947-b20b-adfc53981a25", "access_url": "https://:test.fabric.com/", "enterprise_project_id": "01049549-82cd-4b2b-9733-ddb94350c125" }
- Call the API for creating an endpoint to create an inference endpoint and record the endpoint ID returned by the API.
POST https://{hostname}/v1/workspaces/{workspace_id}/endpoints
workspace_id: the workspace ID recorded in step 1.
Body:{ "name": "apie_test", "description": "apie test endpoint", "type": "inference", "reserved_resource": { "mu": { "spec_code": "mu.llama3.8b", "min": 0, "max": 1 } } }
Example response{ "visibility": "PRIVATE", "id": "0b5633ba2b904511ad514346f4d23d4b", "name": "endpoint1", "type": "inference", "status": "CREATING", "description": "description", "create_time": "2023-05-30T12:24:30.401Z", "update_time": "2023-05-30T12:24:30.401Z", "owner": { "domain_name": "string", "domain_id": "xxx", "user_name": "string", "user_id": "xxx" } "reserved_resource": { "mu": { "spec_code": "mu.llama3.8b", "min": 0, "max": 1 } } }
- Call the API for creating a model to create a private model and record the model ID returned by the API.
POST https://{hostname}/v1/workspaces/{workspace_id}/models
workspace_id: the workspace ID recorded in step 1.
Body:
{ "name": "LLama3-8b", "description": "this is a apie test model", "type": "LLM_MODEL", "version": { "name": "v1", "description": "test description", "config": { "llm_model_config": { "base_model_type": "", "model_path": "" } } } }
Example response
{ "id": "ac8111bf-3601-4905-8ddd-b41d3e636a4e"} }
- Call the API for creating an inference service to create an inference service and record the inference service ID returned by the API.
POST https://{hostname}/v1/workspaces/{workspace_id}/services/instances
workspace_id: the workspace ID recorded in step 1.
Body:
{ "source": { "id": "" }, "name": "test_serviceInstanceName", "description": "description", "endpoint_id": ""}
- id: the model ID returned by the API in step 3.
- endpoint_id: the inference endpoint ID returned by the API in step 2.
Example response
{ "id": "b935d0ef-f4eb-4b95-aff1-9d33ae9f57b6" }
- Call the API for inference request to initiate an inference request.
POST https://{hostname}/v1/workspaces/{workspace_id}/services/instances/{instance_id}/invocations
- workspace_id: the workspace ID recorded in step 1.
- instance_id: the inference service ID recorded in step 4.
Body:
{ "messages": [ { "role": "user", "content": "hello" } ] }
Response example: The API for inference request returns results in streaming mode.
{ "id": "chatcmpl-62dda7304f53451c9477e0", "object": "chat.completion.chunk", "created": 1730120529, "model": "ada1d67d-f2a1-4e77-838f-0d8688d756f4", "choices": [ { "index": 0, "delta": { "role": "assistant", "content": "\n\nHello! LLM stands for Large Language Model. It refers to artificial intelligence models, like myself," }, "finish_reason": null } ], "system_fingerprint": null, "usage": null }
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot