Help Center/ ModelArts/ ModelArts User Guide (Standard)/ History/ Inference Deployment(To go offline)/ Deploying a Model as Real-Time Inference Jobs/ Accessing a Real-Time Service Using Different Protocols/ Accessing a Real-Time Service Using Server-Sent Events

Updated on 2025-05-28 GMT+08:00

View PDF

Accessing a Real-Time Service Using Server-Sent Events

Context

Server-Sent Events (SSE) is a server push technology enabling a server to push events to a client via an HTTP connection. This technology is usually used to enable a server to push real-time data to a client, for example, a chat application or a real-time news update.

SSE primarily facilitates unidirectional real-time communication from the server to the client, such as streaming ChatGPT responses. In contrast to WebSockets, which provide bidirectional real-time communication, SSE is designed to be more lightweight and simpler to implement.

Prerequisites

The image for importing the model is SSE-compliant.

Constraints

SSE supports only the deployment of real-time services.
It supports only real-time services deployed using models imported from custom images.
When you call an API to access a real-time service, the size of the prediction request body and the prediction time are subject to the following limitations:
- The size of a request body cannot exceed 12 MB. Otherwise, the request will fail.
- Due to the limitation of API Gateway, the prediction duration of each request does not exceed 40 seconds.

Calling an SSE Real-Time Service

The SSE protocol itself does not introduce new authentication mechanisms; it relies on the same methods as HTTP requests.

The following section uses GUI software Postman for prediction and token authentication as an example to describe how to call an SSE service.

Figure 1 Calling an SSE service

Figure 2 Response header Content-Type

In normal cases, the value of Content-Type in the response header is text/event-stream;charset=UTF-8.

Parent topic: Accessing a Real-Time Service Using Different Protocols

Previous topic: Accessing a Real-Time Service Using Different Protocols

Next topic: Deploying a Model as a Batch Inference Service

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel