Accessing a Real-Time Service Using Server-Sent Events

Context

Server-Sent Events (SSE) is a server push technology enabling a server to push events to a client via an HTTP connection. This technology is usually used to enable a server to unidirectionally push real-time data to a client, for example, a real-time news update or stock prices.

SSE primarily facilitates unidirectional real-time communication from the server to the client, such as streaming ChatGPT responses. In contrast to WebSockets, which provide bidirectional real-time communication, SSE is designed to be more lightweight and simpler to implement.

Key features of SSE include:

Easy to use: SSE is based on the HTTP protocol and is straightforward to implement. No complex configurations or additional libraries are needed, and data can be pushed in real-time over standard HTTP connections.
Automatic reconnection: SSE supports automatic reconnection. If the connection is interrupted, the client automatically attempts to reconnect, ensuring continuous data delivery.
Unidirectional communication: SSE is unidirectional, meaning the server can send events to the client, but the client cannot send data back to the server through the same connection.
Low resource usage: SSE uses HTTP connections, making it less resource-intensive compared to other real-time communication protocols like WebSocket. This makes SSE ideal for lightweight real-time data push scenarios.

Prerequisites

The image for importing the model is SSE-compliant.

Constraints

SSE supports only the deployment of real-time services.
It supports only real-time services deployed using models imported from custom images.
When you call an API to access a real-time service, the size of the prediction request body and the prediction time are subject to the following limitations:
- The size of a request body cannot exceed 12 MB. Otherwise, the request will fail.
- Due to the limitation of API Gateway, the prediction duration of each request does not exceed 40 seconds.