Updated on 2024-11-26 GMT+08:00

Multipart Upload

Scenarios

Multipart upload allows you to upload a single object as a group of parts. Each part is a contiguous one of the object data. You can upload these parts separately and in any sequence. If a part fails to be uploaded, you can upload it again without affecting other parts. After all parts are uploaded, OBS assembles these parts to create the object.

Generally, if the size of an object reaches 100 MB, multipart upload is recommended. For example, if you want to upload a 500 MB object to an OBS bucket, you can use OBS Browser+ for multipart upload. OBS Browser+ divides the object into small parts and then uploads the parts. Alternatively, you can call the multipart upload API, improving upload efficiency and reducing failures.

Advantages of Multipart Upload

  • Improved throughput: You can upload parts in parallel to improve throughput.
  • Quick recovery from network errors: Small parts minimize the impact of restarting a failed uploading caused by network errors.
  • Convenient pause and resuming of object uploads: You can upload parts at any time. A multipart upload will not expire after being initiated. You must explicitly complete or abort a multipart upload.
  • Starting uploading before knowing the object size: You can upload an object while creating it.

Constraints

Table 1 Constraints on multipart upload

Item

Constraints

Maximum object size

48.8 TB

Maximum number of parts for each upload task

10,000

Part number

1 to 10,000 (inclusive)

Part size

5 MB to 5 GB. The size of the last part is between 0 bytes to 5 GB.

Maximum number of parts returned in response to the request for listing uploaded parts

1,000

Maximum number of multipart uploads returned in response to the request for listing initiated multipart uploads

1,000

If you have over 48.8 TB data to upload, refer to Migrating Local Data to OBS.

Multipart Upload Process

Figure 1 Multipart upload
  1. Divide a file to be uploaded into parts.
  2. Initiate a multipart upload.

    When you send a request to start multipart upload, OBS returns a response with the upload ID, which is the unique identifier of the multipart upload. This ID must be included in the request for uploading parts, listing uploaded parts, completing a multipart upload, or aborting a multipart upload.

  3. Upload parts.

    When uploading a part, you must specify the upload ID and a part number. You can select any part number between 1 and 10,000. A part number uniquely identifies a part and its location in the object you are uploading. If the number of an uploaded part is used to upload a new part, the uploaded part will be overwritten.

    Whenever you upload a part, OBS returns the ETag header in the response. For each part upload, you must record the part number and the ETag value. These part numbers and ETag values are required in subsequent requests to complete the multipart upload task.

    When concurrent upload operations are performed for the same part of an object, the server complies with the Last Write Wins policy, but the time referred in Last Write is the time when the part metadata is created. To ensure data accuracy, the client must be locked during the concurrent upload for the same part of an object. Concurrent upload for different parts of an object does not require the client to be locked.

  4. (Optional) Copy parts.

    After initiating a multipart upload, you can specify upload ID for the multipart upload and call the multipart copy API to copy part or all of the uploaded object as parts.

    If you copy the source object as a part called part1 and another part1 already exists before the copy operation, the original part1 will be overwritten by the new one after the copy operation. After the copy succeeds, only the new part1 can be listed and the original part1 will be deleted. Therefore, ensure that the original part does not exist or has no value when copying a part. Otherwise, data may be deleted by mistake. The source object does not change during the copy.

    You cannot determine whether a request is successful only based on the status_code in the returned HTTP header. If 200 is returned for status_code, the server has received the request and started to process the request. The copy is successful only when the body in the response contains ETag.

  5. (Optional) Abort the multipart upload.

    You can abort a multipart upload. After a multipart upload is aborted, the upload ID cannot be used to upload any part. Then, OBS releases the storage of all uploaded parts. If you stop an ongoing multipart upload, the uploading will still complete and the result can be successful or failed. To release the storage capacity occupied by all parts, you need to abort the multipart upload after the entire task is complete.

  6. (Optional) List parts.
    • Listing uploaded parts

      You can list the parts of a specific multipart upload task or the parts of all the multipart upload tasks in progress. Information about uploaded parts in a specific multipart upload will be returned for a request to list uploaded parts. For each request to list uploaded parts, OBS returns information about the uploaded parts in the specific multipart upload. Information about a maximum of 1,000 parts can be returned. If there are more than 1,000 parts in a multipart upload, you need to send multiple requests to list all uploaded parts. The list of uploaded parts does not include assembled parts.

      A returned list can only be used for verification. After a multipart upload is complete, the result in the list is no longer valid. However, when part numbers and the ETag values returned by OBS are uploaded, the list of your specified part numbers will be reserved.

    • Listing multipart uploads

      You can list initiated multipart uploads by listing the multipart uploads in the bucket. Initiated multipart uploads refer to the multipart uploads that are not assembled or aborted after initiation. A maximum of 1,000 multipart uploads can be returned for each request. If there are more than 1,000 multipart uploads in progress, you need to send more requests to query the remaining tasks.

  7. Assemble parts.

    OBS creates an object by assembling the parts in ascending order based on the part number. If any object metadata is provided for initiating a multipart upload task, OBS associates the metadata with the object. After the request for the multipart upload is complete, the parts will no longer exist. A part assembling request must contain the upload ID, part numbers, and a list of corresponding ETag values. An OBS response includes an ETag that uniquely identifies the combined object data. The ETag is not an MD5 hash of the object data.

    • After the multipart upload task is initiated and one or more parts are uploaded, you must assemble the parts or abort the multipart upload. Otherwise, you have to pay for the storage fee of the uploaded parts. OBS releases the storage and stops charging the storage fee only after the uploaded parts are assembled or the multipart upload is aborted.
    • If 10 parts are uploaded but only nine parts are assembled, the parts that are not assembled will be automatically deleted by the system and cannot be restored after being deleted. Before assembling the parts, adopt the API used to list the parts that have been uploaded to check all parts to ensure that no part is missed.

Permissions

You can perform multipart upload only after being granted with the permission. You can use ACLs, bucket policies, or user policies to grant users the permission. The following table lists multipart upload operations and the required permissions that can be granted by ACLs, bucket policies, or user policies.

Operation

Required Permissions

Initiate a multipart upload.

To perform this operation, you need to have the PutObject permission.

A bucket owner can grant the PutObject permission to others.

Upload parts.

To perform this operation, you need to have the PutObject permission.

Only the initiator of a multipart upload can upload parts. The bucket owner must grant the multipart upload initiator the PutObject permission so that the initiator can upload parts of the object.

Copy parts.

To perform this operation, you need to have the PutObject permission as well as the GetObject permission on the object to be copied.

Only the initiator of a multipart upload can copy parts. The bucket owner must grant the multipart upload initiator the PutObject permission so that the initiator can upload parts of the object.

Assemble parts.

To perform this operation, you need to have the PutObject permission.

Only the initiator of a multipart upload can assemble parts. The bucket owner must grant the multipart upload initiator the PutObject permission so that the initiator can complete the multipart upload.

Abort the multipart upload.

To perform this operation, you need to have the AbortMultipartUpload permission.

By default, only the bucket owner and the multipart upload initiator have this permission. In addition to the default configuration, the bucket owner can allow trustees to perform this operation. The bucket owner can also deny any trustees performing this operation.

List uploaded parts.

To perform this operation, you need to have the ListMultipartUploadParts permission.

By default, the bucket owner can list the uploaded parts of any multipart upload to the bucket. The multipart upload initiator can list the uploaded parts of a specific multipart upload.

In addition to the default configuration, the bucket owner can allow trustees to perform this operation. The bucket owner can also deny any trustees performing this operation.

List multipart uploads.

To list multipart uploads to the bucket, you need to have the ListBucketMultipartUploads permission.

In addition to the default configuration, the bucket owner can allow trustees to perform this operation.

Important Notes

  • A directory cannot be uploaded. Only one object can be uploaded at a time.
  • Each request can upload only one part, but multiple requests can be initiated at the same time.
  • If you want to upload a large number of Deep Archive objects, you can upload them in the Standard storage class and then transition them to the Deep Archive storage class through lifecycle rules to lower the costs on PUT requests.
  • If a large number of objects need to be uploaded, do not name the objects using sequential prefixes, such as timestamps or alphabetical order. Objects named with sequential prefixes may be stored in a specific storage partition. In such case, if there are a large number of access requests for the objects, the requests cannot be handled efficiently.
  • It is recommended to enable versioning to prevent objects with the same name from being overwritten. Versioning keeps every version of objects in the same bucket. You can restore any historical version at any time.
  • The checkpoint file records the status of uploads using SDKs. Ensure that you have write permissions for the checkpoint file.
  • Do not modify the verification information in the checkpoint file. If the checkpoint file is damaged, all fragments will be uploaded again.
  • If the local file changes during the upload, all fragments will be uploaded again.

Ways to Upload

You can use OBS Console, APIs, SDKs, OBS Browser+, or obsutil to upload objects.