Bucket Inventories

Scenarios

A bucket inventory can list objects in a bucket, save object information in CSV files, and deliver the CSV files to the bucket specified for storing bucket inventory files. In this manner, you can easily manage objects in a bucket. A source bucket can also be the destination bucket.

A bucket inventory file can contain the following object related information: versions, sizes, storage classes, tags, encryption statuses, and last modification time.
You can encrypt bucket inventory files in SSE-KMS mode.
You can set the frequency (daily or weekly) for generating bucket inventory files.
You can also specify a bucket to store the generated bucket inventory files.

Constraints

Bucket versions

Inventories can be generated only for OBS 3.0 buckets, but they can be stored in either OBS 3.0 or OBS 2.0 buckets.

Maximum number of bucket inventories

A bucket can have a maximum of 10 inventories.

Source and destination buckets

The source bucket (for which a bucket inventory rule is configured) and the destination bucket (where the generated inventory files are stored) must belong to the same account.
The source and destination buckets must be in the same region.
The destination bucket cannot have server-side encryption enabled.

Functions

Inventory files must be in the CSV format.
Inventories can apply to all objects in a bucket or a set of objects with the same name prefix.
Inventory rules in the same bucket cannot overlap.
- If there is already an inventory rule for all objects in the bucket, any other inventory rule with an object name prefix specified cannot be created. To create a rule for only a set of objects, first delete the inventory rule configured for all objects.
- If there is an inventory rule for a set of objects, a rule for all objects in the bucket cannot be created. To create a rule for all objects, first delete all inventory rules that match objects by prefix.
- If a bucket already has an inventory rule that filters objects by the object name prefix ab, the filter of a new inventory rule cannot start with a or abc. To create such a rule, you need to first delete the existing inventory rule that conflicts with the rule you will create.
Only SSE-KMS can be used to encrypt bucket inventories.

Permissions

Inventory files are uploaded to the destination bucket by an OBS system user, so you must grant this user the write permission for the bucket. That is, the destination bucket must contain a policy with the value of {"Service": "obs"} for Principal. For details, see 1.

Others

Bucket inventories are free, but the storage they occupy is billed.
The bucket inventory function is not available for federated users.

Configuring a Bucket Inventory

Before the configuration, you need to understand what are source and destination buckets.

Source bucket: A source bucket is the bucket for which an inventory is configured. The inventory lists objects stored in the source bucket.
Destination bucket: A destination bucket is where generated inventory files are stored. A source bucket can also be the destination bucket. You can specify a name prefix for an inventory. Then generated inventory files will be named with the prefix and saved in the directory with the prefix. If you do not specify any name prefix for the inventory, the generated inventory files are stored in the root directory of the bucket.
- Restrictions on the destination bucket
  - The destination bucket must belong to the same account as the source bucket.
  - The destination bucket must be in the same region as the source bucket.
  - A bucket policy must be configured to grant OBS the permission to write objects to the destination bucket. For details, see Add a bucket policy for the destination bucket.
- The destination bucket contains the following files:
  - A list of inventory files
  - The Manifest file that contains the list of all inventory files under a certain inventory configuration. For details about the Manifest file, see Manifest File.

Procedure

You can use OBS Console or make a REST API call to configure a bucket inventory. If you use the OBS Console, a bucket policy is automatically generated for the destination bucket. If you call the REST API, you need to manually configure the bucket policy for the destination bucket.

Add a bucket policy for the destination bucket.
A bucket policy must be configured for the destination bucket, to grant the OBS system users the permission to write objects to the destination bucket. The following lists an example bucket policy. Replace destbucket with the actual name of the destination bucket.
```
{
	"Statement": [
		{
			"Effect": "Allow",
			"Sid": "1",
			"Principal": {"Service": "obs"},
                        "Resource": ["destbucket/*"],
			"Action": ["PutObject"]
		}
	]
}
```

Configure a bucket inventory.
There are multiple tools to configure a bucket inventory. For details, see Bucket Inventories.

Content in an Inventory File

You can configure the content in an inventory file when creating the inventory. For details about all possible fields, see Table 1.

**Table 1** Object metadata listed in an inventory file
Metadata	Description
Bucket	The name of the source bucket
Key	The name of an object. Each object in a bucket has a unique key. (Object names in the inventory file are URL-encoded using UTF-8 character set and can be used only after being decoded.)
VersionId	The version ID of an object. If the value of IncludedObjectVersions in the inventory configuration is Current, this field is not included in the inventory file.
IsLatest	Whether the object version is the latest. If it is, the value of this field is True. This field is not included in the inventory file if IncludedObjectVersions in the inventory configuration is set to Current.
IsDeleteMarker	Whether versioning is enabled for the source bucket. If it is, deleting an object will create a new piece of object metadata and set IsDeleteMarker of the metadata to true. This field is not included in the inventory file if IncludedObjectVersions in the inventory configuration is set to Current.
Size	The object size, in bytes
LastModifiedDate	When an object was created or last modified.
ETag	Hexadecimal digest of the object MD5. ETag is the unique identifier of the object content. It can be used to identify whether the object content is changed. For example, if the ETag value is A when an object is uploaded but changes to B when the object is downloaded, it means that the object content has been changed.
StorageClass	The storage class of an object STANDARD: the Standard storage class STANDARD_IA (also WARM): the Infrequent Access storage class COLD: the Archive storage class DEEP_ARCHIVE: the Deep Archive storage class
IsMultipartUploaded	Whether an object is uploaded in the multipart mode.
ReplicationStatus	The cross-region replication status of an object
EncryptionStatus	The encryption status of an object

Inventory File Name

The name of an inventory file is in the following format:

destinationPrefix/sourceBucketName/inventoryId/yyyy-MM-dd'T'HH-mm'Z'/files/UUID_index.csv

destinationPrefix indicates the prefix specified in the inventory configuration, which can be used to group inventory files. If no prefix is specified, the default prefix is BucketInventory.
sourceBucketName indicates the source bucket for which the inventory is configured. This field can prevent conflicts when inventory files of different source buckets are saved to the same destination bucket.
inventoryId can prevent conflicts when multiple inventory files of the same source bucket are sent to the same destination bucket.
yyyy-MM-dd'T'HH-mm'Z' indicates the start time and date when the inventory generation begins scanning the bucket. Objects uploaded to the source bucket after this time may not be listed in the inventory file.
UUID_index.csv indicates one of the inventory files.

Manifest File

If there are a large number of objects in a bucket, multiple inventory files may be generated for a single inventory configuration. It takes some time to generate these files. For example, if there are 200,000 objects in a bucket, it will take about 1.5 minutes to generate all inventory files. One or two hours after all inventory files are generated, a manifest.json file will be generated. The manifest.json file contains information about all inventory files generated this time, including:

sourceBucket: name of the source bucket
destinationBucket: name of the destination bucket
version: version of the inventory
fileFormat: format of inventory files
fileSchema: object metadata fields contained in the inventory files
files: list of all inventory files
key: inventory file name
size: size of an inventory file, in bytes
inventoriedRecord: number of records contained in an inventory file

The following is an example of a simple manifest.json file.

{
        "sourceBucket":"user001",
        "destinationBucket":"bucket001",
        "version":"2019-01-03",
        "fileFormat":"CSV",
        "fileSchema":"Bucket,Key,Size,LastModifiedDate,ETag,StorageClass,IsMultipartUploaded,ReplicationStatus,EncryptionStatus",
        "files":[
                {
                        "key":"inventory/user001/test_id/2019-01-03T12-28Z/files/0000016813AF58E66806C1E2D7F15155_1.csv",
                        "size":6705647390,
                        "inventoriedRecord":70585762,
                }
        ]
}

The name of a manifest file is as follows (for details about each field, see Inventory File Name):

destinationPrefix/sourceBucketName/inventoryId/yyyy-MM-dd'T'HH-mm'Z'/manifest.json

symlink.txt File

The symlink.txt file records the path of an inventory file. It helps quickly find all inventory files in big data scenarios. The symlink.txt file is Apache Hive-compatible. Hive can automatically discover the symlink.txt file and the inventory files listed within it. symlink.txt files are sorted by object name in lexicographical order.

The name of the symlink.txt file is as follows (for details about each field, see Inventory File Name):

destinationPrefix/sourceBucketName/inventoryId/hive/dt=YYYY-MM-DD-00-00/symlink.txt

Ways to Configure a Bucket Inventory

You can use OBS Console, APIs, or SDKs to configure a bucket inventory.

Using OBS Console

In the navigation pane of OBS Console, choose Object Storage.
In the bucket list, click the bucket you want to operate. The Objects page is displayed.
In the navigation pane, choose Data Management > Inventories. The inventory list is displayed.
Click Create. The Create Inventory dialog box is displayed.

Figure 1 Inventory settings

Configure required parameters.

**Table 2** Parameters for configuring a bucket inventory
Parameter	Description
Inventory Name	Name of a bucket inventory
Filter	Filter of an inventory. You can enter an object name prefix for OBS to create an inventory for objects with the specified prefix. Currently, only a prefix can be used as a filter. If the filter is not specified, the inventory covers all objects in the bucket. If a bucket has multiple inventories, their filters cannot overlap with each other.
Save Inventory Files To	Select a bucket (destination bucket) for saving generated inventory files. This bucket must be in the same region as the source bucket.
Inventory File Name Prefix	Prefix of the inventory file path. An inventory file will be saved in the following path: Inventory file name prefix/Source bucket name/Inventory name/Date and time/files/. If this parameter is not specified, OBS automatically adds BucketInventory as the prefix to inventory file's path.
Frequency	How frequently inventory files are generated. It can be set to Daily or Weekly.
Status	Inventory status. You can enable or disable the generation of inventories.

Click Next to go to the Configure Report page.

Figure 2 Configuring the report

Configure the report.

**Table 3** Report related parameters
Parameter	Description
Inventory Format	Inventory files can only be saved in CSV format.
Object Versions	Object versions that you want to list in an inventory file. It can be set to Current version only or Include all versions.
Optional Fields	Object information fields that can be contained in an inventory file, including Size, Last modified date, Storage class, ETag, Multipart upload, Encryption status, and Replication status. For details about the fields, see Metadata in an Inventory File.

Click Next to confirm the bucket policy.

OBS then automatically creates a bucket policy on the destination bucket to grant OBS permission to write inventory files to the bucket.
Click OK.