Help Center/ DataArts Lake Formation/ API Reference/ API/ LakeCat/ Partition Statistics/ Obtaining Column Statistics in a Partition in Batches
Updated on 2024-02-21 GMT+08:00

Obtaining Column Statistics in a Partition in Batches

Function

This API is used to obtain column statistics in a partition in batches.

URI

POST /v1/{project_id}/instances/{instance_id}/catalogs/{catalog_name}/databases/{database_name}/tables/{table_name}/partitions/column-statistics/batch-get

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Project ID. For how to obtain the project ID, see Obtaining a Project ID (lakeformation_04_0026.xml).

instance_id

Yes

String

LakeFormation instance ID. The value is automatically generated when the instance is created, for example, 2180518f-42b8-4947-b20b-adfc53981a25.

catalog_name

Yes

String

Catalog name. The value should contain 1 to 256 characters. Only letters, numbers, and underscores (_) are allowed.

database_name

Yes

String

Database name. The value should contain 1 to 128 characters. Only letters, numbers, hyphens (-), and underscores (_) are allowed.

table_name

Yes

String

Table name. The value should contain 1 to 256 characters. Only letters, numbers, hyphens (-), and underscores (_) are allowed.

Request Parameters

Table 2 Request header parameters

Parameter

Mandatory

Type

Description

X-Auth-Token

Yes

Array of strings

Tenant token.

Table 3 Request body parameters

Parameter

Mandatory

Type

Description

aggregate_statics

Yes

Boolean

Whether to aggregate and return statistics.

column_names

Yes

Array of strings

Name of the column that contains the statistics.

partition_values_list

Yes

Array<Array<String>>

List of partition values to be collected.

Response Parameters

Status code: 200

Table 4 Response body parameters

Parameter

Type

Description

found_partition_number

Integer

Number of partitions.

column_statistics

Map<String,Array<ColumnStatisticsObj>>

List of partition statistics.

Table 5 ColumnStatisticsObj

Parameter

Type

Description

column_name

String

Column name. The value can contain 1 to 767 characters. Only letters, digits, and special characters (_-+*(),) are allowed.

column_type

String

Data type, including array, bigint, binary, boolean, char, date, decimal, double, float, int, interval, map, set, smallint, string, struct, timestamp, tinyint, union, and varchar.

data_type

String

Statistics type, including binaryStats, booleanStats, dateStats, decimalStats, doubleStats, longStats, and stringStats.

Enumeration values:

  • binaryStats

  • booleanStats

  • dateStats

  • decimalStats

  • doubleStats

  • longStats

  • stringStats

binary_statistics_data

BinaryColumnStatisticsData object

Statistics on byte arrays.

long_statistics_data

LongColumnStatisticsData object

Statistics on long integers.

decimal_statistics_data

DecimalColumnStatisticsData object

Statistics on decimal values.

string_statistics_data

StringColumnStatisticsData object

Statistics on strings.

double_statistics_data

DoubleColumnStatisticsData object

Statistics on floating point numbers.

date_statistics_data

DateColumnStatisticsData object

Statistics on date values.

boolean_statistics_data

BooleanColumnStatisticsData object

Statistics on Boolean data.

Table 6 BinaryColumnStatisticsData

Parameter

Type

Description

maximum_length

Long

Maximum value of a byte array in a column.

average_length

Double

Average length of byte arrays in a column.

number_of_null

Long

Number of null values in a column.

Table 7 LongColumnStatisticsData

Parameter

Type

Description

minimum_value

Long

Minimum long integer value in a column.

maximum_value

Long

Maximum long integer value in a column.

number_of_null

Long

Number of null values in a column.

number_of_distinct_value

Long

Number of long integer values in a column after deduplication.

bit_vector

String

Bitmap used for estimating unique values.

Table 8 DecimalColumnStatisticsData

Parameter

Type

Description

minimum_value

Decimal object

Minimum decimal value in a column.

maximum_value

Decimal object

Maximum decimal value in a column.

number_of_null

Long

Number of null values in a column.

number_of_distinct_value

Long

Number of decimal values in a column after deduplication.

bit_vector

String

Bitmap used for estimating unique values.

Table 9 Decimal

Parameter

Type

Description

scale

Integer

Integer part.

unscaled

String

Decimal part.

Table 10 StringColumnStatisticsData

Parameter

Type

Description

average_length

Double

Average length of strings in a column.

maximum_length

Long

Maximum length of strings in a column.

number_of_null

Long

Number of null values in a column.

number_of_distinct_value

Long

Number of strings after deduplication in a column.

bit_vector

String

Bitmap used for estimating unique values.

Table 11 DoubleColumnStatisticsData

Parameter

Type

Description

minimum_value

Double

Minimum floating point number in a column.

maximum_value

Double

Maximum floating point number in a column.

number_of_null

Long

Number of null values in a column.

number_of_distinct_value

Long

Number of floating point numbers after deduplication in a column.

bit_vector

String

Bitmap used for estimating unique values.

Table 12 DateColumnStatisticsData

Parameter

Type

Description

minimum_value

String

Minimum timestamp in a column.

maximum_value

String

Maximum timestamp in a column.

number_of_null

Long

Number of null values in a column.

number_of_distinct_value

Long

Number of timestamps after deduplication in a column.

bit_vector

String

Bitmap used for estimating unique values.

Table 13 BooleanColumnStatisticsData

Parameter

Type

Description

number_of_true

Long

Number of real records in a column.

number_of_false

Long

Number of false records in a column.

number_of_null

Long

Number of empty records in a column.

Status code: 400

Table 14 Response body parameters

Parameter

Type

Description

error_code

String

Error code.

error_msg

String

Error message.

solution_msg

String

Solution.

Status code: 404

Table 15 Response body parameters

Parameter

Type

Description

error_code

String

Error code.

error_msg

String

Error message.

solution_msg

String

Solution.

Status code: 500

Table 16 Response body parameters

Parameter

Type

Description

error_code

String

Error code.

error_msg

String

Error message.

solution_msg

String

Solution.

Example Requests

POST https://{endpoint} /v1/{project_id}/instances/{instance_id}/catalogs/{catalog_name}/databases/{database_name}/tables/{table_name}/partitions/column-statistics/batch-get

{
  "aggregate_statics" : false,
  "column_names" : [ "column1", "column2" ],
  "partition_values_list" : [ [ "value1", "value2" ] ]
}

Example Responses

Status code: 200

OK

{
  "found_partition_number" : 1,
  "column_statistics" : {
    "part1=value1/part2=value2" : [ {
      "column_name" : "columnName",
      "column_type" : "bigint",
      "data_type" : "longStats",
      "long_statistics_data" : {
        "minimum_value" : 10,
        "maximum_value" : 1000,
        "number_of_null" : 30,
        "number_of_distinct_value" : 20
      }
    } ]
  }
}

Status code: 400

Bad Request

{
  "error_code" : "common.01000001",
  "error_msg" : "failed to read http request, please check your input, code: 400, reason: Type mismatch., cause: TypeMismatchException"
}

Status code: 401

Unauthorized

{
  "error_code": 'APIG.1002',
  "error_msg": 'Incorrect token or token resolution failed'
}

Status code: 403

Forbidden

{
  "error" : {
    "code" : "403",
    "message" : "X-Auth-Token is invalid in the request",
    "error_code" : null,
    "error_msg" : null,
    "title" : "Forbidden"
  },
  "error_code" : "403",
  "error_msg" : "X-Auth-Token is invalid in the request",
  "title" : "Forbidden"
}

Status code: 404

Not Found

{
  "error_code" : "common.01000001",
  "error_msg" : "response status exception, code: 404"
}

Status code: 408

Request Timeout

{
  "error_code" : "common.00000408",
  "error_msg" : "timeout exception occurred"
}

Status code: 500

Internal Server Error

{
  "error_code" : "common.00000500",
  "error_msg" : "internal error"
}

Status Codes

Status Code

Description

200

OK

201

Created

400

Bad Request

401

Unauthorized

403

Forbidden

404

Not Found

408

Request Timeout

500

Internal Server Error

Error Codes

See Error Codes.