Elasticsearch

Elasticsearch is a distributed search and analytics engine built on Apache Lucene. It is used for full-text retrieval, log analysis, real-time data query, and large-scale data aggregation.

Huawei Cloud Cloud Search Service (CSS) is a fully hosted distributed search service powered by open-source Elasticsearch. CSS links can be used to migrate log files and database records to CSS for search and analysis using Elasticsearch.

DataArts Migration supports open-source Elasticsearch which is compatible with Huawei Cloud CSS, and provides stable and efficient data integration capabilities.

Preparation and Constraints

Network requirements
The Elasticsearch data source can communicate with CDM. This ensures smooth data transmission. For details, see Enabling Network Connectivity.
Required permissions
- Huawei Cloud CSS permissions:
  - Read permission: DataArts Migration reads cluster information from CSS. You can assign the CSS ReadOnlyAccess policy or a custom read-only permission in IAM. This permission allows you to perform read operations, such as querying the cluster list, viewing cluster details, obtaining monitoring metrics, and viewing snapshot information.
  - Write permission: DataArts Migration creates or changes cluster resources in CSS. You can assign the CSS FullAccess policy or custom read and write permissions in IAM. These permissions allow all read operations.
- Open-source Elasticsearch permissions:
  - Read permission: DataArts Migration reads index data. You can assign the built-in read role and bind the role to the corresponding indexes in Elasticsearch.
  - Write permission: DataArts Migration writes, updates, and deletes documents. You can assign the write role (or index role) in Elasticsearch.
Enabling ports
Elasticsearch port (9200): TCP 9200 must be enabled so that DataArts Migration can access Elasticsearch.

Supported Data Types

The following table lists the supported Elasticsearch data types.

Category	Type	Read	Write
Character	keyword	√	√
	text	√	√
	string	√	√
Integer	short	√	√
	integer	√	√
	long	√	√
Numeric	double	√	√
Numeric	float	√	√
Boolean	boolean	√	√
Object	object	√	√
Nested type	nested	√	√
Date	date	√	√
Special type	ip	√	√
Array	string_array	√	√
	short_array	√	√
	integer_array	√	√
	long_array	√	√
	float_array	√	√
	double_array	√	√
Value range	completion	√	√

Supported Migration Scenarios

DataArts Migration supports the following modes for synchronizing on-premises data:

Single table synchronization
DataArts Migration supports table/file synchronization in data ingestion into a data lake or data migration to the cloud.
Database and table shard synchronization
DataArts Migration supports synchronization of data from multiple databases and tables in data ingestion into a data lake or data migration to the cloud.
Entire DB migration
DataArts Migrations supports synchronization of data from an on-premises database in data ingestion into a data lake or data migration to the cloud.

Database and table shard synchronization and entire DB migration are not supported in all regions. The following table lists the supported Elasticsearch migration scenarios.

Supported Migration Scenario	Single Table Read	Single Table Write	Database/Table Shard Read	Database/Table Shard Write	Entire DB Read	Entire DB Write
Supported	√	√	x	√	x	x

Core Capabilities

Connection configuration

Configuration Item

Supported

Description

Support for Secure Shell (SSL)

√

SSL encryption ensures secure data transmission. Currently, this function is not supported.

Configuration Item	Supported	Description
Support for Secure Shell (SSL)	√	SSL encryption ensures secure data transmission. Currently, this function is not supported.

Read capabilities

Configuration Item	Supported	Description
Incremental read	√	The filter condition can be configured to enable incremental read.
Shard concurrency	x	3.x and later versions support concurrent shard read, fully utilizing resources and improving read performance.
Custom fields	√	You can add computed columns, constant columns, or masking functions for tasks to meet personalized service requirements.
Dirty data processing	√	Abnormal data can be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.

Write capabilities

Configuration Item	Supported	Description
Data clearance before import	√	Data can be cleansed and processed before imported.
Conflict resolution	√	UPSERT, UPDATE, INDEX, and CREATE operations can flexibly handle data conflicts.
Concurrent write	√	Concurrent write improves efficiency.
Batch submission	√	Commit Size can be set to submit data to the server in batches.
Dirty data processing	√	Abnormal data can be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.