Elasticsearch
Elasticsearch is a distributed search and analytics engine built on Apache Lucene. It is used for full-text retrieval, log analysis, real-time data query, and large-scale data aggregation.
Huawei Cloud Cloud Search Service (CSS) is a fully hosted distributed search service powered by open-source Elasticsearch. CSS links can be used to migrate log files and database records to CSS for search and analysis using Elasticsearch.
DataArts Migration supports open-source Elasticsearch which is compatible with Huawei Cloud CSS, and provides stable and efficient data integration capabilities.
Preparation and Constraints
- Network requirements
The Elasticsearch data source can communicate with CDM. This ensures smooth data transmission. For details, see Enabling Network Connectivity.
- Required permissions
- Huawei Cloud CSS permissions:
- Read permission: DataArts Migration reads cluster information from CSS. You can assign the CSS ReadOnlyAccess policy or a custom read-only permission in IAM. This permission allows you to perform read operations, such as querying the cluster list, viewing cluster details, obtaining monitoring metrics, and viewing snapshot information.
- Write permission: DataArts Migration creates or changes cluster resources in CSS. You can assign the CSS FullAccess policy or custom read and write permissions in IAM. These permissions allows all read operations.
- Open-source Elasticsearch permissions:
- Read permission: DataArts Migration reads index data. You can assign the built-in read role and bind the role to the corresponding indexes in Elasticsearch.
- Write permission: DataArts Migration writes, updates, and deletes documents. You can assign the write role (or index role) in Elasticsearch.
- Huawei Cloud CSS permissions:
- Enabling ports
Elasticsearch port (9200): TCP 9200 must be enabled so that DataArts Migration can access Elasticsearch.
Supported Data Types
| Category | Type | Read | Write |
|---|---|---|---|
| Character | keyword | √ | √ |
| text | √ | √ | |
| string | √ | √ | |
| Integer | short | √ | √ |
| integer | √ | √ | |
| long | √ | √ | |
| Numeric | double | √ | √ |
| float | √ | √ | |
| Boolean | boolean | √ | √ |
| Object | object | √ | √ |
| Nested type | nested | √ | √ |
| Date | date | √ | √ |
| Special type | ip | √ | √ |
| Array | string_array | √ | √ |
| short_array | √ | √ | |
| integer_array | √ | √ | |
| long_array | √ | √ | |
| float_array | √ | √ | |
| double_array | √ | √ | |
| Value range | completion | √ | √ |
Supported Migration Scenarios
DataArts Migration supports the following modes for synchronizing on-premises data:
- Single table synchronization
DataArts Migration supports table/file synchronization in data ingestion into a data lake or data migration to the cloud.
- Database and table shard synchronization
DataArts Migration supports synchronization of data from multiple databases and tables in data ingestion into a data lake or data migration to the cloud.
- Entire DB migration
DataArts Migrations supports synchronization of data from an on-premises database in data ingestion into a data lake or data migration to the cloud.
Database and table shard synchronization and entire DB migration are not supported in all regions. The following table lists the supported Elasticsearch migration scenarios.
| Supported Migration Scenario | Single Table Read | Single Table Write | Database/Table Shard Read | Database/Table Shard Write | Entire DB Read | Entire DB Write |
|---|---|---|---|---|---|---|
| Supported | √ | √ | x | √ | x | x |
Core Capabilities
- Connection configuration
Configuration Item
Supported
Description
Support for Secure Shell (SSL)
√
SSL encryption ensures secure data transmission. Currently, this function is not supported.
- Read capabilities
Configuration Item
Supported
Description
Incremental read
√
The filter condition can be configured to enable incremental read.
Shard concurrency
x
3.x and later versions support concurrent shard read, fully utilizing resources and improving read performance.
Custom fields
√
You can add computed columns, constant columns, or masking functions for tasks to meet personalized service requirements.
Dirty data processing
√
Abnormal data can be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.
- Write capabilities
Configuration Item
Supported
Description
Data clearance before import
√
Data can be cleansed and processed before imported.
Conflict resolution
√
UPSERT, UPDATE, INDEX, and CREATE operations can flexibly handle data conflicts.
Concurrent write
√
Concurrent write improves efficiency.
Batch submission
√
Commit Size can be set to submit data to the server in batches.
Dirty data processing
√
Abnormal data can be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.
Creating a Data Source
Create a data source in Management Center. For details, see Configuring Data Connection Parameters.
Creating an Offline Data Migration Job
Create an Elasticsearch migration job in DataArts Factory. For details, see Creating an Offline Processing Migration Job.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot