Introduction to Data Import to Doris Clusters
The load function is used to import the raw data to Doris. After the import is complete, you can query data on the MySQL client. Doris provides a variety of data import methods.
Supported Data Sources
You can select different data import methods for different data sources.
Supported Data Formats
Different import methods support different data formats.
Import Method |
Supported Format |
---|---|
Broker Load |
parquet, orc, and obs |
Stream Load |
csv, json, parquet, and orc |
Import Instructions
The data import implementation of Doris has the following common features, which are introduced here to help you better use the data import function.
Import Atomicity Guarantee
Each import job of Doris, whether it is batch import using Broker Load or single import using the INSERT statement, is a complete transaction operation. The import transaction can ensure that the data in a batch takes effect atomically, and there will be no partial data write.
At the same time, an import job will have a label. This label is used to uniquely identify an import job in a database. Labels can be specified users, and some import functions are automatically generated by the system.
A label is used to ensure the success import of the corresponding import job and can be used only once. A successfully imported label, when used again, will be rejected with the error Label already used. Through this mechanism, At-Most-Once semantics can be implemented in Doris. In combination with the At-Least-Once semantics of the upstream system, the Exactly-Once semantics of imported data can be achieved.
Synchronous and Asynchronous Imports
Import methods are divided into synchronous and asynchronous imports. If an external program accesses the import function of Doris, you need to determine the import method and then determine the access logic.
- Synchronous import
Doris executes the import synchronously when you create an import job and returns the import result after the import is complete. You can check whether the import is successful based on the output of the import creation command.
- Asynchronous import
Doris directly returns a creation success message after you create an import job. But successful creation does not mean that the data has been imported. The import job is executed asynchronously. After an import job is created, you need to send a query command in polling mode to view the status of the import job. If the creation fails, you can determine whether to create the job again based on the failure information.
Both two methods should not endlessly retry after Doris returns an import failure or an import job creation failure. After the external system retries for a limited number of times and fails, the failure information is retained. Most of the failures are caused by incorrect usage methods or data skew.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot