Import GES

Function

Graph Engine Service (GES) facilitates query and analysis of multi-relational graph data structures.

The Import GES node is used to import files from an OBS bucket to a GES graph.

Parameters

Table 1 and Table 2 describe the parameters of the Import GES node.

**Table 1** Parameters of Import GES nodes
Parameter	Mandatory	Description
Node Name	Yes	Name of a node. The name must contain 1 to 128 characters, including only letters, numbers, underscores (_), hyphens (-), slashes (/), less-than signs (<), and greater-than signs (>).
Graph Name	Yes	You can directly select the graph to import or manually enter the graph name. To create a GES graph, go to the GES console.
Metadata Source	Yes	Two types of metadata sources are available: Existing file: Select an existing XML metadata file from an OBS bucket. New: Generate an XML metadata file in an OBS bucket based on the vertex tables and edge tables in MRS Hive. NOTE: Set at least one of the following parameters: Metadata, Edge Data Set, and Vertex Data Set.
Metadata	No	Set this parameter based on the value you select for Metadata Source. If you select Existing file for Metadata Source, click in the text box and select the corresponding metadata file. If you select New for Metadata Source, click in the text box. In the displayed dialog box, select the vertex table and edge table in MRS Hive, enter the OBS path for storing the metadata, and click Create. Then the system automatically generates an XML metadata file and saves it to the OBS path you enter. The vertex table and edge table in MRS Hive are the edge data set and vertex data set normalized based on the GES graph data format. They must be consistent with the values of Edge Data Set and Vertex Data Set, respectively. The vertex and edge data sets must comply with the data format requirements of GES graphs. The graph data format requirements are briefed as follows. For details, see Graph Data Formats. The vertex data set contains the data of each vertex. Each row is the data of a vertex. The format is as follows. id is the unique identifier of vertex data. id,label,property 1,property 2,property 3,... The edge data set contains the data of each edge. Each row is the data of an edge. Graph specifications in GES are defined based on the edge quantity, for example, one million edges. The format is as follows. id 1 and id 2 are the IDs of the two endpoints of an edge. id 1, id 2, label, property 1, property 2,... NOTE: When creating metadata, note the following: You can only select a vertex table and an edge table that uses a single label. If you select a vertex table or an edge table that has multiple labels, the generated metadata may be missing. The metadata XML file is generated after you click Create. If the structure of the vertex table and edge table changes during subsequent job scheduling, the metadata XML file will not be updated automatically. In this case, you need to open the New dialog box and click Create again to generate a new metadata XML file. In the generated metadata XML file, the value of Cardinality (data composite type) in Property is single and cannot be changed. You can generate metadata XML files for multiple pairs of vertex tables and edge tables at a time. However, only one table can be selected for the Edge Data Set and Vertex Data Set parameters of the Import GES node. If there are multiple pairs of vertex tables and edge tables, you are advised to create metadata XML files on multiple Import GES nodes. In this way, you can ensure that each piece of metadata corresponds to each pair of vertex tables and edge tables during the import of graph data. Figure 1 New
Edge Data Set	No	You can select the edge data set CSV file in the corresponding OBS bucket or select the OBS path of the edge data set. The vertex and edge data sets must comply with the data format requirements of GES graphs. The graph data format requirements are briefed as follows. For details, see Graph Data Formats. The vertex data set contains the data of each vertex. Each row is the data of a vertex. The format is as follows. id is the unique identifier of vertex data. id,label,property 1,property 2,property 3,... The edge data set contains the data of each edge. Each row is the data of an edge. Graph specifications in GES are defined based on the edge quantity, for example, one million edges. The format is as follows. id 1 and id 2 are the IDs of the two endpoints of an edge. id 1, id 2, label, property 1, property 2,...
Vertex Data Set	No	You can directly select the corresponding Vertex data set or select the OBS path of the Vertex data set. The vertex and edge data sets must comply with the data format requirements of GES graphs. The graph data format requirements are briefed as follows. For details, see Graph Data Formats. The vertex data set contains the data of each vertex. Each row is the data of a vertex. The format is as follows. id is the unique identifier of vertex data. id,label,property 1,property 2,property 3,... The edge data set contains the data of each edge. Each row is the data of an edge. Graph specifications in GES are defined based on the edge quantity, for example, one million edges. The format is as follows. id 1 and id 2 are the IDs of the two endpoints of an edge. id 1, id 2, label, property 1, property 2,...
Edge Processing	Yes	The edge processing supports the following modes: Allow repetitive edges Ignore subsequent repetitive edges Overwrite previous repetitive edges
Offline	No	Whether offline import is used. The value is Yes or No, and the default value is No. true: Offline import is selected. The import speed is high, but the graph is locked and cannot be read or written during the import. false: Online import is selected. Online import is slower than offline import. However, during online import, the graph can be read (but cannot be written).
Ignore Labels on Repetitive Edges	No	Indicates whether to ignore labels on repetitive edges. The value is Yes or No, and the default value is Yes. Yes: Indicates that the repetitive edge definition does not contain the label. That is, the <source vertex, target vertex> indicates an edge, excluding the label information. No: Indicates that the repetitive edge definition contains the label. That is, the <source vertex, target vertex, label> indicates an edge.
Log Storage Path	No	Stores vertex and edge datasets that do not comply with the metadata definition, as well as detailed logs generated during graph import.

**Table 2** Advanced parameters
Parameter	Mandatory	Description
Node Status Polling Interval (s)	Yes	How often the system check completeness of the node. The value ranges from 1 to 60 seconds.
Max. Node Execution Duration	Yes	Execution timeout interval for the node. If retry is configured and the execution is not complete within the timeout interval, the node will be executed again.
Retry upon Failure	Yes	Whether to re-execute a node if it fails to be executed. Possible values: Yes: The node will be re-executed, and the following parameters must be configured: Retry upon Timeout Maximum Retries Retry Interval (seconds) No: The node will not be re-executed. This is the default setting. NOTE: If retry is configured for a job node and the timeout duration is configured, the system allows you to retry a node when the node execution times out. If a node is not re-executed when it fails upon timeout, you can go to the Default Configuration page to modify this policy. Retry upon Timeout is displayed only when Retry upon Failure is set to Yes.
Policy for Handling Subsequent Nodes If the Current Node Fails	Yes	Operation that will be performed if the node fails to be executed. Possible values: Suspend execution plans of the subsequent nodes: stops running subsequent nodes. The job instance status is Failed. End the current job execution plan: stops running the current job. The job instance status is Failed. Go to the next node: ignores the execution failure of the current node. The job instance status is Failure ignored. Suspend the current job execution plan: If the current job instance is in abnormal state, the subsequent nodes of this node and the subsequent job instances that depend on the current job are in waiting state.
Enable Dry Run	No	If you select this option, the node will not be executed, and a success message will be returned.
Task Groups	No	Select a task group. If you select a task group, you can control the maximum number of concurrent nodes in the task group in a fine-grained manner in scenarios where a job contains multiple nodes, a data patching task is ongoing, or a job is rerunning.

Parent topic: Node Reference

Previous topic: Rest Client

Next topic: MRS Kafka

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot