Updated on 2024-07-11 GMT+08:00

Import GES

Function

The Import GES node is used to import files from an OBS bucket to a GES graph.

Parameters

Table 1 and Table 2 describe the parameters of the Import GES node.

Table 1 Parameters of Import GES nodes

Parameter

Mandatory

Description

Node Name

Yes

Name of a node. The name must contain 1 to 128 characters, including only letters, numbers, underscores (_), hyphens (-), slashes (/), less-than signs (<), and greater-than signs (>).

Graph Name

Yes

You can directly select the graph to import or manually enter the graph name.

To create a GES graph, go to the GES console.

Metadata Source

Yes

Two types of metadata sources are available:

  • Existing file: Select an existing XML metadata file from an OBS bucket.
  • New: Generate an XML metadata file in an OBS bucket based on the vertex tables and edge tables in MRS Hive.
    NOTE:

    Set at least one of the following parameters: Metadata, Edge Data Set, and Vertex Data Set.

Metadata

No

Set this parameter based on the value you select for Metadata Source.

  • If you select Existing file for Metadata Source, click in the text box and select the corresponding metadata file.
  • If you select New for Metadata Source, click in the text box. In the displayed dialog box, select the vertex table and edge table in MRS Hive, enter the OBS path for storing the metadata, and click Create. Then the system automatically generates an XML metadata file and saves it to the OBS path you enter.

    The vertex table and edge table in MRS Hive are the edge data set and vertex data set normalized based on the GES graph data format. They must be consistent with the values of Edge Data Set and Vertex Data Set, respectively.

    The vertex and edge data sets must comply with the data format requirements of GES graphs. The graph data format requirements are briefed as follows. For details, see Graph Data Formats.
    • The vertex data set contains the data of each vertex. Each row is the data of a vertex. The format is as follows. id is the unique identifier of vertex data.
      id,label,property 1,property 2,property 3,...
    • The edge data set contains the data of each edge. Each row is the data of an edge. Graph specifications in GES are defined based on the edge quantity, for example, one million edges. The format is as follows. id 1 and id 2 are the IDs of the two endpoints of an edge.
      id 1, id 2, label, property 1, property 2,...
    NOTE:

    When creating metadata, note the following:

    1. You can only select a vertex table and a edge table that use a single label. If you select a vertex table or a edge table that has multiple labels, the generated metadata may be missing.
    2. The metadata XML file is generated after you click Create. If the structure of the vertex table and edge table changes during subsequent job scheduling, the metadata XML file will not be updated automatically. In this case, you need to open the New dialog box and click Create again to generate a new metadata XML file.
    3. In the generated metadata XML file, the value of Cardinality (data composite type) in Property is single and cannot be changed.
    4. You can generate metadata XML files for multiple pairs of vertex tables and edge tables at a time. However, only one table can be selected for the Edge Data Set and Vertex Data Set parameters of the Import GES node. If there are multiple pairs of vertex tables and edge tables, you are advised to create metadata XML files on multiple Import GES nodes. In this way, you can ensure that each piece of metadata corresponds to each pair of vertex tables and edge tables during the import of graph data.
    Figure 1 New

Edge Data Set

No

You can select the edge data set CSV file in the corresponding OBS bucket or select the OBS path of the edge data set.

The vertex and edge data sets must comply with the data format requirements of GES graphs. The graph data format requirements are briefed as follows. For details, see Graph Data Formats.
  • The vertex data set contains the data of each vertex. Each row is the data of a vertex. The format is as follows. id is the unique identifier of vertex data.
    id,label,property 1,property 2,property 3,...
  • The edge data set contains the data of each edge. Each row is the data of an edge. Graph specifications in GES are defined based on the edge quantity, for example, one million edges. The format is as follows. id 1 and id 2 are the IDs of the two endpoints of an edge.
    id 1, id 2, label, property 1, property 2,...

Vertex Data Set

No

You can directly select the corresponding Vertex data set or select the OBS path of the Vertex data set.

The vertex and edge data sets must comply with the data format requirements of GES graphs. The graph data format requirements are briefed as follows. For details, see Graph Data Formats.
  • The vertex data set contains the data of each vertex. Each row is the data of a vertex. The format is as follows. id is the unique identifier of vertex data.
    id,label,property 1,property 2,property 3,...
  • The edge data set contains the data of each edge. Each row is the data of an edge. Graph specifications in GES are defined based on the edge quantity, for example, one million edges. The format is as follows. id 1 and id 2 are the IDs of the two endpoints of an edge.
    id 1, id 2, label, property 1, property 2,...

Edge Processing

Yes

The edge processing supports the following modes:

  • Allow repetitive edges
  • Ignore subsequent repetitive edges
  • Overwrite previous repetitive edges

Offline

No

Whether offline import is used. The value is Yes or No, and the default value is No.

  • true: Offline import is selected. The import speed is high, but the graph is locked and cannot be read or written during the import.
  • false: Online import is selected. Online import is slower than offline import. However, during online import, the graph can be read (but cannot be written).

Ignore Labels on Repetitive Edges

No

Indicates whether to ignore labels on repetitive edges. The value is Yes or No, and the default value is Yes.

  • Yes: Indicates that the repetitive edge definition does not contain the label. That is, the <source vertex, target vertex> indicates an edge, excluding the label information.
  • No: Indicates that the repetitive edge definition contains the label. That is, the <source vertex, target vertex, label> indicates an edge.

Log Storage Path

No

Stores vertex and edge datasets that do not comply with the metadata definition, as well as detailed logs generated during graph import.

Table 2 Advanced parameters

Parameter

Mandatory

Description

Node Status Polling Interval (s)

Yes

How often the system check completeness of the node. The value ranges from 1 to 60 seconds.

Max. Node Execution Duration

Yes

Execution timeout interval for the node. If retry is configured and the execution is not complete within the timeout interval, the node will be executed again.

Retry upon Failure

Yes

Whether to re-execute a node if it fails to be executed. Possible values:

  • Yes: The node will be re-executed, and the following parameters must be configured:
    • Retry upon Timeout
    • Maximum Retries
    • Retry Interval (seconds)
  • No: The node will not be re-executed. This is the default setting.
    NOTE:

    If retry is configured for a job node and the timeout duration is configured, the system allows you to retry a node when the node execution times out.

    If a node is not re-executed when it fails upon timeout, you can go to the Default Configuration page to modify this policy.

    Retry upon Timeout is displayed only when Retry upon Failure is set to Yes.

Policy for Handling Subsequent Nodes If the Current Node Fails

Yes

Operation that will be performed if the node fails to be executed. Possible values:

  • Suspend execution plans of the subsequent nodes: stops running subsequent nodes. The job instance status is Failed.
  • End the current job execution plan: stops running the current job. The job instance status is Failed.
  • Go to the next node: ignores the execution failure of the current node. The job instance status is Failure ignored.
  • Suspend the current job execution plan: If the current job instance is in abnormal state, the subsequent nodes of this node and the subsequent job instances that depend on the current job are in waiting state.

Enable Dry Run

No

If you select this option, the node will not be executed, and a success message will be returned.

Task Groups

No

Select a task group. If you select a task group, you can control the maximum number of concurrent nodes in the task group in a fine-grained manner in scenarios where a job contains multiple nodes, a data patching task is ongoing, or a job is rerunning.