Updated on 2024-01-03 GMT+08:00

Static Graph

Before importing graph data, familiarize yourself with the graph data formats supported by GES.

  • GES only supports the loading of raw graph data in the standard CSV format. If your raw data is not in this format, convert it to CSV.
  • GES graph data consists of the vertex, edge, and metadata files.
    • Vertex files store vertex data.
    • Edge files store edge data.
    • Metadata is used to describe the formats of data in vertex and edge files.

Concept Description

Graph data is imported through a property graph model in GES, so you must learn the concept of the property graph.

A property graph is a directed graph consisting of vertices, edges, labels, and properties.

  • A vertex is also called a node, and an edge is also called a relationship. Nodes and relationships are the most important entities.
  • Metadata describes vertex and edge properties. It contains multiple labels, and each label consists of one or more properties.
  • Vertices with the same label belong to a group or a set.
  • Each vertex or edge can have only one label.

Metadata

The following figure shows the metadata structure.

Figure 1 Metadata structure

GES metadata is stored in an XML file and is used to define vertex and edge properties.

It contains labels and properties.

  • Label

    A label is a collection of properties. It describes formats of property data contained within a vertex or an edge.

    If the same property name is defined in different labels, the cardinality and dataType of the properties in different labels must be the same.

  • Property

    A property refers to the data format of a single property and contains three fields.

    • name: Indicates the name of a property. It contains 1 to 256 characters and cannot contain special characters such as angle brackets (<>) and ampersands (&).

      A label cannot contain two properties with the same name.

    • cardinality: Indicates the composite type of data. Possible values are single, list, and set.
      • single indicates that the data of this property has a single value, such as a digit or a character string.

        If value1;value2 is of the single type, it is regarded as a single value.

      • list and set indicate that data of this property consists of multiple values separated by semicolons (;).
        • list: The values are placed in sequence and can be repeated. For example, 1;1;1 contains three values.
        • set: The values are in random sequence and must be unique. Duplicate values will be overwritten. For example, 1;1;1 contains only one value (1).

        list and set do not support values of the char array data type.

    • dataType: Indicates the data type of the property values. The following table lists the data types supported by GES.
      Table 1 Supported data types

      Type

      Description

      char

      Character

      char array

      Fixed-length string. Set the maximum length using the maxDataSize parameter.

      NOTE:
      • You can set maxDataSize to limit the maximum length of the string. For details, see Metadata structure.
      • Only single supports the data type.
      • If the property data is a string, you are advised to set dataType to char array. If the data type is set to string, the import is slower.

      float

      Float type (32-bit float)

      double

      Double floating point type (64-bit float point)

      bool

      Boolean type. Available values are 0/1 and true/false.

      long

      Long integer (value range: -2^63 to 2^63-1)

      int

      Integer (value range: -2^31 to 2^31-1)

      date

      Date. Currently, the following formats are supported:

      • YYYY-MM-DD HH:MM:SS
      • YYYY-MM-DD
      NOTE:

      The value of MM or DD must consist of two digits. If the day or month number contains only one digit, add 0 before it, for example, 05/01.

      enum

      Enumeration. Specify the number of the enumerated values and the name of each value. For details, see Metadata structure.

      string

      Variable-length string

      NOTE:

      The data import efficiency can be very low if the string is too long. You are advised to use a char array instead.

      You can set the length of a char array as needed. It is recommended that the length be less than or equal to 32 characters.

The following figure shows a metadata example:

<?xml version="1.0" encoding="ISO-8859-1"?>
<PMML version="3.0"
  xmlns="http://www.dmg.org/PMML-3-0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema_instance" >
  <labels>
    <label name="default">
    </label>
    <label name="movie">
        <properties>
            <property name="movieid" cardinality="single" dataType="int" />
            <property name="title" cardinality="single" dataType="string"/>
            <property name="genres" cardinality="single" dataType="string"/> 
        </properties>
    </label>
	<label name="user">
        <properties>
            <property name="userid" cardinality="single" dataType="int" />
            <property name="gender" cardinality="single" dataType="string"/>
            <property name="age" cardinality="single" dataType="enum" typeNameCount="7" 
			typeName1="Under 18" typeName2="18-24" typeName3="25-34" typeName4="35-44" typeName5="45-49"
			 typeName6="50-55" typeName7="56+"/> 
			<property name="occupation" cardinality="single" dataType="enum" typeNameCount="21" 
			typeName1="other or not specified" typeName2="academic/educator" typeName3="artist" typeName4="clerical/admin" typeName5="college/grad student"
			 typeName6="customer service" typeName7="doctor/health care" typeName8="executive/managerial" typeName9="farmer" typeName10="homemaker"
			  typeName11="K-12 student" typeName12="lawyer" typeName13="programmer" typeName14="retired" typeName15="sales/marketing"
			   typeName16="scientist" typeName17="self-employed" typeName18="technician/engineer" typeName19="tradesman/craftsman" typeName20="unemployed"
			    typeName21="writer"/>
			<property name="Zip-code" cardinality="single" dataType="char array" maxDataSize="12"/>
        </properties>
    </label>
	<label name="rate">
        <properties> 
            <property name="Rating" cardinality="single" dataType="int" />
            <property name="Datetime" cardinality="single" dataType="string"/>
        </properties>
    </label>  
</labels>
</PMML>

Vertex Files

A vertex file contains the data of each vertex. A vertex of data is generated for each behavior. The following is an example. id is the unique identifier of a set of vertex data.

id, label, property 1, property 2, property 3, ...
  • The vertex ID cannot contain hyphens (-).
  • You do not need to set the data type of the vertex ID. It is of the string type by default.
  • Do not add spaces before or after a label. Use commas (,) to separate information. If a space is identified as a part of a label, the label may fail to be identified. In this case, the system may display a message indicating that the label does not exist.

Example:

Vivian, user, Vivian, F, 25-34, artist, 98133
Eric, user, Eric, M, 18-24, college/grad student, 40205

Edge Files

An edge file contains the data of each edge. An edge of data is generated for each behavior. The graph size in GES is defined by the quantity level of the edges, for example, one million edges. The following is an example. id 1 and id 2 are the IDs of the two endpoints (vertices) of an edge.

id 1, id 2, label, property 1, property 2, ...

Example:

Eric,Lethal Weapon,rate,4,2000-11-21 15:33:18
Vivian,Eric,friends