Multi-granularity Word Segmentation

Introduction

Input a sentence, and a hierarchical structure of all words with different granularities is output.

The following figure shows the hierarchical structure of the input text after multi-granularity word segmentation. The white circle is a character node, and the blue rounded matrix is a word node.

Figure 1 Multi-granularity word segmentation

This API is free of charge and can be called twice per second.

URI

  • URI format
    POST /v1/{project_id}/nlp-fundamental/multi-grained-segment
  • Parameter description
    Table 1 URI parameters

    Parameter

    Mandatory

    Description

    project_id

    Yes

    Project ID. For details about how to obtain the project ID, see Obtaining a Project ID.

Request

Table 2 describes the request parameters.

Table 2 Request parameters

Parameter

Type

Mandatory

Description

text

String

Yes

Text to be analyzed. The text is encoded using UTF-8 and contains 1 to 64 characters.

lang

String

No

Supported text language type. Currently, Chinese (zh) and English (en) are supported. The default value is zh.

granularity

Integer

No

Segmentation granularity. 1 indicates the finest granularity, and 2 indicates the coarsest granularity. In other cases, the segmentation tree result of all granularities is returned by default.

Response

Table 3 describes the response parameters.

Table 3 Response parameters

Parameter

Type

Description

result

Array of node objects or array of strings

Word segmentation result By default, the word segmentation tree result of all granularities is returned. If the word segmentation granularity is selected, the word list of the corresponding granularity is returned.

Table 4 Data structure description of node

Parameter

Type

Description

content

String

Text content corresponding to the node, which is encoding and normalized based on the Unicode of the text

For example, the Chinese comma is mapped to the English comma.

type

String

Node type. The options are WORD (word type) and CHAR (character type).

sub_contents

Array of node objects

Subnode list

Example

  • Example request 1
    POST https://{endpoint}/v1/{project_id}/nlp-fundamental/multi-grained-segment
    
    Request Header:
        Content-Type: application/json
        X-Auth-Token: MIINRwYJKoZIhvcNAQcCoIINODCCDTQCAQExDTALBglghkgBZQMEAgEwgguVBgkqhkiG...
    
    Request Body:
       {
           "text": "Input text",
           "lang":"zh",
           "granularity":2
        }
    
  • Example Response 1
    • Successful example response
      {
        "result": [
          "Word 1",
          "Word 7",
          "Word 8",
        ]
      }
  • Example request 2
    POST https://{endpoint}/v1/{project_id}/nlp-fundamental/multi-grained-segment
    
    Request Header:
        Content-Type: application/json
        X-Auth-Token: MIINRwYJKoZIhvcNAQcCoIINODCCDTQCAQExDTALBglghkgBZQMEAgEwgguVBgkqhkiG...
    
    Request Body:
       {
           "text": "Input text",
           "lang":"zh"
        }
    
  • Example response 2
    • Successful example response
      {
        "result": [
          {
            "content": "Word 1",
            "sub_contents": [
              {
                "content": "Word 2",
                "sub_contents": [
                  {
                    "content": "Charater 1",
                    "type": "CHAR"
                  },
                  {
                    "content": "Character 2",
                    "type": "CHAR"
                  }
                ],
                "type": "WORD"
              },
              {
                "content": "Word 3",
                "sub_contents": [
                  {
                    "content": "Character 3",
                    "type": "CHAR"
                  },
                  {
                    "content": "Character 4",
                    "type": "CHAR"
                  }
                ],
                "type": "WORD"
              },
              {
                "content": "Word 4",
                "sub_contents": [
                  {
                    "content": "Word 5",
                    "sub_contents": [
                      {
                        "content": "Character 5",
                        "type": "CHAR"
                      },
                      {
                        "content": "Character 6",
                        "type": "CHAR"
                      }
                    ],
                    "type": "WORD"
                  },
                  {
                    "content": "Word 6",
                    "sub_contents": [
                      {
                        "content": "Character 7",
                        "type": "CHAR"
                      },
                      {
                        "content": "Character 8",
                        "type": "CHAR"
                      }
                    ],
                    "type": "WORD"
                  }
                ],
                "type": "WORD"
              }
            ],
            "type": "WORD"
          },
          {
            "content": "Word 7",
            "sub_contents": [
              {
                "content": "Character 9",
                "type": "CHAR"
              }
            ],
            "type": "WORD"
          },
          {
            "content": "Word 8",
            "sub_contents": [
              {
                "content": "Character 10",
                "type": "CHAR"
              },
              {
                "content": "Character 11",
                "type": "CHAR"
              }
            ],
            "type": "WORD"
          }
        ]
      }
    • Failed example response
      {
          "error_code": "NLP.0301",
          "error_msg": "the length of the text must between 1-64"
      }

Status Code

For details about status codes, see Status Code.

Error Code

For details about error codes, see Error Code.