Updated on 2025-07-04 GMT+08:00

NLP

The NLP connector is used to connect to the Huawei Cloud NLP service.

Natural Language Processing (NLP) is a cloud service pertained to enterprises and developers for efficient text mining and analysis.

Creating an NLP Connection

  1. Log in to the new ROMA Connect console.
  2. In the navigation pane on the left, choose Connector. On the page displayed, click New Connection.
  3. Select the NLP connector.
  4. In the dialog box displayed, configure the connector and click OK.

    Parameter

    Description

    Name

    Enter the connector instance name.

    App Key

    Access key ID (AK) of the current account. Obtain the AK by referring to Access Keys. If an AK/SK pair has been generated, find the downloaded AK/SK file (such as credentials.csv).

    App Secret

    Secret access key (SK) of the current account. Obtain the SK by referring to Access Keys. If an AK/SK pair has been generated, find the downloaded AK/SK file (such as credentials.csv).

    Description

    Enter the description of the connector to identify it.

Action

  • Text Similarity (Advanced Edition)
  • NER (Domain-specific edition)
  • Multi-granularity Word Segmentation
  • Document Translation Job Status Query
  • Document Translation
  • Language Recognition
  • Text Translation
  • Intent Understanding
  • Document Categorization
  • Entity-Based Sentiment Analysis
  • Aspect-Based Sentiment Analysis (Advanced Edition)
  • Aspect-based Sentiment Analysis
  • Sentiment Analysis (Domain-specific Edition)
  • Text Classification
  • Text Summarization (Domain-specific Edition)
  • Constituency Syntax Analysis
  • Poem Generation

Configuration Parameters

Table 1 Text Similarity (Advanced Edition)

Parameter

Description

project_id

Project ID.

region_id

Region ID.

text1

Text 1, on which the text similarity is to be computed. The text is encoded using UTF-8 and contains 1 to 512 characters.

text2

Text 2, on which the text similarity is to be computed. The text is encoded using UTF-8 and contains 1 to 512 characters.

lang

Supported language type. Currently, only Chinese is supported. The default value is zh.

Table 2 NER (domain-specific edition)

Parameter

Description

project_id

Project ID.

region_id

Region ID.

text

Text to be analyzed. The text is encoded using UTF-8 and contains 1 to 512 characters.

lang

Supported language type. Currently, only Chinese is supported. The default value is zh.

domain

Supported domain type. The value can be general (default), business, or entertainment.

Table 3 Multi-granularity word segmentation

Parameter

Description

project_id

Project ID.

region_id

Region ID.

text

Text to be segmented. The text must be encoded in UTF-8 and can contain 1 to 64 characters.

lang

Supported language type. Currently, only Chinese is supported. The default value is zh.

granularity

Segmentation granularity. 1 indicates the finest granularity, and 2 indicates the coarsest granularity. In other cases, the segmentation tree result of all granularities is returned by default.

Table 4 Document translation job status query

Parameter

Description

project_id

Project ID.

job_id

Document translation job ID, which can be obtained by calling the document translation job creation API.

region_id

Region ID.

Table 5 Document translation

Parameter

Description

project_id

Project ID.

region_id

Region ID.

url

Path of the document stored in OBS. For private files, you are advised to use a temporary authorization URL to call the service. For details about how to obtain the OBS file URL and temporary authorization URL, see Configuring the Access Permission of OBS. The region of OBS must be the same as that of the requested service. Otherwise, OBS is unavailable, even if it allows public access.

from

Source language. Currently, Chinese and English are supported.

to

Target language. Currently, Chinese and English are supported.

type

Document format. Currently, docx, pptx, and txt files can be translated.

Table 6 Language recognition

Parameter

Description

project_id

Project ID.

region_id

Region ID.

text

The text whose language needs to be recognized, which must be encoded in UTF-8 and can contain a maximum of 2,000 characters.

Table 7 Text translation

Parameter

Description

project_id

Project ID.

region_id

Region ID.

text

The text to be translated, which must be encoded in UTF-8 and can contain a maximum of 2,000 characters.

from

Source language to be translated. Supported languages: Arabic (ar), German (de), Russian (ru), French (fr), Korean (ko), Portuguese (pt), Japanese (ja), and Thai (th). Türkiye (tr); Spanish (es); English (en); Vietnamese (vi); simplified Chinese (zh); traditional Chinese (zh-tw).

The system automatically detects the input language and translates it to the target language. You need to specify the target language.

to

Target language to be translated. Supported languages: Arabic (ar), German (de), Russian (ru), French (fr), Korean (ko), Portuguese (pt), Japanese (ja), and Thai (th). Türkiye (tr); Spanish (es); English (en); Vietnamese (vi); simplified Chinese (zh); traditional Chinese (zh-tw).

scene

The default value is common. Currently, only common scenarios are supported.

Table 8 Intent understanding

Parameter

Description

project_id

Project ID.

region_id

Region ID.

lang

Supported language type. Currently, only Chinese is supported. The default value is zh.

text

Text list to be analyzed. The text is encoded in UTF-8 and the value contains a maximum of 32 characters. If the value exceeds 32 characters, only the first 32 characters are detected.

Table 9 Document categorization

Parameter

Description

project_id

Project ID.

region_id

Region ID.

conten

Document you provide. This API can process a maximum of 10,000 characters at once. If your document exceeds 10,000 characters, only the first 10,000 characters are detected.

lang

Supported language type. Currently, only Chinese is supported. The default value is zh.

Table 10 Entity-based sentiment analysis

Parameter

Description

project_id

Project ID.

region_id

Region ID.

conten

Request text. The text must be encoded in UTF-8. Only Chinese is supported currently. The content and entity in total cannot exceed 512 characters. Otherwise, only the first 512 characters are detected.

entity

Request entity. The text must be encoded in UTF-8. Only Chinese is supported currently. The content and entity in total cannot exceed 512 characters. Otherwise, only the first 512 characters are detected.

type

Value:

3: Finance

Table 11 Aspect-based sentiment analysis (advanced edition)

Parameter

Description

project_id

Project ID.

region_id

Region ID.

conten

Text to be analyzed. The text must be encoded in UTF-8. Only Chinese is supported currently. The text can contain a maximum of 4,096 characters. A length of 300 characters is recommended.

type

Values:

  • 1: Mobile phone
  • 2: Automobile
Table 12 Aspect-based sentiment analysis

Parameter

Description

project_id

Project ID.

region_id

Region ID.

conten

Text to be analyzed. The text must be encoded in UTF-8. Only Chinese is supported currently. The text can contain a maximum of 4,096 characters. A length of 300 characters is recommended.

type

Value:

1: Mobile phone

Table 13 Sentiment analysis (domain-specific edition)

Parameter

Description

project_id

Project ID.

region_id

Region ID.

conten

Text to be analyzed. The text must be encoded in UTF-8. Only Chinese text sentiment analysis is supported. If type is set to 1 (e-commerce), the value contains a maximum of 200 characters. If the value exceeds 200 characters, only the first 200 characters are detected. If type is set to 2 (automotive), the value contains a maximum of 400 characters. If the value exceeds 400 characters, only the first 400 characters are detected.

type

Values:

  • 0: domain adaptation. The system automatically identifies the domain based on the input content.
  • 1: e-commerce, which can be applied as comments in the e-commerce industry.
  • 2: automotive, which can be applied as comments in the automotive industry.
Table 14 Text classification

Parameter

Description

project_id

Project ID.

region_id

Region ID.

conten

Text to be analyzed. The text must be encoded in UTF-8. The value contains a maximum of 400 characters. If the value exceeds 400 characters, only the first 400 characters are detected.

domain

Value:

1: advertisement detection

Table 15 Text summarization (domain-specific edition)

Parameter

Description

project_id

Project ID.

region_id

Region ID.

length_limit

Length limit of the generated summary. If the value of length_limit is greater than 1, the length of the returned summary is greater than or equal to, as well as closest to the value. If the value of length_limit ranges from 0 to 1, the length percentage of the returned summary is greater than or equal to, as well as closest to the value.

title

Text title, which must be encoded in UTF-8 and cannot exceed 1,000 characters.

lang

Supported language type. Currently, only Chinese is supported. The default value is zh.

conten

Text body. Currently, the title is encoded using UTF-8. The text length cannot exceed 10,000 characters.

type

Domain type. Value:

0 (default): General domain. Currently, only the general domain is supported.

Table 16 Constituency syntax analysis

Parameter

Description

project_id

Project ID.

region_id

Region ID.

lang

Supported language type. Currently, only Chinese is supported. The default value is zh.

text

Text to be analyzed. The value contains 1 to 32 characters.

Table 17 Poem generation

Parameter

Description

project_id

Project ID.

region_id

Region ID.

title

Poem title. Only the Chinese title is supported currently and it is encoded using UTF-8. The text length ranges from 1 to 10 characters.

type

Poem type. The values are as follows:

  • 0: five-character quatrain
  • 1: seven-character quatrain
  • 2: five-character octave
  • 3: seven-character octave

acrostic

Acrostic. The values are as follows:

  • true: acrostic
  • false (default): non-acrostic.