NLP

The NLP connector is used to connect to the Huawei Cloud NLP service.

Natural Language Processing (NLP) is a cloud service pertained to enterprises and developers for efficient text mining and analysis.

Creating an NLP Connection

Log in to the new ROMA Connect console.
In the navigation pane on the left, choose Connector. On the page displayed, click New Connection.
Select the NLP connector.

In the dialog box displayed, configure the connector and click OK.

Parameter	Description
Name	Enter the connector instance name.
App Key	Access key ID (AK) of the current account. Obtain the AK by referring to Access Keys. If an AK/SK pair has been generated, find the downloaded AK/SK file (such as credentials.csv).
App Secret	Secret access key (SK) of the current account. Obtain the SK by referring to Access Keys. If an AK/SK pair has been generated, find the downloaded AK/SK file (such as credentials.csv).
Description	Enter the description of the connector to identify it.

Action

Text Similarity (Advanced Edition)
NER (Domain-specific edition)
Multi-granularity Word Segmentation
Document Translation Job Status Query
Document Translation
Language Recognition
Text Translation
Intent Understanding
Document Categorization
Entity-Based Sentiment Analysis
Aspect-Based Sentiment Analysis (Advanced Edition)
Aspect-based Sentiment Analysis
Sentiment Analysis (Domain-specific Edition)
Text Classification
Text Summarization (Domain-specific Edition)
Constituency Syntax Analysis
Poem Generation

Configuration Parameters

**Table 1** Text Similarity (Advanced Edition)
Parameter	Description
project_id	Project ID.
region_id	Region ID.
text1	Text 1, on which the text similarity is to be computed. The text is encoded using UTF-8 and contains 1 to 512 characters.
text2	Text 2, on which the text similarity is to be computed. The text is encoded using UTF-8 and contains 1 to 512 characters.
lang	Supported language type. Currently, only Chinese is supported. The default value is zh.

**Table 2** NER (domain-specific edition)
Parameter	Description
project_id	Project ID.
region_id	Region ID.
text	Text to be analyzed. The text is encoded using UTF-8 and contains 1 to 512 characters.
lang	Supported language type. Currently, only Chinese is supported. The default value is zh.
domain	Supported domain type. The value can be general (default), business, or entertainment.

**Table 3** Multi-granularity word segmentation
Parameter	Description
project_id	Project ID.
region_id	Region ID.
text	Text to be segmented. The text must be encoded in UTF-8 and can contain 1 to 64 characters.
lang	Supported language type. Currently, only Chinese is supported. The default value is zh.
granularity	Segmentation granularity. 1 indicates the finest granularity, and 2 indicates the coarsest granularity. In other cases, the segmentation tree result of all granularities is returned by default.

**Table 4** Document translation job status query
Parameter	Description
project_id	Project ID.
job_id	Document translation job ID, which can be obtained by calling the document translation job creation API.
region_id	Region ID.

**Table 5** Document translation
Parameter	Description
project_id	Project ID.
region_id	Region ID.
url	Path of the document stored in OBS. For private files, you are advised to use a temporary authorization URL to call the service. For details about how to obtain the OBS file URL and temporary authorization URL, see Configuring the Access Permission of OBS. The region of OBS must be the same as that of the requested service. Otherwise, OBS is unavailable, even if it allows public access.
from	Source language. Currently, Chinese and English are supported.
to	Target language. Currently, Chinese and English are supported.
type	Document format. Currently, docx, pptx, and txt files can be translated.

**Table 6** Language recognition
Parameter	Description
project_id	Project ID.
region_id	Region ID.
text	The text whose language needs to be recognized, which must be encoded in UTF-8 and can contain a maximum of 2,000 characters.

**Table 7** Text translation
Parameter	Description
project_id	Project ID.
region_id	Region ID.
text	The text to be translated, which must be encoded in UTF-8 and can contain a maximum of 2,000 characters.
from	Source language to be translated. Supported languages: Arabic (ar), German (de), Russian (ru), French (fr), Korean (ko), Portuguese (pt), Japanese (ja), and Thai (th). Türkiye (tr); Spanish (es); English (en); Vietnamese (vi); simplified Chinese (zh); traditional Chinese (zh-tw). The system automatically detects the input language and translates it to the target language. You need to specify the target language.
to	Target language to be translated. Supported languages: Arabic (ar), German (de), Russian (ru), French (fr), Korean (ko), Portuguese (pt), Japanese (ja), and Thai (th). Türkiye (tr); Spanish (es); English (en); Vietnamese (vi); simplified Chinese (zh); traditional Chinese (zh-tw).
scene	The default value is common. Currently, only common scenarios are supported.

**Table 8** Intent understanding
Parameter	Description
project_id	Project ID.
region_id	Region ID.
lang	Supported language type. Currently, only Chinese is supported. The default value is zh.
text	Text list to be analyzed. The text is encoded in UTF-8 and the value contains a maximum of 32 characters. If the value exceeds 32 characters, only the first 32 characters are detected.

**Table 9** Document categorization
Parameter	Description
project_id	Project ID.
region_id	Region ID.
conten	Document you provide. This API can process a maximum of 10,000 characters at once. If your document exceeds 10,000 characters, only the first 10,000 characters are detected.
lang	Supported language type. Currently, only Chinese is supported. The default value is zh.

**Table 10** Entity-based sentiment analysis
Parameter	Description
project_id	Project ID.
region_id	Region ID.
conten	Request text. The text must be encoded in UTF-8. Only Chinese is supported currently. The content and entity in total cannot exceed 512 characters. Otherwise, only the first 512 characters are detected.
entity	Request entity. The text must be encoded in UTF-8. Only Chinese is supported currently. The content and entity in total cannot exceed 512 characters. Otherwise, only the first 512 characters are detected.
type	Value: 3: Finance

**Table 11** Aspect-based sentiment analysis (advanced edition)
Parameter	Description
project_id	Project ID.
region_id	Region ID.
conten	Text to be analyzed. The text must be encoded in UTF-8. Only Chinese is supported currently. The text can contain a maximum of 4,096 characters. A length of 300 characters is recommended.
type	Values: 1: Mobile phone 2: Automobile

**Table 12** Aspect-based sentiment analysis
Parameter	Description
project_id	Project ID.
region_id	Region ID.
conten	Text to be analyzed. The text must be encoded in UTF-8. Only Chinese is supported currently. The text can contain a maximum of 4,096 characters. A length of 300 characters is recommended.
type	Value: 1: Mobile phone

**Table 13** Sentiment analysis (domain-specific edition)
Parameter	Description
project_id	Project ID.
region_id	Region ID.
conten	Text to be analyzed. The text must be encoded in UTF-8. Only Chinese text sentiment analysis is supported. If type is set to 1 (e-commerce), the value contains a maximum of 200 characters. If the value exceeds 200 characters, only the first 200 characters are detected. If type is set to 2 (automotive), the value contains a maximum of 400 characters. If the value exceeds 400 characters, only the first 400 characters are detected.
type	Values: 0: domain adaptation. The system automatically identifies the domain based on the input content. 1: e-commerce, which can be applied as comments in the e-commerce industry. 2: automotive, which can be applied as comments in the automotive industry.

**Table 14** Text classification
Parameter	Description
project_id	Project ID.
region_id	Region ID.
conten	Text to be analyzed. The text must be encoded in UTF-8. The value contains a maximum of 400 characters. If the value exceeds 400 characters, only the first 400 characters are detected.
domain	Value: 1: advertisement detection

**Table 15** Text summarization (domain-specific edition)
Parameter	Description
project_id	Project ID.
region_id	Region ID.
length_limit	Length limit of the generated summary. If the value of length_limit is greater than 1, the length of the returned summary is greater than or equal to, as well as closest to the value. If the value of length_limit ranges from 0 to 1, the length percentage of the returned summary is greater than or equal to, as well as closest to the value.
title	Text title, which must be encoded in UTF-8 and cannot exceed 1,000 characters.
lang	Supported language type. Currently, only Chinese is supported. The default value is zh.
conten	Text body. Currently, the title is encoded using UTF-8. The text length cannot exceed 10,000 characters.
type	Domain type. Value: 0 (default): General domain. Currently, only the general domain is supported.

**Table 16** Constituency syntax analysis
Parameter	Description
project_id	Project ID.
region_id	Region ID.
lang	Supported language type. Currently, only Chinese is supported. The default value is zh.
text	Text to be analyzed. The value contains 1 to 32 characters.

**Table 17** Poem generation
Parameter	Description
project_id	Project ID.
region_id	Region ID.
title	Poem title. Only the Chinese title is supported currently and it is encoded using UTF-8. The text length ranges from 1 to 10 characters.
type	Poem type. The values are as follows: 0: five-character quatrain 1: seven-character quatrain 2: five-character octave 3: seven-character octave
acrostic	Acrostic. The values are as follows: true: acrostic false (default): non-acrostic.