Natural Language ProcessingNatural Language Processing

Compute
Elastic Cloud Server
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
Domain Name Service
VPC Endpoint
Cloud Connect
Enterprise Switch
Security & Compliance
Anti-DDoS
Web Application Firewall
Host Security Service
Data Encryption Workshop
Database Security Service
Advanced Anti-DDoS
Data Security Center
Container Guard Service
Situation Awareness
Managed Threat Detection
Compass
Cloud Certificate Manager
Anti-DDoS Service
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GaussDB NoSQL
GaussDB(for MySQL)
Distributed Database Middleware
GaussDB(for openGauss)
Developer Services
ServiceStage
Distributed Cache Service
Simple Message Notification
Application Performance Management
Application Operations Management
Blockchain
API Gateway
Cloud Performance Test Service
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
DevCloud
ProjectMan
CodeHub
CloudRelease
CloudPipeline
CloudBuild
CloudDeploy
Cloud Communications
Message & SMS
Cloud Ecosystem
Marketplace
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP License Service
Support Plans
Customer Operation Capabilities
Partner Support Plans
Professional Services
enterprise-collaboration
Meeting
IoT
IoT
Intelligent EdgeFabric
DeveloperTools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Updated at: Feb 22, 2022 GMT+08:00

Word Segmentation

Introduction

This API is used to segment words in the text.

For details about endpoints, see Endpoints.

URI

  • URI format
    POST /v1/{project_id}/nlp-fundamental/segment
  • Parameter description
    Table 1 URI parameters

    Parameter

    Mandatory

    Description

    project_id

    Yes

    Project ID. For details about how to obtain the project ID, see Obtaining a Project ID.

Request

Table 2 describes the request parameters.

Table 2 Request parameters

Parameter

Type

Mandatory

Description

text

String

Yes

Text to be segmented. The text is encoded using UTF-8 and contains 1 to 2,000 characters.

pos_switch

Integer

No

Whether to enable part-of-speech tagging (POS tagging). The options are 1 (yes) and 0 (no). The default value is 0.

lang

String

No

Supported text language type. English (en) is now supported.

criterion

String

No

Supported word segmentation criterion

The default word segmentation criterion for English text is Penn TreeBank. You do not need to configure this parameter.

Response

Table 3 describes the response parameters.

Table 3 Response parameters

Parameter

Type

Description

words

Array of words

Word segmentation result. For details, see Table 4.

error_code

String

Error code when the API fails to be called. For details, see Error Code.

The parameter is not included when the API call succeeds.

error_msg

String

Error message returned when the API fails to be called.

The parameter is not included when the API call succeeds.

Table 4 Word field data structure

Parameter

Type

Description

content

String

Word text.

pos

String

Lexical character corresponding to a word. For details, see Table 5, Table 6, and Table 7.

Table 5 Part of speech (POS) description (PKU)

Class-1 POS

Class-2 POS

Class-3 POS

n: Noun

nr: Name of a person

  • nr1: Chinese surname
  • nr2: Chinese given name
  • nrj: Japanese name
  • nrf: Transliterated name

ns: Place name

nsf: Transliterated place name

nt: Organization or group name

-

nz: Other exclusive name

-

nl: Nominal locution

-

ng: Nominal morpheme

-

t: Time word

tg: Time morpheme

-

s: Locative word

-

-

f: Positional word

-

-

v: Verb

vd: Adverbial form of a verb

-

vn: Gerund

-

vshi: Copula verb

-

vyou: Verb indicating "has/have"

-

vf: Directional verb

-

vx: Formal verb

-

vi: Intransitive verb

-

vl: Verbal locution

-

vg: Verbal morpheme

-

a: Adjective

ad: Adverbial adjective

-

an: Nominal adjective

-

ag: Adjective morpheme

-

al: Adjective locution

-

b: Distinguishing word

bl: Distinguishing locution

-

z: Status word

-

-

r: Pronoun

rr: Personal pronoun

-

rz: Demonstrative pronoun

  • rzt: Demonstrative pronoun for time
  • rzs: Demonstrative pronoun for location
  • rzv: Demonstrative pronoun for predicate

ry: Interrogative pronoun

  • ryt: Interrogative pronoun for time
  • rys: Interrogative pronoun for location
  • ryv: Interrogative pronoun for predicate

rg: Pronominal morpheme

-

m: Numeral

mq: Number word

-

mg: A, B, C, D, E, F, G, H, N, and G

-

q: Classifier

qv: Verbal classifier

-

qt: Time classifier

-

d: Adverb

-

-

p: Preposition

pba: Preposition ba

-

pbei: Preposition bei

-

c: Conjunction

cc: Coordinating conjunction

-

u: Particle

uzhe: Particle

-

ule: Particle

-

uguo: Particle

-

ude1: Particle

-

ude2: Particle

-

ude3: Particle

-

usuo: Particle

-

udeng: Particle

-

uyy: Particle

-

udh: Particle

-

uls: Particle

-

uzhi: Particle

-

ulian: Particle

-

e: Exclamation

-

-

y: Discourse word

-

-

o: Onomatopoeia

-

-

h: Prefix

-

-

k: Suffix

-

-

x: character string

xe: Email character string

-

xs: Weibo session separator

-

xm: Emoticon

-

xu: Website URL

-

w: Punctuation

wkz: Chinese left brackets

-

wky: Chinese right brackets

-

wyz: Chinese left quotation marks

-

wyy: Chinese right quotation marks

-

wj: Chinese full stop

-

ww: Question marks

-

wt: Exclamation marks

-

wd: Commas

-

wf: Semicolons

-

wn: Enumeration comma

-

wm: Colons

-

ws: Ellipsis

-

wp: Dashes

-

wb: Percentile and permil

-

wh: Unit

-

Table 6 POS description (CTB)

POS

Description

Example

AD

Adverb

word-1, word-2, word-3

AS

Dynamic particle

word-4, word-5, word-6

BA

"ba" structure

word-7

CC

Coordinating conjunction

word-8, word-9

CD

Quantifier

One, two, three

CS

Subordinating conjunction

Although, if, when

DEC

Complement or nominalization

word-10, word-11

DEG

Conjunctive or possessive

word-12, word-13

DER

Complement de

de

DEV

Adverb di

di

DT

Determiner

word-14, word-15, word-16

ETC

word-17

word-17, word-18

FW

Loanword

A E B

IJ

Exclamation

word-18, word-19

JJ

Modifier for noun

Big, new, small

LB

Long bei structure

word-20, word-21, word-22

LC

Positional word

middle, upper

M

Classifier

Unit, year, dollar

MSP

Particle

Particle-1, particle-2, particle-3

NN

Noun

Economy, enterprise, person

NR

Proper noun

China, Zhejiang

NT

Time noun

Present, last year

OD

Numeral

First, second, top

ON

Onomatopoeia

O

P

Preposition

Preposition-1, preposition-2, preposition-3

PN

Pronoun

He, I, myself

PU

Punctuation

Chinese comma, Chinese full stop

SB

Short bei structure

word-23, word-24

SP

Particle at the end of a sentence

Particle-1, particle-2, particle-3

VA

Predicative adjective

Big, many, good

VC

Linking verb

Verb-1, verb-2, verb-3

VE

Verb indicating "has/have"

Verb-4, verb-5, verb-6

VV

Verb

Verb-7, verb-8, verb-9

Table 7 POS description (Penn TreeBank)

POS

Description

Example

CC

Coordinating conjunction

and, but, or

CD

Cardinal number

one, two

DT

Determiner

a, the

EX

There be, to exist

there

FW

Foreign word

mea, culpa

IN

Preposition, subordinating conjunction

of, in, by

JJ

Adjective

yellow

JJR

Comparative form of adjectives

bigger

JJS

Superlative form of adjectives

wildest

LS

List item marker

1, 2, One

MD

Modal verb

can, could, might

NN

Noun, countable or uncountable

llama

NNS

Noun, in plural form

llamas

NNP

Proper noun, in singular form

IBM

NNPS

Proper noun, in plural form

Carolinas

PDT

Predeterminer

all, both

POS

Possessive adjective

's

PRP

Personal pronoun

I, me, you,

PRP$

Possessive pronoun

my, your, yours

RB

Adverb

quickly

RBR

Comparative form of adverbs

faster

RBS

Superlative form of adverbs

fastest

RP

Particle

up, off

SYM

Sign (mathematics or science)

+, % ,&

TO

to

to

UH

Exclamation

ah, oops

VB

Basic form of verbs

eat

VBD

Past tense of verbs

ate

VBG

Gerund or present participle

eating

VBN

Past participle

eaten

VBP

Non-third person singular form of verbs

eat

VBZ

Third person singular form of verbs

eats

WDT

wh-determiner

which, that

WP

wh-pronoun

what, who

WP$

wh-possessive pronoun

whose

WRB

wh-adverb

how, where

PU

Punctuation

, . :

Example

  • Example request
    POST https://nlp-ext.ap-southeast-3.myhuaweicloud.com/v1/{project_id}/nlp-fundamental/segment
    
    Request Header:
        Content-Type: application/json
        X-Auth-Token: MIINRwYJKoZIhvcNAQcCoIINODCCDTQCAQExDTALBglghkgBZQMEAgEwgguVBgkqhkiG...
    
    Request Body:
        {    
            "text":"today is a good day.", 
            "pos_switch":1,    
            "lang":"en",
            "criterion":"PKU"
    }
  • Example response
    • Successful response example
      {
          "words": [
              {
                  "content": "today",
                  "pos": "NN"
              },
              {
                  "content": "is",
                  "pos": "VBZ"
              },
              {
                  "content": "a",
                  "pos": "DT"
              },
              {
                  "content": "good",
                  "pos": "JJ"
              },
              {
                  "content": "day",
                  "pos": "NN"
              },
              {
                  "content": ".",
                  "pos": "PU"
              }
          ]
      }
    • Failed response example
      {
          "error_code": "NLP.0301",
          "error_msg": "The length of text should be in the range of 1-512"
      }

Status code

For details about status codes, see Status Code.

Error Code

For details about error codes, see Error Code.

Did you find this page helpful?

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel