Word Segmentation
Introduction
This API is used to segment words in the text.
For details about endpoints, see Endpoints.
Calling NLP APIs will incur fees. NLP packages are classified into the basic and domain-specific editions. When purchasing a package, view the APIs supported by the basic package and domain-specific packages in the Natural Language Processing Price Calculator.
URI
- URI format
POST /v1/{project_id}/nlp-fundamental/segment
- Parameter description
Table 1 URI parameters Parameter
Mandatory
Description
project_id
Yes
Project ID. For details about how to obtain the project ID, see Obtaining a Project ID.
Request
Table 2 describes the request parameters.
| Parameter | Type | Mandatory | Description |
|---|---|---|---|
| text | String | Yes | Text to be split. The text is encoded using UTF-8 and contains 1 to 512 characters. |
| pos_switch | Integer | No | Whether to enable part-of-speech tagging (POS tagging). The options are 1 (yes) and 0 (no). The default value is 0. |
| lang | String | No | Supported text language type. Currently, Chinese (zh) and English (en) are supported. The default value is zh. |
| criterion | String | No | Supported word segmentation criterion Supported word segmentation criteria. Currently, Peking University standard (PKU) and Chinese Penn Treebank (CTB) are supported. The default value is PKU. The default word segmentation criterion for English text is Penn TreeBank. You do not need to configure this parameter. |
Response
Table 3 describes the response parameters.
| Parameter | Type | Description |
|---|---|---|
| words | Array of words | Word segmentation result. For details, see Table 4. |
| error_code | String | Error code when the API fails to be called. For details, see Error Code. The parameter is not included when the API call succeeds. |
| error_msg | String | Error message returned when the API fails to be called. The parameter is not included when the API call succeeds. |
| Parameter | Type | Description |
|---|---|---|
| content | String | Word text. |
| pos | String | Lexical character corresponding to a word. For details, see Table 5, Table 6, and Table 7. |
| Class-1 POS | Class-2 POS | Class-3 POS |
|---|---|---|
| n: Noun | nr: Name of a person |
|
| ns: Place name | nsf: Transliterated place name | |
| nt: Organization or group name | - | |
| nz: Other exclusive name | - | |
| nl: Nominal locution | - | |
| ng: Nominal morpheme | - | |
| t: Time word | tg: Time morpheme | - |
| s: Locative word | - | - |
| f: Positional word | - | - |
| v: Verb | vd: Adverbial form of a verb | - |
| vn: Gerund | - | |
| vshi: Copula verb | - | |
| vyou: Verb indicating "has/have" | - | |
| vf: Directional verb | - | |
| vx: Formal verb | - | |
| vi: Intransitive verb | - | |
| vl: Verbal locution | - | |
| vg: Verbal morpheme | - | |
| a: Adjective | ad: Adverbial adjective | - |
| an: Nominal adjective | - | |
| ag: Adjective morpheme | - | |
| al: Adjective locution | - | |
| b: Distinguishing word | bl: Distinguishing locution | - |
| z: Status word | - | - |
| r: Pronoun | rr: Personal pronoun | - |
| rz: Demonstrative pronoun |
| |
| ry: Interrogative pronoun |
| |
| rg: Pronominal morpheme | - | |
| m: Numeral | mq: Number word | - |
| mg: A, B, C, D, E, F, G, H, N, and G | - | |
| q: Classifier | qv: Verbal classifier | - |
| qt: Time classifier | - | |
| d: Adverb | - | - |
| p: Preposition | pba: Preposition ba | - |
| pbei: Preposition bei | - | |
| c: Conjunction | cc: Coordinating conjunction | - |
| u: Particle | uzhe: Particle | - |
| ule: Particle | - | |
| uguo: Particle | - | |
| ude1: Particle | - | |
| ude2: Particle | - | |
| ude3: Particle | - | |
| usuo: Particle | - | |
| udeng: Particle | - | |
| uyy: Particle | - | |
| udh: Particle | - | |
| uls: Particle | - | |
| uzhi: Particle | - | |
| ulian: Particle | - | |
| e: Exclamation | - | - |
| y: Discourse word | - | - |
| o: Onomatopoeia | - | - |
| h: Prefix | - | - |
| k: Suffix | - | - |
| x: character string | xe: Email character string | - |
| xs: Weibo session separator | - | |
| xm: Emoticon | - | |
| xu: Website URL | - | |
| w: Punctuation | wkz: Chinese left brackets | - |
| wky: Chinese right brackets | - | |
| wyz: Chinese left quotation marks | - | |
| wyy: Chinese right quotation marks | - | |
| wj: Chinese full stop | - | |
| ww: Question marks | - | |
| wt: Exclamation marks | - | |
| wd: Commas | - | |
| wf: Semicolons | - | |
| wn: Enumeration comma | - | |
| wm: Colons | - | |
| ws: Ellipsis | - | |
| wp: Dashes | - | |
| wb: Percentile and permil | - | |
| wh: Unit | - |
| POS | Description | Example |
|---|---|---|
| AD | Adverb | word-1, word-2, word-3 |
| AS | Dynamic particle | word-4, word-5, word-6 |
| BA | "ba" structure | word-7 |
| CC | Coordinating conjunction | word-8, word-9 |
| CD | Quantifier | One, two, three |
| CS | Subordinating conjunction | Although, if, when |
| DEC | Complement or nominalization | word-10, word-11 |
| DEG | Conjunctive or possessive | word-12, word-13 |
| DER | Complement de | de |
| DEV | Adverb di | di |
| DT | Determiner | word-14, word-15, word-16 |
| ETC | word-17 | word-17, word-18 |
| FW | Loanword | A E B |
| IJ | Exclamation | word-18, word-19 |
| JJ | Modifier for noun | Big, new, small |
| LB | Long bei structure | word-20, word-21, word-22 |
| LC | Positional word | middle, upper |
| M | Classifier | Unit, year, dollar |
| MSP | Particle | Particle-1, particle-2, particle-3 |
| NN | Noun | Economy, enterprise, person |
| NR | Proper noun | China, Zhejiang |
| NT | Time noun | Present, last year |
| OD | Numeral | First, second, top |
| ON | Onomatopoeia | O |
| P | Preposition | Preposition-1, preposition-2, preposition-3 |
| PN | Pronoun | He, I, myself |
| PU | Punctuation | Chinese comma, Chinese full stop |
| SB | Short bei structure | word-23, word-24 |
| SP | Particle at the end of a sentence | Particle-1, particle-2, particle-3 |
| VA | Predicative adjective | Big, many, good |
| VC | Linking verb | Verb-1, verb-2, verb-3 |
| VE | Verb indicating "has/have" | Verb-4, verb-5, verb-6 |
| VV | Verb | Verb-7, verb-8, verb-9 |
| POS | Description | Example |
|---|---|---|
| CC | Coordinating conjunction | and, but, or |
| CD | Cardinal number | one, two |
| DT | Determiner | a, the |
| EX | There be, to exist | there |
| FW | Foreign word | mea, culpa |
| IN | Preposition, subordinating conjunction | of, in, by |
| JJ | Adjective | yellow |
| JJR | Comparative form of adjectives | bigger |
| JJS | Superlative form of adjectives | wildest |
| LS | List item marker | 1, 2, One |
| MD | Modal verb | can, could, might |
| NN | Noun, countable or uncountable | llama |
| NNS | Noun, in plural form | llamas |
| NNP | Proper noun, in singular form | IBM |
| NNPS | Proper noun, in plural form | Carolinas |
| PDT | Predeterminer | all, both |
| POS | Possessive adjective | 's |
| PRP | Personal pronoun | I, me, you, |
| PRP$ | Possessive pronoun | my, your, yours |
| RB | Adverb | quickly |
| RBR | Comparative form of adverbs | faster |
| RBS | Superlative form of adverbs | fastest |
| RP | Particle | up, off |
| SYM | Sign (mathematics or science) | +, % ,& |
| TO | to | to |
| UH | Exclamation | ah, oops |
| VB | Basic form of verbs | eat |
| VBD | Past tense of verbs | ate |
| VBG | Gerund or present participle | eating |
| VBN | Past participle | eaten |
| VBP | Non-third person singular form of verbs | eat |
| VBZ | Third person singular form of verbs | eats |
| WDT | wh-determiner | which, that |
| WP | wh-pronoun | what, who |
| WP$ | wh-possesive pronoun | whose |
| WRB | wh-adverb | how, where |
| PU | Punctuation | , . : |
Example
- Example request
POST https://{endpoint}/v1/{project_id}/nlp-fundamental/segment Request Header: Content-Type: application/json X-Auth-Token: MIINRwYJKoZIhvcNAQcCoIINODCCDTQCAQExDTALBglghkgBZQMEAgEwgguVBgkqhkiG... Request Body: { "text":"Text to segment", "pos_switch":1, "lang":"zh", "criterion":"PKU" } - Example response
- Successful response example
{ "words": [ { "content": "word-1", "pos": "t" }, { "content": "word-2", "pos": "n" }, { "content": "word-3", "pos": "d" }, { "content": "word-4", "pos": "a" } ] } - Failed response example
{ "error_code": "NLP.0301", "error_msg": "The length of text should be in the range of 1-512" }
- Successful response example
Status code
For details about status codes, see Status Code.
Error Code
For details about error codes, see Error Code.
Last Article: NLP Fundamentals APIs
Next Article: Multi-granularity Word Segmentation
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.