Word Segmentation

  • Example request
    //You are advised to construct and use the client in a singleton pattern to avoid frequent object creation.
    NlpfClient client = new NlpfClient(AuthMode.AKSK,authInfo);
    
    SegmentReq req = new SegmentReq();
    //text: indicates text to be segmented, which is mandatory.
    req.setText("Text to segment");
    
    // Optional. SegmentConstant.POS_SWITCH_ON indicates enabling part-of-speech tagging and SegmentConstant.POS_SWITCH_OFF indicates disabling part-of-speech tagging. The values are 1 and 0 respectively.
    req.setPosSwitch(SegmentConstant.POS_SWITCH_ON);
    
    //Optional. The default value is zh, indicating Chinese. Only Chinese is currently supported.
    req.setLang("zh");
    
    //Optional. Supported word segmentation criteria. Currently, Peking University standard (PKU) and Chinese Penn Treebank (CTB) are supported. The default value is PKU.
    req.setCriterion("PKU");
    
    try {
        SegmentResp resp = client.segment(req);
    } catch (NlpException e) {
        //A failure is thrown as an exception. For details about exceptions, see section "Exceptions". The client automatically processes exceptions.
    }
  • For details about SegmentReq parameters, see Table 1. Obtain and set the parameters by using the getter and setter methods.
    Table 1 SegmentReq parameters

    Parameter

    Mandatory

    Type

    Description

    text

    Yes

    String

    Text to be segmented

    posSwitch

    No

    Integer

    1 indicates enabling part-of-speech tagging and 0 indicates disabling part-of-speech tagging. The default value is 0.

    lang

    No

    String

    Language. zh indicates Chinese. The default value is zh.

    criterion

    No

    String

    Supported word segmentation criteria. Currently, Peking University standard (PKU) and Chinese Penn Treebank (CTB) are supported. The default value is PKU.

  • For details about SegmentReq parameters, see Table 2. Obtain and set the parameters by using the getter and setter methods.
    Table 2 SegmentResp parameters

    Parameter

    Type

    Description

    words

    Array of Word

    Word segmentation result

    Table 3 Data structure description of the Word field

    Parameter

    Type

    Description

    content

    String

    Word text

    pos

    String

    Part of speech of a word. For details, see Word Segmentation.