使用实时语音识别
前提条件
- 确保已按照配置CPP环境(Linux)配置完毕。
- 请参考SDK(websocket)获取最新版本SDK包。
初始化Client
初始化RasrClient,其参数包括AuthInfo
请求参数
请求类为RasrRequest,详见表 RasrRequest。
参数名称 |
是否必选 |
参数类型 |
描述 |
---|---|---|---|
audioFormat |
是 |
String |
音频格式,支持pcm等,如pcm8k16bit,参见《API参考》中开始识别章节。 |
property |
是 |
String |
属性字符串,language_sampleRate_domain, 如chinese_8k_common,参见《API参考》中开始识别章节。 |
通过set方法可以设置具体参数,详见表 RasrRequest设置参数
方法名称 |
是否必选 |
参数类型 |
描述 |
---|---|---|---|
SetPunc |
否 |
String |
表示是否在识别结果中添加标点,取值为yes 、 no,默认no。 |
SetDigitNorm |
否 |
String |
表示是否将语音中的数字识别为阿拉伯数字,取值为yes 、 no,默认为yes。 |
SetVadHead |
否 |
Integer |
头部最大静音时间,[0, 60000],默认10000ms。 |
SetVadTail |
否 |
Integer |
尾部最大静音时间,[0, 3000],默认500ms。 |
SetMaxSeconds |
否 |
Integer |
音频最长持续时间, [1, 60],默认30s。 |
SetIntermediateResult |
否 |
String |
是否显示中间结果,yes 或 no,默认no。 |
SetVocabularyId |
否 |
String |
热词表id,若没有则不填。 |
SetNeedWordInfo |
否 |
String |
表示是否在识别结果中输出分词结果信息,取值为“yes”和“no”,默认为“no”。 |
示例代码
如下示例仅供参考,最新代码请前往SDK(websocket)章节获取并运行。
/* * Copyright (c) Huawei Technologies Co., Ltd. 2020-2020. All rights reserved. */ #include "Utils.h" #include "RasrClient.h" #include "gflags/gflags.h" // auth info // refer to https://support.huaweicloud.com/api-sis/sis_03_0051.html // 认证用的AK和SK硬编码在代码中或明文存储都有很大安全风险,建议在配置文件或环境变量中密文存放,使用时解密,确保安全。 DEFINE_string(ak, "", "access key"); DEFINE_string(sk, "", "secrect key"); // region, for example cn-east-3, cn-north-4 DEFINE_string(region, "cn-east-3", "project region, such as cn-east-3"); // projectId, refer to https://support.huaweicloud.com/api-sis/sis_03_0008.html DEFINE_string(projectId, "", "project id"); // endpoint, relevant to region, sis-ext.${region}.myhuaweicloud.com DEFINE_string(endpoint, "", "service endpoint"); DEFINE_string(audioFormat, "pcm16k16bit", "such pcm16k16bit alaw16k16bit etc."); DEFINE_string(property, "chinese_16k_general", ""); DEFINE_string(audioPath, "xx.wav", "audio path"); DEFINE_int32(chunkSize, 3000, "bytes per send"); DEFINE_int32(sampleRate, 16000, "sample rate of audio"); DEFINE_int32(readTimeOut, 20000, "read time out, default 20s"); DEFINE_int32(connectTimeOut, 20000, "connecting time out, default 20s"); DEFINE_int32(bytesPerSecond, 32000, "32000 bytes per second"); void OnOpen() { LOG(INFO) << "now rasr Connect success"; } void OnStart(std::string text) { LOG(INFO) << "now rasr receive start response: " << text; } void OnResp(std::string text) { // text encoded by utf-8 contains chinese character, which will cause error code. So we should convert to ansi LOG(INFO) << "rasr receive " << text; } void OnEnd(std::string text) { LOG(INFO) << "now rasr receive end response: " << text; } void OnClose() { LOG(INFO) << "now rasr receive Close"; } void OnError(std::string text) { LOG(INFO) << "now rasr receive error: " << text; } void OnEvent(std::string text) { LOG(INFO) << "now rasr receive event: " << text; } void RasrTest(const std::string filePath) { const int sleepTime = FLAGS_bytesPerSecond / FLAGS_chunkSize; speech::huawei_asr::AuthInfo authInfo(FLAGS_ak, FLAGS_sk, FLAGS_region, FLAGS_projectId, FLAGS_endpoint); // config Connect parameter speech::huawei_asr::HttpConfig httpConfig; httpConfig.SetReadTimeout(FLAGS_readTimeOut); httpConfig.SetConnectTimeout(FLAGS_connectTimeOut); // config callback, callback function are optional, if not set, it will use function in RasrListener speech::huawei_asr::WebsocketService::ptr websocketServicePtr = websocketpp::lib::make_shared<speech::huawei_asr::WebsocketService>(); websocketServicePtr->SetOnConnectFunc(OnOpen); // Connect success callback websocketServicePtr->SetOnStartFunc(OnStart); // receive start response callback websocketServicePtr->SetOnRespFunc(OnResp); // receive transcribe result callback websocketServicePtr->SetOnEndFunc(OnEnd); // receive end response callback websocketServicePtr->SetOnCloseFunc(OnClose); // Close callback websocketServicePtr->SetOnEventFunc(OnEvent); // receive event callback websocketServicePtr->SetOnErrorFunc(OnError); // receive error callback // step1 create client std::shared_ptr<speech::huawei_asr::RasrClient> rasrClient = std::make_shared<speech::huawei_asr::RasrClient>(authInfo, websocketServicePtr, httpConfig); // step2 connect, just select one mode, the following is continue stream connect. rasrClient->ContinueStreamConnect(); // short stream connect // rasrClient->ShortStreamConnect(); // sentence stream connect // rasrClient->SentenceStreamConnect(); // step3 construct request params speech::huawei_asr::RasrRequest request(FLAGS_audioFormat, FLAGS_property); // set whether to add punctuation, yes or no, default no, optional operation. request.SetPunc("no"); // set whether to transcribe number into arabic numerals, yes or no, default yes,optional operation. request.SetDigitNorm("yes"); // set vad head, max silent head, [0, 60000], default 10000, optional operation. request.SetVadHead(10000); // set vad tail, max silent tail, [0, 3000], default 500, optional operation. request.SetVadTail(500); // set max seconds of one sentence, [1, 60], default 30, optional operation. request.SetMaxSeconds(30); // set whether to return intermediate result, yes or no, default no. optional operation. request.SetIntermediateResult("no"); // set whether to return word_info, yes or no, default no. optional operation. request.SetNeedWordInfo("no"); // set vocabulary_id, it should be filled only if it exists or it will report error // request.SetVocabularyId(""); // step4 send start rasrClient->SendStart(request); // step5 send audio std::string audioContent; int ret = speech::huawei_asr::ReadBinary(filePath, audioContent); if (ret != 0) { LOG(ERROR) << "RasrDemo running failed"; rasrClient->Close(); return; } unsigned char *buf = (unsigned char *)(audioContent.c_str()); rasrClient->SendBinary(buf, audioContent.size(), FLAGS_chunkSize, sleepTime); // step5 send end rasrClient->SendEnd(); // step6 close rasrClient->Close(); } int main(int argc, char *argv[]) { FLAGS_alsologtostderr = true; FLAGS_log_dir = "./logs"; gflags::ParseCommandLineFlags(&argc, &argv, true); google::InitGoogleLogging(argv[0]); RasrTest(FLAGS_audioPath); return 0; }
编译脚本
以下编译脚本仅供参考,您可以根据实际业务需求,对RasrDemo.cpp进行定制修改。
cd ${project_dir} mkdir build && cd build mkdir logs cmake .. make -j ./RasrDemo --audioPath=yourAudioPath --ak=yourAk --sk=yourSk --region=yourRegion --projectId=yourProjectId