我之前在学习的时候做过一个贪吃蛇小游戏,本来打算想把这个游戏弄成语音控制的,在我工作了以后一直也没有时间,就在几天前,应一位小学弟的要求,今天就用这个小游戏来举例,让你学会linux语音识别。
在bin文件夹下注意到一个asr_keywords_utf8.txt的文件,这个SDK的你需要知道思路:你把你想识别的文字写到asr_keywords_utf8.txt中,接下来上传到服务器上,然后返回一个GrammarID,据说上传一次“终身有效”,意思就是不让重复上传占用服务器空间,反正有了这个GrammarID以后在不同的程序中想识别相同的文字就直接用好了,比如我想识别“左,右,上,下,图书馆,独自”,把这些汉字写到asr_keywords_utf8.txt中,而且必须是utf-8的格式,当然在linux下默认如此。下面展示一下我写的上传这个txt并获得GrammarID的代码:
#include #include #include #include #include #define TRUE 1 #define FALSE 0 int main() { int ret = QISRInit("appid=xxxxxxx"); if(ret != MSP_SUCCESS) { printf("QISRInit with errorCode: %d n", ret); return 0; } char GrammarID[128]; memset(GrammarID, 0, sizeof(GrammarID)); const int MAX_KEYWORD_LEN = 4096; ret = MSP_SUCCESS; const char * sessionID = NULL; sessionID = QISRSessionBegin(NULL, "ssm=1,sub=asr", &ret); if(ret != MSP_SUCCESS) { printf("QISRSessionBegin with errorCode: %d n", ret); return ret; } char UserData[MAX_KEYWORD_LEN]; memset(UserData, 0, MAX_KEYWORD_LEN); FILE* fp = fopen("asr_keywords_utf8.txt", "rb"); if (fp == NULL) { printf("keyword file cannot openn"); return -1; } unsigned int len = (unsigned int)fread(UserData, 1, MAX_KEYWORD_LEN, fp); UserData[len] = 0; fclose(fp); const char* testID = QISRUploadData(sessionID, "contact", UserData, len, "dtt=keylist", &ret); if(ret != MSP_SUCCESS) { printf("QISRUploadData with errorCode: %d n", ret); return ret; } memcpy((void*)GrammarID, testID, strlen(testID)); printf("GrammarID: "%s" n", GrammarID); QISRSessionEnd(sessionID, "normal"); return 0; }
记住要达到这个效果以后,把这些记下来就好了,接下来的一个步骤要好好挺,就是录制了,在这里有一个点需要注意一下,就是不能直接用ubuntu自带的录音机,那样会识别不了,自带的录音软件都是默认32位采样,只能用ffmpeg或自己写代码录制,ffmpeg命令如下:
ffmpeg -f alsa -i hw:0 -ar 16000 -ac 1 lib.wav
我录制了2秒音频,就会有识别代码:
#include #include #include #include #include #define TRUE 1 #define FALSE 0 int run_asr(const char* asrfile); const int BUFFER_NUM = 4096; const int MAX_KEYWORD_LEN = 4096; int main(int argc, char* argv[]) { int ret = MSP_SUCCESS; const char* asrfile ="lib.wav"; ret = QISRInit("appid=xxxxxx"); if(ret != MSP_SUCCESS) { printf("QISRInit with errorCode: %d n", ret); return 0; } ret = run_asr(asrfile); QISRFini(); char key = getchar(); return 0; } int run_asr(const char* asrfile) { int ret = MSP_SUCCESS; int i = 0; FILE* fp = NULL; char buff[BUFFER_NUM]; unsigned int len; int status = MSP_AUDIO_SAMPLE_CONTINUE, ep_status = -1, rec_status = -1, rslt_status = -1; //const char* GrammarID="e7eb1a443ee143d5e7ac52cb794810fe"; const char *GrammarID="c66d4eecd37d4fe1c8274a2224b832d5"; const char* param = "rst=json,sub=asr,ssm=1,aue=speex,auf=audio/L16;rate=16000";//注意sub=asr const char* sess_id = QISRSessionBegin(GrammarID, param, &ret); if ( MSP_SUCCESS != ret ) { printf("QISRSessionBegin err %dn", ret); return ret; } fp = fopen( asrfile , "rb"); if ( NULL == fp ) { printf("failed to open file,please check the file.n"); QISRSessionEnd(sess_id, "normal"); return -1; } printf("writing audio...n"); // int count=0; // while ( !feof(fp) ) { len = (unsigned int)fread(buff, 1, BUFFER_NUM, fp); feof(fp) ? status = MSP_AUDIO_SAMPLE_LAST : status = MSP_AUDIO_SAMPLE_CONTINUE; if(status==MSP_AUDIO_SAMPLE_LAST) printf("MSP_AUDIO_SAMPLE_LASTn"); if(status==MSP_AUDIO_SAMPLE_CONTINUE) printf("MSP_AUDIO_SAMPLE_CONTINUEn"); // ret = QISRAudioWrite(sess_id, buff, len, status, &ep_status, &rec_status); if ( ret != MSP_SUCCESS ) { printf("nQISRAudioWrite err %dn", ret); break; } // printf("%dn",count++); // if ( rec_status == MSP_REC_STATUS_SUCCESS ) { const char* result = QISRGetResult(sess_id, &rslt_status, 0, &ret); if (ret != MSP_SUCCESS ) { printf("error code: %dn", ret); break; } else if( rslt_status == MSP_REC_STATUS_NO_MATCH ) printf("get result nomatchn"); else { if ( result != NULL ) printf("get result[%d/%d]:len:%dn %sn", ret, rslt_status,strlen(result), result); } } printf("."); } printf("n"); if (ret == MSP_SUCCESS) { printf("get reuslt~~~~~~~n"); char asr_result[1024] = ""; unsigned int pos_of_result = 0; int loop_count = 0; do { const char* result = QISRGetResult(sess_id, &rslt_status, 0, &ret); if ( ret != 0 ) { printf("QISRGetResult err %dn", ret); break; } if( rslt_status == MSP_REC_STATUS_NO_MATCH ) { printf("get result nomatchn"); } else if ( result != NULL ) { // FILE*f=fopen("data.txt","wb"); printf("~~~%dn",strlen(result)); fwrite(result,1,strlen(result),f); fclose(f); // printf("[%d]:get result[%d/%d]: %sn", (loop_count), ret, rslt_status, result); strcpy(asr_result+pos_of_result,result); pos_of_result += (unsigned int)strlen(result); } else { printf("[%d]:get result[%d/%d]n",(loop_count), ret, rslt_status); } usleep(500000); } while (rslt_status != MSP_REC_STATUS_COMPLETE && loop_count++ < 30); if (strcmp(asr_result,"")==0) { printf("no resultn"); } } QISRSessionEnd(sess_id, NULL); printf("QISRSessionEnd.n"); fclose(fp); return 0; }
识别后,你就会看到输出结果如下:
kl@kl-Latitude:~/xunfeiSDK$ ./a.out writing audio... MSP_AUDIO_SAMPLE_CONTINUE 0 .MSP_AUDIO_SAMPLE_CONTINUE 1 .MSP_AUDIO_SAMPLE_CONTINUE 2 .MSP_AUDIO_SAMPLE_CONTINUE 3 .MSP_AUDIO_SAMPLE_CONTINUE 4 .MSP_AUDIO_SAMPLE_CONTINUE 5 .MSP_AUDIO_SAMPLE_CONTINUE 6 .MSP_AUDIO_SAMPLE_CONTINUE 7 .MSP_AUDIO_SAMPLE_CONTINUE 8 .MSP_AUDIO_SAMPLE_CONTINUE 9 .MSP_AUDIO_SAMPLE_CONTINUE 10 .MSP_AUDIO_SAMPLE_CONTINUE
11 .MSP_AUDIO_SAMPLE_CONTINUE 12 .MSP_AUDIO_SAMPLE_CONTINUE 13 .MSP_AUDIO_SAMPLE_CONTINUE 14 .MSP_AUDIO_SAMPLE_CONTINUE 15 .MSP_AUDIO_SAMPLE_CONTINUE 16 .MSP_AUDIO_SAMPLE_CONTINUE 17
.MSP_AUDIO_SAMPLE_CONTINUE 18 .MSP_AUDIO_SAMPLE_CONTINUE 19 .MSP_AUDIO_SAMPLE_CONTINUE 20 .MSP_AUDIO_SAMPLE_CONTINUE 21 .MSP_AUDIO_SAMPLE_CONTINUE 22 .MSP_AUDIO_SAMPLE_LAST 23 . get reuslt~~~~~~~ [0]:get result[0/2] ~~~123 [1]:get result[0/5]: {"sn":1,"ls":true,"bg":0,"ed":0,"ws":[{"bg":0,"cw":[{"sc":"85","gm":"0","w":"图书馆","mn":[{"contact":"图书馆"}]}]}]} QISRSessionEnd.
还有一个注意的点这个输出格式是个坑,因为官方的例子默认是直接输出识别的结果,但是结果是GB2312格式的,在linux终端下是乱码,在这里你要注意:在QISRSessionBegin()函数初始化的时候第二个参数param中的rst改成json,就是按照json格式把所有结果全输出来后,是utf8格式的汉字,之后再用json模块来解就妥妥的了~整体代码很清晰。
1.先要调用QISRInit()函数,参数是自己的appid,每个SDK都是注册才能下载的,所以是的,用来区分用户的,不同级别的用户每天可以使用SDK的次数有限制,毕竟人用的多了语音识别的性能肯定会下降;
2.之后就是把GrammarID,输入输出的参数param和调用状态返回值ret作为参数传入QISRSessionBegin()函数中进行初始化,返回值是sessionID,这个是后面所有函数的主要参数之一;
3.打开自己的音频文件,调用QISRAudioWrite()函数写入,可以分段也可以一次,个参数是sessionID,上面初始化函数返回的值,第二个参数是音频数据头指针,第三个参数是音频文件大小,第四个参数是音频发送的状态,表示发送完了没有,剩下两个是服务器端检测语音状态和识别状态的返回值;
4.调用QISRGetResult()函数获取识别的结果,个参数还是sessionID,第二个参数是输出识别的状态,第三个参数是与服务器交互的间隔时间,官方建议5000,我取为0,第四个参数是调用状态返回值ret,这个函数的返回值就是上面结果的json数据了;
linux语音识别程序你是不是已经掌握了呢,你可以多看几遍,先做一个简单的,接下来有简到难,明白了其中的道理,那么你就是linux语音识别程序的大神了。