Azure Voicemail Transcription

Status
Not open for further replies.

bryanredeagle

New Member
Apr 17, 2019
29
4
3
38
La Porte, IN
haway.io
I've gotten transcription working using the new Cognition API on Microsoft Azure. I plan on doing a proper pull request when I read through the contribution guidelines. In the meantime, I've attached the modified script (use the script editor to edit app/voicemail/resources/functions/record_message.lua).

You'll need to sign up for Cognitive services on Azure. Be sure to note the region where you placed the resource group. You'll need it later.

Edit the default settings like in the original instructions (https://docs.fusionpbx.com/en/latest/applications/voicemail_transcription.html), but with two differences
1. You only need microsoft_key1 (both keys work, but you only need one of them).
2. You need to set a new setting called azure_region. Set it using this list as a reference. You need the domain part just after the https://. The underlined part of this example is what you need: https://centralus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1

Turn on transcription on the needed voicemail boxes and you should be good to go. Hit me up if you run into any issues.
 

Attachments

  • record_message.lua.zip
    4.8 KB · Views: 22

KonradSC

Active Member
Mar 10, 2017
166
98
28
From my research it seems that the best option right now is going to be IBM's transcription. That will probably require a bigger rewrite though as that API is more of 'dry cleaner' model. You drop off the clothes, get a ticket, and then come back with your ticket to pick them up. Right now the 'send email' portion of the voicemail lua is handled in more real-time. Send Email and Transcription probably need to be moved to a subroutine in the voicemail lua's. That will take buy-in from Mark.
 

bcmike

Active Member
Jun 7, 2018
326
54
28
53
Has anyone tried Mozilla Deep Speech or Kaldi ASR as mentioned in the documentation??
 

bcmike

Active Member
Jun 7, 2018
326
54
28
53
From my research it seems that the best option right now is going to be IBM's transcription. That will probably require a bigger rewrite though as that API is more of 'dry cleaner' model. You drop off the clothes, get a ticket, and then come back with your ticket to pick them up. Right now the 'send email' portion of the voicemail lua is handled in more real-time. Send Email and Transcription probably need to be moved to a subroutine in the voicemail lua's. That will take buy-in from Mark.

Ok I know I'm being a bit of a PITA about this but I'm keen to get it going.

This is the procedure i get from the Watson docs. I tried it manually using curl with my api key and the results came back pretty quick. I just don't know how to modify the lua to process the results

Step 1: Transcribe audio with no options
Call the POST /v1/recognize method to request a basic transcript of a FLAC audio file with no additional request parameters.

  1. Download the sample audio file audio-file.flac .
  2. Issue the following command to call the service's /v1/recognize method for basic transcription with no parameters. The example uses the Content-Type header to indicate the type of the audio, audio/flac. The example uses the default language model, en-US_BroadbandModel, for transcription.
    • Modify {path_to_file} to specify the location of the audio-file.flac file.

  3. Windows users, replace the backslash (\) at the end of each line with a caret (^). Make sure there are no trailing spaces.


curl -X POST -u "apikey:{apikey}" \
--header "Content-Type: audio/flac" \
--data-binary @{path_to_file}audio-file.flac \
"{url}/v1/recognize"



The service returns the following transcription results:





{
"results": [
{
"alternatives": [
{
"confidence": 0.96
"transcript": "several tornadoes touch down as a line of
severe thunderstorms swept through Colorado on Sunday "
}
],
"final": true
}
],
"result_index": 0
}
 

Dan

Member
Jul 23, 2017
69
12
8
34
Has anyone tried Mozilla Deep Speech or Kaldi ASR as mentioned in the documentation??
We use Mozilla DeepSpeech in production with FusionPBX (I wrote the integration & documentation). There is a very active community in #machinelearning on Mozilla IRC that is working on tuning and improvements.
 

bcmike

Active Member
Jun 7, 2018
326
54
28
53
Hi Dan,

Do you have a link to the documentation. There's a couple lines on the official wiki but its pretty vague. My struggle always seems to be getting it into the record_message.lua.

Thanks
 

bcmike

Active Member
Jun 7, 2018
326
54
28
53
Update on Watson. Got message_record.lua to pass the curl command and it receives the transcription, just can't parse it:

2019-07-19 15:07:15.524647 [NOTICE] switch_cpp.cpp:1443 [voicemail] transcribe_provider: custom
2019-07-19 15:07:15.524647 [NOTICE] switch_cpp.cpp:1443 [voicemail] transcribe_language: en-US
2019-07-19 15:07:18.324650 [NOTICE] switch_cpp.cpp:1443 [voicemail] CMD: curl -X POST -u "apikey:XXXXXXXXXXXXXXXXXXXXXX" --header "Content-type: audio/wav" --data-binary @/var/lib/freeswitch/storage/voicemail/default/random.pbx.com/100/msg_fc5f49f3-e2fd-4676-88ac-e27a03ceb578.wav "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize"
2019-07-19 15:07:18.324650 [NOTICE] switch_cpp.cpp:1443 [voicemail] RESULT: {
"results": [
{
"alternatives": [
{
"confidence": 0.99,
"transcript": "test one to test one to test one to test one two "
}
],
"final": true
}
],
"result_index": 0
}
2019-07-19 15:07:18.324650 [NOTICE] switch_cpp.cpp:1443 [voicemail] TRANSCRIPTION: (null)
2019-07-19 15:07:18.324650 [NOTICE] switch_cpp.cpp:1443 [voicemail] CONFIDENCE: (null)
2019-07-19 15:07:18.324650 [ERR] mod_lua.cpp:202 ...pts/app/voicemail/resources/functions/record_message.lua:92: attempt to index global 'transcription' (a nil value)
stack traceback:
...pts/app/voicemail/resources/functions/record_message.lua:92: in function 'transcribe'
...pts/app/voicemail/resources/functions/record_message.lua:279: in function 'record_message'
/usr/share/freeswitch/scripts/app/voicemail/index.lua:417: in main chunk
/usr/share/freeswitch/scripts/app.lua:48: in main chunk
 

Dan

Member
Jul 23, 2017
69
12
8
34
Hi Dan,

Do you have a link to the documentation. There's a couple lines on the official wiki but its pretty vague. My struggle always seems to be getting it into the record_message.lua.

Thanks
You shouldn't need to modify record_message.lua to use the custom API, I wrote the ReadTheDocs entry to be as straightforward as possible. Stand up the Mozilla DeepSpeech Frontend (first link in the Custom API guide), add the variables listed just below that, flush Memcache, reload the XML & Rescan, then enable transcription for an extension and you should be good to go.

The last bit I still need to poke at, as its a bit of a PITA to have every extension/domain default to having transcription disabled. Most FusionPBX users will want transcriptions enabled, seeing as its a fairly trivial cost.
 
Status
Not open for further replies.