Azure Voicemail Transcription

bryanredeagle · May 17, 2019

I've gotten transcription working using the new Cognition API on Microsoft Azure. I plan on doing a proper pull request when I read through the contribution guidelines. In the meantime, I've attached the modified script (use the script editor to edit app/voicemail/resources/functions/record_message.lua).

You'll need to sign up for Cognitive services on Azure. Be sure to note the region where you placed the resource group. You'll need it later.

Edit the default settings like in the original instructions (https://docs.fusionpbx.com/en/latest/applications/voicemail_transcription.html), but with two differences
1. You only need microsoft_key1 (both keys work, but you only need one of them).
2. You need to set a new setting called azure_region. Set it using this list as a reference. You need the domain part just after the https://. The underlined part of this example is what you need: https://centralus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1

Turn on transcription on the needed voicemail boxes and you should be good to go. Hit me up if you run into any issues.

KonradSC · May 17, 2019

Thanks for the update!

Were you able to get around the 10 second max recorded audio limit?

https://docs.microsoft.com/en-us/az...ice/rest-speech-to-text#regions-and-endpoints

Requests that use the REST API can only contain 10 seconds of recorded audio.

bryanredeagle · May 17, 2019

Nope! I didn't even see that bit. I was focused on getting the endpoints working. I will surely investigate that though.

bryanredeagle · May 17, 2019

Looking further, would it be unreasonable to the project to write an external script in Python or Node that uses the SDK?

yukon · May 17, 2019

bryanredeagle said:
Looking further, would it be unreasonable to the project to write an external script in Python or Node that uses the SDK?

Yes, I highly doubt mark would accept that. But you could ask him on IRC

KonradSC · May 17, 2019

Looks like IBM Watson might be a viable alternative. Their synchronous HTTP service can take up to 100 meg files.

https://cloud.ibm.com/docs/services/speech-to-text?topic=speech-to-text-http

The asynchronous interface can take up to 1 gig files. That wouldn't be as realtime as synchronous though. We'd have to write some kind of queue for the voicemail emails and have a callback URL for Watson.

bryanredeagle · May 17, 2019

I'm actually looking into that exact thing. I have a list of services to investigate and integrate (if they're not silly like Azure).

bcmike · Jul 18, 2019

Hi,

Did you guys happen to get this working? I couldn't seem to get it going.

Microsoft gave me a different URL though when I setup the cognitive voice service: https://westus.api.cognitive.microsoft.com/sts/v1.0

I tried both in the script but no luck. Are there any other service that work?

KonradSC · Jul 18, 2019

From my research it seems that the best option right now is going to be IBM's transcription. That will probably require a bigger rewrite though as that API is more of 'dry cleaner' model. You drop off the clothes, get a ticket, and then come back with your ticket to pick them up. Right now the 'send email' portion of the voicemail lua is handled in more real-time. Send Email and Transcription probably need to be moved to a subroutine in the voicemail lua's. That will take buy-in from Mark.

bcmike · Jul 18, 2019

Has anyone tried Mozilla Deep Speech or Kaldi ASR as mentioned in the documentation??

bcmike · Jul 18, 2019

KonradSC said:
From my research it seems that the best option right now is going to be IBM's transcription. That will probably require a bigger rewrite though as that API is more of 'dry cleaner' model. You drop off the clothes, get a ticket, and then come back with your ticket to pick them up. Right now the 'send email' portion of the voicemail lua is handled in more real-time. Send Email and Transcription probably need to be moved to a subroutine in the voicemail lua's. That will take buy-in from Mark.

Ok I know I'm being a bit of a PITA about this but I'm keen to get it going.

This is the procedure i get from the Watson docs. I tried it manually using curl with my api key and the results came back pretty quick. I just don't know how to modify the lua to process the results

Step 1: Transcribe audio with no options
Call the POST /v1/recognize method to request a basic transcript of a FLAC audio file with no additional request parameters.

Download the sample audio file audio-file.flac .

Issue the following command to call the service's /v1/recognize method for basic transcription with no parameters. The example uses the Content-Type header to indicate the type of the audio, audio/flac. The example uses the default language model, en-US_BroadbandModel, for transcription.

Modify {path_to_file} to specify the location of the audio-file.flac file.

Windows users, replace the backslash (\) at the end of each line with a caret (^). Make sure there are no trailing spaces.

curl -X POST -u "apikey:{apikey}" \
--header "Content-Type: audio/flac" \
--data-binary @{path_to_file}audio-file.flac \
"{url}/v1/recognize"

The service returns the following transcription results:

{
"results": [
{
"alternatives": [
{
"confidence": 0.96
"transcript": "several tornadoes touch down as a line of
severe thunderstorms swept through Colorado on Sunday "
}
],
"final": true
}
],
"result_index": 0
}

Dan · Jul 19, 2019

bcmike said:
Has anyone tried Mozilla Deep Speech or Kaldi ASR as mentioned in the documentation??

We use Mozilla DeepSpeech in production with FusionPBX (I wrote the integration & documentation). There is a very active community in #machinelearning on Mozilla IRC that is working on tuning and improvements.

bcmike · Jul 19, 2019

Hi Dan,

Do you have a link to the documentation. There's a couple lines on the official wiki but its pretty vague. My struggle always seems to be getting it into the record_message.lua.

Thanks

bcmike · Jul 19, 2019

Update on Watson. Got message_record.lua to pass the curl command and it receives the transcription, just can't parse it:

2019-07-19 15:07:15.524647 [NOTICE] switch_cpp.cpp:1443 [voicemail] transcribe_provider: custom
2019-07-19 15:07:15.524647 [NOTICE] switch_cpp.cpp:1443 [voicemail] transcribe_language: en-US
2019-07-19 15:07:18.324650 [NOTICE] switch_cpp.cpp:1443 [voicemail] CMD: curl -X POST -u "apikey:XXXXXXXXXXXXXXXXXXXXXX" --header "Content-type: audio/wav" --data-binary @/var/lib/freeswitch/storage/voicemail/default/random.pbx.com/100/msg_fc5f49f3-e2fd-4676-88ac-e27a03ceb578.wav "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize"
2019-07-19 15:07:18.324650 [NOTICE] switch_cpp.cpp:1443 [voicemail] RESULT: {
"results": [
{
"alternatives": [
{
"confidence": 0.99,
"transcript": "test one to test one to test one to test one two "
}
],
"final": true
}
],
"result_index": 0
}
2019-07-19 15:07:18.324650 [NOTICE] switch_cpp.cpp:1443 [voicemail] TRANSCRIPTION: (null)
2019-07-19 15:07:18.324650 [NOTICE] switch_cpp.cpp:1443 [voicemail] CONFIDENCE: (null)
2019-07-19 15:07:18.324650 [ERR] mod_lua.cpp:202 ...pts/app/voicemail/resources/functions/record_message.lua:92: attempt to index global 'transcription' (a nil value)
stack traceback:
...pts/app/voicemail/resources/functions/record_message.lua:92: in function 'transcribe'
...pts/app/voicemail/resources/functions/record_message.lua:279: in function 'record_message'
/usr/share/freeswitch/scripts/app/voicemail/index.lua:417: in main chunk
/usr/share/freeswitch/scripts/app.lua:48: in main chunk

Dan · Jul 19, 2019

bcmike said:
Hi Dan,

Do you have a link to the documentation. There's a couple lines on the official wiki but its pretty vague. My struggle always seems to be getting it into the record_message.lua.

Thanks

You shouldn't need to modify record_message.lua to use the custom API, I wrote the ReadTheDocs entry to be as straightforward as possible. Stand up the Mozilla DeepSpeech Frontend (first link in the Custom API guide), add the variables listed just below that, flush Memcache, reload the XML & Rescan, then enable transcription for an extension and you should be good to go.

The last bit I still need to poke at, as its a bit of a PITA to have every extension/domain default to having transcription disabled. Most FusionPBX users will want transcriptions enabled, seeing as its a fairly trivial cost.

bcmike · Jul 20, 2019

Thanks for the reply. I'll give it a try.

bcmike · Jul 25, 2019

Started a new thread for watson integration: https://www.pbxforums.com/threads/ibm-watson-integration.3397/

Still want to try Mozilla as I'm not sure if Watson will end up being cost prohibitive.

Search

Search

Azure Voicemail Transcription

bryanredeagle

New Member

Attachments

KonradSC

Active Member

bryanredeagle

New Member

bryanredeagle

New Member

yukon

Member

KonradSC

Active Member

bryanredeagle

New Member

bcmike

Active Member

KonradSC

Active Member

bcmike

Active Member

bcmike

Active Member

Dan

Member

bcmike

Active Member

bcmike

Active Member

Dan

Member

bcmike

Active Member

bcmike

Active Member