I am using a disconnected container for Azure Speech.
Please let me know if there is a way to improve the response of the speech-to-text processing.
The current system returns the final results 3 seconds after a 5-second speech segment when performing stereo recognition processing (2 channels) on a single call.
However, in actual operation, multiple calls need to be processed simultaneously, and it can sometimes take 30 seconds or more from speech input to response.
Is there anything that can be improved to enable real-time text conversion?
Upgrading the server specifications and increasing the resources allocated to the containers did not improve performance.
Please let me know if there are any other settings I can adjust.
Thank you.
<sever spec>vcpu=48、memory=96
<container> ※1 container is running on 1 server.
docker run --name "azurestt-container01" \
-itd \
--restart always -p 5000:5000 \
--memory 60g \
--cpus 42 \
-v /home/ca-stt/STT/license:/path/to/license/directory \
-v /home/ca-stt/STT/output:/path/to/output/directory \
-e Speech:Concurrency=100 \
-e DECODER_MAX_COUNT=40 \
-e Eula=accept \
-e Mounts:License=/path/to/license/directory \
-e Mounts:Output=/path/to/output/directory \
-e Logging:Disk:Format=json \
-e Logging:Disk:LogLevel:Default=Information \
mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text:5.2.0-amd64-ja-jp