An Azure service that integrates speech processing into apps and services.
Hello @tomoe
Thank you for reaching out to Microsoft Q&A.
Azure Speech-to-Text Phrase Lists are designed to improve recognition accuracy by providing runtime hints for words and phrases that are important to your application. This is particularly useful for proper nouns, product names, company names, technical terminology, acronyms, and other domain-specific vocabulary that may not be recognized accurately by the base model.
What level of granularity is recommended?
Phrase Lists are not limited to individual words. They support both:
- Single words
Multi-word phrases
For example:
Company names
Product names
Brand names
Department names
Frequently used business terms
Short fixed expressions
The Speech service uses these entries as recognition biasing hints, increasing the likelihood that the specified words or phrases will be selected when the audio is ambiguous.
Considerations for Japanese
Your observation is particularly relevant for Japanese because many words share the same pronunciation while having different meanings (homophones).
For example:
会議 (meeting)
懐疑 (skepticism)
公称 (nominal designation)
In these cases, recognition accuracy depends heavily on context.
Instead of registering only a single keyword, it can often be beneficial to register the complete business phrase that users commonly speak.
For example:
Rather than only adding:
製品A
株式会社〇〇
Consider adding:
製品Aの登録
株式会社〇〇サポートセンター
△△サービス
The additional context may help the recognizer select the intended term when multiple interpretations share the same pronunciation.
Can Phrase Lists contain phrases?
Yes, The Azure Speech documentation explicitly describes Phrase Lists as supporting both words and phrases. The feature is intended to boost recognition of specific terms that are expected to appear in the transcript.
However, Phrase Lists are not intended to model grammar, sentence structure, or conversational context.
What should not be added?
Generally, Phrase Lists are not designed for:
Full sentences
Long conversational utterances
Large collections of example text
Language-model style contextual training
The feature works best when used as a targeted vocabulary enhancement mechanism.
Best practice for Japanese Speech-to-Text
A common and effective approach is Add:
Company names
Product names
Person names
Industry terminology
Frequently spoken business phrases
Multi-word expressions where context helps distinguish homophones
Avoid:
- Entire conversations
- Long sentences
- Hundreds of contextual examples intended to teach grammar
Phrase List limitations
Microsoft recommends keeping Phrase Lists focused and manageable.
Key considerations:
- Phrase Lists are applied at runtime to bias recognition.
- They are supported for real-time transcription and Fast Transcription scenarios.
- They are not supported for Batch Transcription.
- A Phrase List should not contain more than 500 phrases.
When Phrase Lists may not be enough
If you find yourself needing to provide extensive contextual examples, sentence patterns, or large numbers of expressions to achieve acceptable accuracy, a Phrase List may no longer be the best tool.
In such cases, consider using Custom Speech, which allows model adaptation using domain-specific training data and can provide better results for specialized vocabularies and language patterns.
Recommended approach
For your scenario, Microsoft would generally recommend:
- Start by adding key proper nouns (company names, product names, etc.).
- Evaluate recognition accuracy.
- Add commonly spoken multi-word business phrases where homophone ambiguity exists.
- Compare results against the baseline.
- If significant contextual adaptation is still required, evaluate Custom Speech rather than continuously expanding the Phrase List.
Please refer this
Improve recognition accuracy with Phrase Lists https://learn.microsoft.com/azure/ai-services/speech-service/improve-accuracy-phrase-list
Speech Containers https://learn.microsoft.com/azure/ai-services/speech-service/speech-container-overview
“Custom Speech phrase list” external training-data prep resource: https://docs.microsoft.com/azure/cognitive-services/speech-service/how-to-custom-speech-test-and-train
For Japanese Speech-to-Text, Phrase Lists are intended to support both individual words and short phrases. When dealing with homophones, adding meaningful business phrases in addition to proper nouns can improve recognition accuracy. However, Phrase Lists should be treated as vocabulary biasing hints rather than a mechanism for teaching the service broader linguistic context or sentence structure. If extensive contextual adaptation is required, Custom Speech is generally the recommended solution.
I Hope this helps. Do let me know if you have any further queries.
If this answers your query, please do click Accept Answer and Yes for was this answer helpful.
Thank you!