Optimal Length of Phrase Lists for Azure Speech AI STT (Japanese)

tomoe 110 Reputation points
2026-06-25T06:44:31.3233333+00:00

I am using a disconnected container for Azure Speech.

I added a few words to the phrase list to improve speech recognition. For example, “company name” and “product name.”

While this helps improve accuracy for proper nouns, Japanese has words that sound the same but have different meanings, so it’s necessary to choose the right word based on context.

Since it’s called a “phrase list,” I thought it might be designed not just for registering individual words, but also for phrases and set expressions to some extent. However, on the “Phrase List” explanation page, I only saw examples of registering individual words.

I’d like to know what level of granularity is expected for the content added to the phrase list in Japanese, as intended by Speech to Text.

Thank you.

Azure Speech in Foundry Tools
0 comments No comments

Answer accepted by question author

SRILAKSHMI C 19,550 Reputation points Microsoft External Staff Moderator
2026-06-25T11:36:43.9533333+00:00

Hello @tomoe

Thank you for reaching out to Microsoft Q&A.

Azure Speech-to-Text Phrase Lists are designed to improve recognition accuracy by providing runtime hints for words and phrases that are important to your application. This is particularly useful for proper nouns, product names, company names, technical terminology, acronyms, and other domain-specific vocabulary that may not be recognized accurately by the base model.

What level of granularity is recommended?

Phrase Lists are not limited to individual words. They support both:

  • Single words

Multi-word phrases

For example:

Company names

Product names

Brand names

Department names

Frequently used business terms

Short fixed expressions

The Speech service uses these entries as recognition biasing hints, increasing the likelihood that the specified words or phrases will be selected when the audio is ambiguous.

Considerations for Japanese

Your observation is particularly relevant for Japanese because many words share the same pronunciation while having different meanings (homophones).

For example:

会議 (meeting)

懐疑 (skepticism)

公称 (nominal designation)

In these cases, recognition accuracy depends heavily on context.

Instead of registering only a single keyword, it can often be beneficial to register the complete business phrase that users commonly speak.

For example:

Rather than only adding:

製品A

株式会社〇〇

Consider adding:

製品Aの登録

株式会社〇〇サポートセンター

△△サービス

The additional context may help the recognizer select the intended term when multiple interpretations share the same pronunciation.

Can Phrase Lists contain phrases?

Yes, The Azure Speech documentation explicitly describes Phrase Lists as supporting both words and phrases. The feature is intended to boost recognition of specific terms that are expected to appear in the transcript.

However, Phrase Lists are not intended to model grammar, sentence structure, or conversational context.

What should not be added?

Generally, Phrase Lists are not designed for:

Full sentences

Long conversational utterances

Large collections of example text

Language-model style contextual training

The feature works best when used as a targeted vocabulary enhancement mechanism.

Best practice for Japanese Speech-to-Text

A common and effective approach is Add:

Company names

Product names

Person names

Industry terminology

Frequently spoken business phrases

Multi-word expressions where context helps distinguish homophones

Avoid:

  • Entire conversations
  • Long sentences
  • Hundreds of contextual examples intended to teach grammar

Phrase List limitations

Microsoft recommends keeping Phrase Lists focused and manageable.

Key considerations:

  • Phrase Lists are applied at runtime to bias recognition.
  • They are supported for real-time transcription and Fast Transcription scenarios.
  • They are not supported for Batch Transcription.
  • A Phrase List should not contain more than 500 phrases.

When Phrase Lists may not be enough

If you find yourself needing to provide extensive contextual examples, sentence patterns, or large numbers of expressions to achieve acceptable accuracy, a Phrase List may no longer be the best tool.

In such cases, consider using Custom Speech, which allows model adaptation using domain-specific training data and can provide better results for specialized vocabularies and language patterns.

Recommended approach

For your scenario, Microsoft would generally recommend:

  1. Start by adding key proper nouns (company names, product names, etc.).
  2. Evaluate recognition accuracy.
  3. Add commonly spoken multi-word business phrases where homophone ambiguity exists.
  4. Compare results against the baseline.
  5. If significant contextual adaptation is still required, evaluate Custom Speech rather than continuously expanding the Phrase List.

Please refer this

Improve recognition accuracy with Phrase Lists https://learn.microsoft.com/azure/ai-services/speech-service/improve-accuracy-phrase-list

Speech Containers https://learn.microsoft.com/azure/ai-services/speech-service/speech-container-overview

“Custom Speech phrase list” external training-data prep resource: https://docs.microsoft.com/azure/cognitive-services/speech-service/how-to-custom-speech-test-and-train

For Japanese Speech-to-Text, Phrase Lists are intended to support both individual words and short phrases. When dealing with homophones, adding meaningful business phrases in addition to proper nouns can improve recognition accuracy. However, Phrase Lists should be treated as vocabulary biasing hints rather than a mechanism for teaching the service broader linguistic context or sentence structure. If extensive contextual adaptation is required, Custom Speech is generally the recommended solution.

I Hope this helps. Do let me know if you have any further queries.


If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

Thank you!

Was this answer helpful?

2 people found this answer helpful.

Answer accepted by question author

Alex Burlachenko 23,170 Reputation points MVP Volunteer Moderator
2026-06-25T08:43:29.8633333+00:00

hi tomoe, thx for sharing urs issue here at Q&A portal,

phrase lists can include single words and short multi-word phrases. For Japanese, short phrases are useful for company names, product names, department names, or fixed domain terms.

But it’s not meant to work like a prompt or a full context dictionary. It only biases recognition toward terms that are more likely. It doesn’t force Speech to always return that phrase. For homophones, add the exact written term u want in the transcript. If needed, add a short natural phrase around it too. I wouldn’t add full sentences. Long phrases can over-bias the recognizer, so it may start choosing words from the phrase list even when the speaker said something else.

Start w/ proper nouns and repeated fixed terms. If one word isn’t enough to resolve a homophone, try a short phrase, maybe 2–5 words. Test it w/ real recordings, not only clean test audio. Keep the list focused, bc adding everything usually makes results worse, not better.

If phrase lists still don’t solve the domain vocabulary issue, Custom Speech is the better option. Phrase list is lightweight biasing. Custom Speech is for more serious domain adaptation.

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/improve-accuracy-phrase-list

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/custom-speech-overview

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-container-howto

So yeah, start w/ individual words, use short meaningful phrases only when needed, don’t load it w/ full conversational sentences.

rgds,

Alex

&

If my answer was helpful pls mark it and additional thx if u follow me at Q&A portal

Was this answer helpful?

2 people found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.