Speech Language Models in Low-Resource Settings:
Performance, Evaluation, and Bias Analysis

Co-located with LREC 2026 • Full-Day Workshop

About the Workshop

Why This Workshop, & Why Now

Speech-native language models (Speech LLMs) have substantially broadened the scope of spoken language technology, enabling intent understanding, dialogue management, long-form summarization, and expressive text-to-speech. However, these advances have not translated evenly across languages and speaker communities. In low-resource (LR) settings, researchers and developers confront persistent constraints on data availability, annotation quality, and computational budget. These limitations are further compounded by deployment conditions that differ markedly from laboratory benchmarks, such as channel mismatch, dialectal variation, and spontaneous speech phenomena.

Even state-of-the-art multilingual foundation ASR models that look competitive on clean test sets degrade under real-world variability—accents, low-SNR/forensic audio, and channel/microphone mismatch. Whisper shows strong zero-shot multilingual results yet exhibits error spikes on poor-quality or streaming/lecture audio and uneven accuracy across languages/accents; recent audits and robustness benchmarks corroborate these disparities. Ensuring reliable performance across all languages and devices is therefore critical to preserve cultural-linguistic diversity and democratize speech technologies.

There is a pressing need for a forum that consolidates practical methodology for LR modeling, such as transfer learning and cross-lingual adaptation, along with evaluation protocols that reflect real-world use and explicitly characterize uncertainty, robustness, and cost.

What SPEAKABLE Brings

SPEAKABLE focuses on three intertwined strands:

1. Efficient Adaptation: Advancing efficient adaptation of Speech LLMs for LR languages through parameter-efficient methods (e.g., adapters, LoRA, prompt/prefix tuning), multilingual transfer and layer selection, knowledge distillation, and streaming or edge-constrained inference. These methods have been shown to significantly reduce the number of trainable parameters while maintaining or improving performance across low-resource speech tasks.

2. Meaningful Evaluation: Moving beyond word error rate to task-appropriate metrics for ASR, SLU, and speech generation, incorporating calibration and reliability analysis, slice-aware reporting by accent, dialect, channel, and speaking style, and principled comparisons between end-to-end and cascaded pipelines with attention to error propagation.

3. Responsible Practice: Treating bias analysis and responsible practice as routine scientific reporting rather than an optional appendix. This includes transparent documentation of data provenance and consent, disclosure of synthetic data use, and minimal guardrails for privacy and safety in speech I/O.

"Build strong models, measure what matters, and make bias analysis routine for speech in the long tail."