Speech-to-Text
The Speech-to-Text feature adds AI-powered voice input to your <x-input> and <x-textarea> components. When enabled, a microphone icon appears on hover, allowing users to dictate text using OpenAI's Whisper API.
Audio is recorded directly in the browser, uploaded to S3 via a presigned URL (bypassing server upload size limits), and transcribed by OpenAI. The temporary audio file is deleted immediately after transcription. Optionally, the raw transcript can be post-processed via a Livewire method before insertion — for example to clean up filler words, restructure the text, or send it through an AI for formatting.
Quick Setup
To enable Speech-to-Text, add these two environment variables to your .env file:
1# .env2 3BC_SPEECH_TO_TEXT_ENABLED=true4OPENAI_API_KEY=sk-your-api-key-here
All Environment Variables
Below is a complete list of all environment variables that control the Speech-to-Text feature. All values shown are the defaults — you only need to set variables you want to override.
1# .env – Full Speech-to-Text Configuration 2 3BC_SPEECH_TO_TEXT_ENABLED=true 4OPENAI_API_KEY=sk-your-api-key-here 5BC_STT_OPENAI_MODEL=whisper-1 6BC_STT_OPENAI_TIMEOUT=30 7BC_STT_STORAGE_DISK=s3 8BC_STT_STORAGE_FOLDER=tmp/speech-to-text 9BC_STT_DEFAULT_LANGUAGE=de10BC_STT_MAX_DURATION=90011BC_STT_RATE_LIMIT=1012BC_STT_RATE_LIMIT_WINDOW=5
Usage with Input
Add the speech-to-text attribute to any <x-input> component to enable voice input. A microphone icon will appear when the user hovers over the input field.
1<x-input speech-to-text wire:model="title" label="Title" />
Usage with Textarea
The same attribute works on <x-textarea> components.
1<x-textarea speech-to-text wire:model="description" label="Description" />
Component Attributes
The following attributes can be used on <x-input> and <x-textarea> components to customize the Speech-to-Text behavior per component.
| Attribute | Type | Default | Description |
|---|---|---|---|
| speech-to-text | Boolean | — | Enables the speech-to-text feature on this component. Required. |
| stt-language | String | Config default | ISO-639-1 language code (e.g. en, de, fr). Overrides the global default for this component. |
| stt-prompt | String | — | Comma-separated prompt words to help Whisper recognize domain-specific terminology. |
| stt-max-duration | Integer | 900 | Maximum recording duration in seconds. The recording stops automatically after this time. |
| stt-endpoint | String | Config route prefix | Custom endpoint URL for the transcription API. Overrides the route defined in the config. |
| stt-process | Boolean | — | Enables post-processing of the transcript via a Livewire method before inserting it into the field. Requires the WithSpeechToTextProcessing trait. |
Per-Component Language Override
By default, all components use the language defined in config('basic-components.speech-to-text.default-language'). You can override the language per component using the stt-language attribute with an ISO-639-1 code.
Supported examples: de, en, fr, es, it, pt, nl, ja, zh
1<x-input speech-to-text stt-language="en" wire:model="title" label="Title (English)" />2 3<x-textarea speech-to-text stt-language="fr" wire:model="description" label="Description (French)" />
Prompt Words
Prompt words help the Whisper model recognize domain-specific terminology (such as brand names, technical terms, or product names) more accurately. You can add them per component using the stt-prompt attribute.
1<x-textarea speech-to-text stt-prompt="Laravel, Livewire, Eloquent, Pest" wire:model="notes" label="Technical Notes" />
Max Recording Duration
Control the maximum recording duration per component using the stt-max-duration attribute (in seconds). The recording will automatically stop after this duration. The global default is 900 seconds (15 minutes).
1<x-input speech-to-text stt-max-duration="60" wire:model="summary" label="Quick Summary (max 60s)" />
Combined Example
You can combine all per-component attributes to fully customize the behavior for a specific field.
1<x-textarea2 speech-to-text3 stt-language="en"4 stt-prompt="MaskowLabs, BasicComponents, Livewire"5 stt-max-duration="120"6 wire:model="report"7 label="Meeting Report"8 rows="6"9/>
Full Configuration Reference
The complete configuration array lives in config/basic-components.php under the speech-to-text key. Below is the full reference with all available options and their defaults.
1// config/basic-components.php 2 3'speech-to-text' => [ 4 'enabled' => env('BC_SPEECH_TO_TEXT_ENABLED', false), 5 'openai-api-key' => env('OPENAI_API_KEY'), 6 'openai-model' => env('BC_STT_OPENAI_MODEL', 'whisper-1'), 7 'openai-timeout' => env('BC_STT_OPENAI_TIMEOUT', 30), 8 'storage-disk' => env('BC_STT_STORAGE_DISK', 's3'), 9 'storage-folder' => env('BC_STT_STORAGE_FOLDER', 'tmp/speech-to-text'),10 'default-language' => env('BC_STT_DEFAULT_LANGUAGE', 'de'),11 'default-prompt-words' => [12 // 'YourCompanyName', 'YourProductName',13 ],14 'max-recording-duration' => env('BC_STT_MAX_DURATION', 900),15 'route-prefix' => 'basic-components/speech-to-text',16 'middleware' => ['web'],17 'rate-limit' => env('BC_STT_RATE_LIMIT', 10),18 'rate-limit-window' => env('BC_STT_RATE_LIMIT_WINDOW', 5),19],
Configuration Options
Detailed explanation of every configuration option:
| Key | Env Variable | Default | Description |
|---|---|---|---|
| enabled | BC_SPEECH_TO_TEXT_ENABLED | false | Master switch. When false, the microphone button will not be rendered, even if the attribute is present on a component. |
| openai-api-key | OPENAI_API_KEY | null | Your OpenAI API key. Required for transcription. Get one at platform.openai.com/api-keys. |
| openai-model | BC_STT_OPENAI_MODEL | whisper-1 | The OpenAI model used for transcription. Currently, whisper-1 is the only available model. |
| openai-timeout | BC_STT_OPENAI_TIMEOUT | 30 | Timeout in seconds for the OpenAI API request. Increase this value if you experience timeouts with longer recordings. |
| storage-disk | BC_STT_STORAGE_DISK | s3 | The storage disk for temporary audio files. Must support temporaryUploadUrl() (S3-compatible). Audio is uploaded from the browser via a presigned URL and deleted immediately after transcription. |
| storage-folder | BC_STT_STORAGE_FOLDER | tmp/speech-to-text | The folder path on the storage disk where temporary audio files are saved. Created automatically if it doesn't exist. |
| default-language | BC_STT_DEFAULT_LANGUAGE | de | Default transcription language (ISO-639-1). Can be overridden per-component via stt-language or dynamically via the SpeechToText facade. |
| default-prompt-words | — | [] | Array of words always sent with every transcription request. Helps Whisper recognize specialized terminology (brand names, product names, etc.). |
| max-recording-duration | BC_STT_MAX_DURATION | 900 | Maximum recording duration in seconds (default: 15 minutes). The recording stops automatically after this time. Can be overridden per-component via stt-max-duration. |
| route-prefix | — | basic-components/speech-to-text | The URL prefix for the STT API endpoints. The full routes will be /{prefix}/presigned-url and /{prefix}/transcribe. |
| middleware | — | ['web'] | Middleware applied to the STT API endpoints. The web middleware provides CSRF protection. Add auth to restrict access to authenticated users. |
| rate-limit | BC_STT_RATE_LIMIT | 10 | Maximum number of transcription requests per user within the time window. Set to 0 or null to disable rate limiting entirely. |
| rate-limit-window | BC_STT_RATE_LIMIT_WINDOW | 5 | Rate limit time window in minutes. Together with rate-limit, this defines the throttle (e.g. 10 requests per 5 minutes). |
Dynamic Language Resolution
Instead of using a static default language, you can resolve the language dynamically using the SpeechToText facade. This is useful when your application supports multiple languages and you want to match the transcription language to the authenticated user's preference.
Register the resolver in your AppServiceProvider:
1use MaskowLabs\BasicComponents\Facades\SpeechToText; 2 3class AppServiceProvider extends ServiceProvider 4{ 5 public function boot(): void 6 { 7 // Resolve the STT language dynamically (e.g. based on the authenticated user) 8 SpeechToText::resolveLanguageUsing(function () { 9 return auth()->user()?->preferred_language ?? config('app.locale');10 });11 }12}
Dynamic Prompt Words
In addition to the static default-prompt-words in the config, you can add prompt words dynamically using the SpeechToText facade. These words are merged with the static config and any per-component stt-prompt attribute values.
This is useful for injecting context-aware vocabulary, such as company-specific terms loaded from the database.
1use MaskowLabs\BasicComponents\Facades\SpeechToText; 2 3class AppServiceProvider extends ServiceProvider 4{ 5 public function boot(): void 6 { 7 // Add dynamic prompt words to improve transcription accuracy 8 SpeechToText::resolvePromptWordsUsing(function () { 9 return [10 'MaskowLabs',11 'BasicComponents',12 'Laravel',13 'Livewire',14 'Eloquent',15 ];16 });17 }18}
Securing the Endpoint
By default, the STT endpoints only use the web middleware (CSRF protection). If you want to restrict access to authenticated users only, add the auth middleware in the config:
1// config/basic-components.php2 3'speech-to-text' => [4 // ...5 'middleware' => ['web', 'auth'],6 // ...7],
Rate Limiting
Rate limiting is enabled by default and protects your OpenAI API key from excessive usage. Authenticated users are identified by their user ID, unauthenticated users by their IP address.
The default allows 10 requests per 5 minutes. Customize it as needed:
1// config/basic-components.php2 3'speech-to-text' => [4 // ...5 'rate-limit' => 20, // Allow 20 requests ...6 'rate-limit-window' => 10, // ... per 10 minutes7 // ...8],
Disabling Rate Limiting
If you want to disable rate limiting entirely (e.g. in a trusted internal application), set the rate-limit to 0:
1// config/basic-components.php2 3'speech-to-text' => [4 // ...5 'rate-limit' => 0, // Disable rate limiting entirely6 // ...7],
Custom Endpoint
If you need to use a custom transcription endpoint instead of the built-in one (e.g. you have your own transcription service), you can override the endpoint URL per component using the stt-endpoint attribute:
1<x-input speech-to-text stt-endpoint="/custom/my-transcribe" wire:model="title" label="Title" />
API Routes
When Speech-to-Text is enabled, the package automatically registers the following API routes:
| Method | Route | Description |
|---|---|---|
| POST | /{prefix}/presigned-url | Generates a presigned S3 upload URL for the browser to upload the audio file directly. |
| POST | /{prefix}/transcribe | Triggers transcription of the uploaded audio file using OpenAI's Whisper API. Returns the transcribed text. |
The {prefix} is defined by the route-prefix config value (default: basic-components/speech-to-text). These routes are only registered when BC_SPEECH_TO_TEXT_ENABLED=true.
Post-Processing with AI (stt-process)
Add the stt-process attribute to send the raw transcript through a Livewire method before it is inserted. This lets you clean up, reformat, or completely transform the dictated text — for example using an AI model.
When stt-process is set, the component calls processSpeechToText(string $transcript, string $model) on your Livewire component after transcription but before the text is inserted. If the call fails or returns an empty result, the raw transcript is used as a fallback.
1<x-textarea2 wire:model="notes"3 label="Meeting Notes"4 speech-to-text5 stt-process6 rows="6"7 placeholder="Dictate your notes — they will be processed before insertion…"8/>
Implementing the Trait
To handle post-processing, use the WithSpeechToTextProcessing trait in your Livewire component and override the processSpeechToText() method. The trait provides a default implementation that simply returns the transcript unchanged.
1use Livewire\Component; 2use MaskowLabs\BasicComponents\Traits\WithSpeechToTextProcessing; 3 4class MeetingNotes extends Component 5{ 6 use WithSpeechToTextProcessing; 7 8 public string $notes = ''; 9 10 /**11 * Process the raw transcript before it is inserted into the field.12 * The $model parameter tells you which field triggered the transcription.13 */14 public function processSpeechToText(string $transcript, string $model): string15 {16 // Example: call OpenAI to clean up the transcript17 $response = OpenAI::chat()->create([18 'model' => 'gpt-4o-mini',19 'messages' => [20 ['role' => 'system', 'content' => 'Clean up the following transcript. Fix grammar, remove filler words, and structure it into clear paragraphs. Return only the cleaned text.'],21 ['role' => 'user', 'content' => $transcript],22 ],23 ]);24 25 return $response->choices[0]->message->content;26 }27 28 public function render()29 {30 return view('livewire.meeting-notes');31 }32}
Per-Field Processing
When your form has multiple STT fields, use the $model parameter to apply different processing logic per field. For example, you could title-case a subject line but use AI formatting for a message body:
1use Livewire\Component; 2use MaskowLabs\BasicComponents\Traits\WithSpeechToTextProcessing; 3 4class ContactForm extends Component 5{ 6 use WithSpeechToTextProcessing; 7 8 public string $subject = ''; 9 public string $message = '';10 11 /**12 * Process the transcript differently depending on which field was dictated.13 */14 public function processSpeechToText(string $transcript, string $model): string15 {16 return match ($model) {17 'subject' => Str::limit(Str::title($transcript), 80),18 'message' => $this->formatWithAI($transcript),19 default => $transcript,20 };21 }22 23 private function formatWithAI(string $text): string24 {25 // Your AI formatting logic here…26 return $text;27 }28 29 public function render()30 {31 return view('livewire.contact-form');32 }33}
Per-Field Processing — Blade
The corresponding Blade view — both fields use stt-process, but the Livewire component decides how to process each one based on the wire:model value:
1<x-input 2 wire:model="subject" 3 label="Subject" 4 speech-to-text 5 stt-process 6 placeholder="Dictate a subject line…" 7/> 8 9<x-textarea10 wire:model="message"11 label="Message"12 speech-to-text13 stt-process14 rows="8"15 placeholder="Dictate your message…"16/>
How It Works
Here's a step-by-step overview of the complete Speech-to-Text flow:
- The user hovers over an input or textarea with the speech-to-text attribute — a microphone icon appears.
- The user clicks the microphone to start recording. The browser's MediaRecorder API captures the audio.
- The user clicks again to stop recording (or it stops automatically after max-recording-duration).
- The browser requests a presigned S3 upload URL from the server (POST /{prefix}/presigned-url).
- The audio file is uploaded directly to S3 from the browser, bypassing server upload size limits.
- The browser sends a transcription request to the server (POST /{prefix}/transcribe).
- The server downloads the audio from S3, sends it to OpenAI's Whisper API, and returns the transcribed text.
- The temporary audio file is deleted from S3 immediately after transcription.
- If stt-process is enabled, the transcript is sent to the Livewire component's processSpeechToText() method for post-processing (e.g. AI cleanup, reformatting). During this step, the spinner remains visible with an isPostProcessing state.
- The final text (processed or raw) is inserted into the input/textarea field at the current cursor position.