Maskowlabs – Laravel Packages & Developer Tools

The Speech-to-Text feature adds AI-powered voice input to your <x-input> and <x-textarea> components. When enabled, a microphone icon appears on hover, allowing users to dictate text using OpenAI's Whisper API.

Audio is recorded directly in the browser, uploaded to S3 via a presigned URL (bypassing server upload size limits), and transcribed by OpenAI. The temporary audio file is deleted immediately after transcription. Optionally, the raw transcript can be post-processed via a Livewire method before insertion — for example to clean up filler words, restructure the text, or send it through an AI for formatting.

Requirements

This feature requires an OpenAI API key and an S3-compatible storage disk that supports temporaryUploadUrl(). Make sure both are properly configured before enabling.

Quick Setup

To enable Speech-to-Text, add these two environment variables to your .env file:

1# .env
2 
3BC_SPEECH_TO_TEXT_ENABLED=true
4OPENAI_API_KEY=sk-your-api-key-here

All Environment Variables

Below is a complete list of all environment variables that control the Speech-to-Text feature. All values shown are the defaults — you only need to set variables you want to override.

 1# .env – Full Speech-to-Text Configuration
 2 
 3BC_SPEECH_TO_TEXT_ENABLED=true
 4OPENAI_API_KEY=sk-your-api-key-here
 5BC_STT_OPENAI_MODEL=whisper-1
 6BC_STT_OPENAI_TIMEOUT=30
 7BC_STT_STORAGE_DISK=s3
 8BC_STT_STORAGE_FOLDER=tmp/speech-to-text
 9BC_STT_DEFAULT_LANGUAGE=de
10BC_STT_MAX_DURATION=900
11BC_STT_RATE_LIMIT=10
12BC_STT_RATE_LIMIT_WINDOW=5

Usage with Input

Add the speech-to-text attribute to any <x-input> component to enable voice input. A microphone icon will appear when the user hovers over the input field.

1<x-input speech-to-text wire:model="title" label="Title" />

Usage with Textarea

The same attribute works on <x-textarea> components.

1<x-textarea speech-to-text wire:model="description" label="Description" />

Component Attributes

The following attributes can be used on <x-input> and <x-textarea> components to customize the Speech-to-Text behavior per component.

Attribute	Type	Default	Description
`speech-to-text`	Boolean	—	Enables the speech-to-text feature on this component. Required.
`stt-language`	String	Config default	ISO-639-1 language code (e.g. `en`, `de`, `fr`). Overrides the global default for this component.
`stt-prompt`	String	—	Comma-separated prompt words to help Whisper recognize domain-specific terminology.
`stt-max-duration`	Integer	`900`	Maximum recording duration in seconds. The recording stops automatically after this time.
`stt-endpoint`	String	Config route prefix	Custom endpoint URL for the transcription API. Overrides the route defined in the config.
`stt-process`	Boolean	—	Enables post-processing of the transcript via a Livewire method before inserting it into the field. Requires the `WithSpeechToTextProcessing` trait.

Per-Component Language Override

By default, all components use the language defined in config('basic-components.speech-to-text.default-language'). You can override the language per component using the stt-language attribute with an ISO-639-1 code.

Supported examples: de, en, fr, es, it, pt, nl, ja, zh

1<x-input speech-to-text stt-language="en" wire:model="title" label="Title (English)" />
2 
3<x-textarea speech-to-text stt-language="fr" wire:model="description" label="Description (French)" />

Prompt Words

Prompt words help the Whisper model recognize domain-specific terminology (such as brand names, technical terms, or product names) more accurately. You can add them per component using the stt-prompt attribute.

How prompt words work

Whisper uses prompt words as context for the transcription. If your application uses specialized vocabulary, adding those words as prompts will significantly improve transcription accuracy.

1<x-textarea speech-to-text stt-prompt="Laravel, Livewire, Eloquent, Pest" wire:model="notes" label="Technical Notes" />

Max Recording Duration

Control the maximum recording duration per component using the stt-max-duration attribute (in seconds). The recording will automatically stop after this duration. The global default is 900 seconds (15 minutes).

1<x-input speech-to-text stt-max-duration="60" wire:model="summary" label="Quick Summary (max 60s)" />

Combined Example

You can combine all per-component attributes to fully customize the behavior for a specific field.

1<x-textarea
2    speech-to-text
3    stt-language="en"
4    stt-prompt="MaskowLabs, BasicComponents, Livewire"
5    stt-max-duration="120"
6    wire:model="report"
7    label="Meeting Report"
8    rows="6"
9/>

Full Configuration Reference

The complete configuration array lives in config/basic-components.php under the speech-to-text key. Below is the full reference with all available options and their defaults.

Publishing the config

If you haven't published the config yet, run: php artisan vendor:publish --tag=basic-components:config

 1// config/basic-components.php
 2 
 3'speech-to-text' => [
 4    'enabled'                => env('BC_SPEECH_TO_TEXT_ENABLED', false),
 5    'openai-api-key'         => env('OPENAI_API_KEY'),
 6    'openai-model'           => env('BC_STT_OPENAI_MODEL', 'whisper-1'),
 7    'openai-timeout'         => env('BC_STT_OPENAI_TIMEOUT', 30),
 8    'storage-disk'           => env('BC_STT_STORAGE_DISK', 's3'),
 9    'storage-folder'         => env('BC_STT_STORAGE_FOLDER', 'tmp/speech-to-text'),
10    'default-language'       => env('BC_STT_DEFAULT_LANGUAGE', 'de'),
11    'default-prompt-words'   => [
12        // 'YourCompanyName', 'YourProductName',
13    ],
14    'max-recording-duration' => env('BC_STT_MAX_DURATION', 900),
15    'route-prefix'           => 'basic-components/speech-to-text',
16    'middleware'              => ['web'],
17    'rate-limit'             => env('BC_STT_RATE_LIMIT', 10),
18    'rate-limit-window'      => env('BC_STT_RATE_LIMIT_WINDOW', 5),
19],

Configuration Options

Detailed explanation of every configuration option:

Key	Env Variable	Default	Description
`enabled`	`BC_SPEECH_TO_TEXT_ENABLED`	`false`	Master switch. When `false`, the microphone button will not be rendered, even if the attribute is present on a component.
`openai-api-key`	`OPENAI_API_KEY`	`null`	Your OpenAI API key. Required for transcription. Get one at platform.openai.com/api-keys.
`openai-model`	`BC_STT_OPENAI_MODEL`	`whisper-1`	The OpenAI model used for transcription. Currently, `whisper-1` is the only available model.
`openai-timeout`	`BC_STT_OPENAI_TIMEOUT`	`30`	Timeout in seconds for the OpenAI API request. Increase this value if you experience timeouts with longer recordings.
`storage-disk`	`BC_STT_STORAGE_DISK`	`s3`	The storage disk for temporary audio files. Must support `temporaryUploadUrl()` (S3-compatible). Audio is uploaded from the browser via a presigned URL and deleted immediately after transcription.
`storage-folder`	`BC_STT_STORAGE_FOLDER`	`tmp/speech-to-text`	The folder path on the storage disk where temporary audio files are saved. Created automatically if it doesn't exist.
`default-language`	`BC_STT_DEFAULT_LANGUAGE`	`de`	Default transcription language (ISO-639-1). Can be overridden per-component via `stt-language` or dynamically via the `SpeechToText` facade.
`default-prompt-words`	—	`[]`	Array of words always sent with every transcription request. Helps Whisper recognize specialized terminology (brand names, product names, etc.).
`max-recording-duration`	`BC_STT_MAX_DURATION`	`900`	Maximum recording duration in seconds (default: 15 minutes). The recording stops automatically after this time. Can be overridden per-component via `stt-max-duration`.
`route-prefix`	—	`basic-components/speech-to-text`	The URL prefix for the STT API endpoints. The full routes will be `/{prefix}/presigned-url` and `/{prefix}/transcribe`.
`middleware`	—	`['web']`	Middleware applied to the STT API endpoints. The `web` middleware provides CSRF protection. Add `auth` to restrict access to authenticated users.
`rate-limit`	`BC_STT_RATE_LIMIT`	`10`	Maximum number of transcription requests per user within the time window. Set to `0` or `null` to disable rate limiting entirely.
`rate-limit-window`	`BC_STT_RATE_LIMIT_WINDOW`	`5`	Rate limit time window in minutes. Together with `rate-limit`, this defines the throttle (e.g. 10 requests per 5 minutes).

Dynamic Language Resolution

Instead of using a static default language, you can resolve the language dynamically using the SpeechToText facade. This is useful when your application supports multiple languages and you want to match the transcription language to the authenticated user's preference.

Register the resolver in your AppServiceProvider:

 1use MaskowLabs\BasicComponents\Facades\SpeechToText;
 2 
 3class AppServiceProvider extends ServiceProvider
 4{
 5    public function boot(): void
 6    {
 7        // Resolve the STT language dynamically (e.g. based on the authenticated user)
 8        SpeechToText::resolveLanguageUsing(function () {
 9            return auth()->user()?->preferred_language ?? config('app.locale');
10        });
11    }
12}

Dynamic Prompt Words

In addition to the static default-prompt-words in the config, you can add prompt words dynamically using the SpeechToText facade. These words are merged with the static config and any per-component stt-prompt attribute values.

This is useful for injecting context-aware vocabulary, such as company-specific terms loaded from the database.

 1use MaskowLabs\BasicComponents\Facades\SpeechToText;
 2 
 3class AppServiceProvider extends ServiceProvider
 4{
 5    public function boot(): void
 6    {
 7        // Add dynamic prompt words to improve transcription accuracy
 8        SpeechToText::resolvePromptWordsUsing(function () {
 9            return [
10                'MaskowLabs',
11                'BasicComponents',
12                'Laravel',
13                'Livewire',
14                'Eloquent',
15            ];
16        });
17    }
18}

Securing the Endpoint

By default, the STT endpoints only use the web middleware (CSRF protection). If you want to restrict access to authenticated users only, add the auth middleware in the config:

1// config/basic-components.php
2 
3'speech-to-text' => [
4    // ...
5    'middleware' => ['web', 'auth'],
6    // ...
7],

Rate Limiting

Rate limiting is enabled by default and protects your OpenAI API key from excessive usage. Authenticated users are identified by their user ID, unauthenticated users by their IP address.

The default allows 10 requests per 5 minutes. Customize it as needed:

1// config/basic-components.php
2 
3'speech-to-text' => [
4    // ...
5    'rate-limit'        => 20,   // Allow 20 requests ...
6    'rate-limit-window' => 10,   // ... per 10 minutes
7    // ...
8],

Disabling Rate Limiting

If you want to disable rate limiting entirely (e.g. in a trusted internal application), set the rate-limit to 0:

1// config/basic-components.php
2 
3'speech-to-text' => [
4    // ...
5    'rate-limit' => 0,  // Disable rate limiting entirely
6    // ...
7],

Custom Endpoint

If you need to use a custom transcription endpoint instead of the built-in one (e.g. you have your own transcription service), you can override the endpoint URL per component using the stt-endpoint attribute:

1<x-input speech-to-text stt-endpoint="/custom/my-transcribe" wire:model="title" label="Title" />

API Routes

When Speech-to-Text is enabled, the package automatically registers the following API routes:

Method	Route	Description
`POST`	`/{prefix}/presigned-url`	Generates a presigned S3 upload URL for the browser to upload the audio file directly.
`POST`	`/{prefix}/transcribe`	Triggers transcription of the uploaded audio file using OpenAI's Whisper API. Returns the transcribed text.

The {prefix} is defined by the route-prefix config value (default: basic-components/speech-to-text). These routes are only registered when BC_SPEECH_TO_TEXT_ENABLED=true.

Named Routes

The routes are registered with the following names: basic-components.speech-to-text.presigned-url and basic-components.speech-to-text.transcribe.

Post-Processing with AI (stt-process)

Add the stt-process attribute to send the raw transcript through a Livewire method before it is inserted. This lets you clean up, reformat, or completely transform the dictated text — for example using an AI model.

When stt-process is set, the component calls processSpeechToText(string $transcript, string $model) on your Livewire component after transcription but before the text is inserted. If the call fails or returns an empty result, the raw transcript is used as a fallback.

1<x-textarea
2    wire:model="notes"
3    label="Meeting Notes"
4    speech-to-text
5    stt-process
6    rows="6"
7    placeholder="Dictate your notes — they will be processed before insertion…"
8/>

Implementing the Trait

To handle post-processing, use the WithSpeechToTextProcessing trait in your Livewire component and override the processSpeechToText() method. The trait provides a default implementation that simply returns the transcript unchanged.

Trait method signature

public function processSpeechToText(string $transcript, string $model): string
$transcript — the raw text from OpenAI Whisper.
$model — the wire:model value of the field that triggered the transcription, so you can apply different logic per field.

 1use Livewire\Component;
 2use MaskowLabs\BasicComponents\Traits\WithSpeechToTextProcessing;
 3 
 4class MeetingNotes extends Component
 5{
 6    use WithSpeechToTextProcessing;
 7 
 8    public string $notes = '';
 9 
10    /**
11     * Process the raw transcript before it is inserted into the field.
12     * The $model parameter tells you which field triggered the transcription.
13     */
14    public function processSpeechToText(string $transcript, string $model): string
15    {
16        // Example: call OpenAI to clean up the transcript
17        $response = OpenAI::chat()->create([
18            'model' => 'gpt-4o-mini',
19            'messages' => [
20                ['role' => 'system', 'content' => 'Clean up the following transcript. Fix grammar, remove filler words, and structure it into clear paragraphs. Return only the cleaned text.'],
21                ['role' => 'user', 'content' => $transcript],
22            ],
23        ]);
24 
25        return $response->choices[0]->message->content;
26    }
27 
28    public function render()
29    {
30        return view('livewire.meeting-notes');
31    }
32}

Per-Field Processing

When your form has multiple STT fields, use the $model parameter to apply different processing logic per field. For example, you could title-case a subject line but use AI formatting for a message body:

 1use Livewire\Component;
 2use MaskowLabs\BasicComponents\Traits\WithSpeechToTextProcessing;
 3 
 4class ContactForm extends Component
 5{
 6    use WithSpeechToTextProcessing;
 7 
 8    public string $subject = '';
 9    public string $message = '';
10 
11    /**
12     * Process the transcript differently depending on which field was dictated.
13     */
14    public function processSpeechToText(string $transcript, string $model): string
15    {
16        return match ($model) {
17            'subject' => Str::limit(Str::title($transcript), 80),
18            'message' => $this->formatWithAI($transcript),
19            default   => $transcript,
20        };
21    }
22 
23    private function formatWithAI(string $text): string
24    {
25        // Your AI formatting logic here…
26        return $text;
27    }
28 
29    public function render()
30    {
31        return view('livewire.contact-form');
32    }
33}

Per-Field Processing — Blade

The corresponding Blade view — both fields use stt-process, but the Livewire component decides how to process each one based on the wire:model value:

 1<x-input
 2    wire:model="subject"
 3    label="Subject"
 4    speech-to-text
 5    stt-process
 6    placeholder="Dictate a subject line…"
 7/>
 8 
 9<x-textarea
10    wire:model="message"
11    label="Message"
12    speech-to-text
13    stt-process
14    rows="8"
15    placeholder="Dictate your message…"
16/>

How It Works

Here's a step-by-step overview of the complete Speech-to-Text flow:

The user hovers over an input or textarea with the speech-to-text attribute — a microphone icon appears.
The user clicks the microphone to start recording. The browser's MediaRecorder API captures the audio.
The user clicks again to stop recording (or it stops automatically after max-recording-duration).
The browser requests a presigned S3 upload URL from the server (POST /{prefix}/presigned-url).
The audio file is uploaded directly to S3 from the browser, bypassing server upload size limits.
The browser sends a transcription request to the server (POST /{prefix}/transcribe).
The server downloads the audio from S3, sends it to OpenAI's Whisper API, and returns the transcribed text.
The temporary audio file is deleted from S3 immediately after transcription.
If stt-process is enabled, the transcript is sent to the Livewire component's processSpeechToText() method for post-processing (e.g. AI cleanup, reformatting). During this step, the spinner remains visible with an isPostProcessing state.
The final text (processed or raw) is inserted into the input/textarea field at the current cursor position.

No server upload limits

Because the audio is uploaded directly from the browser to S3 via a presigned URL, you don't need to worry about PHP's upload_max_filesize or post_max_size limits.