Speech-to-Text

The Speech-to-Text feature adds AI-powered voice input to your <x-input> and <x-textarea> components. When enabled, a microphone icon appears on hover, allowing users to dictate text using OpenAI's Whisper API.

Audio is recorded directly in the browser, uploaded to S3 via a presigned URL (bypassing server upload size limits), and transcribed by OpenAI. The temporary audio file is deleted immediately after transcription. Optionally, the raw transcript can be post-processed via a Livewire method before insertion — for example to clean up filler words, restructure the text, or send it through an AI for formatting.

Quick Setup

To enable Speech-to-Text, add these two environment variables to your .env file:

1# .env
2 
3BC_SPEECH_TO_TEXT_ENABLED=true
4OPENAI_API_KEY=sk-your-api-key-here

All Environment Variables

Below is a complete list of all environment variables that control the Speech-to-Text feature. All values shown are the defaults — you only need to set variables you want to override.

1# .env – Full Speech-to-Text Configuration
2 
3BC_SPEECH_TO_TEXT_ENABLED=true
4OPENAI_API_KEY=sk-your-api-key-here
5BC_STT_OPENAI_MODEL=whisper-1
6BC_STT_OPENAI_TIMEOUT=30
7BC_STT_STORAGE_DISK=s3
8BC_STT_STORAGE_FOLDER=tmp/speech-to-text
9BC_STT_DEFAULT_LANGUAGE=de
10BC_STT_MAX_DURATION=900
11BC_STT_RATE_LIMIT=10
12BC_STT_RATE_LIMIT_WINDOW=5

Usage with Input

Add the speech-to-text attribute to any <x-input> component to enable voice input. A microphone icon will appear when the user hovers over the input field.

1<x-input speech-to-text wire:model="title" label="Title" />

Usage with Textarea

The same attribute works on <x-textarea> components.

1<x-textarea speech-to-text wire:model="description" label="Description" />

Component Attributes

The following attributes can be used on <x-input> and <x-textarea> components to customize the Speech-to-Text behavior per component.

Attribute Type Default Description
speech-to-text Boolean Enables the speech-to-text feature on this component. Required.
stt-language String Config default ISO-639-1 language code (e.g. en, de, fr). Overrides the global default for this component.
stt-prompt String Comma-separated prompt words to help Whisper recognize domain-specific terminology.
stt-max-duration Integer 900 Maximum recording duration in seconds. The recording stops automatically after this time.
stt-endpoint String Config route prefix Custom endpoint URL for the transcription API. Overrides the route defined in the config.
stt-process Boolean Enables post-processing of the transcript via a Livewire method before inserting it into the field. Requires the WithSpeechToTextProcessing trait.

Per-Component Language Override

By default, all components use the language defined in config('basic-components.speech-to-text.default-language'). You can override the language per component using the stt-language attribute with an ISO-639-1 code.

Supported examples: de, en, fr, es, it, pt, nl, ja, zh

1<x-input speech-to-text stt-language="en" wire:model="title" label="Title (English)" />
2 
3<x-textarea speech-to-text stt-language="fr" wire:model="description" label="Description (French)" />

Prompt Words

Prompt words help the Whisper model recognize domain-specific terminology (such as brand names, technical terms, or product names) more accurately. You can add them per component using the stt-prompt attribute.

1<x-textarea speech-to-text stt-prompt="Laravel, Livewire, Eloquent, Pest" wire:model="notes" label="Technical Notes" />

Max Recording Duration

Control the maximum recording duration per component using the stt-max-duration attribute (in seconds). The recording will automatically stop after this duration. The global default is 900 seconds (15 minutes).

1<x-input speech-to-text stt-max-duration="60" wire:model="summary" label="Quick Summary (max 60s)" />

Combined Example

You can combine all per-component attributes to fully customize the behavior for a specific field.

1<x-textarea
2 speech-to-text
3 stt-language="en"
4 stt-prompt="MaskowLabs, BasicComponents, Livewire"
5 stt-max-duration="120"
6 wire:model="report"
7 label="Meeting Report"
8 rows="6"
9/>

Full Configuration Reference

The complete configuration array lives in config/basic-components.php under the speech-to-text key. Below is the full reference with all available options and their defaults.

1// config/basic-components.php
2 
3'speech-to-text' => [
4 'enabled' => env('BC_SPEECH_TO_TEXT_ENABLED', false),
5 'openai-api-key' => env('OPENAI_API_KEY'),
6 'openai-model' => env('BC_STT_OPENAI_MODEL', 'whisper-1'),
7 'openai-timeout' => env('BC_STT_OPENAI_TIMEOUT', 30),
8 'storage-disk' => env('BC_STT_STORAGE_DISK', 's3'),
9 'storage-folder' => env('BC_STT_STORAGE_FOLDER', 'tmp/speech-to-text'),
10 'default-language' => env('BC_STT_DEFAULT_LANGUAGE', 'de'),
11 'default-prompt-words' => [
12 // 'YourCompanyName', 'YourProductName',
13 ],
14 'max-recording-duration' => env('BC_STT_MAX_DURATION', 900),
15 'route-prefix' => 'basic-components/speech-to-text',
16 'middleware' => ['web'],
17 'rate-limit' => env('BC_STT_RATE_LIMIT', 10),
18 'rate-limit-window' => env('BC_STT_RATE_LIMIT_WINDOW', 5),
19],

Configuration Options

Detailed explanation of every configuration option:

Key Env Variable Default Description
enabled BC_SPEECH_TO_TEXT_ENABLED false Master switch. When false, the microphone button will not be rendered, even if the attribute is present on a component.
openai-api-key OPENAI_API_KEY null Your OpenAI API key. Required for transcription. Get one at platform.openai.com/api-keys.
openai-model BC_STT_OPENAI_MODEL whisper-1 The OpenAI model used for transcription. Currently, whisper-1 is the only available model.
openai-timeout BC_STT_OPENAI_TIMEOUT 30 Timeout in seconds for the OpenAI API request. Increase this value if you experience timeouts with longer recordings.
storage-disk BC_STT_STORAGE_DISK s3 The storage disk for temporary audio files. Must support temporaryUploadUrl() (S3-compatible). Audio is uploaded from the browser via a presigned URL and deleted immediately after transcription.
storage-folder BC_STT_STORAGE_FOLDER tmp/speech-to-text The folder path on the storage disk where temporary audio files are saved. Created automatically if it doesn't exist.
default-language BC_STT_DEFAULT_LANGUAGE de Default transcription language (ISO-639-1). Can be overridden per-component via stt-language or dynamically via the SpeechToText facade.
default-prompt-words [] Array of words always sent with every transcription request. Helps Whisper recognize specialized terminology (brand names, product names, etc.).
max-recording-duration BC_STT_MAX_DURATION 900 Maximum recording duration in seconds (default: 15 minutes). The recording stops automatically after this time. Can be overridden per-component via stt-max-duration.
route-prefix basic-components/speech-to-text The URL prefix for the STT API endpoints. The full routes will be /{prefix}/presigned-url and /{prefix}/transcribe.
middleware ['web'] Middleware applied to the STT API endpoints. The web middleware provides CSRF protection. Add auth to restrict access to authenticated users.
rate-limit BC_STT_RATE_LIMIT 10 Maximum number of transcription requests per user within the time window. Set to 0 or null to disable rate limiting entirely.
rate-limit-window BC_STT_RATE_LIMIT_WINDOW 5 Rate limit time window in minutes. Together with rate-limit, this defines the throttle (e.g. 10 requests per 5 minutes).

Dynamic Language Resolution

Instead of using a static default language, you can resolve the language dynamically using the SpeechToText facade. This is useful when your application supports multiple languages and you want to match the transcription language to the authenticated user's preference.

Register the resolver in your AppServiceProvider:

1use MaskowLabs\BasicComponents\Facades\SpeechToText;
2 
3class AppServiceProvider extends ServiceProvider
4{
5 public function boot(): void
6 {
7 // Resolve the STT language dynamically (e.g. based on the authenticated user)
8 SpeechToText::resolveLanguageUsing(function () {
9 return auth()->user()?->preferred_language ?? config('app.locale');
10 });
11 }
12}

Dynamic Prompt Words

In addition to the static default-prompt-words in the config, you can add prompt words dynamically using the SpeechToText facade. These words are merged with the static config and any per-component stt-prompt attribute values.

This is useful for injecting context-aware vocabulary, such as company-specific terms loaded from the database.

1use MaskowLabs\BasicComponents\Facades\SpeechToText;
2 
3class AppServiceProvider extends ServiceProvider
4{
5 public function boot(): void
6 {
7 // Add dynamic prompt words to improve transcription accuracy
8 SpeechToText::resolvePromptWordsUsing(function () {
9 return [
10 'MaskowLabs',
11 'BasicComponents',
12 'Laravel',
13 'Livewire',
14 'Eloquent',
15 ];
16 });
17 }
18}

Securing the Endpoint

By default, the STT endpoints only use the web middleware (CSRF protection). If you want to restrict access to authenticated users only, add the auth middleware in the config:

1// config/basic-components.php
2 
3'speech-to-text' => [
4 // ...
5 'middleware' => ['web', 'auth'],
6 // ...
7],

Rate Limiting

Rate limiting is enabled by default and protects your OpenAI API key from excessive usage. Authenticated users are identified by their user ID, unauthenticated users by their IP address.

The default allows 10 requests per 5 minutes. Customize it as needed:

1// config/basic-components.php
2 
3'speech-to-text' => [
4 // ...
5 'rate-limit' => 20, // Allow 20 requests ...
6 'rate-limit-window' => 10, // ... per 10 minutes
7 // ...
8],

Disabling Rate Limiting

If you want to disable rate limiting entirely (e.g. in a trusted internal application), set the rate-limit to 0:

1// config/basic-components.php
2 
3'speech-to-text' => [
4 // ...
5 'rate-limit' => 0, // Disable rate limiting entirely
6 // ...
7],

Custom Endpoint

If you need to use a custom transcription endpoint instead of the built-in one (e.g. you have your own transcription service), you can override the endpoint URL per component using the stt-endpoint attribute:

1<x-input speech-to-text stt-endpoint="/custom/my-transcribe" wire:model="title" label="Title" />

API Routes

When Speech-to-Text is enabled, the package automatically registers the following API routes:

Method Route Description
POST /{prefix}/presigned-url Generates a presigned S3 upload URL for the browser to upload the audio file directly.
POST /{prefix}/transcribe Triggers transcription of the uploaded audio file using OpenAI's Whisper API. Returns the transcribed text.

The {prefix} is defined by the route-prefix config value (default: basic-components/speech-to-text). These routes are only registered when BC_SPEECH_TO_TEXT_ENABLED=true.

Post-Processing with AI (stt-process)

Add the stt-process attribute to send the raw transcript through a Livewire method before it is inserted. This lets you clean up, reformat, or completely transform the dictated text — for example using an AI model.

When stt-process is set, the component calls processSpeechToText(string $transcript, string $model) on your Livewire component after transcription but before the text is inserted. If the call fails or returns an empty result, the raw transcript is used as a fallback.

1<x-textarea
2 wire:model="notes"
3 label="Meeting Notes"
4 speech-to-text
5 stt-process
6 rows="6"
7 placeholder="Dictate your notes — they will be processed before insertion…"
8/>

Implementing the Trait

To handle post-processing, use the WithSpeechToTextProcessing trait in your Livewire component and override the processSpeechToText() method. The trait provides a default implementation that simply returns the transcript unchanged.

1use Livewire\Component;
2use MaskowLabs\BasicComponents\Traits\WithSpeechToTextProcessing;
3 
4class MeetingNotes extends Component
5{
6 use WithSpeechToTextProcessing;
7 
8 public string $notes = '';
9 
10 /**
11 * Process the raw transcript before it is inserted into the field.
12 * The $model parameter tells you which field triggered the transcription.
13 */
14 public function processSpeechToText(string $transcript, string $model): string
15 {
16 // Example: call OpenAI to clean up the transcript
17 $response = OpenAI::chat()->create([
18 'model' => 'gpt-4o-mini',
19 'messages' => [
20 ['role' => 'system', 'content' => 'Clean up the following transcript. Fix grammar, remove filler words, and structure it into clear paragraphs. Return only the cleaned text.'],
21 ['role' => 'user', 'content' => $transcript],
22 ],
23 ]);
24 
25 return $response->choices[0]->message->content;
26 }
27 
28 public function render()
29 {
30 return view('livewire.meeting-notes');
31 }
32}

Per-Field Processing

When your form has multiple STT fields, use the $model parameter to apply different processing logic per field. For example, you could title-case a subject line but use AI formatting for a message body:

1use Livewire\Component;
2use MaskowLabs\BasicComponents\Traits\WithSpeechToTextProcessing;
3 
4class ContactForm extends Component
5{
6 use WithSpeechToTextProcessing;
7 
8 public string $subject = '';
9 public string $message = '';
10 
11 /**
12 * Process the transcript differently depending on which field was dictated.
13 */
14 public function processSpeechToText(string $transcript, string $model): string
15 {
16 return match ($model) {
17 'subject' => Str::limit(Str::title($transcript), 80),
18 'message' => $this->formatWithAI($transcript),
19 default => $transcript,
20 };
21 }
22 
23 private function formatWithAI(string $text): string
24 {
25 // Your AI formatting logic here…
26 return $text;
27 }
28 
29 public function render()
30 {
31 return view('livewire.contact-form');
32 }
33}

Per-Field Processing — Blade

The corresponding Blade view — both fields use stt-process, but the Livewire component decides how to process each one based on the wire:model value:

1<x-input
2 wire:model="subject"
3 label="Subject"
4 speech-to-text
5 stt-process
6 placeholder="Dictate a subject line…"
7/>
8 
9<x-textarea
10 wire:model="message"
11 label="Message"
12 speech-to-text
13 stt-process
14 rows="8"
15 placeholder="Dictate your message…"
16/>

How It Works

Here's a step-by-step overview of the complete Speech-to-Text flow:

  1. The user hovers over an input or textarea with the speech-to-text attribute — a microphone icon appears.
  2. The user clicks the microphone to start recording. The browser's MediaRecorder API captures the audio.
  3. The user clicks again to stop recording (or it stops automatically after max-recording-duration).
  4. The browser requests a presigned S3 upload URL from the server (POST /{prefix}/presigned-url).
  5. The audio file is uploaded directly to S3 from the browser, bypassing server upload size limits.
  6. The browser sends a transcription request to the server (POST /{prefix}/transcribe).
  7. The server downloads the audio from S3, sends it to OpenAI's Whisper API, and returns the transcribed text.
  8. The temporary audio file is deleted from S3 immediately after transcription.
  9. If stt-process is enabled, the transcript is sent to the Livewire component's processSpeechToText() method for post-processing (e.g. AI cleanup, reformatting). During this step, the spinner remains visible with an isPostProcessing state.
  10. The final text (processed or raw) is inserted into the input/textarea field at the current cursor position.