Streaming TTS API
This page documents Kotoba's streaming Text-to-Speech (TTS) endpoints.
Authentication
All requests require an API key.
- HTTP:
Authorization: Bearer <KOTOBA_API_KEY> - Do not embed your API key in a browser or mobile app bundle.
Errors & error codes
Kotoba APIs return a structured error JSON (see the Errors schema page in the sidebar).
For streaming SSE, fatal errors are also emitted as:
event: error
Common error codes you may see for TTS:
invalid_api_key(HTTP 401): missing/invalid API keyrate_limit_exceeded/too_many_concurrent_requests(HTTP 429): throttled or concurrency limit reachedquota_exceeded: insufficient creditsinvalid_parameters/bad_request(HTTP 400): invalid requesttext_string_required(HTTP 422): empty/invalidinputmax_character_length_exceeded:inputtoo longvoice_not_found: unknownvoiceaudio_format_not_found: invalidresponse_formatlanguage_not_found: invalidlanguage
Streaming TTS over SSE (recommended)
Endpoint
POST https://api.kotobatech.ai/v1/tts/sse
Request body (JSON)
The request model in this repo is TTSSSERequest / TTSRequest (server-side).
{
"input": "Hello! This is a streaming TTS test.",
"voice": "ja-man-1",
"response_format": "pcm16",
"language": "en"
}
input: text to synthesizevoice: preset voice id (default:ja-man-1)response_format:pcm16|float32|mulawlanguage:ja|en
Response (Server-Sent Events)
The response is text/event-stream.
event: delta: contains an audio chunk in base64event: done: stream completedevent: error: error message
Example SSE payloads:
event: delta
data: {"type":"speech.audio.delta","item_id":"...","delta":"<base64 audio bytes>"}
event: done
data: {"type":"speech.audio.done","item_id":"..."}
Curl example
curl -N "https://api.kotobatech.ai/v1/tts/sse" \
-H "Authorization: Bearer $KOTOBA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "Hello! This is a streaming TTS test.",
"voice": "ja-man-1",
"response_format": "pcm16",
"language": "en"
}'
Realtime TTS over WebSocket (experimental)
Endpoint
wss://api.kotobatech.ai/v1/realtime_tts
Flow
- Connect to the WebSocket with API key auth.
- Wait for the server event
tts_session.created. - Send
tts_session.updateonce to start synthesis. - Receive
tts.output_audio.deltaevents (base64 audio) untiltts.output.completed.
Client → Server: tts_session.update
{
"type": "tts_session.update",
"session": {
"input_text": "Hello from realtime TTS.",
"preset_voice": "ja_man_1",
"output_audio_format": "pcm16",
"output_audio_sample_rate": 24000,
"output_audio_number_of_channels": 1,
"language": "en"
}
}
Server → Client events
tts_session.createdtts_session.updatedtts.output_audio.delta(containsdeltabase64 audio bytes)tts.output.completed