Skip to main content

Streaming TTS API

This page documents Kotoba's streaming Text-to-Speech (TTS) endpoints.

Authentication

All requests require an API key.

  • HTTP: Authorization: Bearer <KOTOBA_API_KEY>
  • Do not embed your API key in a browser or mobile app bundle.

Errors & error codes

Kotoba APIs return a structured error JSON (see the Errors schema page in the sidebar).

For streaming SSE, fatal errors are also emitted as:

  • event: error

Common error codes you may see for TTS:

  • invalid_api_key (HTTP 401): missing/invalid API key
  • rate_limit_exceeded / too_many_concurrent_requests (HTTP 429): throttled or concurrency limit reached
  • quota_exceeded: insufficient credits
  • invalid_parameters / bad_request (HTTP 400): invalid request
  • text_string_required (HTTP 422): empty/invalid input
  • max_character_length_exceeded: input too long
  • voice_not_found: unknown voice
  • audio_format_not_found: invalid response_format
  • language_not_found: invalid language

Endpoint

POST https://api.kotobatech.ai/v1/tts/sse

Request body (JSON)

The request model in this repo is TTSSSERequest / TTSRequest (server-side).

{
"input": "Hello! This is a streaming TTS test.",
"voice": "ja-man-1",
"response_format": "pcm16",
"language": "en"
}
  • input: text to synthesize
  • voice: preset voice id (default: ja-man-1)
  • response_format: pcm16 | float32 | mulaw
  • language: ja | en

Response (Server-Sent Events)

The response is text/event-stream.

  • event: delta: contains an audio chunk in base64
  • event: done: stream completed
  • event: error: error message

Example SSE payloads:

event: delta
data: {"type":"speech.audio.delta","item_id":"...","delta":"<base64 audio bytes>"}

event: done
data: {"type":"speech.audio.done","item_id":"..."}

Curl example

curl -N "https://api.kotobatech.ai/v1/tts/sse" \
-H "Authorization: Bearer $KOTOBA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "Hello! This is a streaming TTS test.",
"voice": "ja-man-1",
"response_format": "pcm16",
"language": "en"
}'

Realtime TTS over WebSocket (experimental)

Endpoint

wss://api.kotobatech.ai/v1/realtime_tts

Flow

  1. Connect to the WebSocket with API key auth.
  2. Wait for the server event tts_session.created.
  3. Send tts_session.update once to start synthesis.
  4. Receive tts.output_audio.delta events (base64 audio) until tts.output.completed.

Client → Server: tts_session.update

{
"type": "tts_session.update",
"session": {
"input_text": "Hello from realtime TTS.",
"preset_voice": "ja_man_1",
"output_audio_format": "pcm16",
"output_audio_sample_rate": 24000,
"output_audio_number_of_channels": 1,
"language": "en"
}
}

Server → Client events

  • tts_session.created
  • tts_session.updated
  • tts.output_audio.delta (contains delta base64 audio bytes)
  • tts.output.completed