Skip to main content

text to speech

  • Endpoint: /audio/speech
  • Main request parameters:
    • model: model used for speech synthesis, supported model list.
    • input: text content to be converted into audio.
    • voice: reference voice, supports system preset voices, user preset voices, and user dynamic voices.
bash
curl https://BASE_URL/v1/audio/speech \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini-tts",
"input": "The quick brown fox jumped over the lazy dog.",
"voice": "alloy"
  }' \
--output speech.mp3

speech to text

  • Endpoint: /audio/transcriptions
  • Content-Type: multipart/form-data
  • Main request parameters:
    • model: model used for speech-to-text, supported model list.
    • file: audio file to be converted to text.
bash
curl https://BASE_URL/v1/audio/transcriptions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/file/audio.mp3" \
-F model="gpt-4o-transcribe"

speech to speech

This scene is currently onlyElevenlabsFor model support, please refer to the corresponding documentation.

Things to note

  1. When using, you need toOPENAI_BASE_URLset tohttps://BASE_URL/v1
  2. OPENAI_API_KEYshould be set to your API Key
  3. Most models have been adapted to the OpenAI mapping interface. Some models have not been adapted. Please refer to the model documentation.