Audio Model

text to speech

Endpoint: /audio/speech
Main request parameters:
- model: model used for speech synthesis, supported model list.
- input: text content to be converted into audio.
- voice: reference voice, supports system preset voices, user preset voices, and user dynamic voices.

bash

curl https://BASE_URL/v1/audio/speech \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini-tts",
"input": "The quick brown fox jumped over the lazy dog.",
"voice": "alloy"
  }' \
--output speech.mp3

speech to text

Endpoint: /audio/transcriptions
Content-Type: multipart/form-data
Main request parameters:
- model: model used for speech-to-text, supported model list.
- file: audio file to be converted to text.

bash

curl https://BASE_URL/v1/audio/transcriptions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/file/audio.mp3" \
-F model="gpt-4o-transcribe"

speech to speech

This scene is currently onlyElevenlabsFor model support, please refer to the corresponding documentation.

Things to note

When using, you need toOPENAI_BASE_URLset tohttps://BASE_URL/v1
OPENAI_API_KEYshould be set to your API Key
Most models have been adapted to the OpenAI mapping interface. Some models have not been adapted. Please refer to the model documentation.

OpenAI official documentation

Click to view OpenAI official documentation

Getting Started

Models & Services

Usage Guide

text to speech

speech to text

speech to speech

Things to note

OpenAI official documentation

Getting Started

Models & Services

Usage Guide

​text to speech

​speech to text

​speech to speech

​Things to note

​Related links

OpenAI official documentation

text to speech

speech to text

speech to speech

Things to note

Related links