Skip to main content

text to speech

  • Endpoint: /audio/speech
  • Main request parameters:
    • model: model used for speech synthesis, supported model list.
    • input: text content to be converted into audio.
    • voice: reference voice, supports system preset voices, user preset voices, and user dynamic voices.
bash
curl https://api.ephone.ai/v1/audio/speech \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini-tts",
"input": "The quick brown fox jumped over the lazy dog.",
"voice": "alloy"
  }' \
--output speech.mp3

speech to text

  • Endpoint: /audio/transcriptions
  • Content-Type: multipart/form-data
  • Main request parameters:
    • model: model used for speech-to-text, supported model list.
    • file: audio file to be converted to text.
bash
curl https://api.ephone.ai/v1/audio/transcriptions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/file/audio.mp3" \
-F model="gpt-4o-transcribe"

speech to speech

This scene is currently onlyElevenlabsFor model support, please refer to the corresponding documentation.

Things to note

  1. When using, you need toOPENAI_BASE_URLset tohttps://api.ephone.ai/v1
  2. OPENAI_API_KEYshould be set to your API Key
  3. Most models have been adapted to the OpenAI mapping interface. Some models have not been adapted. Please refer to the model documentation.

OpenAI official documentation

Click to view OpenAI official documentation