API Documentation | Pomelo

Convert text to lifelike speech using ElevenLabs' state-of-the-art voice synthesis technology. Our API allows you to generate realistic speech in multiple languages and voices.

Text to Speech

POST/api/v1/elevenlabs

Convert text to speech using a variety of voices and settings.

Request Parameters

Parameter	Type	Required	Description
text	string	Yes	The text to convert to speech
voice_id	string	Yes	ID of the voice to use for synthesis
model_id	string	No	ID of the model to use (default: "eleven_monolingual_v1")
stability	float	No	Stability factor (0-1, default: 0.5)
similarity_boost	float	No	Similarity boost factor (0-1, default: 0.75)
style	float	No	Speaking style emphasis (0-1, default: 0)
output_format	string	No	Output audio format ("mp3" or "wav", default: "mp3")

Example Request

# Python example
import requests
import json

url = "https://api.pomeloapi.example.com/api/v1/elevenlabs"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}
data = {
    "text": "Welcome to our new AI platform. We're excited to have you here!",
    "voice_id": "EXAVITQu4vr4xnSDxMaL",
    "stability": 0.6,
    "similarity_boost": 0.8,
    "output_format": "mp3"
}

response = requests.post(url, headers=headers, data=json.dumps(data))

# Save the audio file
if response.status_code == 200:
    with open("output_speech.mp3", "wb") as f:
        f.write(response.content)

Response

The API returns the audio data directly in the format requested. The response has the appropriate Content-Type header (e.g., "audio/mpeg" for MP3).

Response Headers

Content-Type: audio/mpeg

Content-Length: 58372

Available Voices

GET/api/v1/elevenlabs/voices

Get a list of available voices for text-to-speech synthesis.

Example Request

# Python example
import requests

url = "https://api.pomeloapi.example.com/api/v1/elevenlabs/voices"
headers = {
    "Authorization": "Bearer YOUR_API_KEY"
}

response = requests.get(url, headers=headers)
print(response.json())

Response Format

{
  "voices": [
    {
      "voice_id": "EXAVITQu4vr4xnSDxMaL",
      "name": "Rachel",
      "category": "premade",
      "description": "A professional female voice with a neutral American accent",
      "preview_url": "https://storage.pomeloapi.example.com/audio/voice-samples/rachel.mp3",
      "gender": "female"
    },
    {
      "voice_id": "21m00Tcm4TlvDq8ikWAM",
      "name": "Michael",
      "category": "premade",
      "description": "A deep male voice with an American accent",
      "preview_url": "https://storage.pomeloapi.example.com/audio/voice-samples/michael.mp3",
      "gender": "male"
    },
    {
      "voice_id": "AZnzlk1XvdvUeBnXmlld",
      "name": "Emily",
      "category": "premade",
      "description": "A soft female voice with a British accent",
      "preview_url": "https://storage.pomeloapi.example.com/audio/voice-samples/emily.mp3",
      "gender": "female"
    }
    // Additional voices...
  ]
}

Best Practices

Tips for Better Speech Generation

Use punctuation appropriately to control pacing and intonation
Use SSML tags for more precise control over pronunciation
Keep texts relatively short (under 1000 characters) for consistent output
Adjust stability and similarity_boost parameters to fine-tune voice characteristics
Test different output formats based on your quality vs. file size requirements

SSML Support

Our ElevenLabs integration supports Speech Synthesis Markup Language (SSML) for more precise control over speech synthesis. You can include SSML tags in your text to control aspects like:

Emphasis on specific words
Pauses and breaks
Pronunciation of specific words or acronyms
Speech rate and pitch

SSML Example

<speak>
  Welcome to <emphasis level="strong">Pomelo</emphasis>!
  <break time="1s"/>
  Our platform offers state-of-the-art AI capabilities.
  <prosody rate="slow" pitch="+2st">This text will be spoken slower and with a higher pitch.</prosody>
</speak>

Voice Generation (ElevenLabs)

Text to Speech

Request Parameters

Example Request

Response

Available Voices

Example Request

Response Format

Best Practices

Tips for Better Speech Generation

SSML Support

SSML Example