Pomelo

Voice Generation (ElevenLabs)

Convert text to lifelike speech using ElevenLabs' state-of-the-art voice synthesis technology. Our API allows you to generate realistic speech in multiple languages and voices.

Text to Speech

POST/api/v1/elevenlabs

Convert text to speech using a variety of voices and settings.

Request Parameters

ParameterTypeRequiredDescription
textstringYesThe text to convert to speech
voice_idstringYesID of the voice to use for synthesis
model_idstringNoID of the model to use (default: "eleven_monolingual_v1")
stabilityfloatNoStability factor (0-1, default: 0.5)
similarity_boostfloatNoSimilarity boost factor (0-1, default: 0.75)
stylefloatNoSpeaking style emphasis (0-1, default: 0)
output_formatstringNoOutput audio format ("mp3" or "wav", default: "mp3")

Example Request

# Python example
import requests
import json

url = "https://api.pomeloapi.example.com/api/v1/elevenlabs"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}
data = {
    "text": "Welcome to our new AI platform. We're excited to have you here!",
    "voice_id": "EXAVITQu4vr4xnSDxMaL",
    "stability": 0.6,
    "similarity_boost": 0.8,
    "output_format": "mp3"
}

response = requests.post(url, headers=headers, data=json.dumps(data))

# Save the audio file
if response.status_code == 200:
    with open("output_speech.mp3", "wb") as f:
        f.write(response.content)

Response

The API returns the audio data directly in the format requested. The response has the appropriate Content-Type header (e.g., "audio/mpeg" for MP3).

Response Headers

Content-Type: audio/mpeg
Content-Length: 58372

Available Voices

GET/api/v1/elevenlabs/voices

Get a list of available voices for text-to-speech synthesis.

Example Request

# Python example
import requests

url = "https://api.pomeloapi.example.com/api/v1/elevenlabs/voices"
headers = {
    "Authorization": "Bearer YOUR_API_KEY"
}

response = requests.get(url, headers=headers)
print(response.json())

Response Format

{
  "voices": [
    {
      "voice_id": "EXAVITQu4vr4xnSDxMaL",
      "name": "Rachel",
      "category": "premade",
      "description": "A professional female voice with a neutral American accent",
      "preview_url": "https://storage.pomeloapi.example.com/audio/voice-samples/rachel.mp3",
      "gender": "female"
    },
    {
      "voice_id": "21m00Tcm4TlvDq8ikWAM",
      "name": "Michael",
      "category": "premade",
      "description": "A deep male voice with an American accent",
      "preview_url": "https://storage.pomeloapi.example.com/audio/voice-samples/michael.mp3",
      "gender": "male"
    },
    {
      "voice_id": "AZnzlk1XvdvUeBnXmlld",
      "name": "Emily",
      "category": "premade",
      "description": "A soft female voice with a British accent",
      "preview_url": "https://storage.pomeloapi.example.com/audio/voice-samples/emily.mp3",
      "gender": "female"
    }
    // Additional voices...
  ]
}

Best Practices

Tips for Better Speech Generation

  • Use punctuation appropriately to control pacing and intonation
  • Use SSML tags for more precise control over pronunciation
  • Keep texts relatively short (under 1000 characters) for consistent output
  • Adjust stability and similarity_boost parameters to fine-tune voice characteristics
  • Test different output formats based on your quality vs. file size requirements

SSML Support

Our ElevenLabs integration supports Speech Synthesis Markup Language (SSML) for more precise control over speech synthesis. You can include SSML tags in your text to control aspects like:

  • Emphasis on specific words
  • Pauses and breaks
  • Pronunciation of specific words or acronyms
  • Speech rate and pitch

SSML Example

<speak>
  Welcome to <emphasis level="strong">Pomelo</emphasis>!
  <break time="1s"/>
  Our platform offers state-of-the-art AI capabilities.
  <prosody rate="slow" pitch="+2st">This text will be spoken slower and with a higher pitch.</prosody>
</speak>