Voice Generation (ElevenLabs)
Convert text to lifelike speech using ElevenLabs' state-of-the-art voice synthesis technology. Our API allows you to generate realistic speech in multiple languages and voices.
Text to Speech
POST/api/v1/elevenlabs
Convert text to speech using a variety of voices and settings.
Request Parameters
Parameter | Type | Required | Description |
---|---|---|---|
text | string | Yes | The text to convert to speech |
voice_id | string | Yes | ID of the voice to use for synthesis |
model_id | string | No | ID of the model to use (default: "eleven_monolingual_v1") |
stability | float | No | Stability factor (0-1, default: 0.5) |
similarity_boost | float | No | Similarity boost factor (0-1, default: 0.75) |
style | float | No | Speaking style emphasis (0-1, default: 0) |
output_format | string | No | Output audio format ("mp3" or "wav", default: "mp3") |
Example Request
# Python example import requests import json url = "https://api.pomeloapi.example.com/api/v1/elevenlabs" headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" } data = { "text": "Welcome to our new AI platform. We're excited to have you here!", "voice_id": "EXAVITQu4vr4xnSDxMaL", "stability": 0.6, "similarity_boost": 0.8, "output_format": "mp3" } response = requests.post(url, headers=headers, data=json.dumps(data)) # Save the audio file if response.status_code == 200: with open("output_speech.mp3", "wb") as f: f.write(response.content)
Response
The API returns the audio data directly in the format requested. The response has the appropriate Content-Type header (e.g., "audio/mpeg" for MP3).
Response Headers
Content-Type: audio/mpeg
Content-Length: 58372
Available Voices
GET/api/v1/elevenlabs/voices
Get a list of available voices for text-to-speech synthesis.
Example Request
# Python example import requests url = "https://api.pomeloapi.example.com/api/v1/elevenlabs/voices" headers = { "Authorization": "Bearer YOUR_API_KEY" } response = requests.get(url, headers=headers) print(response.json())
Response Format
{ "voices": [ { "voice_id": "EXAVITQu4vr4xnSDxMaL", "name": "Rachel", "category": "premade", "description": "A professional female voice with a neutral American accent", "preview_url": "https://storage.pomeloapi.example.com/audio/voice-samples/rachel.mp3", "gender": "female" }, { "voice_id": "21m00Tcm4TlvDq8ikWAM", "name": "Michael", "category": "premade", "description": "A deep male voice with an American accent", "preview_url": "https://storage.pomeloapi.example.com/audio/voice-samples/michael.mp3", "gender": "male" }, { "voice_id": "AZnzlk1XvdvUeBnXmlld", "name": "Emily", "category": "premade", "description": "A soft female voice with a British accent", "preview_url": "https://storage.pomeloapi.example.com/audio/voice-samples/emily.mp3", "gender": "female" } // Additional voices... ] }
Best Practices
Tips for Better Speech Generation
- Use punctuation appropriately to control pacing and intonation
- Use SSML tags for more precise control over pronunciation
- Keep texts relatively short (under 1000 characters) for consistent output
- Adjust stability and similarity_boost parameters to fine-tune voice characteristics
- Test different output formats based on your quality vs. file size requirements
SSML Support
Our ElevenLabs integration supports Speech Synthesis Markup Language (SSML) for more precise control over speech synthesis. You can include SSML tags in your text to control aspects like:
- Emphasis on specific words
- Pauses and breaks
- Pronunciation of specific words or acronyms
- Speech rate and pitch
SSML Example
<speak> Welcome to <emphasis level="strong">Pomelo</emphasis>! <break time="1s"/> Our platform offers state-of-the-art AI capabilities. <prosody rate="slow" pitch="+2st">This text will be spoken slower and with a higher pitch.</prosody> </speak>