Support for SSML in python interface? #50

EllenOrange · 2024-09-10T16:40:21Z

Hi folks,

I'm evaluating Play HT for potential use. While SSML is apparently supported, I can't seem to figure out how to access it using the python interface. Is it simply not implemented yet or is there a flag of some sort that I'm missing?

https://docs.play.ht/reference/api-convert-tts-ssml-standard-premium-voices

Here's the simple test case that leads to my confusion:

# import the playht SDK
from pyht import Client, TTSOptions, Format

import io
import pyaudio
from pydub import AudioSegment

def play_audio_stream(byte_iterator):
    # Combine the bytes from the iterator into a single bytes object
    mp3_data = b"".join(byte_iterator)

    # Load the mp3 data into an AudioSegment
    audio = AudioSegment.from_file(io.BytesIO(mp3_data), format="mp3")

    # Convert the AudioSegment to raw audio data
    raw_data = audio.raw_data
    sample_rate = audio.frame_rate
    num_channels = audio.channels
    sample_width = audio.sample_width

    # Initialize PyAudio
    p = pyaudio.PyAudio()

    # Open a stream
    stream = p.open(format=p.get_format_from_width(sample_width),
                    channels=num_channels,
                    rate=sample_rate,
                    output=True)

    # Play the audio by writing to the stream
    stream.write(raw_data)

    # Stop and close the stream
    stream.stop_stream()
    stream.close()

    # Terminate PyAudio
    p.terminate()


# Initialize PlayHT API with your credentials
client = Client("<id>", "<key>")

# configure your stream
options = TTSOptions(
    # this voice id can be one of our prebuilt voices or your own voice clone id, refer to the`listVoices()` method for a list of supported voices.
    # voice="s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json",
    voice="s3://voice-cloning-zero-shot/a59cb96d-bba8-4e24-81f2-e60b888a0275/charlottenarrativesaad/manifest.json",

    # you can pass any value between 8000 and 48000, 24000 is default
    sample_rate=44_100,
  
    # the generated audio encoding, supports 'raw' | 'mp3' | 'wav' | 'ogg' | 'flac' | 'mulaw'
    format=Format.FORMAT_MP3,

    # playback rate of generated speech
    speed=1,
)

# start streaming!
text = '<speak><p>This is the beginning of a beautiful <break time="1.0s"/> friendship</p></speak>'

# must use turbo voice engine for the best latency
audio_stream = client.tts(text=text, voice_engine="PlayHT2.0-turbo", options=options)

play_audio_stream(iter(audio_stream))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for SSML in python interface? #50

Support for SSML in python interface? #50

EllenOrange commented Sep 10, 2024 •

edited

Loading

Support for SSML in python interface? #50

Support for SSML in python interface? #50

Comments

EllenOrange commented Sep 10, 2024 • edited Loading

EllenOrange commented Sep 10, 2024 •

edited

Loading