Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for SSML in python interface? #50

Open
EllenOrange opened this issue Sep 10, 2024 · 0 comments
Open

Support for SSML in python interface? #50

EllenOrange opened this issue Sep 10, 2024 · 0 comments

Comments

@EllenOrange
Copy link

EllenOrange commented Sep 10, 2024

Hi folks,

I'm evaluating Play HT for potential use. While SSML is apparently supported, I can't seem to figure out how to access it using the python interface. Is it simply not implemented yet or is there a flag of some sort that I'm missing?

https://docs.play.ht/reference/api-convert-tts-ssml-standard-premium-voices

Here's the simple test case that leads to my confusion:

# import the playht SDK
from pyht import Client, TTSOptions, Format

import io
import pyaudio
from pydub import AudioSegment

def play_audio_stream(byte_iterator):
    # Combine the bytes from the iterator into a single bytes object
    mp3_data = b"".join(byte_iterator)

    # Load the mp3 data into an AudioSegment
    audio = AudioSegment.from_file(io.BytesIO(mp3_data), format="mp3")

    # Convert the AudioSegment to raw audio data
    raw_data = audio.raw_data
    sample_rate = audio.frame_rate
    num_channels = audio.channels
    sample_width = audio.sample_width

    # Initialize PyAudio
    p = pyaudio.PyAudio()

    # Open a stream
    stream = p.open(format=p.get_format_from_width(sample_width),
                    channels=num_channels,
                    rate=sample_rate,
                    output=True)

    # Play the audio by writing to the stream
    stream.write(raw_data)

    # Stop and close the stream
    stream.stop_stream()
    stream.close()

    # Terminate PyAudio
    p.terminate()


# Initialize PlayHT API with your credentials
client = Client("<id>", "<key>")

# configure your stream
options = TTSOptions(
    # this voice id can be one of our prebuilt voices or your own voice clone id, refer to the`listVoices()` method for a list of supported voices.
    # voice="s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json",
    voice="s3://voice-cloning-zero-shot/a59cb96d-bba8-4e24-81f2-e60b888a0275/charlottenarrativesaad/manifest.json",

    # you can pass any value between 8000 and 48000, 24000 is default
    sample_rate=44_100,
  
    # the generated audio encoding, supports 'raw' | 'mp3' | 'wav' | 'ogg' | 'flac' | 'mulaw'
    format=Format.FORMAT_MP3,

    # playback rate of generated speech
    speed=1,
)

# start streaming!
text = '<speak><p>This is the beginning of a beautiful <break time="1.0s"/> friendship</p></speak>'

# must use turbo voice engine for the best latency
audio_stream = client.tts(text=text, voice_engine="PlayHT2.0-turbo", options=options)

play_audio_stream(iter(audio_stream))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant