Synthesizer
Статья создана
A set of methods for voice synthesis.
Call | Description |
---|---|
UtteranceSynthesis | Synthesizing text into speech. |
Calls Synthesizer
UtteranceSynthesis
Synthesizing text into speech.
rpc UtteranceSynthesis (UtteranceSynthesisRequest) returns (stream UtteranceSynthesisResponse)
UtteranceSynthesisRequest
Field | Description |
---|---|
model | string The name of the model. Specifies basic synthesis functionality. Currently should be empty. Do not use it |
Utterance | oneof: text or text_template Text to synthesis, one of text synthesis markups. |
text | string Raw text (e.g. "Hello, Alice"). |
text_template | TextTemplate Text template instance, e.g. {"Hello, {username}" with username="Alice"} . |
hints[] | Hints Optional hints for synthesis. |
output_audio_spec | AudioFormatOptions Optional. Default: 22050 Hz, linear 16-bit signed little-endian PCM, with WAV header |
loudness_normalization_type | enum LoudnessNormalizationType Optional. Default: LUFS, type of loudness normalization, default value -19. |
unsafe_mode | bool Optional. Automatically split long text to several utterances and bill accordingly. Some degradation in service quality is possible. |
TextTemplate
Field | Description |
---|---|
text_template | string Template text. Sample: The {animal} goes to the {place}. |
variables[] | TextVariable Defining variables in template text. Sample: {animal: cat, place: forest} |
TextVariable
Field | Description |
---|---|
variable_name | string The name of the variable. |
variable_value | string The text of the variable. |
Hints
Field | Description |
---|---|
Hint | oneof: voice , audio_template , speed , volume or role The hint for TTS engine to specify synthesised audio characteristics. |
voice | string Name of speaker to use. |
audio_template | AudioTemplate Template for synthesizing. |
speed | double hint to change speed |
volume | double hint to regulate volume. For LOUDNESS_NORMALIZATION_TYPE_UNSPECIFIED normalization will use MAX_PEAK, if volume in (0, 1], LUFS if volume in [-145, 0). |
role | string The hint for TTS engine to specify synthesised audio characteristics. |
AudioTemplate
Field | Description |
---|---|
audio | AudioContent Audio file. |
text_template | TextTemplate Template and description of its variables. |
variables[] | AudioVariable Describing variables in audio. |
AudioContent
Field | Description |
---|---|
AudioSource | oneof: content The audio source to read the data from. |
content | bytes Bytes with audio data. |
audio_spec | AudioFormatOptions Description of the audio format. |
AudioFormatOptions
Field | Description |
---|---|
AudioFormat | oneof: raw_audio or container_audio |
raw_audio | RawAudio The audio format specified in request parameters. |
container_audio | ContainerAudio The audio format specified inside the container metadata. |
RawAudio
Field | Description |
---|---|
audio_encoding | enum AudioEncoding Encoding type.
|
sample_rate_hertz | int64 Sampling frequency of the signal. |
ContainerAudio
Field | Description |
---|---|
container_audio_type | enum ContainerAudioType
|
TextTemplate
Field | Description |
---|---|
text_template | string Template text. Sample: The {animal} goes to the {place}. |
variables[] | TextVariable Defining variables in template text. Sample: {animal: cat, place: forest} |
TextVariable
Field | Description |
---|---|
variable_name | string The name of the variable. |
variable_value | string The text of the variable. |
AudioVariable
Field | Description |
---|---|
variable_name | string The name of the variable. |
variable_start_ms | int64 Start time of the variable in milliseconds. |
variable_length_ms | int64 Length of the variable in milliseconds. |
AudioFormatOptions
Field | Description |
---|---|
AudioFormat | oneof: raw_audio or container_audio |
raw_audio | RawAudio The audio format specified in request parameters. |
container_audio | ContainerAudio The audio format specified inside the container metadata. |
RawAudio
Field | Description |
---|---|
audio_encoding | enum AudioEncoding Encoding type.
|
sample_rate_hertz | int64 Sampling frequency of the signal. |
ContainerAudio
Field | Description |
---|---|
container_audio_type | enum ContainerAudioType
|
UtteranceSynthesisResponse
Field | Description |
---|---|
audio_chunk | AudioChunk Part of synthesized audio. |
AudioChunk
Field | Description |
---|---|
data | bytes Sequence of bytes of the synthesized audio in format specified in output_audio_spec. |