Skip to content

Commit 277899e

Browse files
committed
feat(adapter): Adds 11labs voice adapter.
1 parent 87ce443 commit 277899e

12 files changed

Lines changed: 1014 additions & 3659 deletions

File tree

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
## ElevenLabs
2+
ELEVEN_LABS_API_KEY='XXXXX'
3+
4+
## Watson TTS Config
5+
WATSON_TTS_URL='<TTS URL>''
6+
WATSON_TTS_API_KEY='XXXXXXX'
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
FROM node:20
2+
3+
# Create app directory
4+
WORKDIR /usr/src/app
5+
6+
# Install app dependencies
7+
COPY package.json ./
8+
9+
RUN npm install --only=production
10+
11+
# Bundle app source
12+
COPY . .
13+
14+
EXPOSE 8010
15+
16+
CMD [ "npm", "start" ]

speech-adapter-samples/text-to-speech/README.md

Lines changed: 50 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,11 @@ This sample text to speech adapter uses the Watson SDK for Text To Speech found
77
By default IBM Voice Gateway uses the Watson Speech services for Text To Speech synthesis, the purpose of this project is to show how a developer can integrate a third party Text To Speech engine with IBM Voice Gateway. This project uses the Watson SDK for Text To Speech as the example for text synthesis.
88

99
## Requires
10-
- [NodeJS v6 and higher](https://nodejs.org/en/download/)
10+
- [NodeJS v20 and higher](https://nodejs.org/en/download/)
1111
- [IBM Voice Gateway](https://www.ibm.com/support/knowledgecenter/SS4U29/deploydocker.html) Setup
1212

13-
## Setup with Watson Text To Speech
13+
14+
## Setup
1415
1. Clone the Samples Repository
1516
```
1617
git clone https://github.com/WASdev/sample.voice.gateway.git
@@ -20,33 +21,33 @@ By default IBM Voice Gateway uses the Watson Speech services for Text To Speech
2021
```
2122
npm install
2223
```
23-
1. Add in your credentials, under `config/default.json`:
24-
```json
25-
{
26-
"Server": {
27-
"port": 8010
28-
},
29-
"WatsonTextToSpeech": {
30-
"credentials": {
31-
"username": "<username>",
32-
"password": "<password>"
33-
}
34-
}
35-
}
24+
25+
1. (Optional) If working with a remote Voice Gateway you can use [ngrok](https://ngrok.com/) to expose your service:
26+
```
27+
ngrok http 8010
28+
```
29+
30+
1. Copy the `.env.sample` file to `.env`.
31+
32+
## Configure Watson Text To Speech
33+
34+
1. Configure the Voice Gateway to connect to the adapter, by setting the `WATSON_TTS_URL` under the media.relay to point to this sample proxy
35+
```
36+
- WATSON_TTS_URL=http://{hostname}:8010
3637
```
3738
38-
You can also set environment variables, WATSON_TTS_USERNAME and WATSON_TTS_PASSWORD like so:
39-
```bash
40-
WATSON_TTS_USERNAME=<username> WATSON_TTS_PASSWORD=<password> npm start
39+
For example:
40+
```
41+
- WATSON_TTS_URL=https://fcea70235af5.ngrok-free.app
4142
```
4243
43-
1. Run the test cases to validate it's working:
44+
1. Make a call
4445
45-
```bash
46-
npm test
47-
```
46+
## Configure ElevenLabs
47+
48+
1. Set `ELEVENLABS_API_KEY`
4849
49-
1. Connect the Voice Gateway to this proxy, set the `WATSON_TTS_URL` under the media.relay to point to this sample proxy
50+
1. Configure the Voice Gateway to connect to the adapter, by setting the `WATSON_TTS_URL` under the media.relay to point to this sample proxy
5051
```
5152
- WATSON_TTS_URL=http://{hostname}:8010
5253
```
@@ -55,7 +56,7 @@ By default IBM Voice Gateway uses the Watson Speech services for Text To Speech
5556
5657
### Implement your own Text To Speech Engine
5758
58-
Currently, this sample only demonstrates how to use Watson Text To Speech as the Text To Speech engine for the Voice Gateway. You can use the `lib/WatsonTextToSpeechEngine.js` as a guideline on how to implement your own Text To Speech Engine. Essentially, you'll be implementing a [Readable NodeJS Stream](http://nodejs.org/api/stream.html#stream_class_stream_readable). Once you implement your own class, you can modify the `lib/TextToSpeechAdapter.js` to `require` it.
59+
Currently, this sample only demonstrates how to use Watson Text To Speech as the Text To Speech engine for the Voice Gateway. You can use the `lib/services/WatsonTextToSpeechEngine.js` as a guideline on how to implement your own Text To Speech Engine. Essentially, you'll be implementing a [Readable NodeJS Stream](http://nodejs.org/api/stream.html#stream_class_stream_readable). Once you implement your own class, you can modify the `lib/TextToSpeechAdapter.js` to `require` it.
5960
6061
For example,
6162
@@ -142,6 +143,31 @@ By default IBM Voice Gateway uses the Watson Speech services for Text To Speech
142143
```
143144
npm test
144145
```
146+
147+
## IBM Cloud Code Engine Deployment
148+
149+
**TBD (Work iN Progress) **
150+
See [Deploying your app from local source code with the CLI](https://cloud.ibm.com/docs/codeengine?topic=codeengine-app-local-source-code)
151+
152+
Before you begin
153+
154+
1. Set up your [Code Engine CLI](https://cloud.ibm.com/docs/codeengine?topic=codeengine-install-cli) environment.
155+
2. [Create and work with a project.](https://cloud.ibm.com/docs/codeengine?topic=codeengine-manage-project)
156+
157+
Create and work with a project.
158+
159+
The server comes with the Dockerfile for Code Engine deployment. To deploy the server to Code Engine, please follow the steps below:
160+
161+
**Note:** The steps guide how to push with a `.env` file, but
162+
1. Build the docker image
163+
1. Copy the `.env-sample` file to `.env` and fill in the required information
164+
2. Build the image with docker build -t <image-name> . The image name should follow the format of <registry>/<namespace>/<image-name>:<tag>. For example, us.icr.io/testitall_ns/testitall_server:latest.
165+
2. Push the image to the container registry with:
166+
```
167+
docker push <image-name>
168+
```
169+
3. Create a Code Engine project and deploy the image
170+
145171
## License
146172

147173
Licensed under [Apache 2.0 License](https://github.com/WASdev/sample.voice.gateway/blob/master/LICENSE)
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
const Config = require('config');
1+
require('dotenv').config();
22

3-
const PORT = Config.get('Server.port');
3+
const PORT = process.env.PORT || 8010;
44

5-
require('./lib/TextToSpeechAdapter').start({ port: PORT });
5+
require('./lib/ConnectionHandler').start({ port: PORT });

speech-adapter-samples/text-to-speech/config/custom-environment-variables.json

Lines changed: 0 additions & 12 deletions
This file was deleted.

speech-adapter-samples/text-to-speech/config/default.json

Lines changed: 0 additions & 12 deletions
This file was deleted.

speech-adapter-samples/text-to-speech/lib/TextToSpeechAdapter.js renamed to speech-adapter-samples/text-to-speech/lib/ConnectionHandler.js

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -17,14 +17,17 @@ const WebSocketServer = require('ws').Server;
1717

1818
// Change to your own Text to Speech Engine implementation, you can use
1919
// the WatsonTextToSpeechEngine.js for guidance
20-
const TextToSpeechEngine = require('./WatsonTextToSpeechEngine');
20+
const TextToSpeechEngine = require('./services/ElevenLabs');
21+
22+
// Uncomment to enable Watson Text-To-Speech
23+
// const TextToSpeechEngine = require('./services/WatsonTextToSpeechEngine');
24+
2125

2226
const url = require('url');
23-
const Config = require('config');
2427

2528
const DEFAULT_PORT = 8010;
26-
const LOG_LEVEL = Config.get('LogLevel');
27-
const logger = require('pino')({ level: LOG_LEVEL, name: 'TextToSpeechAdapter' });
29+
// const LOG_LEVEL = Config.get('LogLevel');
30+
const logger = require('pino')({ level: 'debug', name: 'TextToSpeechAdapter' });
2831

2932
function handleTextToSpeechConnection(webSocket, incomingMessage) {
3033
logger.debug('connection received');
@@ -35,16 +38,18 @@ function handleTextToSpeechConnection(webSocket, incomingMessage) {
3538

3639
// Get headers
3740
const { headers } = incomingMessage;
38-
logger.trace(headers, 'headers on websocket connection:');
41+
logger.debug(headers, 'headers on websocket connection:');
3942

4043
const sessionID = headers['vgw-session-id'];
4144

4245
logger.debug(`connection with session-id: ${sessionID}`);
4346
let textToSpeechEngine;
44-
webSocket.on('message', (data) => {
47+
let audioStream;
48+
webSocket.on('message', async (data) => {
4549
if (typeof data === 'string') {
4650
try {
4751
const message = JSON.parse(data);
52+
logger.info('message starting');
4853
// Message contains, text and accept
4954
// Combine the start message with query parameters to generate a config
5055
const config = Object.assign(queryParams, message);
@@ -54,24 +59,27 @@ function handleTextToSpeechConnection(webSocket, incomingMessage) {
5459
// NodeJS Stream API
5560
textToSpeechEngine = new TextToSpeechEngine(config);
5661

57-
textToSpeechEngine.on('data', (ttsData) => {
62+
audioStream = await textToSpeechEngine.synthesize();
63+
64+
audioStream.on('data', (ttsData) => {
5865
logger.trace(`data from engine ${ttsData.length}`);
5966
webSocket.send(ttsData);
6067
});
6168

62-
textToSpeechEngine.on('error', (error) => {
69+
audioStream.on('error', (error) => {
6370
logger.error(error, 'TextToSpeechEngine encountered an error: ');
6471
const errorMessage = {
6572
error: error.message,
6673
};
6774
webSocket.send(JSON.stringify(errorMessage));
6875
});
6976

70-
textToSpeechEngine.on('end', (reason = 'No close reason defined') => {
77+
audioStream.on('end', (reason = 'No close reason defined') => {
7178
logger.debug('TextToSpeechEngine closed');
7279
webSocket.close(1000, reason);
7380
});
7481
} catch (e) {
82+
// TODO send Error back
7583
logger.error(e);
7684
webSocket.close(1000, 'Invalid start message');
7785
}
@@ -83,9 +91,6 @@ function handleTextToSpeechConnection(webSocket, incomingMessage) {
8391
// Close event
8492
webSocket.on('close', (code, reason) => {
8593
logger.debug(`onClose, code = ${code}, reason = ${reason}`);
86-
if (textToSpeechEngine) {
87-
textToSpeechEngine.destroy();
88-
}
8994
});
9095
}
9196
let wsServer = null;
@@ -95,6 +100,7 @@ function startServer(options = { port: DEFAULT_PORT }) {
95100
try {
96101
wsServer = new WebSocketServer({ port: options.port });
97102
} catch (e) {
103+
// eslint-disable-next-line no-promise-executor-return
98104
return reject(e);
99105
}
100106

@@ -108,7 +114,6 @@ function startServer(options = { port: DEFAULT_PORT }) {
108114
});
109115

110116
wsServer.on('connection', handleTextToSpeechConnection);
111-
return wsServer;
112117
});
113118
}
114119
module.exports.start = startServer;
@@ -128,4 +133,3 @@ function stopServer() {
128133
});
129134
}
130135
module.exports.stop = stopServer;
131-
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
/**
2+
* (C) Copyright IBM Corporation 2025.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
const { Readable } = require('stream');
18+
19+
const { ElevenLabsClient } = require('elevenlabs');
20+
const TextToSpeechAdapter = require('./TextToSpeechAdapter');
21+
22+
const elevenlabs = new ElevenLabsClient({
23+
apiKey: process.env.ELEVENLABS_API_KEY,
24+
});
25+
26+
class ElevenLabsTextToSpeechEngine extends TextToSpeechAdapter {
27+
constructor(config = {}) {
28+
super();
29+
this.config = config;
30+
}
31+
32+
async synthesize() {
33+
const audioStream = await elevenlabs.generate({
34+
stream: true,
35+
voice_id: this.config.voice_id,
36+
voice: this.config.voice,
37+
text: this.config.text,
38+
model_id: this.config.model_id,
39+
voice_settings: this.config.voice_settings,
40+
// TODO - We need to dynamically pick the output format from the config,
41+
// but for now it's likely going to be mulaw
42+
output_format: 'ulaw_8000',
43+
});
44+
const nodeStream = Readable.fromWeb(audioStream);
45+
46+
return nodeStream;
47+
}
48+
}
49+
module.exports = ElevenLabsTextToSpeechEngine;

speech-adapter-samples/text-to-speech/lib/TextToSpeechEngine.js renamed to speech-adapter-samples/text-to-speech/lib/services/TextToSpeechAdapter.js

Lines changed: 9 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/**
2-
* (C) Copyright IBM Corporation 2018.
2+
* (C) Copyright IBM Corporation 2025.
33
*
44
* Licensed under the Apache License, Version 2.0 (the "License");
55
* you may not use this file except in compliance with the License.
@@ -13,18 +13,15 @@
1313
* See the License for the specific language governing permissions and
1414
* limitations under the License.
1515
*/
16-
const { Readable } = require('stream');
17-
18-
class TextToSpeechEngine extends Readable {
19-
/* eslint-disable class-methods-use-this */
20-
_read() {}
16+
class TextToSpeechAdapter {
17+
constructor(config) {
18+
this.config = config;
19+
}
2120

22-
/**
23-
* Destroys the Text To Speech Engine if a close from the other side occurs
24-
*/
2521
// eslint-disable-next-line class-methods-use-this
26-
destroy() {
27-
throw new Error('not implemented');
22+
async synthesize() {
23+
throw new Error('Not implemented');
2824
}
2925
}
30-
module.exports = TextToSpeechEngine;
26+
27+
module.exports = TextToSpeechAdapter;

0 commit comments

Comments
 (0)