Telegram to HeyGen Avatar Video with Text or Voice

This n8n workflow lets you create a HeyGen avatar video directly from Telegram, using either a voice note or a text message as the input.

It is designed as a simple “message in, talking-avatar video out” system: you send content to a Telegram bot, the workflow sends it to HeyGen using your chosen avatar, waits for the render to finish, and then delivers the finished video back to you in Telegram.

Use cases

This workflow is ideal for creators, coaches, consultants, and marketers who want to generate talking-head videos quickly without recording manually every time.

It’s especially useful if you:

Want to turn voice notes into avatar videos
Need a fast way to create text-to-video spokesperson content
Use Telegram as a lightweight mobile control panel for content creation
Want to reuse a HeyGen avatar for repeatable short-form videos
Need a simple workflow for testing avatar video ideas on demand

Good to know

This workflow requires a HeyGen account and API access.
It uses a specific preconfigured HeyGen avatar and voice ID.
Telegram must be configured to receive both text messages and voice notes.
HeyGen rendering is not instant, so the workflow includes a polling loop that keeps checking the render status until the video is ready.
The output format is vertical 720 × 1280, which is suitable for short-form social content.

How it works

This workflow has two input paths: one for voice notes and one for text messages.

1. Trigger from Telegram

A Telegram Trigger starts the workflow whenever a new message is sent to the bot.
A Set node stores the incoming text value.
An IF node checks whether the message contains text or whether it should be treated as a voice-note flow.

2. Voice-note flow

If the incoming Telegram message is a voice note:

The workflow retrieves the file using Get Voice File
It converts the Telegram file path into a public audio URL
That audio URL is passed into HeyGen Video Generate
HeyGen creates a video where the configured avatar speaks using the uploaded audio

This path is useful when you want to preserve your original pacing and delivery instead of generating speech from text.

3. Text-to-video flow

If the incoming Telegram message contains text:

The workflow extracts the message text
It sends the text directly to HeyGen
HeyGen uses the configured avatar and voice to generate a talking-avatar video from the written script

This is the faster option for turning short scripts or ideas into presenter-style videos.

4. Render tracking

After either generation path starts:

The returned video_id is saved
A Wait node pauses the workflow
The workflow checks render progress using HeyGen video_status.get
An IF node checks whether the video status is completed

If the video is not ready yet:

the workflow sends a Telegram message saying “It’s still rendering”
waits again
and checks the status again

This loop continues until the final video is available.

5. Deliver the finished video

Once the video status is complete:

The final video URL is returned from HeyGen
The workflow sends the video back to the same Telegram user using Send a video

That means Telegram becomes both the input channel and the delivery channel.

How to use

Send either:
- a text message, or
- a voice note
  to your Telegram bot
Wait while HeyGen renders the video
Receive the finished avatar video back in Telegram

For best results:

Keep voice notes clear and free from background noise
Use short, direct text prompts for cleaner avatar delivery
Test with shorter scripts first to speed up rendering

Requirements

A Telegram Bot connected to n8n
A HeyGen account with API access
A configured HeyGen avatar
Proper API credentials for:
- Telegram
- HeyGen HTTP requests

Customising this workflow

You can extend this workflow in several ways:

Let users choose between multiple avatars
Add AI rewriting before text is sent to HeyGen
Generate captions or subtitles automatically
Save finished videos to Google Drive or cloud storage
Post completed videos directly to social platforms
Add error handling for failed or stalled renders
Support different output formats such as square or landscape