Video to Text Extract

Cross-platform command-line tool (Windows, Linux, and Mac) that transcribes audio from video and audio files (.mp4, .mov, .mkv, .mp3, .m4a, .wav) into Markdown files, using Whisper.cpp locally, 100% offline, without sending data to any server.

Prerequisites

Before using vtte, install the following tools on your system:

1. FFmpeg

Mac (Homebrew):

brew install ffmpeg

Linux (Debian/Ubuntu):

sudo apt install ffmpeg

Windows (Winget):

winget install ffmpeg

2. Whisper CLI (whisper.cpp)

Mac (Homebrew):

brew install whisper-cpp

Linux / Windows: Build from source:

git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
cmake -B build && cmake --build build --config Release

Once built, copy the whisper-cli binary (or whisper-cli.exe) to a directory that is in your PATH.

3. Whisper Model

Download the AI model. ggml-base is a good starting point (a balance between speed and accuracy):

# Create a folder for the models
mkdir -p ~/whisper-models

# Download the base model
curl -L https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin \
  -o ~/whisper-models/ggml-base.bin

Other available models (from lightest to most accurate):

ggml-tiny.bin — fastest
ggml-base.bin — recommended
ggml-small.bin
ggml-medium.bin
ggml-large-v3.bin — most accurate, requires more memory

Installing vtte

Requirement: Go 1.22+

go install github.com/isadfrn/vtte@latest

Or build locally:

git clone https://github.com/isadfrn/vtte
cd vtte
go build -o vtte .

Usage

vtte [options] <file_or_directory>

Options

Flag	Default	Description
`-lang`	`auto`	Transcription language (e.g., `pt`, `en`, `es`) or `auto` for automatic detection
`-model`	`ggml-base.bin`	Path to the Whisper model file

The model can also be set via the VTTE_MODEL environment variable.

Examples

Transcribe a single file (language detected automatically):

vtte meeting.mp4
vtte podcast.mp3
vtte interview.wav

Force Portuguese language:

vtte -lang pt meeting.mp4

Use a larger model for higher accuracy:

vtte -model ~/whisper-models/ggml-large-v3.bin -lang pt meeting.mp4

Transcribe an entire folder (mixed formats):

vtte -lang pt ~/recordings/

Set the model via an environment variable:

export VTTE_MODEL=~/whisper-models/ggml-base.bin
vtte folder/with/videos/

Output

For each processed file, vtte generates a .md file in the same folder as the original:

meeting.mp4    →  meeting.md
podcast.mp3    →  podcast.md
interview.wav  →  interview.md

The Markdown file contains the title with the video name followed by the transcribed text, ready to be imported into Google NotebookLM, Claude, or any other AI tool.

How it works

Audio extraction — FFmpeg extracts the audio from the video and converts it to PCM 16kHz mono (the ideal format for Whisper)
Transcription — Whisper CLI processes the audio locally with the chosen model
Markdown — The text is saved as .md with the video name as the title
Cleanup — Temporary .wav and .txt files are removed automatically

Contributing

This repository is using Gitflow Workflow and Conventional Commits, so if you want to contribute:

create a branch from develop branch;
make your contributions;
open a Pull Request to develop branch;
wait for discussion and future approval;

I thank you in advance for any contribution.

Status

Maintaining

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video to Text Extract

Prerequisites

1. FFmpeg

2. Whisper CLI (whisper.cpp)

3. Whisper Model

Installing vtte

Usage

Options

Examples

Output

How it works

Contributing

Status

License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Video to Text Extract

Prerequisites

1. FFmpeg

2. Whisper CLI (whisper.cpp)

3. Whisper Model

Installing vtte

Usage

Options

Examples

Output

How it works

Contributing

Status

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages