A local AI assistant that runs on your computer. No cloud. No subscription. Your data stays with you.
- What is speak?
- Features
- Quick Start
- Requirements
- Usage
- Configuration
- Architecture
- How speak saves RAM
- Troubleshooting
- Contributing
- License
speak is a local AI assistant that runs entirely on your machine. It is designed for normal computers (4-6GB RAM), not expensive servers. No internet connection needed. No monthly fee. Your conversations never leave your computer.
speak is inspired by antirez's ds4 but built for everyday laptops, not high-end Macs.
It remembers who you are across sessions. It can read your files. It can search the web. And it does all of this using less than 2GB of RAM.
| Feature | What it means for you | Status |
|---|---|---|
| 100% Local | Runs on your laptop. No data sent to anyone. | Yes |
| Persistent Memory | Tell speak something once. It remembers forever. | Yes |
| File Reading | "Read my config.json" - speak shows you the content. | Yes |
| Web Search | "Search for latest news" - speak finds current information. | Yes |
| Low RAM Usage | Uses disk caching. Long conversations don't fill your memory. | Yes |
| Hardware Detection | Auto-configures itself for your computer. | Yes |
| Offline First | Works without internet. Web search is optional. | Yes |
| Streaming Output | Tokens appear as they are generated. | Yes |
| Agent Loop | Multi-step tool use (search, read, then answer). | Yes |
| Disk KV Cache | Conversation state saved to SSD, not RAM. | Yes |
| Resumable Downloads | Interrupted model downloads continue where they stopped. | Yes |
git clone https://github.com/zendrx/speak.git && cd speak && shards install && crystal build src/speak.cr --release -o speak && ./speakStep by step
# Clone the repository
git clone https://github.com/zendrx/speak.git
cd speak
# Install dependencies
shards install
# Build the binary
crystal build src/speak.cr --release -o speak
# Run speak
./speakOn first run, speak will:
- Detect your RAM and CPU
- Create a config file for your hardware
- Download the AI model (2.5GB)
- Start the chat
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 4 GB | 8 GB |
| Storage | 3 GB free | 10 GB free |
| OS | Linux | Ubuntu 22.04+ |
| CPU | Any 64-bit | AVX2 support |
macOS works but is experimental. Windows support coming soon.
Chat Commands
Inside the chat, type these commands:
| Command | Action |
|---|---|
| exit or quit | Save and quit |
| clear | Clear the screen |
| history | Show conversation history |
| save | Save conversation manually |
| memory | Show what speak remembers about you |
| clearmemory | Clear all memories |
| reset | Reset working memory |
Example conversation
$ ./speak
======================================================================
speak - Local AI Assistant
======================================================================
> Hello, who are you?
speak: I am speak, a local AI assistant running on your computer.
I can read files, search the web, and remember things about you.
> My name is Sarah and I love Python
speak: I've remembered that your name is Sarah and you love Python.
> Read my config.json
speak: {
"detected": {
"total_ram_mb": 8192,
"available_ram_mb": 6200
},
"active": {
"context_size": 2048,
"model_quant": "Q4_K_M"
}
}
> Search for Python 3.13 features
speak: Search results for: Python 3.13 features
1. Python 3.13 Release Notes
URL: https://docs.python.org/3.13/whatsnew/3.13.html
Improved error messages, incremental garbage collection...
> What do you know about me?
speak: Your name is Sarah and you love Python.
> exit
Goodbye.
All settings are stored in ./speak/config.json. You can edit this file to customize speak.
Configuration Structure
{
"detected": {
"total_ram_mb": 8192,
"available_ram_mb": 6200,
"os_reserved_ram_mb": 512
},
"active": {
"cpu_cores": 4,
"has_avx2": true,
"free_disk_space_mb": 51200,
"context_size": 2048,
"kv_cache_type": "standard",
"model_quant": "Q4_K_M",
"model_file": "nanbeige-3b-q4_k_m.gguf",
"temperature": 0.7,
"max_tokens": 512,
"use_mmap": true
},
"user_overrides": {
"os_reserved_ram_mb": null,
"context_size": null,
"kv_cache_type": null,
"model_quant": null,
"temperature": null,
"max_tokens": null,
"use_mmap": null
}
}| Setting | What it does | Default |
|---|---|---|
| context_size | How many tokens the AI remembers | 2048 |
| temperature | Creativity (0.0 = strict, 1.5 = creative) | 0.7 |
| max_tokens | Maximum response length | 512 |
| model_quant | Quality vs speed (Q2_K, Q4_K_M, Q6_K) | Q4_K_M |
Make AI more creative
Edit ./speak/config.json:
"user_overrides": {
"temperature": 1.2
}Reduce RAM usage
"user_overrides": {
"context_size": 1024,
"model_quant": "Q2_K"
}Custom System Prompt
Edit src/speak/system_prompt.txt and recompile. The prompt is embedded at build time.
System Flow
User Input
|
v
[Launch] ---> [Agent Loop] ---> [Tool Calls]
| | |
v v v
[Config] [Memory] [Tool Executor]
| | |
v v v
[Model] <---- [Disk Cache] <---- [Web Search]
|
v
Response
speak/
+-- src/
+-- speak.cr Entry point
+-- speak/
+-- system.cr Hardware detection
+-- config.cr JSON configuration
+-- install.cr Model downloader
+-- disk.cr Disk-backed KV cache
+-- tool.cr Tool system
+-- memory.cr Agent memory
+-- launch.cr Chat interface
+-- system_prompt.txt Embedded prompt
+-- lib/ Shards
+-- shard.yml
+-- README.md
RAM Tiers
| Available RAM | Model Quant | Context Size | mmap |
|---|---|---|---|
| 3 GB | Q2_K | 512 | Enabled |
| 3-6 GB | Q4_K_M | 1024 | Enabled |
| 6-12 GB | Q4_K_M | 2048 | Enabled |
| 12 GB | Q6_K | 4096 | Disabled |
speak uses two techniques to keep memory low:
- Memory Mapping (mmap)
The model stays on disk. Only the parts needed are loaded into RAM. This reduces RAM usage from 2.5GB to under 500MB for the model.
- Disk KV Cache
Conversation memory (KV cache) is saved to SSD, not RAM. Each turn extends the cache on disk, not in memory.
Without Disk Cache: RAM usage grows with conversation length (2GB -> 8GB crash)
With Disk Cache: RAM usage stays flat (2GB for 10 turns or 10,000 turns)
Unable to create dir ./speak
Your binary is named speak and conflicts with the data directory. Rename the binary:
mv speak speak_app
./speak_app401 Unauthorized during download
The model repository requires authentication. Run:
./hfd.sh Edge-Quant/Nanbeige4.1-3B-Q4_K_M-GGUF --include *.gguf --local-dir ./speak/modelsThen run ./speak again.
Model loads slowly on HDD
Use the smaller Q2_K model. Edit config.json:
"user_overrides": {
"model_quant": "Q2_K"
}Then delete the old model file in ./speak/models/ and restart speak.
Readline not working
Install the system library:
# Ubuntu/Debian
sudo apt install libreadline-dev
# macOS
brew install readlineUndefined method 'tokenize'
Ensure you are using the correct API: @vocab.tokenize not context.tokenize.
Contributions are welcome. Please see CONTRIBUTING.md for guidelines.
git clone https://github.com/zendrx/speak.git
cd speak
# Make your changes
crystal build src/speak.cr --release -o speak_app
./speak_appMIT License. See LICENSE file for details.
| Project | Role |
|---|---|
| **Crystal | Language** |
| **llama.cpp | Inference engine** |
| **llama.cr | Crystal bindings** |
| **Nanbeige | Model** |
| **ds4 | Disk cache inspiration** |
Built with Crystal