OpenClaw In The Real World
From Toy to Tool
You installed OpenClaw, connected it to Telegram, and felt the magic: an AI agent that remembers you, runs tasks while you sleep, and feels almost alive. You’re in the “Mac Mini in a closet” phase - it works, it’s exciting, but it’s fragile.
Then you hit the wall.
Your agent can’t find a decision you made two months ago. You’re afraid to restart it because you might lose context. You have three agents and they keep interfering with each other. You spent an hour re-explaining preferences the agent should already know. You lost configuration when your laptop died.
Three inflection points signal you need production patterns:
1. Memory is breaking. Daily logs pile up faster than you can navigate them. Semantic search times out. The agent says “I don’t have that information” about something you definitely told it.
2. You’re losing work. You made tweaks to AGENTS.md that worked perfectly, then restarted the agent and couldn’t remember what you changed. Or your machine crashed and you lost a week of configuration.
3. Reliability matters more than experimentation. You’ve moved from “cool demo” to “actually depending on this.” When it breaks, your day gets harder. You need a system, not a toy.
This article shows you production patterns from running OpenClaw agents at scale. We’ll cover memory architecture that doesn’t collapse, version control that enables disaster recovery, when to use scripts instead of prompts, how to organize multi-agent systems, and a real-world case study of DevBot—a parent agent managing five specialized sub-agents for software development.
This isn’t a step-by-step tutorial. It’s a field guide to the patterns you’ll need when you graduate from experimentation to production. Use what you need, when you need it.
At this point, if you think these are real problems you face, you should probably ask your OpenClaw Agent to read this article and implement the patterns.
Part 1: Memory Architecture That Scales
The Memory Problem Nobody Talks About
Your agent writes a daily log. Every conversation, every task, every decision gets appended to memory/2026-03-02.md. Tomorrow it writes 2026-03-03.md. The day after, `2026-03-04.md`.
This works beautifully for the first month. Your agent can search recent memory instantly. Then month two hits. You have 60 files. Month three: 90 files. Six months in, you have 180+ daily logs, and ‘memory_search’ is bloated or timing out trying to scan them all.
The agent tells you, “I don’t have information about that budget decision” when you *know* you discussed it just a couple of weeks ago. The information exists—it’s buried in `memory/2026-01-15.md` somewhere—but the agent can’t find it anymore.
The root problem: OpenClaw’s default memory model assumes all history is equally important and equally accessible. It’s not. Ninety percent of what your agent logs is transactional noise: “I sent an email,” “I checked the calendar,” “I ran a search.” Five percent is operational context: what happened and why. The final five percent -decisions, commitments, relationships, pattern changes - is institutional knowledge that should be available forever.
Without intervention, they all get equal weight. Your memory folder becomes an archaeological dig site.
QMD: Quality Memory Digest
The solution is weekly memory compaction. Think of it like database maintenance or filesystem defragmentation, but for your agent’s long-term memory.
QMD (Quality Memory Digest) is a scheduled process that runs once a week and converts seven days of raw operational logs into one curated summary. It scans for:
- Decisions made: “Decided to prioritize security patches over feature work”
- Commitments created: “Promised client we’d deliver API v2 by March 15”
- New contacts: “Met Jack (jack@domain.com), Dev Director”
- Pattern changes: “Email response latency from vendor X increased from 2 hours to 8 hours”
- Blockers and resolutions: “Budget work blocked waiting on Q2 orders. Resolved March 5 when orders finalized.”
It drops:
- Routine operations: “Checked email at 3:15 PM, no urgent items”
- Ephemeral context: “User mentioned needing butter, added to grocery list”
- Repeated information: Same status update sent three days in a row
File structure:
memory/
archive/
2026-01-01-to-07.qmd.md # Week 1 digest (5-8 KB)
2026-01-08-to-14.qmd.md # Week 2 digest
2026-01-15-to-21.qmd.md # Week 3 digest
...
2026-03-15.md # Current week (active)
2026-03-16.md
2026-03-17.md
MEMORY.md # Curated long-term memoryAfter six months, instead of 180 daily logs (900+ KB), you have 26 QMD digests (130-200 KB) plus the current week. Semantic search becomes fast again. The agent can find “that budget decision from January” in under a second because it’s in `2026-01-08-to-14.qmd.md` with clear section headers.
Implementation:
A cron job runs Sunday night at 2 AM:
#!/bin/bash
# qmd-compress.sh
WEEK_START=$(date -d “last monday” +%Y-%m-%d)
WEEK_END=$(date -d “sunday” +%Y-%m-%d)
AGENT_DIR=~/.openclaw/agents/work-agent
# Call agent with QMD prompt
openclaw --agent work-agent --prompt “$(cat <<EOF
Run your weekly QMD (Quality Memory Digest) process:
1. Load all daily logs from ${WEEK_START} to ${WEEK_END}
2. Extract significant events: decisions, commitments, new contacts, pattern changes, blockers
3. Drop routine operations and ephemeral context
4. Generate structured digest with clear sections
5. Save to memory/archive/${WEEK_START}-to-${WEEK_END}.qmd.md
6. Archive original daily logs to memory/archive/daily/${WEEK_START}/
Output only the digest filename when complete.
EOF
)”Add to crontab: 0 2 * * 0 ~/bin/qmd-compress.shYour agent now maintains a sliding window of active memory (current week) plus compressed historical digests.
Dream Routines: The Overnight Shift
QMD handles weekly compaction. Dream routines handle nightly consolidation and proactive preparation.
Named deliberately—just as human sleep involves memory consolidation and pattern recognition—dream routines run during off-peak hours (typically 2-4 AM) and perform cognitive maintenance:
Memory consolidation: Scan today’s log, identify significant events, promote important items to MEMORY.md. Not everything deserves to be in the permanent curated memory, but the big stuff does.
Pattern detection: Analyze communication response times (who’s responsive, who’s dropped off), task velocity (what gets done quickly, what stalls), recurring issues (same problem appearing multiple times).
Proactive preparation: Surface upcoming deadlines in the next 3-7 days, identify tasks waiting on responses with no progress, flag blocked items where the blocker might now be resolved.
Dream routines run via cron, not via HEARTBEAT.md. Why? Because cron guarantees execution. If your agent is busy processing a long task at 3 AM, a heartbeat-based dream routine might get skipped. Cron doesn’t skip.
# crontab
0 3 * * * /usr/local/bin/openclaw dream-routine --agent work-agentThe dream routine script calls the agent with a special prompt, logs the output to ‘memory/dream-YYYY-MM-DD.md’, and optionally sends a summary if significant findings warrant human attention.
Dream routines vs. heartbeats:
Transaction Memory vs. Operational Memory
Not all memory is created equal. Transaction memory is what happened: emails sent, files edited, commands run, API calls made. It’s high-volume and low long-term value. Operational memory is why it happened and what you learned. It’s low-volume and high long-term value.
OpenClaw’s default behavior treats them the same—everything goes into daily logs. The result: transaction noise drowns out operational signal.
Separation strategy:
Transaction memory (high volume, archive aggressively):
- Communication logs: ‘communications/2026-03.jsonl’ (one JSON object per email/message)
- Command execution logs: ‘scripts/execution-log-2026-03.jsonl’
- Tool usage logs: ‘tools/usage-2026-03.jsonl’
- Retention: Keep 30 days active, compress and archive monthly, delete after 1 year (unless compliance requires longer)
Operational memory (low volume, keep forever):
- Daily logs: ‘memory/2026-03-15.md’ (narrative of what happened and why)- QMD digests: ‘memory/archive/2026-03-01-to-07.qmd.md’ (weekly summaries)
- Curated long-term: ‘MEMORY.md’ (decisions, relationships, policies, critical context)
- Retention**: Current week active, compress weekly via QMD, keep digests forever
Archiving implementation:
#!/bin/bash
# archive-transaction-logs.sh
# Archive communication logs older than 30 days
find ~/.openclaw/agents/*/communications/ -name “*.jsonl” -mtime +30 \
-exec gzip {} \; \
-exec mv {}.gz ~/.openclaw/agents/*/communications/archive/ \;
# Archive daily logs older than current week (handled by QMD)
# Delete transaction logs older than 1 year
find ~/.openclaw/agents/*/communications/archive/ -name “*.jsonl.gz” -mtime +365 -deleteRun monthly via cron: `0 4 1 * * ~/bin/archive-transaction-logs.sh`Semantic Search at Scale
With QMD digests and transaction log archiving, your active memory footprint stays manageable. But six months in, you still have 26 weekly digests to search. A year in: 52 digests. Two years: 104 digests.
Semantic search (`memory_search` tool) works by generating vector embeddings for each memory file and comparing them to your query. This is fast for 50 files, acceptable for 100 files, and starts to degrade beyond that.
The solution: pre-process memory into a searchable index.
Instead of the agent generating embeddings on every search, run a nightly job that:
1. Scans all QMD digests and MEMORY.md
2. Chunks them into searchable segments (by section header or paragraph)
3. Generates embeddings for each chunk
4. Stores vectors in SQLite FTS (full-text search) or a dedicated vector database
5. Agent queries the preprocessed index, not raw files
Benefits:
- Sub-second retrieval even with 2+ years of history
- Lower token costs (no need to re-embed on every search)
- Better ranking (can tune relevance scoring separately from embedding)
Implementation details depend on your vector database of choice (Chroma, Qdrant, even SQLite with FTS5), but the pattern is the same: **offload expensive preprocessing to scheduled batch jobs**, let the agent query a fast index.
Part 2: Agent Hierarchy & Workspace Organization
The Multi-Agent Pattern
One human, multiple agents. Why?
Context isolation. Your work context has different needs, tools, and risk profiles than your personal context. Your work agent manages email at `you@company.com`, has access to internal databases, and operates under company security policies. Your personal agent manages `you@gmail.com`, has access to your home calendar and grocery list, and can take more risks because it’s not handling company data.
Mixing them is asking for trouble. Your personal agent accidentally sends a work email. Your work agent reveals company information in a personal chat. Or worse: they start interfering with each other’s tasks because they’re both monitoring the same inbox or editing the same files.
Standard structure:
~/.openclaw/
agents/
work/ # Professional context (company email, code, operations)
SOUL.md
AGENTS.md
USER.md
MEMORY.md
...Each agent has:
- Separate credentials: Different email accounts, API keys, database connections
- Isolated workspaces: File system permissions prevent cross-agent access
- Distinct memory: No shared memory or knowledge base (unless explicitly designed)
- Independent operation: One agent crashing doesn’t affect others
When to split agents:
- Different human principals (you vs. team)
- Different credential sets (work email vs. personal email)
- Different risk profiles (conservative production vs. experimental dev)
- Different communication channels (Telegram for personal, Google Chat for work)
Sub-Agent Conventions
Sub-agents are *not* peer agents. They don’t get their own workspace. They inherit their parent’s context, credentials, and memory. They’re spawned for specialized tasks, run for a defined period, and terminate.
Rule: Sub-agents don’t have workspace directories. Their configuration lives in the parent agent’s `AGENTS.md` file.
File: devbot-agents-config.md (10KB)
Path: /Users/rahulsub/.openclaw/agents/rahul-ai/devbot-agents-config.md
Contents:
5 Sub-Agent Definitions (ready for AGENTS.md):
pr-reviewer
• Authority: Level 2 (draft + approve)
• Criteria: Auto-approve <50 lines, require review for security/API changes
• Tools: GitHub API, ESLint, Pylint, TypeScript
• Escalate: Security vulns, architectural concerns
issue-triager
• Authority: Level 3 (autonomous)
• Criteria: P0-P3 priority based on keywords
• Auto-assign by component labels
• Escalate: P0 issues, security reports
codemod
• Authority: Level 2 (create PRs + review)
• Process: AST parsing → transform → test → PR
• Criteria: Auto-PR if tests pass, flag >500 lines
• Escalate: Test failures, AST errors
dependency-auditor
• Authority: Level 3 for patches, Level 2 for majors
• Criteria: Auto-merge patches, flag majors, emergency for CVEs
• Schedule: Weekly Sunday 2 AM
• Escalate: Critical CVEs, breaking changes
ci-monitor
• Authority: Level 3 (autonomous)
• Mode: Session (always-on)
• Categories: Flaky test, dependency, lint, test failure, infrastructure
• Auto-retry: Max 2 retries for flaky/infra
• Escalate: Persistent failures, main branch breaks
Additional Sections:
DevBot Coordination: Morning standup automation, real-time triggers, bulk operations
Configuration Files: JSON configs for each sub-agent (coding standards, priority keywords, failure patterns)
Wrapper Scripts: gh-pr-review, gh-issue-triage, gh-codemodWhen the parent agent spawns a sub-agent, it reads this configuration, constructs the appropriate prompt with authority boundaries and escalation criteria, and launches the sub-agent session.
Why this matters:
1. Version control: Sub-agent logic is in a file you can track in Git
2. Consistency: Same sub-agent configuration every time (no prompt drift)
3. Testability: You can spawn a sub-agent manually to test changes
4. Auditability: Clear record of what each sub-agent is authorized to do
Workspace File Discipline
Not everything in your agent’s workspace should be treated the same. Some files are human-authored configuration that should be version controlled. Some are agent-generated data that should be tracked but can be regenerated. Some are ephemeral cache that should never be committed.
Core files (human-edited, version controlled):
- `SOUL.md` - Agent identity, principles, personality
- `AGENTS.md` - Operating instructions, startup procedures, sub-agent configs
- `USER.md` - Principal context, preferences, special handling
- `MEMORY.md` - Curated long-term memory
- `TOOLS.md` - Tool configurations, credentials (without secrets)
- `IDENTITY.md` - Name, role, signature
- `HEARTBEAT.md` - Periodic task definitionsGenerated files (agent-edited, version controlled):
- `tasks.json` - Task ledger with dependencies
- `memory/archive/*.qmd.md` - Weekly memory digests
- `scripts/*.sh`, `scripts/*.py` - Helper scripts (may be agent-generated)Ephemeral files** (not version controlled):
- `memory/2026-*.md` - Daily logs (archived by QMD, not committed)
- `communications/*.jsonl` - Transaction logs (may contain PII)
- `.cache/`, `temp/`, `*.log` - Runtime artifacts
- Session transcripts, model outputs, temporary filesGitignore strategy:
# .gitignore for OpenClaw agent workspace
# Ephemeral / runtime
.cache/
temp/
*.log
session-*.json
# Daily logs (archived via QMD)
memory/2026-*.md
memory/202[0-9]-*.md
# Transaction logs (may contain PII, archived separately)
communications/*.jsonl
# Secrets
.env
*.key
credentials.json
# Keep structure but ignore content
memory/archive/.gitkeep
communications/archive/.gitkeepCommit: configuration, curated memory, scripts. Archive: transaction logs, daily logs. Ignore: secrets, cache, ephemeral data.
TOOLS.md as Configuration Management
Your agent needs to call external tools: `gog gmail`, `gh` CLI, custom scripts. Each tool has options: account, flags, defaults. If you let the agent figure this out every time, you get inconsistency and errors.
Stop doing this:
Agent: “I’ll search your email now”
Agent runs: `gog gmail search “urgent”`
Error: “No account specified”
Agent: “Let me try again”
Agent runs: `gog gmail --account you@company.com search “urgent”`
Error: “No client specified”
Agent: “One more time”
Agent runs: `gog gmail --account you@company.com --client work-agent search “urgent”`
Success (after three tries and wasted tokens)
Start doing this:
Document defaults in `TOOLS.md`:
## Gmail
- **Account**: you@company.com
- **Client**: work-agent
- **Wrapper**: `~/bin/work-gmail` (auto-adds --account and --client)
- **CRITICAL**: Always use `work-gmail` command, never call `gog gmail` directly
### Usage
```bash
# Search
work-gmail search “urgent”
# Send
work-gmail send --to recipient@example.com --subject “Subject” --body “Message”
```
Create wrapper script `~/bin/work-gmail`:
```bash
#!/bin/bash
# work-gmail - Wrapper for gog gmail with work account defaults
exec gog gmail --account you@company.com --client work-agent “$@”
```Now when the agent needs to search email, it runs `work-gmail search “urgent”` and it works the first time, every time.
Apply this pattern to every tool:
- `work-calendar` wraps `gog calendar`
- `personal-gmail` wraps `gog gmail` with personal account
- `db-query` wraps database client with connection string
- `api-call` wraps curl with authentication headers
Agent calls simple, memorable commands. Wrapper scripts handle configuration. Defaults live in version-controlled `TOOLS.md`.Part 3: Version Control & Deployment
Why Git + Stow
Your agent’s workspace lives in `~/.openclaw/agents/work-agent/`. That’s where OpenClaw expects to find it. But you want your configuration in a Git repository—`~/code/openclaw-config/`—so you can version control it, test changes in branches, and deploy to multiple machines.
The naive approach: Keep your workspace in Git, point OpenClaw at `~/code/openclaw-config/work-agent/`. This works until you have multiple agents and they start colliding on paths, or you want to test a change without affecting your production agent.
The production approach: Keep source in Git repository, use **GNU Stow** to create symlinks from `~/.openclaw/agents/` to your Git repo.
Structure:
~/code/openclaw-config/
work-agent/
SOUL.md
AGENTS.md
USER.md
MEMORY.md
TOOLS.md
IDENTITY.md
HEARTBEAT.md
memory/
archive/
.gitkeep
scripts/
work-gmail
qmd-compress.sh
personal-agent/
SOUL.md
AGENTS.md
...
scripts/
dream-routine.sh
archive-logs.sh
stow.sh # Deployment script
.gitignore
README.mdstow -d ~/code/openclaw-config -t ~/.openclaw/agents work-agent
#This creates symlinks:
#`~/.openclaw/agents/work-agent/SOUL.md` → `~/code/openclaw-config/work-agent/SOUL.md`
#`~/.openclaw/agents/work-agent/AGENTS.md` → `~/code/openclaw-config/work-agent/AGENTS.md`
# And so on for every file in `work-agent/`Result: OpenClaw reads from `~/.openclaw/agents/work-agent/` (where it expects files), but you edit and version control `~/code/openclaw-config/work-agent/` (where Git lives). Changes in Git immediately reflect in the agent’s workspace via symlinks.
**Deployment script** (`stow.sh`):
```bash
#!/bin/bash
# Deploy OpenClaw agent configuration via stow
set -e
REPO_DIR=”$HOME/code/openclaw-config”
TARGET_DIR=”$HOME/.openclaw/agents”
cd “$REPO_DIR”
# Deploy each agent
for agent in work-agent personal-agent family-agent; do
if [ -d “$agent” ]; then
echo “Deploying $agent...
stow -d “$REPO_DIR” -t “$TARGET_DIR” “$agent”
fi
done
echo “Deployment complete. Restart agents to pick up changes.”
Run `./stow.sh` after pulling updates from Git. Your agents pick up the new configuration.What to Commit
Commit to Git:
✅ All workspace `.md` files (SOUL, AGENTS, USER, MEMORY, TOOLS, IDENTITY, HEARTBAT)✅ QMD digests (`memory/archive/*.qmd.md`)
✅ Helper scripts (`scripts/*.sh`, `scripts/*.py`)
✅ Configuration files (`tasks.json` *template*, not active ledger)
✅ Documentation (`README.md`, setup instructions)
Don’t commit:
❌ Daily logs (`memory/2026-*.md`) - too noisy, handled by QMD
❌ Transaction logs (`communications/*.jsonl`) - may contain PII
❌ Cache, temp files, session transcripts
❌ API keys, tokens, credentials - use environment variables or OS keychain
❌ Active task ledger (`tasks.json` with live data) - too much churn
Sensitive data:
For credentials, use one of:
1. Environment variables (‘.envrc’ with direnv, not committed)
2. OS keychain (macOS Keychain, Linux Secret Service)
3. Separate credentials file (‘.credentials.json’, gitignored)
Document in ‘TOOLS.md’ how to set up credentials, but don’t commit the credentials themselves.
Branching Strategy
main: Stable, deployed to production agent
dev: Experimental changes, test before deploying
feature/X: Specific improvements (new sub-agent, dream routine enhancements)
Deploy flow:
1. Develop in feature branch
git checkout -b feature/add-pr-reviewer-subagent
# Edit work-agent/AGENTS.md to add PR-Reviewer config
git commit -m “Add PR-Reviewer sub-agent configuration”2. Test in dev mode
git checkout dev
git merge feature/add-pr-reviewer-subagent
./stow.sh
# Start agent with dev config to test
openclaw --agent work-agent-dev
# Test sub-agent spawn, verify behavior
3. Merge to main after 24-hour soak test
git checkout main
git merge dev
./stow.sh
# Agent picks up changes on next restart or SIGHUP4. Rollback if needed
git revert <commit>
./stow.sh
# Or: git checkout main~1 && ./stow.shBranch protection rules (if team is involved):
- ‘main’: Requires pull request review
- ‘dev’: Direct commits allowed
- ‘feature/*’: Delete after merge
Disaster Recovery
Scenario: Your machine dies. Your agent is gone.
Without Git + Stow:
- Reinstall OpenClaw
- Reconfigure agent from memory (or start over)
- Lost context: past decisions, relationships, preferences
- Recovery time: Days to weeksWith Git + Stow:
# Clone the repository to a new machine
git clone git@github.com:yourname/openclaw-config.git ~/code/openclaw-config
# Run deployment scripts
cd ~/code/openclaw-config
./stow.sh
# Restore archived logs (optional)
rsync -av backup-server:openclaw-archives/ ~/.openclaw/agents/work-agent/memory/archive/
# Restart the gateway
openclaw gateway start
Agent has full memory from `MEMORY.md` and QMD digests. Operational continuity restored.
Recovery time: 10 minutes (clone, stow, start) vs. “start from scratch and lose institutional knowledge.”
Bonus: Push your Git repo to a private repository (GitHub, GitLab, self-hosted). Your agent configuration is backed up offsite automatically.
Part 4: Determinism Over Prompting
The Prompt Fatigue Problem
You tell your agent, “Check my email every 15 minutes and let me know if anything urgent comes in.”
The agent adds this to its heartbeat routine. For the first day, it works great. Then:
- Day 2: Agent is processing a long task at 3:15 PM, skips the email check
- Day 3: Agent checks email but interprets “urgent” differently (yesterday it was executive emails, today it’s anything with “ASAP” in the subject)
- Day 4: Agent checks email, finds nothing urgent, sends you “All clear!” message (noise)
- Day 5: Agent forgets to check email entirely because heartbeat prompt got lost during a restart
The problem: You’re using the LLM—expensive, probabilistic, context-dependent—for a task that should be deterministic.
Cron > Heartbeat for Scheduled Tasks
Heartbeat (agent-managed):
# HEARTBEAT.md
Check email every 15 minutes. If urgent items found (executive senders, keywords: urgent/ASAP/EOD), alert immediately.Problems:
- Unreliable (agent might skip if busy with other work)
- Expensive (LLM call every 15 minutes = 96 calls/day)
- Unpredictable (timing drifts, “every 15 minutes” becomes “every 12-18 minutes”)
- Non-deterministic (”urgent” interpretation varies)
Cron (system-managed):
# crontab
*/15 * * * * ~/bin/check-email-and-notify work-agent#!/bin/bash
# check-email-and-notify
AGENT=”work-agent”
URGENT_SENDERS=”ceo@company.com|cto@company.com|emergency@company.com”
URGENT_KEYWORDS=”urgent|asap|eod|deadline”
# Query inbox (deterministic, fast)
URGENT=$(work-gmail search “is:unread (from:${URGENT_SENDERS} OR subject:${URGENT_KEYWORDS})” --json | \
jq -r ‘.messages[]? | “\(.from) - \(.subject)”’)
if [ -n “$URGENT” ]; then
# Call agent only if urgent items found
openclaw --agent “$AGENT” --prompt “Alert: Urgent emails found:\n\n$URGENT\n\nPlease review and respond.”
fi
# Log check (even if nothing found, for monitoring)
echo “$(date +%Y-%m-%dT%H:%M:%S) - Checked email, $(echo “$URGENT” | wc -l) urgent items” >> ~/.openclaw/agents/$AGENT/email-check.log# crontab
*/15 * * * * ~/bin/check-email-and-notify work-agent#!/bin/bash
# check-email-and-notify
AGENT=”work-agent”
URGENT_SENDERS=”ceo@company.com|cto@company.com|emergency@company.com”
URGENT_KEYWORDS=”urgent|asap|eod|deadline”
# Query inbox (deterministic, fast)
URGENT=$(work-gmail search “is:unread (from:${URGENT_SENDERS} OR subject:${URGENT_KEYWORDS})” --json | \
jq -r ‘.messages[]? | “\(.from) - \(.subject)”’)
if [ -n “$URGENT” ]; then
# Call agent only if urgent items found
openclaw --agent “$AGENT” --prompt “Alert: Urgent emails found:\n\n$URGENT\n\nPlease review and respond.”
fi
# Log check (even if nothing found, for monitoring)
echo “$(date +%Y-%m-%dT%H:%M:%S) - Checked email, $(echo “$URGENT” | wc -l) urgent items” >> ~/.openclaw/agents/$AGENT/email-check.logBenefits:
- Reliable: OS guarantees execution at :00, :15, :30, :45
- Cheap: Shell script handles filtering, agent called only if urgent items found (96 potential calls → ~5-10 actual calls)
- Predictable: Runs exactly on schedule, no drift
- Deterministic: Same inbox state = same result (regex patterns, not LLM interpretation)
Scripts as Agent Tools
Anti-pattern:
User: “Agent, analyze this sales CSV and send me a summary by region with revenue trends.”
Agent: Reads CSV (2,000 rows), uses LLM to parse and aggregate data (expensive, slow, error-prone for structured data), writes summary.
Token cost: ~$0.50
Time: 45 seconds
Reliability: 80% (LLM sometimes miscalculates aggregations)
Production pattern:
User: “Agent, analyze sales.csv by region.”
Agent: Runs `python scripts/analyze-sales.py sales.csv --format markdown`
#!/usr/bin/env python3
# analyze-sales.py
import pandas as pd
import sys
def analyze_sales(csv_path):
df = pd.read_csv(csv_path)
summary = df.groupby(’region’).agg({
‘revenue’: [’sum’, ‘mean’, ‘count’]
}).round(2)
print(”## Sales Summary by Region\n”)
print(summary.to_markdown())
# Trend analysis
df[’month’] = pd.to_datetime(df[’date’]).dt.to_period(’M’)
trends = df.groupby([’region’, ‘month’])[’revenue’].sum().unstack(fill_value=0)
trends[’trend’] = trends.apply(lambda row: ‘up’ if row.iloc[-1] > row.iloc[0] else ‘down’, axis=1)
print(”\n## Trends (last 3 months)\n”)
for region, trend in trends[’trend’].items():
print(f”- {region}: {trend}”)
if __name__ == ‘__main__’:
analyze_sales(sys.argv[1])Agent: Reads script output, includes in response.
Token cost: ~$0.02 (just reading script output)
Time: 2 seconds
Reliability: 100% (deterministic calculation)
When to use scripts:
✅ Structured data processing (CSV, JSON, XML, logs)
✅ Mathematical calculations (aggregations, statistics, forecasts)
✅ API interactions with deterministic inputs (query database, call REST API)
✅ File system operations (find files, check disk usage, organize directories)
✅ Scheduled operations (backups, log rotation, health checks)
When to use LLM:
✅ Unstructured input (natural language requests, interpreting user intent)
✅ Decision-making (apply judgment to novel situations, escalation logic)
✅ Content generation (write emails, summarize documents, explain findings)
✅ Tool orchestration (which script to run, in what order, with what parameters)
Let the LLM do what it’s good at—understanding intent and generating language. Let scripts do what they’re good at—deterministic, reliable, fast computation.
Configuration Files > Prompts
Bad:
User: “Remember to always CC me on emails the portfolio-manager sub-agent sends.”
Agent: “Got it, I’ll remember that.”
Three days later, agent restarts and sends emails without CC’ing you because the agent forgot.
Good:
Create `tools/agent-manager-config.json`:
{
“email_defaults”: {
“cc”: [”you@company.com”],
“use_html”: true,
“signature”: “Portfolio Manager (Automated)\nYour Company”
},
“funding_criteria”: {
“fund_threshold_revenue”: 10000,
“fund_threshold_milestone”: 0.8,
“defund_threshold_days”: 60
},
“escalation”: {
“notify_on”: [”criteria_conflict”, “missing_data”, “team_dispute”],
“notify_to”: “you@company.com”
}
}Portfolio-manager sub-agent (or its wrapper script) reads this config on every run. Behavior is consistent, documented, and version-controlled.
Update the config:
git checkout -b feature/update-pm-cc
jq ‘.email_defaults.cc += [”cfo@company.com”]’ tools/portfolio-manager-config.json > tmp && mv tmp tools/portfolio-manager-config.json
git commit -m “Add CFO to portfolio-manager email CC list”
git push
# Merge, deploy via stowChange is tracked in Git history. No “did I tell the agent to do this or not?” ambiguity.
When to Use LLM vs. Script: Decision Framework
Hybrid approach (best):
User request → LLM interprets intent → LLM decides which script to run → Script executes deterministically → LLM reads output and responds to user
The LLM is the orchestrator. Scripts are the reliable workers.
Part 5: Real-World Case Study - The DevBot Ecosystem
The Problem: Rahul, the human manager manages active product development across 15+ projects.
- 50+ GitHub repositories
- Pull requests that need review (architecture decisions, security implications, coding standards)
- Issues that need triage (bugs, feature requests, questions from external teams)
- CI/CD pipelines that need monitoring (flaky tests, build failures, deployment issues)
- Dependencies that need updates (security patches, version upgrades)
- Legacy code that needs modernization (tech debt, framework migrations)
He can’t personally review every PR or triage every issue—that would be 40+ hours per week just on GitHub admin. But he needs to stay informed on critical changes, ensure coding standards are enforced, and respond quickly to security issues.
He needs force multiplication, not another task list.
The Solution: DevBot + Specialized Sub-Agents
DevBot is a parent agent that lives in Rahul’s work agent context. It monitors GitHub activity, routes work to specialized sub-agents, and consolidates reports back. It doesn’t do the work itself—it orchestrates.
Five specialized sub-agents:
1. PR-Reviewer Sub-Agent
Purpose: Automated code review for pull requests
Authority: Level 2 (Propose & Execute - draft reviews, post after approval)
Process:
1. GitHub webhook triggers DevBot when new PR opened
2. DevBot spawns PR-Reviewer with PR number
3. PR-Reviewer:
- Fetches PR diff, commit messages, CI results
- Checks against coding standards documented in repo’s `CONTRIBUTING.md`
- Runs static analysis (linting, type checking, complexity metrics)
- Identifies potential issues: security vulnerabilities, performance anti-patterns, style violations
- Drafts review comments with specific line references
- Posts draft to internal review channel (Slack/Telegram)
4. Rahul’s reviews draft, approves or adjusts
5. PR-Reviewer posts approved comments to GitHub
Why it’s a sub-agent: Each PR needs deep, focused context. The review task is time-bounded (10-30 minutes). Sub-agent inherits DevBot’s GitHub credentials and coding standards from memory.
2. Issue-Triager Sub-Agent
Purpose: Classify and prioritize GitHub issues
Authority: Level 3 (Full Autonomy - label, assign, close; escalate critical)
Process:
1. Runs daily at 8 AM via cron
2. DevBot spawns Issue-Triager in batch mode
3. Issue-Triager:
- Queries GitHub API for new issues across watched repos (last 24 hours)
- Classifies: bug / feature / question / invalid
- Assigns priority labels based on keywords:
- `P0: Critical` - security, data loss, outage
- `P1: High` - major functionality broken
- `P2: Medium` - feature request, minor bug
- `P3: Low` - enhancement, cleanup
- Mentions relevant team members based on component labels
- Closes obvious duplicates or spam
- Escalates `P0` issues immediately to Rahul via notification
4. Logs triage decisions to `communications/issue-triage-YYYY-MM.jsonl`
Why it’s a sub-agent: Batch operation (process all new issues at once). Clear, deterministic criteria. Fully autonomous within guidelines.
3. CodeMod Sub-Agent
Purpose: Bulk refactoring and code migrations
Authority: Level 2 (Propose & Execute - create PRs, request review)
Process:
1. Rahul to DevBot: “Migrate all repos from Winston to Pino for logging”
2. DevBot spawns one CodeMod sub-agent per repository (parallel execution)
3. Each CodeMod sub-agent:
- Clones repository, creates feature branch
- Analyzes current Winston usage (AST parsing, not regex)
- Generates codemod script using jscodeshift or libcst:
// Example jscodeshift transform
import winston from ‘winston’ → import pino from ‘pino’
winston.createLogger(...) → pino(...)- Runs codemod on codebase
- Runs test suite, verifies tests pass
- Opens PR with changes, adds label `automated-refactor`
- Posts PR link back to DevBot
4. DevBot consolidates: “Opened 23 PRs across repositories. 18 passed tests, 5 need manual review.”
5. Rahul’s team reviews failed PRs, provides guidance
Why it’s a sub-agent: Long-running (can take 1-3 hours for large repos). Specialized tooling (AST parsers). Parallel execution across repos.
4. Dependency-Auditor Sub-Agent
Purpose: Monitor and update dependencies
Authority: Level 3 for patch/minor updates, escalate major updates
Process:
1. Runs weekly via cron (Sunday 2 AM)
2. DevBot spawns Dependency-Auditor
3. Dependency-Auditor:
- Scans `package.json`, `requirements.txt`, `go.mod`, etc. across all repos
- Identifies outdated dependencies
- For patch/minor updates (1.2.3 → 1.2.4 or 1.3.0):
- Creates PR automatically with dependency updates
- CI runs tests
- If tests pass, auto-merges (Rahul’s policy: trust patch updates)
- For major updates (1.x → 2.x):
- Flags for human review
- Includes changelog summary, breaking changes analysis
- Escalates via notification, waits for approval
- Monitors security advisories (GitHub Dependabot, Snyk, etc.)
- Critical vulnerabilities: Immediate notification + emergency PR
Why it’s a sub-agent: Scheduled operation with clear upgrade policy. Autonomous for routine updates, escalates ambiguity.
5. CI-Monitor Sub-Agent
Purpose: Watch CI/CD pipelines, triage build failures
Authority: Level 3 (Full Autonomy - retry builds, categorize failures, escalate persistent issues)
Process:
1. Runs continuously in session mode (always-on)
2. Listens to CI/CD webhooks (GitHub Actions, CircleCI, Jenkins)
3. On build failure:
- Fetches build logs
- Categorizes failure type:
- Flaky test: Same test failing/passing inconsistently → Retries build, logs flakiness
- Dependency issue: Package install failed → Checks if transient (retry) or permanent (escalate)
- Lint/style: Formatting or linting errors → Suggests auto-fix PR
- Test failure: New or persistent test failure → Escalates with diagnostic info
- Infrastructure: Timeout, OOM, runner issue → Retries with different runner
4. Tracks flaky tests over time, opens issue when test flakes >3 times in 7 days
5. Auto-retries known flaky builds (waste of dev time to manually retry)
6. Escalates if failure persists after 2 retries
Why it’s a sub-agent: Always-on monitoring. Immediate response needed. Clear categorization logic. Autonomous retry authority.
The Coordination Pattern
DevBot doesn’t micromanage. It orchestrates at key moments.
Morning Standup (Automated)
DevBot dream routine (runs 6 AM daily):
1. Spawn Issue-Triager → process yesterday’s issues (5 min)
2. Spawn Dependency-Auditor → check for new security advisories (5 min)
3. Query PR-Reviewer results from yesterday → collect drafted reviews
4. Query CI-Monitor → get overnight build failure summary
5. Aggregate results
6. Send digest to Rahul (Telegram):
---
Morning Summary - March 3, 2026
PRs:
• 5 PRs opened yesterday, 3 reviews drafted (attached)
• 2 PRs auto-merged after tests passed
Issues:
• 12 new issues triaged: 2 critical (flagged), 8 medium, 2 low
• Critical: Auth service memory leak (Issue #482)
• Critical: Payment API returning 500s intermittently (Issue #483)
Dependencies:
• 3 security updates auto-merged (all patch versions)
• 1 major update needs review: React 17 → 18 (breaking changes)
CI:
• 8 builds failed overnight, 6 auto-retried successfully
• 2 persistent failures need attention:
- api-service: new test failure in auth module
- web-app: flaky test detected (image-upload-test, 4th flake this week)
Recommendation: Review critical issues first, then React upgrade.Rahul’s morning now starts with context, not chaos. He knows what needs attention, what was handled automatically, and what’s waiting on him.
DevBot isn’t magic. It’s fine tuned infrastructure.
Conclusion: From Toy to Infrastructure
You learned the magic: agents that text back, order groceries, code while you sleep. You spun up your first OpenClaw instance, gave it access to Telegram, and felt the thrill of delegating real work to AI.
This article is what comes next. When your agent’s memory becomes unmanageable. When you lose configuration in a crash. When you realize you’re spending more time fixing agent mistakes than you save from automation. When you need it to be reliable, not just clever.
The patterns in this article aren’t sexy:
- Memory compaction schedules (QMD weekly digests)
- Git repositories with Stow symlinks
- Cron jobs instead of heartbeat prompts
- Scripts for deterministic operations
- Sub-agent hierarchies documented in version control
But they’re what separate toys from tools.
DevBot isn’t magic. It’s:
- 5 specialized sub-agents with clear authority boundaries
- Coding standards in version-controlled config files
- Cron-scheduled weekly audits
- Dream routines that consolidate GitHub activity into CTO-digestible summaries
- Scripts that handle AST parsing and API calls
- An LLM that handles judgment and natural language
You’ll know you’re ready for production patterns when:
- Your agent can’t remember something you definitely told it
- You’re afraid to restart because you might lose context
- You’ve re-explained the same preferences 3+ times
- Your daily log folder has 200+ files and ‘memory_search’ takes 5+ seconds
- A sub-agent sent an email with the wrong formatting because you forgot to specify a flag
- You lost work because you didn’t commit your agent’s configuration
These patterns aren’t requirements on day 1. They’re requirements when you stop experimenting and start depending on your agent.
Start simple. Get the magic working. Feel the dopamine hit of your agent texting you unprompted.
Then, when it hurts, level up. Add memory compaction. Version control your config. Move scheduled tasks to cron. Write scripts for deterministic work. Build sub-agent hierarchies.
The boring parts are what make it work in production.
The difference between a demo and infrastructure is whether you can rely on it tomorrow, next month, next year. OpenClaw is powerful enough for both. This article is your map from one to the other.
---
*Want to discuss production OpenClaw patterns? Find me on [X/Twitter](https://x.com/rsubbuilds)






The QMD pattern is clever. I've been hitting exactly that memory scaling problem you describe. On the infrastructure side, moving to Laravel Forge sorted the rebuild problem for me; they added OpenClaw as a server type, so a fresh instance is five minutes instead of an afternoon. Wrote it up here: https://reading.sh/laravel-forge-can-now-run-openclaw-not-just-your-websites-65c248964223
Great article.
You mention the AGENTS.md/SOUL.md/TOOLS.md etc being human editable, not agent editable. Do you allow the agent to update these or block it with file permissions? If the agent does update them, how do you handle the conflict?