The Algorithm: Engineering Decisions Behind a Million Impressions

How I built an AI engagement system for X by choosing robustness over perfection

Oct 29, 2025

Seven figures of impressions. Hundreds of profile visits daily. All from a tool I built out of frustration with X’s noise. The Algorithm isn’t another scheduling tool or chatbot. It’s an engagement operating system that maintains human authenticity at machine scale.

This isn’t about the code (that technical breakdown comes later). This is about the engineering principles and architectural decisions that made it work. Because building AI tools isn’t about chasing perfect outputs. It’s about choosing the right constraints.

Section 1: Three Technical Decisions That Actually Mattered

Decision 1: Multi-LLM Orchestration Over Single Provider

Most builders pick one LLM and optimize prompts forever. I built The Algorithm to run OpenAI, Anthropic, and Gemini simultaneously on every generation. Not because one is superior. They’re statistically identical for engagement outcomes.

The breakthrough: showing all three outputs side by side revealed that quality isn’t provider dependent, it’s prompt dependent. Same insight, three voices, wildly different engagement. The system assigns quality scores (typically ranging from 30% to 80%) to each output, but here’s what matters: humans are terrible at predicting which variant will resonate.

This parallel processing approach costs more but eliminates single points of failure. When one API throttles, others continue. When one model updates and breaks, two remain stable.

Decision 2: Voice Profiles Over Generic Prompting

Generic AI replies read like corporate LinkedIn. The Algorithm builds mathematical representations of writing style from URLs and samples you provide. Feed it your blog posts, tweets, emails. It extracts tone patterns, phrase structures, emotional ranges.

The engineering insight: consistency beats authenticity. Your voice profile becomes the statistical median of all your writing modes. Tired you, caffeinated you, frustrated you, all averaged into one consistent voice. The profile is more reliably “you” than you are on any given day.

This required building a vector analysis system that maps writing patterns to generation parameters. Eight tone modes (analytical, supportive, challenging, curious, etc.) can blend in weighted combinations. The “Combined” mode merges all selected tones into a unified voice that maintains personality while adapting to context.

Decision 3: Queue and Review Over Full Automation

I disabled auto-posting deliberately. Not for compliance (though that matters) but for quality assurance. The system generates replies in bulk, holds them in queue, allows comparison and editing, then posts on command.

The workflow: Select posts, generate multiple replies per post, review in queue, compare providers, edit if needed, bulk post. This human-in-the-loop approach maintains 65% consistent quality rather than swinging between 30% and 95%.

Why this matters: One “AI guy” callout destroys months of trust building. Better to have slightly worse performance with zero catastrophic failures than occasional brilliance with occasional disasters. The judgment people levy is on the failures, not the successes.

Section 2: The Data Behind the Decisions

Patterns That Emerged From Testing

This project didn’t spring from a huge A/B test; it was born out of a simple desire to spotlight colleagues’ work. In the spring and summer I tried to manually keep up with coworkers’ posts, liking and commenting to help them gain visibility. Between May 1 and Sept 21 2025, that effort generated a modest 33.3 K impressions and 659 total engagements (only 46 replies and 172 likes) with a 1.9 % engagement rate. It was clear that the intent was there but the scale was missing.

In late September I wired up The Algorithm to surface those posts for me and let an LLM draft replies that I could approve, rewrite and leverage. The difference was dramatic:

Volume dwarfed the baseline. In the 5‑week window from Sept 22 to Oct 28, impressions jumped from tens of thousands to nearly one million, and engagements leapt to ≈11.7 K. That means about 97 % of my lifetime engagement (12.4 K) happened after flipping the switch. Replies surged from 46 in the pre‑tool era to 437 (a ~1000 % increase), likes went from 172 to 1.2 K (+714 %) and reposts from 29 to 139 (+479 %). Even bookmarks and shares, previously negligible, registered meaningful numbers.
Engagement rate settled at scale. While the raw engagement count exploded, the engagement rate naturally dipped from 1.9 % to around 1.1 %. That’s expected when moving from occasional high‑touch interactions to a larger volume of automated drafts; it indicates that the audience is bigger and more diverse, but responses are still resonating.
Human‑in‑the‑loop matters. Because every drafted reply still goes through me, the tone stays authentic. I can skip posts that aren’t relevant, adjust the voice profile when needed and make sure I’m never spamming a thread.

These data points show that the system didn’t just amplify existing engagement, it created engagement that wasn’t happening at all before.

Architecture That Scales

Under the hood, the system looks more like a production‑ready SaaS than a side project. The core services listed in the repo’s README, an X.com monitor, a reply engine with LLM integration, and a user‑management module for auth and list‑sharing, sit on top of PostgreSQL for persistent data and Redis for caching and queueing. A Celery‑based task queue schedules generation jobs and throttles API calls to stay within X’s limits. Recent work has hardened the OAuth flows (PKCE, encrypted tokens, automatic refresh), fixed cross‑device cookie persistence and even added an “ignore” button so I can filter out posts I don’t want to respond to. This foundation means the system can handle bursts of activity without collapsing or accidentally double‑replying.

What Failed and What We Learned

Building in public meant embracing some failures. Early on, I experimented with alternative posting APIs so I could schedule replies without needing official X credentials. At one point a queue bug caused those generated posts to publish automatically without my review, a clear violation of X’s terms. Another time, while integrating OAuth properly, I hammered the authentication endpoint so hard that my account was temporarily locked. Both incidents reinforced the importance of a human‑in‑the‑loop workflow: the system drafts, the user decides, and nothing goes out without an explicit approval.

A more serious hiccup occurred when shared cookie files in a third‑party library leaked authentication between users. The fix required tearing out the shared cookies.json file, implementing strict user isolation and having everyone re‑authenticate. And in the earliest prototypes, the system had no notion of “thread memory,” so it sometimes suggested multiple replies to the same conversation. A thread‑memory layer is now on the roadmap to prevent duplicate engagement.

Despite those bumps, the numbers don’t lie. Focusing on highlighting coworkers’ content through an LLM‑assisted queue gave my account its first real taste of scale. It validated the idea that combining human judgment with machine speed can achieve algorithmic reach without sacrificing authenticity.

Section 3: What’s Next for Algorithmic Engagement

The Correlation Problem

The Algorithm includes a post optimizer that scores content before publishing. It uses publicly available X algorithm factors: reply weights, engagement multipliers, visibility coefficients. But here’s the open question: do these internal scores correlate with actual performance?

The next phase involves reverse engineering from high performers if our predictions prove weak. Every optimized post becomes a data point. Every engagement metric feeds back into the scoring algorithm. This creates a continuously learning system that adapts to algorithm changes.

The Context Challenge

Current limitations include no image comprehension, no thread awareness, no conversation memory. The system can pile replies in the same thread without realizing it. These aren’t bugs, they’re architectural decisions I made to ship fast.

The roadmap includes thread memory layers, conversation context tracking, and reply-to-reply awareness. But each addition increases complexity exponentially. The question becomes: when does intelligence become overhead?

The Scale Question

The Algorithm works for individual power users. But what happens when hundreds run similar systems? When every reply might be AI generated? When authenticity becomes computationally defined?

I’m exploring federation models where multiple instances share learning without sharing data. Imagine distributed intelligence about engagement patterns, updated in real time across all users. The technical challenges are significant, but the potential is transformative.

The Algorithm launches in limited beta next week. Built from frustration, validated through iteration, shipping despite imperfection.

Discussion about this post

Ready for more?