Blog

The Agentic Architect: Running a One-Person AI Engineering Team for Under $100/Month

How I use a three-model delegation pattern with cheap orchestrators and OpenAI Codex CLI on a single cloud instance to automate blog writing, email drafts, and competitive intelligence.

Peter Steinberger, the creator of OpenClaw, logged over 6,600 commits in a single month. When he described his workflow, the phrase that stuck was "PRs are dead." He wasn't talking about abandoning code review. He was describing a shift where the human becomes the architect and the AI handles implementation. You stop reviewing lines of code and start reviewing the prompts and systems that generate them.

I read that and thought: I can build this. Not at Steinberger's scale, not with a team behind me, but as a solo operator running automated workflows for a law practice and a cybersecurity blog. So I did. Here's the architecture, what it costs, and what I've learned running it in production.

The Three-Model Pattern

The core idea isn't which models you pick. It's how they divide labor.

Most people throw their best model at everything. That works if budget isn't a constraint. For solo operators and small teams, it usually is. The pattern that makes the economics work is simple: use the cheapest model that can handle each tier of complexity.

Tier 1 - Orchestrator (cheap, fast). Handles routing, coordination, and deciding what needs to happen next. Processes incoming messages, triggers scheduled tasks, and figures out which action to take. This is high-volume, low-complexity work. You want a model that's fast, reliable, and costs almost nothing per request.

Tier 2 - Fallback (cheap, resilient). Same role as the orchestrator, activated automatically when the primary hits rate limits or goes down. Zero-downtime failover without human intervention.

Tier 3 - Worker (capable, expensive). Handles the work that actually requires intelligence: writing coherent long-form content, understanding codebases, making architectural decisions in code, running builds, committing to git. This model only activates when the orchestrator delegates a real task.

The result is a system where roughly 80% of the compute (by request count) runs on the cheapest available models, and 20% runs on the most capable one. The 20% produces all the valuable output. The 80% makes sure it happens on schedule.

Model Choices

There's no single right answer here. The landscape shifts every few months. What matters is the pattern, not the specific models. That said, here's what's viable for each tier right now:

Tier Options Why
Orchestrator Kimi K2.5, DeepSeek R1, Qwen 3.5 Plus All under $1/M input tokens. Kimi has built-in Agent Swarm. DeepSeek is cheapest. Qwen scores highest on agentic benchmarks.
Fallback Gemini Flash 3, any of the above Gemini's context caching cuts repeat-prompt costs by 75%. Any orchestrator-tier model works here.
Worker OpenAI Codex CLI (ChatGPT Plus), Claude Code (Max subscription), Claude Sonnet/Opus via API Codex CLI is now bundled with ChatGPT subscriptions — $20/month for Plus, $200/month for Pro. No separate API key or per-token billing required. Claude Code on a Max subscription is more capable but costs $100-200/month.

Update (March 2026): The economics of this stack shifted significantly when OpenAI bundled Codex CLI into ChatGPT subscriptions. You can now sign in to Codex CLI with your ChatGPT account and use it as your worker model at no extra cost beyond the subscription. ChatGPT Plus at $20/month gets you 33-168 local messages per 5 hours and 10-60 cloud tasks — enough for most solo automation workloads. Pro at $200/month scales to 223-1120 local messages and 50-400 cloud tasks. When I originally wrote this post, the worker tier was the largest expense at $100-200/month. That's no longer the case.

I've switched my worker to Codex CLI on a Plus subscription. My orchestrator costs still run $5-20/month. The fallback barely registers. The total model spend for this pattern now stays under $50/month.

One consideration: some of these models are hosted by providers in different jurisdictions. If API continuity matters to your business, factor in geopolitical risk when choosing providers. Having a fallback from a different provider in a different region is free insurance.

Infrastructure

Everything runs on a single small cloud instance. No Kubernetes, no multi-region deployment, no container orchestration. One box running the agent gateway on localhost with the three models plugged in.

Provider Instance Specs Monthly Cost
AWS t3.large 2 vCPU, 8GB RAM ~$61
GCP e2-standard-2 2 vCPU, 8GB RAM ~$49
Azure D2as v5 2 vCPU, 8GB RAM ~$63

GCP is the cheapest at list price. All three providers offer reserved/committed pricing that drops costs 40-70% if you commit to 1-3 years. For a persistent agent workload that runs 24/7, reserved pricing makes sense.

2 vCPUs and 8GB RAM is the practical floor for this kind of setup. The orchestrator gateway is lightweight, but when the worker model kicks off a coding session that involves building a project, running tests, and managing git operations, you need headroom. Burstable instances (AWS t3, Azure B-series) work well since the heavy compute is intermittent.

Running models locally on this hardware is not practical. A 2-vCPU box with no GPU gives you roughly 5-10 tokens per second on even a small quantized model. API-based models are the right call unless you're willing to invest in GPU instances at $300-400/month, which defeats the budget purpose.

Full Cost Breakdown

Component Monthly Cost
Cloud compute (small instance) $49-63
Worker model (Codex CLI via ChatGPT Plus) $20
Orchestrator + fallback tokens $10-30
Domains, DNS, hosting $15-20
Total $95-135

The range depends on usage intensity and which cloud provider you choose. The worker model used to be the largest line item at $100-200/month with Claude Code Max. Since OpenAI bundled Codex CLI into ChatGPT subscriptions, it dropped to a flat $20. If you need higher rate limits, ChatGPT Pro at $200/month pushes the total to $275-315, but most solo operators won't hit the Plus ceiling.

What It Actually Automates

This isn't a demo. These workflows run on a regular schedule and produce real output.

Competitive Intelligence and Content

On a regular schedule, a cron job scans competitor websites in a target niche. The orchestrator fetches each site, checks for new content, cross-references against existing posts to avoid duplication, and scores topic ideas by SEO potential. It picks the best one and delegates to the worker model. The worker writes the full post, formats the metadata, adds internal links, updates the sitemap, commits to git, and pushes. The orchestrator reports results to a dedicated messaging channel.

The entire pipeline runs in under 15 minutes. I review what was published, edit where needed, and move on.

Automated Content for a Second Domain

Same pattern, different domain. A separate scheduled job scans news sources in another topic area, identifies trending angles, and delegates a full blog post to the worker. This is how many of the posts on this site get their first draft. The ones that meet the bar after review stay published.

Email Draft Processing

Multiple times daily, the agent checks an inbox, reads new messages, and drafts replies. This is the workflow where I'm most cautious about automation boundaries. The agent drafts. It never sends. Every reply sits in the drafts folder until a human reviews and hits send.

That boundary is enforced at multiple layers, not just in the prompt. When you're handling client communications, an AI sending a reply with incorrect guidance isn't just embarrassing. It's a liability issue with real consequences.

Messaging as Command Center

All of this reports to a messaging platform with channels organized by topic. Content reports go to one channel. Email summaries go to another. System alerts go to a third. This becomes the dashboard where I check what happened overnight, what needs attention, and what ran clean. No custom UI required.

The Memory Problem (and Solution)

Most coding agents have session amnesia. Every run starts fresh with no memory of previous sessions. For one-off tasks, that's fine. For automated workflows that need continuity, it's a problem.

The orchestrator solves this by acting as the memory layer. After every worker session, the orchestrator summarizes what happened and stores it. The next time a scheduled job runs, it passes relevant context from previous runs into the prompt. The content writing job knows what topics were covered yesterday. The email drafting job knows which threads are ongoing.

This isn't perfect. The context window is finite, and long-running threads eventually compress. But it's good enough for daily-cadence workflows. The orchestrator remembers what the worker can't.

Security Posture

Running an AI agent gateway on a cloud instance with access to email, messaging, and code repositories creates real attack surface. I wrote about this when the ClawJacked vulnerability dropped, and the lessons apply to any similar setup.

The short version: binding a gateway to localhost doesn't make it secure. "Not reachable from the network" isn't the same as "not exploitable." Authentication, rate limiting, and origin validation are still required. The ClawJacked disclosure proved that any website could open a WebSocket to localhost from a browser tab and hijack an agent session.

OpenClaw specifically has had a rough security track record. A January 2026 audit by Argus Security identified over 500 vulnerabilities, including 8 critical. Researchers have demonstrated that third-party skills from the ClawHub marketplace can contain prompt injection. If you run OpenClaw (or any agent gateway), treat security as an ongoing concern, not a one-time configuration.

Some principles that apply regardless of which platform you choose:

  • Bind to loopback only. Never expose the gateway to the public internet.
  • Enforce token-based authentication on the gateway, even for local connections.
  • Restrict capabilities to minimum required. Deny any command categories your workflows don't need.
  • Run host-based connection monitoring to catch unexpected outbound connections.
  • Keep the agent platform updated. The ecosystem is young and vulnerabilities will keep appearing.
  • Enforce automation boundaries at multiple layers. Don't rely on prompt-level restrictions alone for safety-critical controls like "don't send emails." Use API-level permissions, tool restrictions, and architectural constraints.
  • Review activity logs regularly. Automated systems can drift. Periodic review catches issues before they compound.

Alternative Platforms

If OpenClaw's security posture concerns you (and it should give you pause), other orchestration frameworks exist:

  • n8n is a visual, low-code workflow builder with strong AI agent capabilities. Self-hosted, open source, no licensing cost. Good if you prefer drag-and-drop to config files.
  • CrewAI is the most popular multi-agent framework right now (44k+ GitHub stars). Role-based agent model with structured task orchestration. More Pythonic, less infrastructure-heavy.
  • LangGraph (by LangChain) offers the most granular control over agent state and tool orchestration. Steeper learning curve but maximum flexibility.
  • n8n + custom scripts is a pragmatic middle ground. Wire up cron jobs, API calls, and LLM invocations through n8n's visual editor, then call your worker model via shell or HTTP nodes.

Each has different tradeoffs in flexibility, security, and operational overhead. The three-model delegation pattern works with any of them. The orchestration platform is the plumbing, not the architecture.

Competing Approaches

Full transparency: there are other ways to solve this that may be better fits depending on your constraints.

The Unwind AI's "Monica" setup runs a similar autonomous agent team on a $5/month VPS for under $400/month total. It uses a structured file-based workflow instead of a gateway-based agent platform. Simpler, fewer moving parts, but less real-time responsiveness.

CrewAI-based setups define agents by role (researcher, writer, reviewer) and orchestrate them in structured pipelines. More opinionated than OpenClaw but easier to reason about. If your workflows are well-defined and don't need ad-hoc flexibility, CrewAI can be more robust.

Pure n8n workflows replace the agent orchestrator entirely with visual automation. Each workflow is a DAG of nodes: fetch RSS, call LLM, write file, commit to git. Less intelligent routing but more predictable execution. Good if you want reliability over adaptability.

Enterprise-grade options like Microsoft's Semantic Kernel or the OpenAI Agents SDK offer tighter integration with their respective ecosystems. Higher floor on cost and complexity, but better support and security guarantees.

OpenAI Codex CLI as a standalone worker is the approach that changed the economics of this post. If your workflows are straightforward — write code, write content, commit to git — you can skip the orchestration platform entirely and drive Codex CLI from cron jobs and shell scripts. At $20/month for ChatGPT Plus, it's the cheapest capable coding agent available. The tradeoff is that you lose the multi-agent orchestration layer, so you need to handle routing and memory yourself.

I chose OpenClaw because it gave me the most flexibility with the least code. That flexibility comes with operational responsibility, particularly around security. If I were starting today, I'd use Codex CLI as the worker, skip OpenClaw entirely, and wire orchestration through n8n or simple cron scripts. The three-model delegation pattern stays the same — only the plumbing changes.

What I'd Do Differently

Better error recovery. When a scheduled job fails (usually due to a rate limit or a flaky web scrape), the current behavior is to report the failure and wait for the next run. Automatic retry with exponential backoff for transient failures would reduce manual intervention.

Tighter memory integration. The memory keeper pattern works, but it's manual in places where it should be automatic. A structured knowledge base would be better than appending to flat files.

Automated cost tracking. I track costs manually by checking provider dashboards. A scheduled job that pulls usage data from each provider's API and posts a weekly cost summary would close the loop.

More granular reporting. The current setup reports success or failure. Richer summaries (word count, SEO estimates, diff stats) would make the daily review faster.

Making It Production-Ready

The architecture gets you running. These patterns keep it running safely.

Isolate the Orchestrator From External State

The orchestrator should route and report. Nothing else. If a workflow touches email, git, or any external API, that call goes through the worker via a wrapper script — never inline in the cron job payload. When I let the orchestrator call email scripts directly, I got race conditions (two sessions hitting the same inbox) and context leaks (the orchestrator accumulating token-heavy email threads it had no business holding). From a security perspective, this also shrinks the orchestrator's attack surface. If it's compromised or prompt-injected, it can't touch your email, your repos, or your credentials. It can only delegate to a worker that operates within its own sandboxed permissions.

Use Wrapper Scripts With Output Contracts

Put the worker instructions in parameterized wrapper scripts, not in the cron job definition. The orchestrator passes four arguments (topic, angle, source, related slugs) and the script handles template substitution, worker invocation, and output parsing. Every script should return a strict JSON contract: {"status": "ok", "published": true, "title": "...", "url": "..."} or {"status": "error", "phase": "seo-review"}. Without this, the orchestrator interprets freeform output, and interpretation drifts. More importantly, a defined contract means you can validate output programmatically. If the JSON doesn't parse or contains unexpected fields, something went wrong — possibly including injection into the worker's output.

Chain Reviews Before Anything Goes Public

For content that gets published automatically, the worker should run a multi-stage review pipeline before committing: write the post, run a copy and voice review, run an SEO audit, and only commit if no critical issues surface. This isn't just quality control. It's a security gate. An agent writing content from scraped news sources is one prompt injection away from publishing something you didn't intend. Each review stage is an independent check that catches drift, tone violations, and content that doesn't match your editorial constraints.

Never Let the Agent Write Its Own Configuration

This one cost me time. I let the orchestrator generate its own cron job configs and scheduling logic. The results looked reasonable but had subtle issues: wrong timezone handling, missing error paths, overly broad permissions. The agent doesn't understand the security implications of the permissions it grants itself. The fix is either to build a dedicated skill that enforces your conventions when creating cron jobs (validated schedules, mandatory timeouts, restricted delivery channels) or to write the configs manually. Configuration is where your constraints live. An agent defining its own constraints is a circular dependency with security implications.

Delegate Heavy Work to Your Best Tool — With a Plan

When the orchestrator hands off to the worker, the worker should receive a plan, not just a raw prompt. Codex CLI and Claude Code both support structured prompting where you can front-load reasoning in a planning step before the expensive implementation pass. That means better output per dollar. But it also means the worker's actions are more predictable and auditable. A plan that says "write post, review copy, review SEO, commit if clean" is easier to verify than "here's a topic, figure it out." Predictability is a security property.

Use Heartbeats for Monitoring, Cron for Actions

Not everything needs a cron job. Batch lightweight checks — inbox polling, disk usage, calendar lookups, credential health — into a heartbeat: a periodic pulse where the orchestrator runs through a short checklist and only reports if something needs attention. Reserve cron for tasks that need exact timing, isolated sessions, or produce external side effects. One heartbeat running every 30 minutes replaces half a dozen cron jobs and burns fewer tokens. It also gives you a natural place to run security hygiene: verify OAuth tokens haven't expired, check that container images are pruned (a small instance fills its disk in days without cleanup), and alert if disk usage crosses a threshold. Treat the heartbeat as your canary.

File-Based Memory — Simple, Auditable, Deletable

The agent wakes up fresh every session. Continuity comes from flat files: daily logs capture raw session notes, and a curated long-term memory file distills what's worth keeping. Before each job, the worker reads the last 48 hours of context. It's not a vector database. It's not RAG. It's cat and markdown. The security advantage is underrated: you can grep your agent's entire memory in seconds, audit exactly what it knows, and delete anything that shouldn't persist. Try doing that with an embedding store. When your agent handles client emails and personal data, "I can see and delete everything it remembers" isn't a limitation. It's a feature.

The Blueprint

If you want to replicate this kind of setup, here's the minimum viable version:

  1. Provision a small cloud instance. 2 vCPU and 8GB RAM is the floor. Pick whichever cloud provider offers the best pricing for your commitment level. Reserved instances save 40-70% for persistent workloads.

  2. Choose your orchestration platform. OpenClaw, n8n, CrewAI, or LangGraph. Evaluate based on your comfort with the security model, your preference for code vs. config vs. visual tools, and how much flexibility you need.

  3. Set up your model hierarchy. Pick a cheap orchestrator. Add a fallback from a different provider. Point heavy work at your best available model. The specific models matter less than the delegation pattern.

  4. Build one automated workflow first. Don't try to automate everything on day one. Pick the most repetitive task in your week and build a scheduled job for it. Get the prompt right. Get the output format right. Get the notification delivery right. Then add the next one.

  5. Organize notifications by topic. One channel per workflow in whatever messaging platform you use. Clean separation makes debugging dramatically easier.

  6. Establish security boundaries before you grant access. Decide what the agent can and can't do before connecting it to anything. Draft-only for emails. Read-only for sensitive data. No destructive operations without human review. These boundaries are harder to add retroactively.

Steinberger was right about the core idea. The future of software engineering isn't writing more code. It's designing better systems and letting agents handle the implementation. You don't need thousands of commits a month to prove the concept. You just need a system that reliably produces useful output while you sleep, at a cost that makes sense for a solo operator.

That's what this kind of stack does. Under $100 a month if you use Codex CLI on ChatGPT Plus as your worker. One instance. Three models. Real output, every day.


Keep reading:

Building an AI-powered automation stack for your business? Let's talk.