Maybe building an AI operating system is simpler than you imagined. Maybe you don’t need full RAG system, and you’d still get 90% of the benefit. When, or if, to scale. is a matter of growth trajectory, concurrent usage, and whether current performance actually meets your needs. Starting simple is possibly more powerful than you think.
This article explores smaller example scales that actually perform, save complexity and money. Understanding when each approach makes sense is one of the most valuable judgements you can make for your business.
Personal Scale: Your AI Operating System
The personal scale is the most underrated. It is also where you should consider training your employees to start from.
The goal at this scale is not to build something that scales. It is to build something that thinks with the user, a simple but powerful system that knows their context, organises their knowledge, and offloads cognitive overhead so they can focus on higher-order work.
AI agent
Your AI agent e.g. Claude Code can read, write, and reorganise files on instruction with a structured set of executable skills. To set up the agent skills, you simply explain what you do day to day, and the agent creates the skills it will use later on.
Folder structure
The architecture is simple: Andrej Karpathy popularised a version of this small scale approach: a local folder to drop raw information files, a folder for the agent to maintain organised knowledge, and an output folder the agent saves it’s outputs.
The architecture is simply Markdown files, navigable by an AI agent. Plain text is the most durable, portable, and AI-readable format that exists. No database, no proprietary format, no platform lock-in.
Knowledge base
The result is a system where you can drop a 40-page document, tell your agent to use a skill (based on your workflow), and it will use the knowledge base and your skill to complete an eight hour task in minutes.
This is not a toy. For a solo practitioner, consultant, or small team, a well-designed personal AI operating system is more useful day-to-day than some enterprise AI deployments.
SMB: Training and Fine-Tuning
When larger teams or multiple teams to share skills and knowledge, then its time to scale up, gain access permissions and the power to search your knowledge base from intent not just keyword search. The small-scale approach makes this possible without spending a fortune.
Higher quality responses
Set up a RAG system on a single consumer GPU, or even a CPU. If you have the time to move t a learning pace, you can user cheaper models that run overnight, reducing costs. You’ll be setting up document loaders, chunkers, embedding models and vector databases.
You’ll fine-tune a small model on a specific task. The point is not the output, it is that you have built the thing from first principles and now you understand it. That understanding is not optional for serious practitioners; it is the foundation everything else stands on.
Distributed knowledge
At this level you are building and maintaining infrastructure so you can fine-tune proprietary data, learn how models work, build domain-specific classifiers or generators where general-purpose models underperform.
Large Scale: Enterprise AI Infrastructure
At large scale, the architecture changes category. You are no longer primarily training models, you are orchestrating them at speed, across distributed systems, with reliability and cost constraints that require real engineering. £2k–£20k+ months of engineering set up, and ongoing maintenance, depending on complexity.
At this scale you will RAG system won’t be fine tuning models, it fetches relevant documents at inference time and passes them as context. The goal is low latency and high throughput, not model innovation. Single-model calls give way to multi-agent pipelines. Specialist agents handle discrete tasks (research, writing, validation, routing), coordinated by an orchestrator. Cost routing makes use of cheaper models for simpler tasks and frontier models where it matters.
Now guardrails and governance protect high stakes risks. systems require input and output validation, content filtering, PII detection, audit logging, and human-in-the-loop escalation paths for edge cases. These are not optional features; they are what makes the system safe to deploy at scale. Observability will also be factored in.
This requires significant engineering investment upfront, and ongoing maintenance, which pays back with volume. At this scale the question is, is the ROI good enough?
Choosing Your Architecture
A team of 12 does not need a distributed agent orchestration platform. A solo consultant does not need a fine-tuning pipeline. And almost nobody needs to train a model from scratch.
Start with the personal scale. Get sharp on what AI can actually do for you. Then move to fine-tuning only when a general-purpose model demonstrably fails at your specific task. Move to large-scale infrastructure only when you have a production use case with real volume that justifies the engineering cost.
The architecture that serves you best is almost always the simplest one that solves the problem in front of you. Scale when you have to, not before.