Building an AI Team: Lessons from My Multi-Agent Experiment

Jeffrey Cortez
4 days ago
11 min read

What I learned from turning three AI agents into a small digital operations team.

By Jeffrey V. Cortez

A Quick Introduction

For most of my career, I’ve worked at the intersection of technology leadership, education, and operational systems. I recently completed my Executive Master’s in Technology Management at Columbia University, where my work focused on how emerging technologies—particularly artificial intelligence—are reshaping how organizations operate.

Professionally, I’ve spent years designing systems, leading technology initiatives, and helping organizations adopt new technologies responsibly. Through my consulting work with 2Nspira, and community initiatives like Open Goal Soccer, I constantly explore how technology can amplify human potential rather than simply automate tasks.

So when the recent wave of AI tools began evolving from simple chat interfaces into programmable systems, I found myself asking a deeper question:

What happens when we stop treating AI as a chat interface that generates answers, and start building an AI team capable of taking on real tasks and executing them?

That question sparked an experiment that has turned into one of the most fascinating technology journeys I’ve taken in recent years.

The Original Idea

The idea started simply.

Instead of relying on a single AI model to do everything, I wondered what would happen if I built a team of AI specialists, each with a clearly defined role.

Three agents quickly emerged:

Brian — the orchestrator

Responsible for coordinating workflows and routing tasks.

Adam — the research specialist

Focused on analysis, research, and structured reasoning.

Maya — the communications specialist

Focused on writing, messaging, and outreach.

The structure mirrored something very familiar: a real working team.

Brian receives a task and determines the workflow. Sometimes Adam works independently on research. Sometimes Maya handles communication tasks. In more complex cases, the agents collaborate:

Adam researches → Maya drafts → Adam reviews → Maya revises.

At least, that was the theory.

But the first version of the system looked very different.

The First Prototype (And The First Major Problem)

The earliest version of the system had no orchestration at all.

At that stage, I was asking a very practical question:

Could I build something powerful without immediately spending money on API tokens?

More specifically, could I run this as lean as possible using the tools I already had — including my existing $20 ChatGPT Pro subscription?

So instead of building around traditional API access, I began by running each agent independently through OpenClaw, using OpenAI’s ChatGPT 5.3 Codex model authenticated through OAuth.

That OAuth integration felt like a revelation.

Because authentication was handled through the account session, I didn’t need to manage API keys, token accounting, or separate billing infrastructure. Within minutes, I had three independent AI agents running:

• Adam — research

• Maya — writing

• Brian — strategy

Each agent simply responded to prompts.

At first, it felt incredibly powerful.

More importantly, it felt economical.

I wasn’t trying to build an expensive AI lab on day one. I was trying to see how far I could go with a lean architecture and the resources I already had.

For a moment, it seemed like I had found the sweet spot.

But very quickly I ran into something many developers eventually encounter:

The dreaded API Rate Limit error.

Once the agents started operating simultaneously, the system began hitting OpenAI rate limits.

Requests stalled.

Responses failed.

Workflows broke unpredictably.

What initially felt like an elegant and cost-efficient architecture suddenly revealed a serious scaling problem.

The challenge was no longer just Can I get three agents working?

It became something much more important:

Can I build an AI team that remains lean, practical, and affordable as it grows?

-Reading about horror stories of API tokens reaching $400-$600 a month in just testing phase.

That question forced me to rethink the entire approach.

The Pivot That Changed Everything

Instead of treating OpenAI as the engine behind every task, I began experimenting with a different model:

Local AI first.

Cloud AI only when necessary.

The idea seemed promising.

If I could run smaller models locally, the system could handle most of its work without constantly calling external APIs. That meant more control, lower costs, and the ability to experiment freely without worrying about usage limits.

So I began deploying local models, experimenting with 7B, 14B, 24B and 32B parameter models — to handle the majority of the workflow.

In theory, this was exactly what I wanted.

In practice, it revealed a new problem.

Response times.

When tasks started moving through the local models, I quickly discovered that even reasonably capable hardware could take close to two minutes to generate a full response.

Two minutes might not sound like much at first, especially when you are used to OpenAI models, with instantaneous response.

But when multiple agents are collaborating — researching, drafting, reviewing, revising — those delays compound quickly.

A single workflow could stretch far longer than expected.

It became clear that running everything locally wasn’t going to be sustainable for interactive use.

So I made a strategic architectural decision.

Instead of forcing every agent onto the same model infrastructure, I split the system into two layers.

Brian, the orchestrator—the agent I interact with directly—runs on a cloud model, providing fast responses and a smooth interface for conversation and task assignment.

Meanwhile, Adam and Maya operate primarily on local models, handling research, drafting, and internal workflow tasks behind the scenes.

Brian coordinates the work between them, assigning tasks to Adam and Maya, gathering their outputs, and presenting the results back to me.

In practice, Adam and Maya perform most of the heavy lifting locally, while Brian manages the workflow and ensures the process moves forward efficiently.

Once the work is complete, Brian evaluates the results and determines whether the task is finished or whether it should be escalated to a stronger cloud model for deeper reasoning or final refinement.

In simple terms, the architecture became:

Local AI handles thinking and iteration.
Cloud AI handles speed and complex reasoning.

What started as a limitation — slow local inference — ended up shaping one of the most important architectural decisions in the entire experiment.

Instead of choosing between local or cloud AI, the system now uses both, each where it performs best.

Building A Real Digital Workspace For The Agents

As the system evolved, I realized something important.

If these agents were going to function like a team, they needed more than just language models.

They needed access to a real working environment.

So I began expanding the system’s capabilities.

Starting through the Google API ecosystem, the agents were given access to the full productivity suite:

• Gmail

• Google Calendar

• Google Docs

• Google Sheets

• Google Drive

This allowed Maya, the communications specialist, to do more than simply generate text. She could draft and respond to emails, prepare documents, organize information, and operate within the same productivity tools that human teams use every day.

Meanwhile, Adam, the research specialist, was given access to the internet, allowing him to gather open source information, analyze sources, and synthesize insights before passing structured findings to the rest of the system.

I also connected the agents to several of my company websites and digital properties, allowing them to review content and assist with SEO improvements, blog drafts, messaging updates, and website refinements. In effect, the agents could analyze existing content and propose improvements to strengthen search visibility, clarity, and audience engagement.

Behind the scenes, I created a GitHub repository where the system maintains configuration files, shared prompts, and development structure for the agents.

To maintain continuity across tasks, I asked and they created a relational database, allowing the agents to store workflow state, task history, and operational context.

Over time something interesting happened.

The agents were not just using this environment — they were managing it as a shared workspace.

Documents were created and updated as part of workflows.

Information was stored and referenced across tasks.

Research notes and drafts were passed between agents as part of collaborative processes.

In effect, the agents began operating inside their own digital office, where research, communication, documentation, and project state all lived together.

In many ways, the workspace began functioning much like a small operations team’s internal environment — except the day-to-day coordination of that environment was being handled by the AI agents themselves.

But one important rule remained.

Even though the agents could draft content, analyze websites, manage documents, and prepare updates, nothing was allowed to publish automatically.

Before any change could be pushed live—whether an email, a document, or a website update—it required my review and approval.

This mirrored how many real organizations operate.

The agents could behave like junior team members, preparing research, drafts, and recommendations.

But the final decision always passed through a human supervisor.

That design choice turned out to be essential.

It allowed the system to remain productive without becoming reckless, giving the agents autonomy to do meaningful work while ensuring that final responsibility stayed where it belongs: with the human operator.

Security and containment

Another important design consideration was containment.

Each agent runs on dedicated hardware that was wiped and reset before deployment, with separate operating system accounts and restricted permissions.

This separation serves two purposes.

First, it limits each agent’s access strictly to the tools and resources required for its role.

Second—and perhaps more importantly—it creates a safety boundary.

If something ever behaves unpredictably in the future, the system can be shut down, isolated, or reset without affecting other environments or systems.

Even though the system experiments with autonomous workflows, it was built around a simple principle:

Experiment boldly, but design for control.

When the system started behaving like a team

Of course, not everything worked smoothly.

In the early tests, the agents occasionally entered strange loops—rewriting the same draft repeatedly or escalating tasks unnecessarily. At one point, Maya produced five different versions of the same email because Adam kept asking for “slightly clearer messaging.”

It felt less like a perfectly engineered system and more like managing a very enthusiastic junior team.

But those early hiccups led to one of the most important breakthroughs in the system.

I introduced a review loop.

Instead of producing a single response and stopping there, the agents began operating more like a real editorial workflow:

Adam researches → Maya drafts → Adam critiques → Maya revises

The impact was immediate.

The output became clearer, more structured, and noticeably higher quality.

But the real surprise wasn’t just the improvement in the results.

Something about the experience changed.

For the first time, the system didn’t feel like a tool generating answers.

It felt like a collaborative workflow unfolding.

I wasn’t simply prompting a model anymore.

I was assigning work to a team and watching it move through a process.

A Real Task Example

To see how the system behaved in practice, I started giving it real-world tasks connected to projects I was already working on.

One example came from my work with Open Goal Soccer, an initiative focused on creating inclusive soccer opportunities for neurodiverse children.

The objective wasn’t just to write a message.

The real goal was to build awareness within the local community and identify potential referral partners who regularly interact with families that might benefit from inclusive programs.

So I asked the system to take on a more structured task:

Research the benefits of inclusive soccer programs for neurodiverse children and develop targeted outreach messaging for organizations within a 12-mile radius that serve neurodiverse families.

The target audience included several different community groups:

local schools and special education departments
pediatric medical offices and therapy centers
public service organizations
family support groups and advocacy organizations

Each of these groups interacts with families in different ways, so the communication couldn’t be generic.

The messaging needed to be adapted to each audience.

This is where the multi-agent workflow became especially useful.

Instead of producing a single response, the task moved through the AI team as a structured collaboration process.

The workflow looked like this:

Adam gathered research and summarized key findings about the developmental, social, and emotional benefits of inclusive sports programs for neurodiverse children.
Adam also identified relevant categories of organizations likely to serve families who might benefit from the program. The target audience also needed to serve families with children ages 8-15. For example, medical offices for cosmetic surgery was not our target.
Maya translated that research into clear outreach messaging designed for each audience segment—schools, medical professionals, and family support organizations.
Adam then reviewed the drafts, checking for missing context, weak connections to the research, and opportunities to strengthen the value proposition.
Maya revised the communications using Adam’s feedback, improving clarity and tailoring the language to each specific audience.
Finally, Brian, acting as the orchestrator, evaluated the overall output and determined whether the workflow was complete or required additional refinement using his Cloud model.

By the time the task reached me, it wasn’t simply a generated email.

It was the result of a structured collaboration process involving research, audience segmentation, messaging strategy, critique, and revision.

In other words, instead of receiving a single AI response, I was reviewing the output of a small digital team working through a real operational task.

That difference may seem subtle, but it fundamentally changes how you interact with AI.

You’re no longer asking a tool for an answer.

You’re assigning a problem and allowing a team to work through it.

And the best part?

After my final review and approval, the team actually handled the outreach and sent the 352 email communications out.

The Hardest Problems Were Not Technical

Interestingly, the hardest challenges were not coding problems, since now we have AI to help me write and review all the python coding.

They were design problems.

Questions like:

• How do agents know when their work is good enough?

• When should a task escalate to a stronger model?

• How do you prevent agents from repeating the same work?

• How much autonomy should each agent have?

These feel less like software engineering questions and more like organizational design challenges.

Building multi-agent AI systems forces you to think like a manager, not just a developer.

The Stack Behind The Experiment

The architecture in simple terms

The system runs across three primary AI roles:

Brian — the orchestrator - Coordinates workflows and determines when tasks are complete or require escalation.

Mac mini (Late 2014)

Processor 2.6 GHz Dula-Core Intel Core i5

Memory: 16gb 1600 Mhz DDR3

Storage Capacity: 1 TB

macOS: Monterey version 12.7.6

Adam — the research specialist - Handles analysis, research gathering, and structured reasoning.

Mac Studio 2025

Chip: Apple M3 Ultra

Memory: 256 GB Storage Capacity: 1 TB

MacOs: Tahoe 26.3

Maya — the communications specialist - Transforms research into clear messaging and outreach.

Macbook Pro(Retina, 13-inch, Early 2015)

Processor 3.1Ghz Dua-Core Intel Core i7

Memory: 16gb 1867 Mhz DDR3

Storage Capacity: 1 TB

macOS: Monterey version 12.7.6

Tasks typically move through the system like this:

Task → Brian → Adam (research)

↓

Maya (draft)

↓

Adam (review)

↓

Maya (revision)

↓

Brian (evaluation)

For those curious about the architecture, the system currently runs on a lightweight distributed stack designed for multi-agent collaboration.

Agent runtime

OpenClaw

Orchestration

LangGraph
FastAPI

AI models

OpenAI ChatGPT 5.3 Codex
Mistral-Small 24B
Qwen 2.5 models

Local model runtime

Ollama

Communication interface

Workspace integrations

Google APIs
(Gmail, Calendar, Docs, Sheets, Drive)

Development infrastructure

GitHub repository
Python worker services
SQLite database

Operational capabilities

distributed worker nodes
automatic worker registration
heartbeat health monitoring
task state persistence
approval layers and budget control

The architecture is intentionally modular so new agents can be introduced without redesigning the entire system.

The Unexpected Lesson

The most surprising realization from this journey is this:

The future of AI may not be a single super-intelligent model.

It may be teams of specialized models working together.

Each with strengths.

Each with limitations.

Coordinated through structured workflows.

In other words, we may be rediscovering something very familiar:

Teams still matter.

Even when the team members are artificial.

Where This Experiment Goes Next

The next stage of this project is exploring how these agents can support real operational workflows such as:

• research pipelines

• communication campaigns

• automated analysis

• structured collaboration loops

I’m also continuing to experiment with ways to make the system more resilient, autonomous, and scalable.

Because the real goal isn’t simply building clever technology.

It’s learning how to design human-centered AI systems that actually work in practice.

Final Reflection

Building this system has been a reminder that innovation rarely follows a straight line.

There are breakthroughs.

There are dead ends.

There are moments when the architecture finally clicks.

But most importantly, there is learning.

And that’s the real reward of the journey.

If you’re experimenting with AI orchestration, agent frameworks, or multi-agent architectures, I’d love to hear what you’re discovering.

What’s working?

What surprised you?

What questions are you wrestling with?

Because one thing is clear:

We are only at the beginning of figuring this out

Building an AI Team: Lessons from My Multi-Agent Experiment

A Quick Introduction

The Original Idea

The First Prototype (And The First Major Problem)

The Pivot That Changed Everything

Building A Real Digital Workspace For The Agents

Security and containment

When the system started behaving like a team

A Real Task Example

The Hardest Problems Were Not Technical

The Stack Behind The Experiment

The architecture in simple terms

The Unexpected Lesson

Where This Experiment Goes Next

Final Reflection

Recent Posts

Comments

How we help organizations

Leadership Ideas & Insights

Not sure where to start?
Begin with clarity.

Working Across New York City

Policies

A Quick Introduction

The Original Idea

The First Prototype (And The First Major Problem)

The Pivot That Changed Everything

Building A Real Digital Workspace For The Agents

Security and containment

When the system started behaving like a team

A Real Task Example

The Hardest Problems Were Not Technical

The Stack Behind The Experiment

The architecture in simple terms

The Unexpected Lesson

Where This Experiment Goes Next

Final Reflection

Comments

How we help organizations

Leadership Ideas & Insights

Not sure where to start? Begin with clarity.

Working Across New York City

Policies

Not sure where to start?
Begin with clarity.