Prototyping for Agent-First Apps

When you build an agent-powered app, the instinct is to start with the app — set up a project, install dependencies, write scaffolding. Then somewhere in the middle, you start figuring out what the agent should actually do.

This is backwards.

Agent-first means starting with the agent. Get the brain working first. Once the agent behaves the way you want, expand outward: add tools, then build the shell around it. The agent is the product — everything else is infrastructure.

This matters because the agent will keep evolving. Prompts change, capabilities expand, behavior gets refined. If the agent is tangled with your application code, every change risks breaking something unrelated. Keep the brain separate from the body, and both can evolve on their own terms.

This guide uses Perstack — a toolkit for agent-first development. In Perstack, agents are called Experts: modular micro-agents defined in plain text (perstack.toml), executed by a runtime that handles model access, tool orchestration, and state management. Perstack supports multiple LLM providers including Anthropic, OpenAI, and Google. You define what the agent should do; the runtime makes it work.

Prerequisites: Node.js 22+ and an LLM API key.
Terminal window
export ANTHROPIC_API_KEY=sk-ant-...

What an Expert looks like

An Expert is defined in a perstack.toml file:

[experts."reviewer"]
description = "Reviews code for security issues"
instruction = """
You are a security-focused code reviewer.
Check for SQL injection, XSS, and authentication bypass.
Explain each finding with a severity rating and a suggested fix.
"""

That’s the entire definition. No SDK, no boilerplate, no orchestration code. Run it immediately:

npx perstack start reviewer "Review this login handler"

perstack start opens a text-based interactive UI where you can watch the Expert reason and act in real time.

From idea to agent in one command

Writing TOML by hand works, but there’s a faster way. create-expert is a CLI that generates Expert definitions from natural language descriptions — it’s itself an Expert that builds other Experts.

npx create-expert "A code review assistant that checks for security vulnerabilities, suggests fixes, and explains the reasoning behind each finding"

create-expert takes your description, generates a perstack.toml, test-runs the Expert against sample inputs, and iterates on the definition until behavior stabilizes. You get a working Expert — no code, no setup.

The description doesn’t need to be precise. Start vague:

npx create-expert "Something that helps with onboarding new team members"

create-expert will interpret your intent, make decisions about scope and behavior, and produce a testable Expert. You can always refine from there.

Iterate by talking

create-expert reads the existing perstack.toml in your current directory. Run it again with a refinement instruction, and it modifies the definition in place:

npx create-expert "Make it more concise. It's too verbose when explaining findings"

npx create-expert "Add a severity rating to each finding: critical, warning, or info"

npx create-expert "Run 10 tests with different code samples and show me the results"

Each iteration refines the definition. The Expert gets better, and you never open an editor.

Test with real scenarios

Prototyping isn’t just about getting the agent to run — it’s about finding where it fails.

Write a test case that your agent should catch. For the code reviewer, create a file with a deliberate vulnerability:

npx create-expert "Read the file test/vulnerable.py and review it. It contains a SQL injection — make sure the reviewer catches it and suggests a parameterized query fix"

If the reviewer misses it, you’ve found a gap in the instruction. Refine and test again:

npx create-expert "The reviewer missed the SQL injection in the raw query on line 12. Update the instruction to pay closer attention to string concatenation in SQL statements"

This is the feedback loop that matters: write a scenario the agent should handle, test it, fix the instruction when it fails, repeat. By the time you build the app around it, you already know what the agent can and can’t do.

Evaluate with others

At some point you need feedback beyond your own testing. perstack start makes this easy — hand someone the perstack.toml and they can run the Expert themselves:

npx perstack start reviewer

The interactive UI lets them try their own queries and see how the Expert responds. No app to deploy, no environment to configure beyond the API key.

Every execution is recorded as checkpoints in the local perstack/ directory. After a round of feedback, inspect what happened:

npx perstack log
npx perstack log --tools    # what tools were called
npx perstack log --errors   # what went wrong

You can review specific runs, filter by step, or export as JSON for deeper analysis. See the CLI Reference for the full set of options.

This gives you a lightweight evaluation workflow: distribute the TOML, collect usage, analyze the logs, refine the instruction.

When your prototype grows

At some point, your prototype will need more. The same perstack.toml scales — you’re not throwing away work.

The agent needs tools — search the web, query a database, call an API → Extending with Tools
The prompt is getting long — split into multiple Experts that collaborate → Taming Prompt Sprawl
The prototype works — embed it into your application → Adding AI to Your App

What’s next

Making Experts — Learn the full Expert definition format
Best Practices — Design patterns for effective Experts