Best Practices

These principles help you avoid common pitfalls in agent development: monoliths, complexity explosions, debugging nightmares, and fragile systems. Building a large agent head-on almost always fails.

The key insight: we tend to over-control, but LLMs work best when you trust their reasoning and define goals rather than procedures. These principles are grounded in the Hard Signal Framework — the design philosophy behind Perstack’s architecture.

The Five Principles

Do One Thing Well
Trust the LLM, Define Domain Knowledge
Let Them Collaborate
Keep It Verifiable
Ship Early

Do One Thing Well

Pitfall: Experts that do everything eventually break under their own weight.

Bad — An Expert that handles too many responsibilities:

[experts."assistant"]
description = "Handles customer inquiries, writes reports, schedules meetings, and manages expenses"

Good — Focused Experts with clear boundaries:

[experts."customer-support"]
description = "Answers customer questions about products and orders"

[experts."report-writer"]
description = "Creates weekly summary reports from data"

[experts."scheduler"]
description = "Finds available time slots and books meetings"

When something goes wrong in a monolith, you can’t tell which part failed. Focused Experts are easier to debug, test, and improve independently.

Trust the LLM, Define Domain Knowledge

Pitfall: Step-by-step instructions that become unmaintainable.

Bad — Every requirement change means rewriting the entire procedure:

instruction = """
1. First, greet the customer
2. Ask for their order number
3. Look up the order
4. If shipped, give tracking number
5. If not shipped, apologize and give estimated date
"""

Good — Domain knowledge lets the LLM adapt:

instruction = """
You are a customer support specialist for an online store.

Key policies:
- Orders ship within 2 business days
- Free returns within 30 days
- VIP customers (order history > $1000) get priority handling

Tone: Friendly but professional. Apologize for delays, offer solutions.
"""

The LLM knows how to have a conversation. What it doesn’t know is your company’s policies — that’s domain knowledge.

Let Them Collaborate

Pitfall: Monolithic agents that can’t be reused, tested, or improved independently.

Bad — A monolith that only the original author can maintain:

[experts."event-planner"]
instruction = """
Plan the company event: survey preferences, find venue, arrange catering, send invitations.
"""

Good — Modular Experts that anyone can reuse and improve:

[experts."event-coordinator"]
delegates = ["venue-finder", "caterer", "invitation-sender"]

[experts."venue-finder"]
description = "Finds and books venues for given date and capacity"

Modular Experts unlock collaboration — between Experts, and between people. The same venue-finder works for any event. One person improves caterer while another builds invitation-sender. Test each Expert independently. Replace one without touching others.

Keep It Verifiable

Pitfall: Experts whose output can only be checked by another LLM.

If the only way to verify an Expert’s output is to have another model (or the same model) review it, the verification loop is soft. The agent will oscillate — “looks good” one iteration, “has issues” the next — without converging.

Bad — Verification relies solely on LLM judgment:

[experts."code-generator"]
instruction = "Generate TypeScript code for the given task."
delegates = ["code-reviewer"]

[experts."code-reviewer"]
instruction = "Review the generated code for correctness and style."

The reviewer uses the same kind of judgment as the generator. It can miss the same bugs, approve the same anti-patterns, and disagree with itself across runs. As the sole gate, this is a soft signal loop — the system oscillates.

Good — Hard signals as the final authority, with an optional soft gate for semantic checks:

[experts."builder"]
delegates = ["code-writer", "reviewer", "verifier"]

[experts."reviewer"]
description = "Checks whether the code reflects the requirements. Returns PASS or CONTINUE."
instruction = """
Read the requirements and the generated code.
Check whether each requirement is addressed. Flag omissions.
Do NOT evaluate code quality — that is the verifier's job.
"""

[experts."reviewer".skills."@perstack/base"]
type = "mcpStdioSkill"
command = "npx"
packageName = "@perstack/base"
pick = ["readTextFile", "attemptCompletion"]

[experts."verifier"]
description = "Executes hard signal checks against the code. Returns PASS or CONTINUE with specific failures."
instruction = """
Run the verification commands. Compare actual output against expected.
Report pass/fail per check. Do NOT read the code and form opinions.
"""

[experts."verifier".skills."@perstack/base"]
type = "mcpStdioSkill"
command = "npx"
packageName = "@perstack/base"
pick = ["readTextFile", "exec", "attemptCompletion"]

The reviewer (soft gate) catches semantic misalignment early — “does the code address the requirements?” is a qualitative judgment that only an LLM can make. The verifier (hard gate) provides the final pass/fail — compiler errors, test failures, and structural checks that are deterministic and independent of LLM judgment. The reviewer has no exec; the verifier has exec. Neither replaces the other. See combining soft and hard signals for the full pattern.

“Verifiable” means the Expert’s output is ultimately checked by a process that does not depend on LLM judgment. Soft signals can supplement hard signals — catching semantic drift that no compiler can detect — but the final gate must be hard. When designing an Expert, ask: what hard signal can verify this Expert’s output? If the only answer is “another LLM reads it,” look for something harder — a compiler, a test suite, a schema validator, a screenshot diff.

Ship Early

Pitfall: Over-engineering for hypothetical scenarios.

Bad — Trying to handle every case before launch:

instruction = """
You are a travel assistant. Handle:
- Flight bookings (compare airlines, handle cancellations, rebooking)
- Hotel reservations (check availability, loyalty programs, special requests)
- Ground transportation (rental cars, trains, rideshare)
- Travel insurance (compare policies, process claims)
- Visa requirements (check by nationality, application assistance)
Support multiple languages and currencies.
"""

Good — Start minimal, expand based on real usage:

instruction = """
You are a flight booking assistant.
Help users find flights between cities.
For hotels or other travel needs, suggest they contact the full-service desk.
"""

Real users reveal the actual edge cases. A complex initial design often solves the wrong problems. Ship, observe, iterate.