Sunday

Specs as Boundaries: A Lightweight SDD Workflow for AI Coding Agents

AI coding agents are no longer just autocomplete tools. They can create project structures, implement features, write tests, refactor code, and explain their decisions.

That is exactly why they need better boundaries.

When an agent is weak, the problem is getting it to produce useful code. When an agent is strong, the problem becomes controlling what it changes, what it ignores, and how it stays aligned with the product.

If we only prompt the agent directly, the project can drift.

A single prompt may be clear in your head, but the agent does not always know the product boundary, the non-goals, the previous decisions, or the intended lifecycle of a feature. Over time, this creates a familiar mess:

“Why did the agent implement this extra thing?”
“Why did it change unrelated code?”
“Where is the actual product behavior documented?”
“Which spec is current and which one was just a temporary plan?”
“Why is the README saying one thing and the code doing another?”

This is where Spec-Driven Development, or SDD, becomes useful.

In this post, I want to explain SDD from a practical point of view: not as theory, not as a strict enterprise process, and not as a replacement for engineering judgment. I see it as a lightweight workflow for building software with AI coding agents.

I will use the workflow I used in my own Java CLI project, AgentTokenInsight, as the example.


I do not see SDD as a replacement for coding, testing, or engineering judgment.

I also do not see it as a heavy enterprise process that every tiny change must follow.

For me, lightweight SDD is useful when an AI coding agent is powerful enough to change a lot of code quickly. The goal is not to slow the agent down. The goal is to give it a clear boundary.

Use SDD for product behavior, APIs, CLI behavior, libraries, compatibility-sensitive changes, and multi-step features.

Do not force it for typos, tiny refactors, throwaway scripts, or experiments.

The point is not documentation. The point is controlled delegation.


What is Spec-Driven Development?

Spec-Driven Development means we do not jump directly from idea to implementation.

Instead, every meaningful change goes through a small lifecycle:

propose → review → apply → test → sync → archive

In plain English:

  1. First, define what should change.
  2. Clarify why it should change.
  3. Write the expected behavior.
  4. List tasks.
  5. Review the scope before implementation.
  6. Implement only that approved scope.
  7. Run tests.
  8. Update the current product spec if behavior changed.
  9. Archive the completed change.

The propose step can create several files, such as:

proposal.md
design.md
tasks.md
spec.md

This is not the same as writing huge requirements documents. In fact, it should be the opposite.

A good SDD workflow should be small, practical, and close to the code.


Why SDD Matters More with AI Coding Agents

Traditional development already benefits from clear specs. But with AI agents, the value becomes much higher.

Why?

Because AI agents are powerful, but they do not always know the real boundary of a task.

Imagine you have a large Java backend project and you ask the agent:

Fix the login timeout bug.

To solve that bug, the agent may only need to inspect a few files:

src/main/java/com/example/auth/LoginService.java
src/main/java/com/example/auth/SessionManager.java
src/test/java/com/example/auth/LoginServiceTest.java

But if the task is not scoped, the agent may start exploring much more:

- all authentication classes
- security configuration
- application.yml
- web.xml
- all session-related tests
- unrelated user-management code
- generated files
- build output
- old archived code

At that point, the agent is not necessarily “wrong”. It is trying to understand the system. But the context has become too large, the token cost has increased, and the change is now harder to review.

The original task was:

Fix the login timeout bug.

But the actual work may turn into:

Analyze half of the authentication module, read several config files, update tests, and maybe touch unrelated code.

This is the kind of drift SDD helps prevent.

Before implementation, the spec can define the boundary clearly:

Scope:
- investigate login timeout behavior
- start with LoginService and SessionManager
- update related tests only if needed

Non-goals:
- refactoring the whole authentication module
- changing security configuration
- changing production config files
- touching unrelated user-management code

Now the agent has a much clearer task. It can still ask to expand the scope, but it should not silently turn a focused bug fix into a broad project investigation.

This gives the AI agent a much better box to work inside.


The Workflow I Use

I use a lightweight internal spec workflow with this structure:

spec/
  WORKFLOW.md
  project.md
  current/
    agenttokeninsight.md
  changes/
    .gitkeep
  archive/

And project-local Codex skills:

.codex/skills/
  spec-propose/
    SKILL.md
  spec-apply/
    SKILL.md
  spec-sync/
    SKILL.md
  spec-archive/
    SKILL.md
  spec-explore/
    SKILL.md

The commands are intentionally simple:

$spec-propose add-feature-name
$spec-apply add-feature-name
$spec-sync add-feature-name
$spec-archive add-feature-name
$spec-explore

These are not universal built-in commands. In my project, they are project-local Codex skills that enforce the workflow.

The important point is not the exact command name. The important point is the lifecycle.


Influences

This workflow is my own lightweight version for this project, but it is influenced by the broader Spec-Driven Development ecosystem.

Tools and projects such as OpenSpec and GitHub Spec Kit helped popularize the idea of using structured specs to guide AI coding agents.

I did not want to copy a full framework into this project. Instead, I kept the parts that matched my needs: small proposals, explicit non-goals, living specs, and archived change history.


The Core Folder Roles

spec/project.md

This is the project guidance file.

It contains things like:

- project purpose
- tech stack
- constraints
- naming decisions
- architectural boundaries
- things not to add unless explicitly requested

For AgentTokenInsight, this includes guidance like:

- Java 25
- Gradle Kotlin DSL
- Picocli
- Jackson YAML and JSON
- JUnit 5
- AssertJ
- deterministic approximate token estimates
- no Spring Boot, no web dashboard, no database unless later specified

This file helps the agent understand the project before touching code.

spec/current/agenttokeninsight.md

This is the living spec.

The living spec is not the dream, the roadmap, or the original proposal.

It is the current observable behavior of the product.

If the code changes user-visible behavior, the living spec should change too. If the code only refactors internals, the living spec probably should not change.

For example:

AgentTokenInsight must support loading configuration from agenttoken.yml.
AgentTokenInsight must support scanning a repository from a root directory.
AgentTokenInsight must support generating Markdown reports.
AgentTokenInsight must support JSON output from agenttoken scan --format json.

This file is the product truth.

spec/changes/<change-name>/

Every active change gets its own folder:

spec/changes/add-scan-insights/
  proposal.md
  design.md
  tasks.md
  spec.md

This is where new work is planned before implementation.

spec/archive/

When a change is completed, it moves to archive.

Archive is history. Do not rewrite it casually.


What Goes into a Change?

A typical change contains four files.

proposal.md

The proposal answers:

What are we changing?
Why are we changing it?
What is in scope?
What is out of scope?

Example:

# add-cli-json-format-option

## Summary
Add a --format option to agenttoken scan so the CLI can render either Markdown or JSON output.

## Why
Human users need Markdown by default, while automation needs JSON.

## Scope
- agenttoken scan continues to print Markdown by default
- agenttoken scan --format markdown prints Markdown
- agenttoken scan --format json prints JSON
- invalid format values fail clearly

## Non-goals
- changing repository scanning behavior
- changing budget validation behavior
- writing reports to files
- adding GitHub Actions

The non-goals are extremely important. They reduce agent drift.

design.md

The design explains the approach.

It does not need to be long.

Example:

# Design

The scan command should treat report format as an output concern only.

The command should map:
- markdown -> MarkdownReportGenerator
- json -> JsonReportGenerator

The scanner and validator must not change.

tasks.md

This is the implementation checklist.

Example:

# Tasks

- [ ] Add --format option to scan.
- [ ] Keep Markdown as default.
- [ ] Route JSON output through JsonReportGenerator.
- [ ] Fail clearly on unsupported format values.
- [ ] Add tests for default Markdown.
- [ ] Add tests for JSON output.
- [ ] Add tests for invalid format.

spec.md

This describes observable behavior.

Example:

# AgentTokenInsight Scan Format Selection

## Requirement 1: AgentTokenInsight must support selecting scan report format from the CLI

The CLI must provide a --format option for agenttoken scan.
When the option is omitted, the command must print Markdown by default.
When --format markdown is provided, the command must print Markdown.
When --format json is provided, the command must print JSON.
The command must fail clearly when an unsupported format value is provided.

### Scenario: Markdown is the default
Given a user runs agenttoken scan without specifying --format
When the command completes successfully
Then the CLI prints Markdown output

### Scenario: JSON is requested explicitly
Given a user runs agenttoken scan --format json
When the command completes successfully
Then the CLI prints JSON output

This gives the agent a testable contract.


Apply: Implementation Comes After Review

After the change is reviewed, implementation starts.

The apply step should read:

spec/project.md
spec/changes/<change-name>/proposal.md
spec/changes/<change-name>/design.md
spec/changes/<change-name>/tasks.md
spec/changes/<change-name>/spec.md

Then it should implement only that scope.

The rule is:

Do not implement non-goals.
Do not sync living specs automatically.
Do not archive automatically.

That separation matters.

Implementation is one step.
Spec sync is another.
Archiving is another.

This keeps the workflow controlled.


Sync: Updating the Living Spec

After code is implemented and tested, the living spec must be updated if product behavior changed.

This is the step many teams skip.

They write a design once, implement the code, and then the design becomes stale.

That is not a living spec. That is documentation debt.

For example, after adding JSON output, the current spec should now include:

AgentTokenInsight must support agenttoken scan --format json.

But it should not include implementation notes that are no longer relevant.

The living spec should describe the product, not the temporary development plan.


Archive: Keeping History Without Polluting Current Truth

Once a change is implemented and the living spec is updated, the change folder moves to archive:

spec/changes/add-cli-json-format-option/

becomes:

spec/archive/add-cli-json-format-option/

The archive is useful because it answers:

Why did we add this?
What was the original scope?
What were the non-goals?
What tasks were done?

But archive is not the current product truth.

The living spec is.


When Should You Update the Living Spec?

You should update the living spec when behavior changes.

Good reasons to update:

- new CLI command
- new config option
- changed output format
- changed validation behavior
- changed error handling
- changed public API
- changed generated report structure

You probably do not need to update the living spec for:

- internal refactor with no behavior change
- test-only cleanup
- formatting changes
- dependency patch with no observable behavior change
- README typo fix

A simple rule:

If a user, caller, or integrator can observe the change, update the living spec.


SDD and Token Usage

Spec-Driven Development can reduce token waste when working with AI agents.

Not because specs are free. They are not. Specs also consume tokens.

But SDD reduces the expensive kind of waste:

- repeated explanation
- agent misunderstanding
- unrelated code changes
- rework
- large context dumps
- rediscovering project decisions
- debugging wrong assumptions

Without SDD, you may keep writing prompts like:

No, don't change the CLI.
No, don't add JSON yet.
No, don't touch scanner behavior.
No, the config file is agenttoken.yml now.
No, Java packages should stay under com.agenttoken.

With SDD, these decisions live in project guidance and current specs.

That makes prompts shorter.

Instead of explaining the whole project again, you can say:

$spec-apply add-scan-insights
Implement only this approved change.

The agent has a smaller and clearer target.

SDD Uses Tokens Upfront to Save Tokens Later

This is the tradeoff.

SDD adds upfront writing:

proposal.md
design.md
tasks.md
spec.md

But it can save tokens later by reducing:

misimplementation
backtracking
scope creep
review noise
repeated context

For small changes, the overhead may not be worth it.

For medium or complex changes, it often is.


Tradeoffs of Spec-Driven Development

SDD is not magic. It has costs.

It also does not make bad intent safe. If the spec is vague, wrong, or too broad, the agent may still produce the wrong thing - just with more confidence.

Benefits

- clearer feature boundaries
- better AI agent control
- fewer unrelated changes
- better review process
- current product behavior is documented
- easier onboarding
- easier release preparation
- more stable architecture decisions

Costs

- more files
- more process
- slower start for tiny changes
- living specs require discipline
- bad specs can mislead agents
- too much detail can waste tokens

The goal is not to write more documentation.

The goal is to write just enough structure to make implementation safer and faster.


When SDD Makes Sense

SDD is a good fit when:

- you are using an AI coding agent heavily
- the project has multiple features
- scope control matters
- behavior should remain stable
- you want repeatable implementation flow
- multiple sessions or contributors will touch the project
- release quality matters

It is especially useful for:

- CLI tools
- libraries
- APIs
- developer tools
- automation tools
- products with evolving behavior

AgentTokenInsight is a good example because it has:

- CLI commands
- config files
- validators
- reports
- packaging
- user-visible behavior

That kind of project benefits from a living spec.


When SDD Is Too Much

SDD may not be worth it for:

- a 30-line script
- a quick local experiment
- throwaway prototype
- one-time data conversion
- purely visual exploration

For those, a simple checklist or short prompt is enough.

Do not turn SDD into ceremony.

The point is control, not bureaucracy.


Example: AgentTokenInsight Feature Flow

Here is a real-style flow.

Step 1: Propose

$spec-propose add-scan-insights

Creates:

spec/changes/add-scan-insights/
  proposal.md
  design.md
  tasks.md
  spec.md

Proposal:

## Summary
Add actionable insights to scan reports.

## Why
Users need to understand why a scope is too large and what to do next.

## Scope
- Add Insights section to Markdown reports.
- Add insights array to JSON reports.
- Generate deterministic insights from scan and validation results.

## Non-goals
- AI-generated recommendations
- model-specific tokenization
- changing scanner behavior
- changing CLI commands

Step 2: Review

Before coding, review the spec.

Ask:

Is the scope small?
Are non-goals clear?
Is the behavior testable?
Could this accidentally change existing behavior?

Step 3: Apply

$spec-apply add-scan-insights

The implementation should only do what the change says.

Step 4: Test

./gradlew test

Step 5: Sync

$spec-sync add-scan-insights

Update:

spec/current/agenttokeninsight.md

Step 6: Archive

$spec-archive add-scan-insights

Move the completed change to:

spec/archive/add-scan-insights/

This gives us both current truth and historical context.


How This Helps AI Agent Work

The most important benefit is not documentation.

The most important benefit is controlled delegation.

When working with an AI coding agent, you are delegating implementation. SDD gives the agent a contract.

Without SDD:

“Add scan insights.”

With SDD:

“Implement the approved add-scan-insights change.
Read proposal, design, tasks, and spec.
Do not implement non-goals.
Do not change scanner behavior.
Add tests.”

That is a much better instruction.

It reduces ambiguity.

It also gives you better review points:

Review proposal before implementation.
Review implementation before sync.
Review living spec before archive.

The Future of SDD

I think SDD will become more important as AI coding agents become more capable.

Why?

Because the more powerful the agent, the more important the boundary becomes.

A weak assistant needs help writing code.
A strong assistant needs clear constraints.

Future development may look less like:

developer writes every line of code

and more like:

developer defines intent, constraints, tests, and product boundaries
agent implements
developer reviews and steers

In that world, specs are not bureaucracy.

Specs are the interface between human intent and agent execution.

A good spec tells the agent:

what to do
why it matters
what not to do
how success is observed
where current truth lives

That is exactly the kind of structure AI development needs.


Practical Rules for Lightweight SDD

1. One change at a time

Do not combine unrelated changes.

Good:

add-json-output

Bad:

add-json-output-and-github-actions-and-refactor-scanner

2. Always write non-goals

Non-goals are often more important than scope.

They stop agent drift.

3. Keep design short

A design does not need to be a novel.

A few clear paragraphs are enough.

4. Requirements should be observable

Avoid vague specs like:

The system should be better.

Prefer:

When agenttoken scan --format json runs, the output must be valid JSON and include scannedFileCount.

5. Do not sync before implementation

The living spec should describe current behavior, not planned behavior.

6. Archive is history

Do not rewrite old archived changes just because the product was renamed later.

If the current behavior changed, update current spec.
If history says something old, let it remain history.

7. Use SDD where it pays off

Use it for product behavior.

Do not force it for every typo.


Conclusion

Spec-Driven Development is not about writing documents for the sake of documents.

It is about creating a lightweight control system for software changes, especially when AI coding agents are involved.

For me, the useful pattern is:

propose → review → apply → test → sync → archive

The propose step may create proposal.md, design.md, tasks.md, and spec.md. The important part is that implementation starts only after the scope is clear enough to review.

This gives the AI agent enough structure to work effectively without giving it unlimited freedom to reshape the project.

In AgentTokenInsight, this workflow helped keep changes small, reviewable, and aligned with the product goal.

The biggest lesson is simple:

The stronger the coding agent becomes, the more valuable clear specs become.

Not because the agent is weak.

Because the agent is powerful.

And powerful tools need boundaries.

No comments:

Post a Comment