← Back to all work

I didn't use AI to build a design system. I built a design system that AI can use.

Stack: Vue 3, CSS Custom Properties, Figma, Storybook 10, Figma MCP.

Most AI-assisted design system work follows the same pattern: a designer prompts an AI to scaffold components, generate tokens, or write documentation. The AI builds things for you. But the output is static — once generated, the system can't maintain itself.

This experiment asks a different question: what if the design system was structured so an AI agent could operate within it autonomously — auditing tokens, catching drift between Figma and code, consolidating redundancy, and keeping both sides in sync?

My role

As Design System Champion at Bauer Media Group, I'm always looking for ways to make our system more maintainable. Rather than experimenting directly in a production codebase, I used my portfolio's design system as a proving ground — a real system with real complexity, where I could validate the approach before bringing it into my workflow at work.

The hypothesis

Vallaure's framework for agentic design systems identifies six structural requirements: a variables architecture, property alignment, complete state design, slots, auto-layout with semantic naming, and Code Connect. The core insight is that a design system is no longer documentation for developers — it's instructions for a machine.

I wanted to test this with a real system. Not a demo with two buttons and a colour palette, but the actual design system powering this portfolio — with 468 tokens, 23 components, dark mode theming, and two canonical Figma files that needed to stay in sync with the codebase.

The hypothesis: if the token architecture is semantically layered, the component descriptions are machine-readable, and the Figma structure mirrors the code structure, then an agent should be able to perform design system maintenance tasks that currently require a human designer.

Token architecture

The foundation is a three-layer token system. Primitive tokens hold raw values — hex colours, pixel sizes, font stacks. Semantic tokens alias primitives by intent: --color-text-primary, --color-surface-card. Component tokens scope to specific UI patterns like the Figma-style widget chrome or Storybook panel.

This layering is what makes the system machine-readable. When an agent encounters --color-text-primary, it doesn't need to understand colour theory — it just needs to follow the chain: semantic → primitive → raw value. Dark mode works the same way: the semantic layer swaps which primitives it points to, and every component updates automatically.

Primitive
Raw values named by hue + scale. Never used directly in components. Example tokens: --color-primitive-teal-500, --color-primitive-purple-600, --color-primitive-neutral-750, --color-primitive-neutral-50, --color-primitive-indigo-100, --color-primitive-indigo-800.
Semantic
Intent-based aliases. Components reference these — they swap in dark mode. Example tokens: --color-text-primary (→ Neutral/750 light, → Indigo/100 dark), --color-surface-card (→ Neutral/50 light, → Indigo/800 dark), --color-primary (→ Teal/500), --gradient-brand (Teal/500 → Purple/600).
Component
Scoped to specific UI patterns like widgets and panels. Example tokens: --color-widget-bg, --color-widget-accent, --color-panel-bg, --color-panel-focus.

Every token in code has a corresponding entry in the Figma Design Tokens file. The primitives are documented as raw swatches. The semantic aliases show which primitive they reference with an arrow notation (→ Neutral/750). This means an agent reading the Figma file via MCP gets the same information as an agent reading tokens.css — the mapping is explicit, not implicit.

Dark mode as proof

Dark mode isn't just a feature — it's the simplest proof that the token architecture works. If every component references semantic tokens, and the semantic layer swaps its primitive bindings under [data-theme="dark"], then dark mode is automatic. No component needs to know it's in dark mode.

This was validated in Storybook 10, where a background toggle sets the data-theme attribute and every component responds through CSS custom properties. The agent was able to identify components that weren't responding (BentoCard labels and Icon names had hardcoded text colours) and fix them by adding var(--color-text-primary) references.

Component descriptions

The Figma MCP reads component descriptions and passes them to the agent as context. This is the bridge between design and code. Each of the 23 components has a description that documents: what it renders, which tokens it uses, what props it accepts, what states it has, and how it behaves on interaction.

These descriptions aren't written for humans browsing Figma — they're written for an agent that needs to decide which component to use, what tokens to reference, and how the component will behave. It's the difference between "A card component" and "Base card wrapper. Uses --color-surface-card background, --color-border-card border, --color-shadow-card-* elevation. Click spawns a ripple at --color-card-ripple. Accepts dark boolean prop for --color-surface-card-dark variant."

Agent workflows

With the system structured, I tested four autonomous agent workflows — tasks a human designer would normally do manually. Each one succeeded because the agent could read the token architecture, cross-reference Figma and code, and make decisions based on semantic naming.

Drift audit
Input: Agent scans every CSS token against Figma variables. Finding: Found 35 semantic tokens in code missing from Figma. Outcome: Synced — all tokens now in both files.
Token consolidation
Input: Text/Primary (#2c2c2c) vs Text/Body (#181818) — two near-black text tokens. Finding: Agent identified they served the same purpose. Outcome: Text/Body aliased to Text/Primary in both code and Figma.
Unused token cleanup
Input: Agent grepped all 468 tokens against every component file. Finding: Found 9 dead tokens: Surface/Bulb On, Surface/Ceiling Mount, Gradient Start/Mid/End, plus 4 overlay tokens. Outcome: Removed from tokens.css — Figma checklist generated.
Font mismatch
Input: Agent inspected Data/Streak text style via Figma MCP. Finding: Flagged — using Inter instead of Fredoka for streak counter. Outcome: Corrected in Figma to match code's --font-family-display.

The critical insight from these workflows: the agent wasn't following a script. For the drift audit, it decided to grep every token against every component, identified which ones were unused, cross-referenced the Figma file, and produced a Figma cleanup checklist — all from a single prompt asking it to "check if code and Figma are in sync." The system's structure gave the agent enough context to make judgment calls.

Figma as source of truth

The system uses two canonical Figma files. The Design Tokens file holds every variable, text style, and colour swatch — organised into Primitive and Semantic sections that mirror the CSS custom property structure. The Design System file holds all 23 components organised in Figma sections, each with descriptions the MCP can read.

This separation matters for agents. When the agent needs to audit token coverage, it reads the Tokens file. When it needs to understand a component's API, it reads the Design System file. The agent knows which file to query because the architecture is explicit — not a single monolithic file where everything is mixed together.

Results

  • 468 — tokens synced between code and Figma
  • 23 — components with machine-readable descriptions
  • 9 — unused tokens found and removed by agent
  • 4 — autonomous agent workflows validated

What I learned

The biggest lesson: quality becomes measurable. When every token has a semantic name, every component has a description, and every Figma variable maps to a CSS custom property, an agent can audit the entire system and tell you exactly where the gaps are. "Design system health" stops being a feeling and starts being a number.

The limitation I hit was Code Connect — Figma's official mapping between components and code files requires an Organisation plan. But the component descriptions effectively serve the same purpose for an agent: they document the file path, props, tokens, and behaviour. The system works without the enterprise tooling.

The risk Vallaure warns about is real: fast generic systems produce forgettable output. An agent assembling from a poorly crafted design system will produce bland interfaces. But an agent operating within a system that has visual intentionality — deliberate colour choices, considered typography scales, opinionated spacing — produces output that looks designed. The craft isn't in the assembly. It's in the vocabulary the agent assembles from.

The design system is no longer just documentation for developers. It's instructions for a machine. And the designer's job is to make those instructions worth following.

Written by Alex Chiu, Senior Product Designer in London. Contact: alex@mchiu.co.uk.