How a Brutal Designer Review Made FlowPilot's Landing Page Redesign Easy

With help from Claude Design and Claude Code, and two months of foundation work that made it possible.

Magritte-inspired oil painting: a faceless French designer in a striped shirt and beret points a judgmental gloved finger at a glowing laptop, while a hooded engineer next to him cries Matrix-code green tears onto the keyboard. A red apple floats in the window between them, like Magritte's Son of Man.

"C'est de la merde ! Qu'est-ce que tu veux que je te dise, c'est de la merde !"

("It's shit. What do you want me to say, it's shit.")

That's a close friend, a stellar web app designer whose judgment I trust on web app design more than my own, looking at the FlowPilot UI I had proudly shown him as "ready to launch." He didn't bury the lead.

We bit the bullet. Postponed the launch. Spent the next two months rebuilding the whole UI on top of a real design system: 40 atomic components, 57 UI test pages, dark mode, design tokens, complete page rewrites. Date pickers from scratch with complex logic. Editable tabs. Flow Production Tracking-specific filters and entity refs. Chat components. Log pages. All of it.

Months later, I used Claude Design and Claude Code to redesign our landing page over a weekend. The two months of challenging, often tedious foundation work were the only reason it was possible.

This is the post about the math of compound interest on infrastructure.

I'm not a designer

I'm a software engineer who came up in VFX. People in VFX tend to like adjacent things, movies, photography, comics, design, even on the engineering side. So I always cared about how the UIs I shipped looked. I had opinions about typography. I noticed when something was off.

What I didn't have was a designer's eagle eye. The thing that catches the small detail you've stared past a hundred times because you wrote it. The thing that holds a mental map of every component on every page and notices the second one of them drifts. The thing that designs a Figma component with the right props on it so the code system that grows from it has solid bones.

Without that eye, FlowPilot's UI in late 2025 was, technically, working. Every page shipped what it needed to ship. But each page had been built fast and somewhat independently. Spacing was vibes. Buttons looked similar but not identical. Modal weights diverged. We were not a design system. We were a collection of small accidents.

The brutal review

A Matisse-style painting of a Frankenstein monster made of mismatched UI components: dropdown-menu head, neon-green Submit-button arm, slider-bar legs in comic sans, rampaging through a bright minimalist design studio.
Every page, in late 2025.

Before launch, I asked the friend mentioned above to take a pass. I expected pointed feedback on a few rough edges. I got the opening line of this post instead.

When the smoke cleared and we walked through the actual product, the complaints were specific:

General coherence

The biggest sin. Spacing, gaps, font sizes, color usage, and button styles were inconsistent across pages. Each screen had been built in isolation, and it showed. Nothing was outright broken; everything was slightly off-key.

No tokens. No props. No system.

Hard-coded colors, magic-number spacing, button variants implemented by copy-paste. There was no emphasis="high|low|none", no color="primary|secondary|positive|negative", no semantic tokens. Each component was a snowflake, which meant any change had to be made everywhere. Death by a thousand divergences.

The color palette picker

A worked example we kept coming back to. The picker let you choose a chart palette by name, and you could only see the colors elsewhere on the page, after you had already made the choice. Backwards. The fix was obvious in retrospect: render the colors inside the select item, so you choose visually, not by typing or guessing. We rewrote it. Then we noticed five other components with the same flavor of bug.

The brutal review was not, fundamentally, about taste. It was about compounding inconsistency. Every divergent button, every off-by-4px spacing, every modal that didn't quite match the others, they each cost almost nothing in isolation, and accumulated into a product that felt unserious.

I don't regret the mess that came first

A Matisse-style painting of a majestic suspension bridge made of glowing blue holographic code, soaring through a vibrant sky. The bridge misses the gleaming futuristic city it was supposed to connect to, and lands in a muddy swamp full of confused capybaras.
A perfectly engineered bridge to the wrong city.

Worth saying clearly, because the rest of the post sounds like a love letter to design systems: I do not regret iterating without one for the first year. The earliest stretch of a product is for finding out what the product is. You're validating functionality, UX flows, who actually uses what, what to delete. If you spend that stretch picking type ramps and tokenizing your spacing scale, you're solving the wrong problem.

Premature design systems have their own failure mode. You commit to patterns that end up wrong for the product you eventually become. You make changes feel expensive when they should feel cheap. You build infrastructure for a destination that turned out to be a different city.

Could a tool like Claude Design change this calculus, by letting a designer collaborate on the prototype from day one? Maybe. I don't think so. The opening stretch of a product wants chaos. You can clean it up later, after you know what "later" looks like.

But "later" did arrive. And when it did, the next two months were ours to spend.

Two months. One bullet. Bitten.

The friction was real. Postponing launch was painful, internally, externally, psychologically. We had a date. We let it go.

The work was longer than expected. It always is. The friend gave us the design system to anchor on; Claude Code did the bulk of the implementation. We added components one at a time, with UI test pages for each, with documentation, with dark mode from day one because we knew VFX people would expect it (you ask any artist on a client call: they're using dark mode).

~2 mo duration Dec 2025 to Feb 2026
+
40 components atomic, in src/lib/components/ui/
+
57 UI test pages one per component, plus variants
=
launch postponed but the foundation was real

The rewrites were aggressive. Among other things:

  • Date pickers from scratch, including a date-time picker, range pickers, relative date tokens ("last 7 days", "this quarter") that resolve at query time, and the long tail of edge cases nobody enjoys writing.
  • Editable tabs with reorder, rename in place, persistence, the full contract.
  • Flow Production Tracking-specific components: filter editor with the full Flow Production Tracking field semantics, status pills with display-name resolution from the Flow Production Tracking schema, entity reference chips with linked navigation.
  • Chat components for the AI assistant, with thinking indicators and tool-call rendering.
  • Log pages for automation execution, with structured filtering.
  • Every existing page rewritten on top of the new system. Dashboards, charts (now widgets), pages, settings, super-admin, all of it.

Months later

Spring 2026. The design system has had time to settle. We just shipped a big refactor renaming charts to widgets and promoting data sources to first-class entities. The landing page is a little stranded, the copy talks about charts and recipes, the screenshots are from the old builder, the YouTube demo is outdated. The whole thing needs a refresh.

The pre-redesign FlowPilot landing page in light mode: large central FlowPilot wordmark, plain hero, Blog and theme toggle floating in the top-right, Scroll to explore indicator at the bottom. The same pre-redesign landing page in dark mode.
Before. The landing as of mid-March 2026. The component library underneath had been rebuilt over the previous two months, but the landing itself was unchanged from launch.

Claude Design launches. It can connect to a GitHub repo, import the design system already in there, and let you prototype against it in HTML/CSS/JS. The premise is exactly what I want.

The designer hat

A Matisse-style painting of a smug robotic assistant proudly holding out a beautifully wrapped gift box that is on fire and leaking neon green slime. The robot is unbothered, beaming.
"Here's your landing page!"

I connected the FlowPilot repo. Claude Design pulled in the design tokens, the logos, the components. It took maybe ten minutes. The interface for working on prototypes is genuinely interesting, you can iterate visually, generate variations, drop components onto canvas, and ask for animations. For someone who isn't a designer but has opinions, this is leverage.

Claude will invent things

The first attempt was an aggressive miss. I asked for a redesigned landing.

What came back was a slick dark-aurora landing with claims that weren't true, features we don't ship, and a multi-section navigation menu I had never built. Claude Design was, it turns out, very confident by default. It happily filled in anything it didn't know. So I pushed back:

"A lot of the claims are untrue. Stick to the copy that's already there. You can add things, but you can't invent them."
"We don't have a top nav. Just a Blog link and a theme toggle. You invented the rest."
The first draft Claude Design produced: a dark aurora landing with an invented top nav (Product, Solutions, Pricing, Customers, Resources), an invented '#1 AI Platform for Production' kicker, an invented 'Revolutionize your creative workflow' headline, and an invented 'Start 14-Day Free Trial' button.
The first draft. A top nav with sections we don't have, a #1 claim we never made, a 14-day trial that doesn't exist. View live →

The lesson holds whatever AI design tool you use: point it at the source of truth, hard, before you ask for variations. The agent will read your real source code if you tell it to. It will not, by default, treat your real product as more authoritative than its priors.

Three variants in an afternoon

Once Claude Design had read the actual landing source and stopped inventing, we generated three full landing-page directions side by side:

  • Cinematic, dark, brand-forward, big animated logo, scroll-pinned showcases, a horizontally paginated product pager, a credits-roll final CTA.
  • Editorial, light, magazine-style, Instrument Serif display type, an asymmetric grid, numbered pull-quotes for features, a ledger-style pricing table.
  • Terminal, developer-tool dark, mono-heavy, a live-typing prompt that builds a chart, a tabbed-IDE feature showcase, a compare.csv-style pricing layout.
Cinematic variant: dark hero, large floating helmet logo, gradient headline 'in natural language', the AI for Flow Production Tracking kicker pill, two product cards.
Cinematic. Dark, brand-forward, animated. View live →
Editorial variant: light cream background, oversized Instrument Serif headline 'Tracking production, in plain English', a 'The Dispatch' sidebar, a numbered pull-quote grid below.
Editorial. Newsprint typography, asymmetric grid. View live →
Terminal variant: dark IDE-style chrome, monospace breadcrumb 'flowpilot / landing / home.tsx', a live terminal window typing 'flowpilot build-chart'.
Terminal. Developer-tool dark, mono-heavy, IDE chrome. View live →

Three full directions, in an afternoon. This is the unlock. I would never have built three. I would have picked one based on a vibe and committed. With three sitting next to each other, the choice was obvious. We took Cinematic.

The counterintuitive lesson: AI design tools are at their best when you use them to reject directions, not commit to one.

The engineer hat

A Matisse-style painting of an exhausted solo developer in an oversized hoodie swinging a glowing wrench at a chaotic swarm of tiny gremlin-like CSS bugs in a server room.
The "I shipped this with AI" tax, paid by a solo operator.

Claude Design exports a handoff bundle, the prototype HTMLs, a tokens CSS file, a SKILL.md for the receiving agent, a README.md with explicit "READ THIS FIRST, ASK BEFORE IMPLEMENTING" instructions. It's a thoughtful handoff format.

I handed the bundle to Claude Code and asked it to plan. It read the bundle, read our existing landing components, read app.css, and came back with an implementation plan organized as a "what to take, what to keep, what to drop" matrix. We argued through it. I rejected some things (their pricing redesign was almost identical to ours, no need to touch). I added others. Then we built.

The unglamorous wins

The seam between AI design output and AI code implementation is where most of the "I shipped this in a weekend with AI" posts gloss over the work. Some of what we actually had to do:

Different colors for the same intent

The prototype invented its own dark mode palette. Not bad, just not ours. We mapped every color back to the design system's existing semantic tokens, so dark mode on the new landing matches the rest of the product instead of looking like a visiting cousin.

Dark mode quietly broken

First dark mode pass was completely broken. Navbar light grey on a black background, illegible. The cause was a CSS quirk specific to our framework that the browser silently ignored. No error, no warning, just nothing applying. Five minutes of head scratching, two characters changed, dark mode came alive.

A blocked font load

The prototype wanted to load a custom monospace font from a third-party CDN. Our security policy doesn't allow that. We dropped the load entirely and let the browser fall back to the system monospace font, which looks almost identical for free.

The flickering logo

The server rendered the light-mode logo. The client, in dark mode, swapped to the dark-mode one. The framework rightly complained about the mismatch. Fix: render both logos and let CSS pick the visible one. No JavaScript, no flicker.

An animation that wouldn't run

The fade up on scroll animation worked for sections below the fold and not for the ones already on screen when the page loaded. The browser was collapsing the transition because the class change happened in the same frame as the element's first paint. Fix: defer the change by one frame. One line.

None of these are interesting individually. All of them are exactly the kind of friction that lives between "the prototype renders" and "the page is shippable." Every AI design tool that exists today produces output that has some shape of this tax. Solo operator pays it.

The component the design system couldn't give us

A Matisse-style painting of a towering steampunk machine held together by duct tape and tangled cables. In the center, a stressed sweat-drenched hamster runs on a wheel keeping the system from collapsing. A sticky note on the side reads 'DO NOT TOUCH'.
The first version of every interesting component.

The most ambitious piece of the new landing is a live AI demo. Real prompts, real Flow Production Tracking data, real chart rendering. Cycles through a handful of scenarios that cover the widget types, stacked bar, pie, gauge, metrics card. It's the centerpiece, and there was no atomic component that did this. We built it from scratch.

And here's where the engineer hat fully comes back on. It is not enough to vibe-code a component into shape. The component has to be testable, reviewable, performant, documented. AI assistance doesn't change the standard; it changes the throughput.

Before any code: prepare the data

Before we wrote a single line of the demo component, we had to make sure we had real, interesting data to demo with. The whole pitch of the component is "real FlowPilot, real Flow Production Tracking data, real widgets", that means the data has to actually exist on a real Flow Production Tracking instance, and it has to look good on screen.

We started with the existing integration tests for the AI assistant. They cover what kinds of widgets the assistant produces from common prompts, which gave us a vocabulary of "scenarios known to work." From there, a few small lab scripts probed our internal Big Buck Bunny test project on Flow Production Tracking to check what data we actually had, shot counts per status, version histories, task bid versus actuals. Some entities were sparse or unrealistic. We seeded what was missing.

From that, four scenarios that exercise the four widget types we care about:

  • Stacked bar, Shots by Sequence broken down by Status. Becomes a trellis after the second prompt (one chart per sequence).
  • Pie, Versions by Status, with a follow-up that filters out N/A and shows percentage labels.
  • Gauge, Bid utilization on Tasks (time logged versus total bid), with a follow-up that swaps the color preset and adds a sublabel.
  • Metrics card, Total Tasks, Time Logged, Total Bid as three KPIs, with a follow-up that switches the visual theme and label position.

Every scenario was verified end to end, prompt, tool call, widget config, real data, rendered output, before we built any UI around them.

The principle: visually simplified, semantically truthful

The component on the landing is not a literal recording of the AI assistant interaction. The real assistant emits structured tool calls, intermediate states, logs, retries, useful in the product, distracting on a marketing page. So we made a deliberate trade:

What that means concretely:

  • The widgets that render are the real widgets, same code your dashboards use, same chart engine, same gauge SVG, same metrics card.
  • The data is real, fetched from our internal Flow Production Tracking instance via the same data-source executor the product uses.
  • The prompts are real prompts that produce these results in the real assistant.
  • The widget configs the demo "applies" are exactly the configs the assistant actually generates when you run those prompts.
  • What's simplified: the chat shows only the human-facing exchange. No tool calls, no intermediate logs, no retry loops. The visual rhythm is tightened so a viewer who scrolls past in three seconds still gets the point.

This is the rule that justifies the existence of the demo: visually simplified, semantically truthful. If we ever drift into "the demo can do things the product can't," the demo is a lie. So far, it isn't.

One component, the full discipline

The first version of the demo component was about 700 lines that did everything in one place. It worked. It was also untestable. The patch logic, the data shaping, the chart rendering, and the fetch logic were all tangled together inside a single file.

Mid-build I told the agent the rule that runs in the rest of our codebase: components are for what's on screen. Anything that does real work belongs in a plain module with unit tests next to it. No exceptions for "small" helpers. No "I'll extract it later." Now.

The component dropped to about 300 lines of orchestration. The patch resolution, the data shaping for the four widget types, and the browser-side fetch wrappers each landed in their own small module, each with a focused purpose. The metrics extractor stopped using a regex hack and called the real formula parser the product already ships. Twenty unit tests cover the moving parts. The brain of the feature now has a brain you can argue with.

Loading without blocking

The first version of the demo also loaded every scenario on the homepage's server-side render. Every visitor to the front page paid for the data fetch even if they never scrolled to the demo. We refactored.

Two small public API endpoints replaced the upfront load. Each scenario is fetched on demand, then cached server-side for a minute so a burst of visitors only triggers one round trip to Flow Production Tracking per scenario. On the browser side, the component kicks off all four scenario fetches in parallel the moment it mounts, in the background. By the time the user has watched the first scenario play out, the rest are already cached. The homepage no longer waits on anything.

After

The redesigned landing. Same brand. Same tokens. Same data. Different page.

The redesigned FlowPilot landing page in light mode: glow blobs, the AI for Flow Production Tracking kicker pill, gradient headline 'in natural language', a marquee of production verticals peeking above the fold. The same redesigned landing page in dark mode.
After. New navigation, new hero composition, new copy that talks about widgets and webhooks instead of charts and recipes, and a marquee of production verticals peeking above the fold.

See it in action

Below is the actual <AiDemoShowcase /> component from the new landing page, embedded directly in this blog post. It's pulling from the same lazy-fetch endpoints, hitting the same FlowPilot internal Flow Production Tracking instance, rendering the same widgets. Same code, same data.

app.flowpilot.studio/widgets/new
AI Assistant

The component cycles through scenarios automatically. Click any tab to jump directly. Scenarios prefetch in the background, so tab switching is instant after the first one resolves.

The numbers

The redesign weekend

One developer. One PR. Two days of focused work, mostly Saturday and Sunday.

  • 29 files changed
  • +2,780 / −1,596 lines
  • 11 components touched or new (Hero, MarqueeBanner, LiveBuildSection, LandingNav, LandingFooter, etc.)
  • 4 new modules in src/lib/ai-demo/
  • 2 public API endpoints (lazy AI demo data)
  • 20 unit tests for the extracted pure logic, all passing
The two months that made it possible

For comparison, the design system rebuild that this whole post leans on:

  • 40 atomic components in src/lib/components/ui/
  • 57 UI test pages across categories
  • ~2 months of dedicated work, December 2025 through February 2026
  • One launch postponed

The redesign weekend used the design system as a constraint at every level. Every button is a Button with emphasis and color props. Every card uses the existing tokens. Every spacing value resolves to a --spacing-* variable. Without that backbone, Claude Design would have given me HTML soup and Claude Code would have produced CSS we'd be paying interest on for years.

What I'd tell a fellow engineer

Foundations compound. Skip them at your peril.

The two months of design system work felt like a tax at the time. Months later, that work is the only reason a redesign-with-AI weekend was even possible. Without tokens, atomic components, and component props, the same AI tools would have produced unshippable output. The foundation is what makes the AI useful.

AI design is for exploring. AI code is for implementing.

Claude Design's superpower is generating multiple full-fidelity directions fast. Use it to reject directions, not to commit to one. Claude Code's superpower is implementation against a real codebase. Use it for the work you'd otherwise do yourself.

You still wear both hats.

The handoff between Claude Design and Claude Code is not seamless. There is real engineering work in between, token mapping, CSP compliance, hydration correctness, accessibility, performance. AI tools amplify what an operator can do. They don't replace the operator's judgment.

Don't start with a design system.

The earliest stretch of a product is for finding out what the product is. Premature design systems lock in patterns that turn out wrong. Iterate chaotically first. Build the system after you know what you're building. Time it right and the cost is bounded; time it wrong and you pay forever.

"C'est de la merde" was the most expensive sentence I've ever heard. It cost two months of focused work and a delayed launch. It was worth every day.

Try FlowPilot

Build dashboards, automate workflows, and pipe live data into Google Sheets straight from plain English. The new landing page (and the demo above) is the product talking about itself, in real time, with real Flow Production Tracking data.