Blog
Gherkin: How to Stop Arguing About What "Done" Means
Most teams do not ship late because they cannot code. They ship late because nobody agreed on what "done" looked like. A founder's take on Gherkin, BDD, and Given/When/Then as a leverage tool, not a syntax flex.
A founder messaged me at 11pm last week.
The dev team said the feature was finished. QA said it was broken. Product said it was never the spec. He was staring at a Slack thread with three people writing the same word, "done," and meaning three different things.
That is not a code problem. That is a language problem.
In 28 years of shipping software, the cheapest fix I have ever taught a team is a 30-year-old testing notation called Gherkin. It does not require a new framework. It does not require a rewrite. It just forces the conversation to happen before the code does.
Gherkin is plain English with rules
Gherkin is the syntax behind Cucumber, SpecFlow, Behave, and most of the BDD (Behavior-Driven Development) tools your engineers already know. A scenario looks like this:
Feature: Password reset
Scenario: User requests a reset link
Given a user is registered with email "alex@example.com"
When they request a password reset for "alex@example.com"
Then they receive a reset email within 60 seconds
And the email contains a one-time link valid for 30 minutes
Five keywords do most of the work.
- Feature names the capability.
- Scenario describes one example of how the feature behaves.
- Given is the starting state.
- When is the action.
- Then is the expected outcome.
You also get And and But to chain steps without sounding robotic, plus Background to share setup across scenarios in the same feature.
A founder can read a Gherkin scenario and understand it. An engineer can read it and write a test against it. Product can read it and sign off on it. That is the entire pitch.
Why I push founders toward it
Most of the budget I have watched evaporate did not go to bad code. It went to rework caused by ambiguity. A founder approves a wireframe. The PM writes a ticket. The engineer reads the ticket and fills in the gaps the PM never saw. Six weeks later the team meets at staging and discovers they were not building the same feature.
Gherkin closes that gap on day one. Here is what changes when a team adopts it:
- Acceptance criteria stop being prose. A ticket that says "users should be able to reset their password" turns into three Given/When/Then scenarios that name every edge case. Forgotten cases show up in sprint planning instead of in production.
- QA stops guessing. Each scenario is also the test. Cucumber and SpecFlow turn
.featurefiles directly into automated checks. The English and the assertion never drift. - Product owns the spec. Non-technical stakeholders can read and edit
.featurefiles. The contract lives in the language of the person paying for the outcome. - Onboarding gets cheaper. A new engineer reads the
features/directory and learns what the system actually does, not what the wiki claimed it would do six months ago.
What a useful scenario looks like
Bad scenarios read like a click-through script. Good scenarios read like a contract.
Feature: Subscription cancellation
Background:
Given the customer "Jordan" is on the "Pro" plan
And their next billing date is "2026-06-01"
Scenario: Cancel keeps access until period end
When Jordan cancels their subscription on "2026-05-15"
Then their plan remains "Pro" until "2026-06-01"
And they receive a confirmation email
And no further charges are issued
Scenario Outline: Cancellation reasons are recorded
When Jordan cancels with reason "<reason>"
Then the cancellation reason is stored as "<stored>"
Examples:
| reason | stored |
| Too expensive | price |
| Missing feature | feature_gap |
| Switching tool | competitor |
A few rules I enforce on every team I work with.
One scenario, one behavior. If you need the words "and then also" inside a Then, split it.
Use real domain language. Write "Pro plan," not "tier_2." If product and engineering use different words for the same thing, your scenarios are where you fix that. Not in a Notion doc that nobody updates.
Avoid UI clicks. "When the user clicks the blue button in the top-right" is brittle and will break the next redesign. "When the user cancels their subscription" survives.
Keep Background short. If your Background is longer than the scenario, your scenarios are too narrow and your feature file is too wide.
Tag aggressively. @smoke, @billing, @regression. Tags decide what runs on every pull request and what runs nightly.
Where teams get it wrong
Gherkin fails when teams treat it as decoration.
I have audited platforms with hundreds of .feature files that nobody runs. Someone wrote them during onboarding. The team drifted back to writing tests in code. The .feature files rotted in place. That is worse than not having them, because now the spec lies to you.
If you adopt Gherkin, adopt the discipline that makes it work.
Feature files live next to the code, in version control. Not in Confluence.
They run on every pull request. A failing scenario blocks the merge.
Product owns the wording. Engineering owns the step definitions underneath.
New behavior starts as a .feature change, not a code change.
That is the deal. If you are not going to enforce it, do not bother adopting it.
When Gherkin is overkill
Two cases where I tell teams to skip it.
A weekend prototype with one author. The spec lives in your head and you can change it in five minutes. Process tax is not worth it.
A pure infrastructure tool with no human-facing behavior to describe. Cron jobs, queue workers, internal CLIs. Unit tests are the right tool.
If you have more than two people, a paying customer, or a board asking when the next release ships, the cost of ambiguity is already higher than the cost of writing scenarios in Given/When/Then. That is most of the founders I talk to.
The real reason to use it
Most of the technical disasters I get called in to clean up are not technical. They are three people who thought they were aligned, working in parallel for six weeks, meeting at staging, and discovering they were not.
Gherkin makes that disagreement visible on day one, while it is still cheap to fix.
Audit first. Spend second.
Panic builds features. Clarity builds leverage.
If your team keeps shipping the wrong thing, write the next feature in Given, When, Then before anyone touches a keyboard. The disagreement will surface in an hour. That hour is the cheapest one your team will spend all quarter.
Want a second opinion on your delivery process? Schedule a call and we will audit how your team writes and agrees on acceptance criteria before a single line of code gets shipped.
Have you tried Gherkin on your team? What stuck and what did not? Drop a note below.