Directory Builder

This skill is the consolidated playbook for building programmatic-SEO directory sites. It exists because Cyrus has now built two of them (UPick Atlas, Waterfall Atlas), spent weeks of API budget validating dozens of niches against Ahrefs + Semrush + Grok, and reverse-engineered the structural patterns of the 100 biggest US directory sites. The lessons are scattered across research/ files; this skill consolidates them so future sessions don't repeat the early mistakes.

When to invoke

Researching whether a niche is worth building
Scoring a candidate against the methodology rubric
Forking the proven template into a new vertical
Optimizing pages on an existing directory
Backlink outreach planning
Monetization strategy for an existing directory

The 9 methodology rules (read these before every niche decision)

The full methodology lives in references/methodology.md. Quick summary:

#	Rule	One-line
1	Frame the search universe, not a keyword	A niche is a family of patterns aggregating to >50K/mo, not a single head term
2	Identify SERP archetype	8 archetypes; if Top 10 = government / category-leader / brand → abort
3	Score competitors by traffic + keyword footprint	Defender >100K kw + >100K visits = fortress, abort
4	Verify data acquirable in Tier 1 (free public dataset)	Tier 3 (locked APIs) = abort
5	Schema.org fit	Must map cleanly to a standard schema type
6	Compute revenue ceiling honestly	volume × capturable share % × RPM × 12
7	Defensibility check (Google updates, incumbents, clones)	At least one structural moat
8	Rich People Filter (Stoddard)	AOV of underlying transaction <$50 = lifestyle ceiling only
9	Institutional Outreach (Stoddard)	.gov / .edu / tourism board manual outreach is the durable backlink play
10	AI Search Era Niche Test (Frey/Greg)	Hyper-niche directories survive AI search; horizontal directories get hurt. Test: would an LLM HAVE to cite your directory as a primary source for a specific high-intent query?
11	Audience Wave Pattern (Pontus)	Alternative playbook for high-variance opportunistic builds: catch a fast-rising tech sub-community at the moment it needs an organized resource. Requires pre-existing distribution, lightning speed, and accepting variance. NOT a substitute for Rules #1-#10 — a separate model for separate situations.

The 40-point scoring rubric

Dimension	Score 5	Score 1
Search universe size (geometric mean volume)	>100K/mo	<20K/mo
Tail KD (max of Ahrefs + Semrush)	<15	>40
SERP archetype mix	Mom-blogs + single businesses	Government or category leader dominates
Strongest defender DR	<30	>75
Defender backlink quality	Mostly thin/diluted (wordpress.com etc)	Diverse + .gov/news
Data availability	Tier 1 free	Tier 3 locked
Schema fit	Direct fit	No good schema
Revenue ceiling (with seasonal/local-pack/AOV penalty)	>$10K/yr year 2	<$1K/yr year 2

Action thresholds:

32-40: BUILD IT
24-31: Build only if no better alternative
<24: Skip

Pre-validation checklist (run BEFORE keyword research)

This saves the most time. Run in order:

Rich People Filter (Rule #8): What's the AOV of the underlying transaction?
- <$50 → only build for portfolio/lifestyle, not as flagship
- $500-5000 → sweet spot for solo operator
AI Search Era Niche Test (Rule #10): Could an LLM answer the user's specific query without citing your directory?
- If yes → too horizontal, narrow further
- If no → niche is in the safe zone for AI-search era
Brand intent check: Does a single brand own the head term? (HYROX → hyrox.com, Pickleball → Pickleheads). If yes, abort.
Local pack check: Will Google Maps eat 60-80% of clicks? (food/services/lawyers/doctors/storage/gyms = yes; outdoor/specialty/tourism = no)
YMYL check: Health, legal, finance, real estate, childcare = -8 points
Seasonal compression: >60% of volume in <12 weeks = -4 points

API budget per fully-researched niche (~$5 + 45 min)

Run this exact sequence:

# 1 - Search universe (Ahrefs Keywords Explorer overview)
curl -sS -H "Authorization: Bearer $AHREFS_KEY" \
  "https://api.ahrefs.com/v3/keywords-explorer/overview?country=us&keywords=$KW&select=keyword,volume,difficulty,cpc,traffic_potential,parent_topic,global_volume"

# 2 - Keyword ideas with KD attached (Ahrefs is much cheaper than Semrush per-keyword)
curl -sS -H "Authorization: Bearer $AHREFS_KEY" \
  "https://api.ahrefs.com/v3/keywords-explorer/matching-terms?country=us&keywords=$KW&select=keyword,volume,difficulty,cpc&limit=50&order_by=volume:desc"

# 3 - Volume cross-check Semrush (geometric mean of two estimates)
curl -sS "https://api.semrush.com/?type=phrase_this&key=$SEMRUSH_KEY&phrase=$KW&database=us&export_columns=Ph,Nq,Cp,Co,Nr"

# 4 - SERP top 10 (Semrush)
curl -sS "https://api.semrush.com/?type=phrase_organic&key=$SEMRUSH_KEY&phrase=$KW&database=us&display_limit=10&export_columns=Dn,Ur"

# 5 - DR on each of those 10 (Ahrefs - 1 unit each)
for d in <list>; do
  curl -sS -H "Authorization: Bearer $AHREFS_KEY" \
    "https://api.ahrefs.com/v3/site-explorer/domain-rating?target=$d&date=$(date +%F)"
done

# 6 - Top defender — backlink quality
curl -sS -H "Authorization: Bearer $AHREFS_KEY" \
  "https://api.ahrefs.com/v3/site-explorer/refdomains?target=$TOP_DEF&mode=domain&date=$(date +%F)&date_compared=$(date +%F)&limit=20&select=domain,domain_rating,links_to_target,dofollow_links&order_by=domain_rating:desc"

# 7 - Top defender — traffic engine pages (find their template that ranks)
curl -sS -H "Authorization: Bearer $AHREFS_KEY" \
  "https://api.ahrefs.com/v3/site-explorer/top-pages?target=$TOP_DEF&country=us&date=$(date +%F)&date_compared=$(date +%F)&limit=20&order_by=sum_traffic:desc&select=url,sum_traffic,top_keyword,top_keyword_volume,top_keyword_best_position"

For trending niche discovery, prepend with Grok (1 query, $0.01) to surface candidates Ahrefs/Semrush haven't indexed yet:

curl -sS https://api.x.ai/v1/responses \
  -H "Authorization: Bearer $XAI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-4-1-fast",
    "input": "10 emerging US activities/services trending on X and Reddit in last 90 days that map to local businesses. Skip pickleball (saturated), AI tools (overcrowded). Bullet list with one-line evidence each.",
    "tools": [{"type": "web_search"}, {"type": "x_search"}]
  }'

Building a new directory (template fork)

The proven stack:

Next.js 15 App Router + output: 'export'
Tailwind v4 + ShadCN UI
Cloudflare Pages (free tier)
Cloudflare Registrar for domain (~$10.46/yr, API-purchasable for most TLDs)
Self-hosted Umami at analytics.northstar-forge.com (just create a new site)

Reference implementations:

/root/.openclaw/workspace/projects/upick-atlas/ (488 routes, ochre/terracotta theme)
/root/.openclaw/workspace/projects/waterfall-atlas/ (224 routes, blue-green theme)

Standard route patterns to fork:

/                         — Homepage with WebSite + Organization JSON-LD
/[entities]/[state]/[slug] — Detail pages with full @graph schema
/states/[state]            — State hubs
/[verb]/[state]            — Year-stamped state-action pages (e.g. /pumpkin-patches/georgia)
/[type-or-category]        — Category hubs
/learn/[topic]             — Educational evergreen content
/sitemap.xml + /robots.txt

Required schema.org composition (the recreation.gov gold standard):

{
  "@context": "https://schema.org",
  "@graph": [
    BreadcrumbList,
    CollectionPage,         // hub pages
    ItemList,                // for any list of entities
    LocalBusiness | TouristAttraction | Place,  // detail pages
    FAQPage,                 // detail pages
    GeoCoordinates + PostalAddress  // detail pages
  ]
}

The shared helper src/lib/seo-shared.ts (in both projects) wires this consistently.

⚠️ Schema rules (learned the hard way — 4 GSC errors in 24 hours)

Full reference: references/gsc-structured-data.md — read it before adding any new JSON-LD.

HARD BANS (no exceptions):

AggregateRating — banned everywhere until real user reviews exist
Review — banned without real reviews
Generic @type: "Thing" — banned in any context requiring a specific subtype (itemReviewed, etc.)
Article schema without image + publisher — use WebPage instead until real og:image assets exist

Pre-deploy sweep (mandatory before every wrangler pages deploy):

grep -rn "AggregateRating\|itemReviewed" --include="*.ts" --include="*.tsx" src/ | grep -v deprecated
grep -rn '"@type": *"Thing"' --include="*.ts" --include="*.tsx" src/
grep -rn '"@type": *"Review"' --include="*.ts" --include="*.tsx" src/

Any non-deprecated hit → strip before deploy.

Per-entity editorRating fields used for SORTING or DISPLAY in HTML (e.g. springs[i].editorRating rendered as "4.7 / 5" in the UI) are FINE — only the JSON-LD schema emission is banned.

Full bug history, well-formed schema patterns, and required-field tables: references/gsc-structured-data.md.

Title formula (steal from PYO's #1 traffic page)

PYO's top-traffic page uses: {YEAR} {STATE} {CROP} U-Pick Farms and Orchards

Generalized: {YEAR} {LOCATION} {CATEGORY} — {Differentiator}

Examples:

2026 California Strawberry Picking — 12 U-Pick Farms
2026 Best Waterfalls in Oregon — Ranked by Hike, Height & Flow
2026 Apple Picking Near Me — 34 U-Pick Apple Orchards in 41 States

The year + location + category in <title> AND <h1> is the proven pattern.

Build process (fastest = template fork)

When starting fresh, run a Codex job in tmux with the BUILD-PROMPT.md template (see references/build-prompt-template.md). Realistic costs:

50 hand-curated entries + ~~200 routes: 4M tokens (~~$15-20)
Full reframe pass with state matrix: 2M tokens (~$8-12)
Harvest calendar / month grid: 1.5M tokens (~$5-8)
50 → 100 entry expansion: 2M tokens (~$8-12)

Run via codex exec --full-auto --skip-git-repo-check "$(cat PROMPT.md)" in tmux. Expected runtime 30-90 minutes per major task.

Codex's recurring bugs to fix manually:

<title> ends with double brand suffix when metadata.template auto-appends. Strip | {Brand} from per-page titles.
Imports of helper modules without creating the module file (e.g. from "@/lib/farm-counties" while only writing the JSON, not the TS export). Always check the build immediately after Codex runs.
Sandboxed git lives at /root/.codex/memories/{project}-git. Fold into workspace git after run.

Monetization (per Rule #8 — match path to AOV)

The four proven patterns (full detail in references/monetization-patterns.md):

AOV tier	Best monetization pattern
<$50	Pattern 4 — affiliate + display ads + digital products (lifestyle blog stack)
$50-500	Pattern 4 + selective Pattern 2 (vertical SaaS for the business owners)
$500-5,000	Pattern 1 — lead gen (Stoddard's playbook); sweet spot
$5,000+	Pattern 1 or Pattern 2 — high leverage, often YMYL risk

Plus Pattern 3 (crowdsourced premium à la GasBuddy) for hyperlocal time-sensitive data niches.

Frey's universal test: every successful directory helps users save time, save money, or make money. If it doesn't clearly do one of those three, monetization will be hard regardless of pattern.

Newsletter is the universal lever. Every directory should capture email from day 1. Use Buttondown ($9/mo for 1k subs), ConvertKit (free up to 10K), or self-hosted Listmonk on the existing VPS. Sell your own product through the newsletter, not ads.

Backlink strategy (Rule #9)

The only durable backlink play: manual outreach to .gov / .edu / tourism boards.

1 hour/day, 60 days = 60 emails sent
Realistic conversion: 10-20% = 6-12 quality backlinks
Each .gov / .edu link is worth ~5-10 random wordpress.com links
Email template + target lists in references/outreach-template.md

No clever shortcut exists. Stoddard's literal answer: "I was just willing to do it."

Data acquisition + enrichment (the bottleneck for most directories)

Full workflow in references/data-pipeline.md. Summary:

80% of people who try to build a directory quit at the data step. The pipeline that gets you past it:

OutScraper — initial Google Maps scrape (~$30, 50K-100K raw rows)
Claude Code junk removal — strip closed/duplicate/wrong-niche (free)
Crawl4AI niche verification — visit each website, confirm niche match (~$10 in tokens)
Per-attribute enrichment passes — single-attribute prompts work much better than multi-attribute (~$5-10 each)
Claude Vision image scoring — score scraped images for relevance (~$30)
Service area / geo enrichment — with cross-validation against HQ
Database import + page generation

Total cost for a 700-entity verified directory: ~$80-100 + Claude Code Max sub (~4 days of work).

Public datasets are the underrated unlock: USDA AMS Local Food Portal, USGS GNIS, federal AFDC, data.gov, state open data portals. Skip Step 1 entirely when public data exists.

Niche micro-targeting (Frey's strategic insight): don't compete on "senior living homes" (Place For Mom owns it); compete on "senior living homes for people with dementia" (1K+ monthly searches, way easier). The data pipeline above is what makes micro-niches viable in 2026 — thin AI content no longer ranks; deeply verified data per entity does.

Common pitfalls observed

Picking a niche by KD alone, ignoring AOV. Mistake we made with UPick + Waterfall (both have low AOV; ceiling is $5-30K/yr lifestyle business, not $350K/yr Stoddard-tier).
Trusting "Competition" score in Semrush as proxy for organic difficulty. It's paid-search competition. Always pull Ahrefs KD separately.
Building before checking if a brand owns the head term. "HYROX gym near me" KD looks low (13) but hyrox.com (DR 77) owns 4 of top 10 results.
No images on pages. Every top directory averages 11 images per page; ours had zero. Wikimedia Commons hotlinking is free.
Title-suffix double-stamping (| Brand | Brand) when metadata template duplicates.
No JSON-LD on homepage. Hub pages need WebSite + Organization + SearchAction. Detail pages alone is not enough.
No email capture. The single biggest moat against AI Overviews is an email list. Both our sites currently have zero.

Files to read on session start

When invoked, READ in this order:

references/methodology.md — full 9-rule framework + scoring rubric
references/build-prompt-template.md — Codex prompt template for new builds
references/data-pipeline.md — the 7-step Crawl4AI + Claude Code workflow for getting + verifying + enriching directory data at scale ($80-200 to build a 700-entity directory in days, not weeks)
references/monetization-patterns.md — the 4 proven patterns (lead gen, vertical SaaS, crowdsourced premium, affiliate + display + digital products), with AOV-based pattern selection
references/outreach-template.md — backlink outreach email templates + target lists
references/gsc-structured-data.md — GSC structured-data hard bans, well-formed patterns, pre-deploy sweep. Read before writing or modifying any JSON-LD.

For specific deep-dives:

/root/.openclaw/workspace/research/stoddard-synthesis-2026-04-26.md — full Stoddard interview synthesis
/root/.openclaw/workspace/research/directory-100-reverse-engineering-2026-04-26.md — what the 100 biggest US directories share
/root/.openclaw/workspace/research/seo-validated-2026-04-26-round4-ahrefs.md — Ahrefs DR data on every defender we've measured
/root/.openclaw/workspace/research/ahrefs-methodology-2026-04-26.md — Ahrefs API workflow card

Honest realism

A directory site will not earn money on day 1. Realistic timeline:

Weeks 1-4: Build, deploy, submit to GSC + IndexNow + Bing
Weeks 4-8: First Google crawl + index
Weeks 8-12: First impressions appear in GSC
Weeks 12-26: First clicks, first 100 organic visitors
Months 6-12: First $1-100 in revenue if monetization is wired
Months 12-24: $100-1000/mo if all 9 rules are followed
Months 24+: Stoddard-scale revenue ($5K-30K/mo) requires Rules #8 + #9 (high AOV + .gov backlinks)

If a site doesn't show traffic by month 3, the niche was wrong. Cancel and re-validate before building #2.

Source freshness disclosure (2026-04-26)

This skill consolidates lessons from:

Tim Stoddard interview — recorded ~2024-2025
Frey/Greg interview round 1 — ~mid-2025
Frey/Greg interview round 2 — ~late 2025

SEO and AI search are evolving fast. Some specifics in these sources are already partially outdated by mid-2026:

Element	Likely staleness
Crawl4AI as primary scraper	Still works but Firecrawl / Apify / Browserbase are managed alternatives
OutScraper for Google Maps	Less reliable in 2026 due to tightened anti-bot — prefer Apify Google Places actor or SerpAPI
"Local SEO doesn't show AI Overviews" (Stoddard)	Partially wrong by 2026 — ~30% of "near me" queries now show AI Overviews in some categories
Mediavine threshold of 25K sessions	Now 50K+
Raptive (formerly AdThrive) threshold	Now 100K+
"No need for llms.txt" (Stoddard)	Becoming wrong — emerging standard worth adopting

What's MORE valuable in 2026 than these sources suggest:

Schema.org @graph composition (Google SGE rewards it heavily for AI citation)
Wikidata / Wikipedia entity matching (drives Knowledge Graph inclusion → ChatGPT/Perplexity citation)
Manual backlink outreach (AI-templated outreach now penalized)
Vertical SaaS combo (cheaper to build with Codex/Claude Code 2.0)
Newsletter monetization (ad rates declining, list-based selling growing)

Validate before you build. Re-check these specifics whenever someone runs the playbook on a new niche:

Is the head term still showing AI Overviews? (Use Grok web search to grab live SERP)
Are display-ad thresholds still where the source said?
Have the recommended tools been deprecated or replaced?

The methodology rules (#1 through #10) are durable. The implementation tactics need to be rechecked every 6-12 months.

directory-builder