Cyrus

directory-builder

← All skills
/root/.openclaw/workspace/skills/directory-builder/SKILL.md
Plan, validate, build, and optimize programmatic-SEO directory websites end-to-end. Use when researching a new directory niche, scoring candidates against the methodology, building a fresh site (forking the proven Next.js template), or optimizing an existing one. Encodes lessons from UPick Atlas + Waterfall Atlas builds, Tim Stoddard's playbook (Sober Nation + Recovery Local), Omar's framework (GuiltyChef + BestDubai), and reverse-engineering the 100 biggest US directory sites.

Directory Builder

This skill is the consolidated playbook for building programmatic-SEO directory sites. It exists because Cyrus has now built two of them (UPick Atlas, Waterfall Atlas), spent weeks of API budget validating dozens of niches against Ahrefs + Semrush + Grok, and reverse-engineered the structural patterns of the 100 biggest US directory sites. The lessons are scattered across research/ files; this skill consolidates them so future sessions don't repeat the early mistakes.

When to invoke

  • Researching whether a niche is worth building
  • Scoring a candidate against the methodology rubric
  • Forking the proven template into a new vertical
  • Optimizing pages on an existing directory
  • Backlink outreach planning
  • Monetization strategy for an existing directory

The 9 methodology rules (read these before every niche decision)

The full methodology lives in references/methodology.md. Quick summary:

#RuleOne-line
1Frame the search universe, not a keywordA niche is a family of patterns aggregating to >50K/mo, not a single head term
2Identify SERP archetype8 archetypes; if Top 10 = government / category-leader / brand → abort
3Score competitors by traffic + keyword footprintDefender >100K kw + >100K visits = fortress, abort
4Verify data acquirable in Tier 1 (free public dataset)Tier 3 (locked APIs) = abort
5Schema.org fitMust map cleanly to a standard schema type
6Compute revenue ceiling honestlyvolume × capturable share % × RPM × 12
7Defensibility check (Google updates, incumbents, clones)At least one structural moat
8Rich People Filter (Stoddard)AOV of underlying transaction <$50 = lifestyle ceiling only
9Institutional Outreach (Stoddard).gov / .edu / tourism board manual outreach is the durable backlink play
10AI Search Era Niche Test (Frey/Greg)Hyper-niche directories survive AI search; horizontal directories get hurt. Test: would an LLM HAVE to cite your directory as a primary source for a specific high-intent query?
11Audience Wave Pattern (Pontus)Alternative playbook for high-variance opportunistic builds: catch a fast-rising tech sub-community at the moment it needs an organized resource. Requires pre-existing distribution, lightning speed, and accepting variance. NOT a substitute for Rules #1-#10 — a separate model for separate situations.

The 40-point scoring rubric

DimensionScore 5Score 1
Search universe size (geometric mean volume)>100K/mo<20K/mo
Tail KD (max of Ahrefs + Semrush)<15>40
SERP archetype mixMom-blogs + single businessesGovernment or category leader dominates
Strongest defender DR<30>75
Defender backlink qualityMostly thin/diluted (wordpress.com etc)Diverse + .gov/news
Data availabilityTier 1 freeTier 3 locked
Schema fitDirect fitNo good schema
Revenue ceiling (with seasonal/local-pack/AOV penalty)>$10K/yr year 2<$1K/yr year 2

Action thresholds:

  • 32-40: BUILD IT
  • 24-31: Build only if no better alternative
  • <24: Skip

Pre-validation checklist (run BEFORE keyword research)

This saves the most time. Run in order:

  1. Rich People Filter (Rule #8): What's the AOV of the underlying transaction?
    • <$50 → only build for portfolio/lifestyle, not as flagship
    • $500-5000 → sweet spot for solo operator
  2. AI Search Era Niche Test (Rule #10): Could an LLM answer the user's specific query without citing your directory?
    • If yes → too horizontal, narrow further
    • If no → niche is in the safe zone for AI-search era
  3. Brand intent check: Does a single brand own the head term? (HYROX → hyrox.com, Pickleball → Pickleheads). If yes, abort.
  4. Local pack check: Will Google Maps eat 60-80% of clicks? (food/services/lawyers/doctors/storage/gyms = yes; outdoor/specialty/tourism = no)
  5. YMYL check: Health, legal, finance, real estate, childcare = -8 points
  6. Seasonal compression: >60% of volume in <12 weeks = -4 points

API budget per fully-researched niche (~$5 + 45 min)

Run this exact sequence:

# 1 - Search universe (Ahrefs Keywords Explorer overview)
curl -sS -H "Authorization: Bearer $AHREFS_KEY" \
  "https://api.ahrefs.com/v3/keywords-explorer/overview?country=us&keywords=$KW&select=keyword,volume,difficulty,cpc,traffic_potential,parent_topic,global_volume"

# 2 - Keyword ideas with KD attached (Ahrefs is much cheaper than Semrush per-keyword)
curl -sS -H "Authorization: Bearer $AHREFS_KEY" \
  "https://api.ahrefs.com/v3/keywords-explorer/matching-terms?country=us&keywords=$KW&select=keyword,volume,difficulty,cpc&limit=50&order_by=volume:desc"

# 3 - Volume cross-check Semrush (geometric mean of two estimates)
curl -sS "https://api.semrush.com/?type=phrase_this&key=$SEMRUSH_KEY&phrase=$KW&database=us&export_columns=Ph,Nq,Cp,Co,Nr"

# 4 - SERP top 10 (Semrush)
curl -sS "https://api.semrush.com/?type=phrase_organic&key=$SEMRUSH_KEY&phrase=$KW&database=us&display_limit=10&export_columns=Dn,Ur"

# 5 - DR on each of those 10 (Ahrefs - 1 unit each)
for d in <list>; do
  curl -sS -H "Authorization: Bearer $AHREFS_KEY" \
    "https://api.ahrefs.com/v3/site-explorer/domain-rating?target=$d&date=$(date +%F)"
done

# 6 - Top defender — backlink quality
curl -sS -H "Authorization: Bearer $AHREFS_KEY" \
  "https://api.ahrefs.com/v3/site-explorer/refdomains?target=$TOP_DEF&mode=domain&date=$(date +%F)&date_compared=$(date +%F)&limit=20&select=domain,domain_rating,links_to_target,dofollow_links&order_by=domain_rating:desc"

# 7 - Top defender — traffic engine pages (find their template that ranks)
curl -sS -H "Authorization: Bearer $AHREFS_KEY" \
  "https://api.ahrefs.com/v3/site-explorer/top-pages?target=$TOP_DEF&country=us&date=$(date +%F)&date_compared=$(date +%F)&limit=20&order_by=sum_traffic:desc&select=url,sum_traffic,top_keyword,top_keyword_volume,top_keyword_best_position"

For trending niche discovery, prepend with Grok (1 query, $0.01) to surface candidates Ahrefs/Semrush haven't indexed yet:

curl -sS https://api.x.ai/v1/responses \
  -H "Authorization: Bearer $XAI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-4-1-fast",
    "input": "10 emerging US activities/services trending on X and Reddit in last 90 days that map to local businesses. Skip pickleball (saturated), AI tools (overcrowded). Bullet list with one-line evidence each.",
    "tools": [{"type": "web_search"}, {"type": "x_search"}]
  }'

Building a new directory (template fork)

The proven stack:

  • Next.js 15 App Router + output: 'export'
  • Tailwind v4 + ShadCN UI
  • Cloudflare Pages (free tier)
  • Cloudflare Registrar for domain (~$10.46/yr, API-purchasable for most TLDs)
  • Self-hosted Umami at analytics.northstar-forge.com (just create a new site)

Reference implementations:

  • /root/.openclaw/workspace/projects/upick-atlas/ (488 routes, ochre/terracotta theme)
  • /root/.openclaw/workspace/projects/waterfall-atlas/ (224 routes, blue-green theme)

Standard route patterns to fork:

/                         — Homepage with WebSite + Organization JSON-LD
/[entities]/[state]/[slug] — Detail pages with full @graph schema
/states/[state]            — State hubs
/[verb]/[state]            — Year-stamped state-action pages (e.g. /pumpkin-patches/georgia)
/[type-or-category]        — Category hubs
/learn/[topic]             — Educational evergreen content
/sitemap.xml + /robots.txt

Required schema.org composition (the recreation.gov gold standard):

{
  "@context": "https://schema.org",
  "@graph": [
    BreadcrumbList,
    CollectionPage,         // hub pages
    ItemList,                // for any list of entities
    LocalBusiness | TouristAttraction | Place,  // detail pages
    FAQPage,                 // detail pages
    GeoCoordinates + PostalAddress  // detail pages
  ]
}

The shared helper src/lib/seo-shared.ts (in both projects) wires this consistently.

⚠️ Schema rules (learned the hard way — 4 GSC errors in 24 hours)

Full reference: references/gsc-structured-data.md — read it before adding any new JSON-LD.

HARD BANS (no exceptions):

  1. AggregateRating — banned everywhere until real user reviews exist
  2. Review — banned without real reviews
  3. Generic @type: "Thing" — banned in any context requiring a specific subtype (itemReviewed, etc.)
  4. Article schema without image + publisher — use WebPage instead until real og:image assets exist

Pre-deploy sweep (mandatory before every wrangler pages deploy):

grep -rn "AggregateRating\|itemReviewed" --include="*.ts" --include="*.tsx" src/ | grep -v deprecated
grep -rn '"@type": *"Thing"' --include="*.ts" --include="*.tsx" src/
grep -rn '"@type": *"Review"' --include="*.ts" --include="*.tsx" src/

Any non-deprecated hit → strip before deploy.

Per-entity editorRating fields used for SORTING or DISPLAY in HTML (e.g. springs[i].editorRating rendered as "4.7 / 5" in the UI) are FINE — only the JSON-LD schema emission is banned.

Full bug history, well-formed schema patterns, and required-field tables: references/gsc-structured-data.md.

Title formula (steal from PYO's #1 traffic page)

PYO's top-traffic page uses: {YEAR} {STATE} {CROP} U-Pick Farms and Orchards

Generalized: {YEAR} {LOCATION} {CATEGORY} — {Differentiator}

Examples:

  • 2026 California Strawberry Picking — 12 U-Pick Farms
  • 2026 Best Waterfalls in Oregon — Ranked by Hike, Height & Flow
  • 2026 Apple Picking Near Me — 34 U-Pick Apple Orchards in 41 States

The year + location + category in <title> AND <h1> is the proven pattern.

Build process (fastest = template fork)

When starting fresh, run a Codex job in tmux with the BUILD-PROMPT.md template (see references/build-prompt-template.md). Realistic costs:

  • 50 hand-curated entries + 200 routes: 4M tokens ($15-20)
  • Full reframe pass with state matrix: 2M tokens (~$8-12)
  • Harvest calendar / month grid: 1.5M tokens (~$5-8)
  • 50 → 100 entry expansion: 2M tokens (~$8-12)

Run via codex exec --full-auto --skip-git-repo-check "$(cat PROMPT.md)" in tmux. Expected runtime 30-90 minutes per major task.

Codex's recurring bugs to fix manually:

  • <title> ends with double brand suffix when metadata.template auto-appends. Strip | {Brand} from per-page titles.
  • Imports of helper modules without creating the module file (e.g. from "@/lib/farm-counties" while only writing the JSON, not the TS export). Always check the build immediately after Codex runs.
  • Sandboxed git lives at /root/.codex/memories/{project}-git. Fold into workspace git after run.

Monetization (per Rule #8 — match path to AOV)

The four proven patterns (full detail in references/monetization-patterns.md):

AOV tierBest monetization pattern
<$50Pattern 4 — affiliate + display ads + digital products (lifestyle blog stack)
$50-500Pattern 4 + selective Pattern 2 (vertical SaaS for the business owners)
$500-5,000Pattern 1 — lead gen (Stoddard's playbook); sweet spot
$5,000+Pattern 1 or Pattern 2 — high leverage, often YMYL risk

Plus Pattern 3 (crowdsourced premium à la GasBuddy) for hyperlocal time-sensitive data niches.

Frey's universal test: every successful directory helps users save time, save money, or make money. If it doesn't clearly do one of those three, monetization will be hard regardless of pattern.

Newsletter is the universal lever. Every directory should capture email from day 1. Use Buttondown ($9/mo for 1k subs), ConvertKit (free up to 10K), or self-hosted Listmonk on the existing VPS. Sell your own product through the newsletter, not ads.

Backlink strategy (Rule #9)

The only durable backlink play: manual outreach to .gov / .edu / tourism boards.

  • 1 hour/day, 60 days = 60 emails sent
  • Realistic conversion: 10-20% = 6-12 quality backlinks
  • Each .gov / .edu link is worth ~5-10 random wordpress.com links
  • Email template + target lists in references/outreach-template.md

No clever shortcut exists. Stoddard's literal answer: "I was just willing to do it."

Data acquisition + enrichment (the bottleneck for most directories)

Full workflow in references/data-pipeline.md. Summary:

80% of people who try to build a directory quit at the data step. The pipeline that gets you past it:

  1. OutScraper — initial Google Maps scrape (~$30, 50K-100K raw rows)
  2. Claude Code junk removal — strip closed/duplicate/wrong-niche (free)
  3. Crawl4AI niche verification — visit each website, confirm niche match (~$10 in tokens)
  4. Per-attribute enrichment passes — single-attribute prompts work much better than multi-attribute (~$5-10 each)
  5. Claude Vision image scoring — score scraped images for relevance (~$30)
  6. Service area / geo enrichment — with cross-validation against HQ
  7. Database import + page generation

Total cost for a 700-entity verified directory: ~$80-100 + Claude Code Max sub (~4 days of work).

Public datasets are the underrated unlock: USDA AMS Local Food Portal, USGS GNIS, federal AFDC, data.gov, state open data portals. Skip Step 1 entirely when public data exists.

Niche micro-targeting (Frey's strategic insight): don't compete on "senior living homes" (Place For Mom owns it); compete on "senior living homes for people with dementia" (1K+ monthly searches, way easier). The data pipeline above is what makes micro-niches viable in 2026 — thin AI content no longer ranks; deeply verified data per entity does.

Common pitfalls observed

  1. Picking a niche by KD alone, ignoring AOV. Mistake we made with UPick + Waterfall (both have low AOV; ceiling is $5-30K/yr lifestyle business, not $350K/yr Stoddard-tier).
  2. Trusting "Competition" score in Semrush as proxy for organic difficulty. It's paid-search competition. Always pull Ahrefs KD separately.
  3. Building before checking if a brand owns the head term. "HYROX gym near me" KD looks low (13) but hyrox.com (DR 77) owns 4 of top 10 results.
  4. No images on pages. Every top directory averages 11 images per page; ours had zero. Wikimedia Commons hotlinking is free.
  5. Title-suffix double-stamping (| Brand | Brand) when metadata template duplicates.
  6. No JSON-LD on homepage. Hub pages need WebSite + Organization + SearchAction. Detail pages alone is not enough.
  7. No email capture. The single biggest moat against AI Overviews is an email list. Both our sites currently have zero.

Files to read on session start

When invoked, READ in this order:

  1. references/methodology.md — full 9-rule framework + scoring rubric
  2. references/build-prompt-template.md — Codex prompt template for new builds
  3. references/data-pipeline.md — the 7-step Crawl4AI + Claude Code workflow for getting + verifying + enriching directory data at scale ($80-200 to build a 700-entity directory in days, not weeks)
  4. references/monetization-patterns.md — the 4 proven patterns (lead gen, vertical SaaS, crowdsourced premium, affiliate + display + digital products), with AOV-based pattern selection
  5. references/outreach-template.md — backlink outreach email templates + target lists
  6. references/gsc-structured-data.md — GSC structured-data hard bans, well-formed patterns, pre-deploy sweep. Read before writing or modifying any JSON-LD.

For specific deep-dives:

  • /root/.openclaw/workspace/research/stoddard-synthesis-2026-04-26.md — full Stoddard interview synthesis
  • /root/.openclaw/workspace/research/directory-100-reverse-engineering-2026-04-26.md — what the 100 biggest US directories share
  • /root/.openclaw/workspace/research/seo-validated-2026-04-26-round4-ahrefs.md — Ahrefs DR data on every defender we've measured
  • /root/.openclaw/workspace/research/ahrefs-methodology-2026-04-26.md — Ahrefs API workflow card

Honest realism

A directory site will not earn money on day 1. Realistic timeline:

  • Weeks 1-4: Build, deploy, submit to GSC + IndexNow + Bing
  • Weeks 4-8: First Google crawl + index
  • Weeks 8-12: First impressions appear in GSC
  • Weeks 12-26: First clicks, first 100 organic visitors
  • Months 6-12: First $1-100 in revenue if monetization is wired
  • Months 12-24: $100-1000/mo if all 9 rules are followed
  • Months 24+: Stoddard-scale revenue ($5K-30K/mo) requires Rules #8 + #9 (high AOV + .gov backlinks)

If a site doesn't show traffic by month 3, the niche was wrong. Cancel and re-validate before building #2.

Source freshness disclosure (2026-04-26)

This skill consolidates lessons from:

  • Tim Stoddard interview — recorded ~2024-2025
  • Frey/Greg interview round 1 — ~mid-2025
  • Frey/Greg interview round 2 — ~late 2025

SEO and AI search are evolving fast. Some specifics in these sources are already partially outdated by mid-2026:

ElementLikely staleness
Crawl4AI as primary scraperStill works but Firecrawl / Apify / Browserbase are managed alternatives
OutScraper for Google MapsLess reliable in 2026 due to tightened anti-bot — prefer Apify Google Places actor or SerpAPI
"Local SEO doesn't show AI Overviews" (Stoddard)Partially wrong by 2026 — ~30% of "near me" queries now show AI Overviews in some categories
Mediavine threshold of 25K sessionsNow 50K+
Raptive (formerly AdThrive) thresholdNow 100K+
"No need for llms.txt" (Stoddard)Becoming wrong — emerging standard worth adopting

What's MORE valuable in 2026 than these sources suggest:

  • Schema.org @graph composition (Google SGE rewards it heavily for AI citation)
  • Wikidata / Wikipedia entity matching (drives Knowledge Graph inclusion → ChatGPT/Perplexity citation)
  • Manual backlink outreach (AI-templated outreach now penalized)
  • Vertical SaaS combo (cheaper to build with Codex/Claude Code 2.0)
  • Newsletter monetization (ad rates declining, list-based selling growing)

Validate before you build. Re-check these specifics whenever someone runs the playbook on a new niche:

  • Is the head term still showing AI Overviews? (Use Grok web search to grab live SERP)
  • Are display-ad thresholds still where the source said?
  • Have the recommended tools been deprecated or replaced?

The methodology rules (#1 through #10) are durable. The implementation tactics need to be rechecked every 6-12 months.