Cyrus

Directory Niche Analysis — Methodology

← All research
research/niche-analysis-methodology.md

Directory Niche Analysis — Methodology

Date: 2026-04-26 Author: Cyrus Purpose: A repeatable, evidence-based process for finding genuinely winnable directory niches. Built on what we've learned validating 15+ candidates against real Semrush data.


Why most niche analysis is wrong

The mistake almost everyone makes (including me, in earlier rounds): looking at search volume and competition score alone.

  • "boba near me" has 450K volume and 0.11 competition — looks great. KD = 69. SERP is locked up by Yelp/DoorDash. Unwinnable.
  • "u pick farms georgia" has 20 volume — looks dead. But it's part of a pattern that aggregates to millions of searches and KD 5-10 across all variants. Real winner.

The right question isn't "is this keyword good?" It's "can a programmatic-SEO directory built by one operator credibly rank in the top 5 for the long-tail of this niche within 6-18 months at a cost under $1,000?"

That's a 4-dimensional question. Volume × Competition × Data Availability × Defensibility.


The 7-step framework

Step 1 — Frame the search universe (NOT the keyword)

A niche is a search universe, not a single keyword. UPick Atlas isn't trying to rank for "u pick farms" — it's trying to rank for the family of queries:

[verb] + [crop] + [location modifier]
where verb ∈ {pick, picking, u-pick, you pick, pumpkin patch, apple orchard, ...}
where crop ∈ {pumpkins, apples, strawberries, ...}
where location ∈ {near me, in [state], in [city], on [route]}

That gives ~20 verbs × 15 crops × 200 locations = 60,000 distinct keyword variations. Almost all individually low-volume but collectively massive.

A niche is winnable if:

  • The variation pattern aggregates to >50K monthly searches in total
  • Individual variations are mostly KD 0-25 even if the head term is KD 60+
  • The pattern produces enough distinct pages to be programmatic (>500)

Test for the pattern:

  1. Sample 5 head-term keywords (high volume)
  2. Sample 10 mid-tail (moderate volume + 1 modifier)
  3. Sample 10 long-tail (specific city / state / variant)
  4. Sum the volumes; if total > 50K/mo, the universe is large enough to matter
  5. Check the KD distribution — if mid + long-tail KD < 25, programmatic SEO can win

This is what UPick passed with flying colors. EV charging passed on volume but failed on KD distribution.

Step 2 — Identify the SERP shape, not just the SERP

Open Google for each head-term keyword. Don't look at the rankings. Look at WHO ranks. Categorize the top 10 into:

SERP archetypeBeatable?Why
A) Government / .gov / .edu❌ NoInfinite domain authority, can't be outranked
B) Big-brand aggregator (Yelp, TripAdvisor, DoorDash, Amazon, eBay)⚠️ HardMassive backlinks, but their pages are often thin and outrankable on long-tail
C) Category leader (AllTrails, BringFido, Niche.com, Untappd)❌ Mostly noBuilt moat, defends actively
D) Network operator (ChargePoint, Tesla, Marriott, etc.)❌ NoBrand intent — searchers want them specifically
E) Single-business sites (one farm, one shop, one venue)✅ YesNo coordinated structure
F) Mom-blog listicles✅ YesThin, dated, no schema
G) Reddit / forums✅ YesUnstructured opinion threads
H) Regional tourism / city government⚠️ MixedHigh authority but narrow per page

Rules of thumb:

  • Top 3 = A or C → abort, unwinnable
  • Top 3 = D → abort if brand intent dominates (Tesla supercharger), proceed if generic intent (e.g. "ev charging stations near me" allows aggregator entry)
  • Top 10 = mostly E + F + G + H → ⭐ this is your opening. Programmatic directory wins here.
  • Top 10 = mix of A/C + E/F → check what % of traffic is captured by E/F. If > 30%, there's a wedge.

Real examples:

  • UPick: "pumpkin patch near me" → top 10 = single farms + Yelp + mom blogs. Beatable. ✅
  • EV charging: "ev charging stations near me" → top 10 = afdc.energy.gov + ChargePoint + ElectrifyAmerica. ❌
  • Waterfalls: "waterfalls near me" → top 10 = mom blogs only. ⭐ Very beatable.
  • Hiking: "hiking trails near me" → top 10 = AllTrails + AllTrails + AllTrails. ❌

Step 3 — Score competitors, not search results

For every candidate niche, identify the 3-5 most-likely defenders (not just whoever's #1 today). Pull from Semrush:

  • Total organic keywords
  • Total monthly organic traffic
  • Estimated organic traffic value (Semrush "Organic Cost")

A defender is dangerous if:

  • They have >100K organic keywords AND
  • They have >100K monthly organic visits AND
  • Their traffic value > $50K/mo

A defender is beatable if:

  • They have <50K organic keywords AND/OR
  • Their content quality is visibly weak (manual SERP inspection)
  • Their structured data is missing or broken

Worked examples:

  • PickYourOwn.org: 39,815 keywords, 7,958 visits, $678 value → beatable
  • BringFido: 1,090,114 keywords, 602,453 visits → fortress
  • AllTrails: 2,426,279 keywords, 5,402,762 visits → untouchable
  • Niche.com: 3,332,472 keywords, 6,962,506 visits → untouchable
  • afdc.energy.gov: 800K keywords, 2.6M visits, +.gov domain → untouchable

Heuristic: if any single defender has >5× the keyword footprint you could realistically build in 12 months, abort.

Step 4 — Verify the data exists and is acquirable

Programmatic SEO requires real, structured data at scale. Three quality tiers:

Tier 1 — Free public dataset. USGS waterfalls (17K), USDA farms (~18K), DoE EV chargers (70K), PDGA disc golf courses (14K). Best-case scenario.

Tier 2 — Scrape-able with effort. Yelp, individual operator websites, regional aggregators. Workable but expensive (Firecrawl + LLM enrichment costs add up at scale).

Tier 3 — Locked / pay-walled / API-restricted. Yelp paid API, Niche.com proprietary review data, Untappd's beer database. Don't bother — you're competing with the incumbent who already has the data.

Test: before scoring a niche, ask:

  1. Where would I get the seed list? (Federal database, scraping, hand-curation?)
  2. How long would acquisition take? (Hours, days, weeks?)
  3. What's the per-entity enrichment cost? (Use Firecrawl: ~$0.001/page, plus LLM ~$0.005/extraction)
  4. Can I refresh the data quarterly? (Vital — stale directories die fast)

Anything that requires more than ~$100 in data acquisition for the first 1,000 entities is suspect.

Step 5 — Score schema.org fit

Google rewards structured data heavily on directory queries. Niche must fit cleanly into one of these schema types:

SchemaBest for
LocalBusiness (and subtypes)Anything with a physical location
TouristAttractionOutdoor, recreational, visit-worthy spots
PlaceGeneric geo-anchored entity
ProductComparable products
RecipeFood recipes
EventTime-bound happenings
EVChargingStationEV chargers (Google has dedicated schema)
SportsActivityLocationDisc golf, pickleball, climbing
EducationalOrganizationSchools

Red flag: if your niche doesn't fit any standard schema cleanly, search engines will struggle to understand it. Boba shops technically use LocalBusiness but Google doesn't reward boba-specific structure — there's no BobaTeaShop type.

Bonus: niches where Google already shows rich-result schema in SERP carousels are gold. Recipes, events, products — visible rich snippets directly correlate to clicks.

Step 6 — Calculate revenue ceiling

This is where most analysts hand-wave. Force yourself to do the math.

Formula:

Annual revenue = (Total niche search volume) × (your capturable share %) × (RPM per visit) × 12
  • Capturable share %: realistic 0.5-3% for new sites, 5-15% for sites that "win" the niche over 2 years
  • RPM per visit (revenue per 1,000 sessions):
    • Display ads (Mediavine eligibility at 50K sessions): $20-40 RPM
    • Affiliate (general): $5-15 RPM
    • Affiliate (high-value lead-gen — storage, finance, services): $50-200 RPM
    • Sponsored placements: $10-30 RPM at scale

Example: UPick Atlas

  • Total niche search volume: ~2M/year (combined seasonal patterns)
  • Capturable share at 12 months: 1.5% (optimistic for a new site) = 30K visits/year
  • RPM (mostly seasonal, mid display + farm sponsorships): ~$15
  • Annual revenue ceiling: 30K × $15 / 1000 = $450/year first year, $2-5K/year by year 2

Example: Waterfalls Atlas

  • Total niche search volume: ~3M/year (110K head + long-tail)
  • Capturable share at 12 months: 2% = 60K visits/year
  • RPM (outdoor/travel = decent display, REI/Backcountry affiliate): ~$15
  • Annual revenue ceiling: ~$900 first year, $4-8K/year by year 2

Example: Self-Storage Comparison

  • Total niche search volume: ~10M/year
  • Capturable share: 0.3% (heavily defended) = 30K visits/year
  • RPM (storage lead-gen pays $20-40/lead, but conversion is low): ~$80 effective
  • Annual revenue ceiling: $2,400 first year, $30K+/year if you capture 1%

Notice: self-storage has the highest ceiling but lowest probability of capture. Compute expected value, not maximum.

Final formula:

Expected annual revenue = ceiling × probability of capture

Probability of capture estimate:

  • 70%+ if all 6 framework gates pass (volume, SERP shape, defenders, data, schema, ceiling math)
  • 30-50% if 4-5 pass
  • <20% if 3 or fewer pass — abort

Step 7 — Defensibility check

After 12-24 months, can you defend what you built? Three threats:

Threat A — Google algorithm. Programmatic AI-generated content is squarely targeted by Google's helpful-content updates. Hedge by:

  • Adding genuine human-generated content per page (FAQ from real research, not LLM)
  • Real photography (not stock)
  • User-generated reviews if possible
  • Update logs visible on each page (proves the data is fresh)

Threat B — Incumbent reaction. If you're capturing meaningful traffic from a defender (BringFido, AllTrails, etc.), they'll notice and fight. Hedge by:

  • Pick niches where the incumbent is structurally bad (mom blogs, Reddit) not just lazy
  • Build a moat the incumbent can't easily copy (real-time data, user contributions, depth no one else has)

Threat C — Clones. Once you prove a niche works, copycats follow within months. Hedge by:

  • Get domain authority before publishing your playbook
  • Lock in user-generated value (reviews, submissions) early
  • Operate 2-3 sites in adjacent niches to share authority across them

The combined scoring rubric

For every candidate niche, score 1-5 on each dimension:

DimensionTestScore 5Score 1
Search universe sizeSum head + mid + long-tail volume>100K/mo aggregate<20K/mo aggregate
Volume × low KD intersectionAvg KD on long-tail<15>40
SERP shapeTop 10 archetype mixAll E+F+GA or C dominates
Strongest defenderTheir organic traffic<20K/mo>500K/mo
Data availabilityTier of sourceTier 1 freeTier 3 locked
Schema fitStandard schema typeDirect fitNo good schema
Revenue ceilingMath from formula>$10K/yr year 2<$1K/yr year 2
Defensibility3-threat resistanceStrong on all 3Weak on 2+

Total /40. Action thresholds:

  • 32-40: BUILD IT
  • 24-31: Build only if the alternative is doing nothing
  • 16-23: Skip
  • <16: Don't waste an hour on it

This is the rubric I'll use going forward. Every niche we've discussed scored according to this, retroactively:

NicheScoreDecision
UPick Atlas36✅ Built
Waterfalls Atlas35⭐ Best new candidate
Drive-In Theaters31Quick-win option
Disc Golf28Skip — UDisc defends
Scenic Drives26Pair with Waterfalls
EV Charging26Skip — gov dominance
Self-Storage24Skip — defended hard
Coworking22Skip
Wineries/Breweries20Skip
Boba16Skip
Pickleball26Skip — Pickleheads defends
Dog Parks18Skip — BringFido fortress
Private Schools18Skip — Niche.com fortress
Hiking Trails8Don't even look

Methodology Rule #8 — The Rich People Filter

Added 2026-04-26 from Tim Stoddard interview synthesis. Source: research/stoddard-synthesis-2026-04-26.md.

Before scoring volume + KD + DR + SERP shape, ask: What is the average transaction value (AOV) of the underlying business this directory serves leads to?

AOV tierRealistic ceilingMonetization paths
<$50 (zero-ticket activities, free services)Lifestyle scale only ($5-30K/yr)Display ads, small affiliate, digital products
$50-500Mid (with perfect execution + heavy traffic)Display + affiliate + sponsored placements
$500-5,000Sweet spot for solo operator ($100-500K/yr possible)Lead gen, premium listings
$5,000+High-leverage but YMYL risk + harder lead verificationLead gen at $50-500/lead

Apply BEFORE volume research. Filters out 50% of candidates in 30 seconds.

Stoddard's $350K/yr came from rehab leads at ~$200-500/lead. AOV of underlying transaction: $30K. That's why the math works.

Applied to our existing portfolio:

  • UPick Atlas (~$30 AOV per farm visit) → lifestyle ceiling
  • Waterfall Atlas (~$300 effective if hotel-affiliate-attached) → mid-tier
  • Plant Medicine Retreats ($5K-15K) → sweet spot — Stoddard already builds here
  • Stem Cell Clinics ($3K-50K) → sweet spot but YMYL
  • Disc Golf, EV Charging, Drive-Ins, Alpaca Farms → all fail this gate

Methodology Rule #9 — Institutional Outreach Strategy

Added 2026-04-26 from Tim Stoddard interview synthesis.

The single most durable backlink play available to a solo operator in any directory niche: manual outreach to local government, education, and tourism institutions.

Stoddard's actual playbook from Sober Nation (which became DR 72):

  1. Local municipalities (city/county .gov sites) — "We built a resource your residents are searching for"
  2. Universities (.edu) — If your topic touches student wellbeing, dropouts, retention, or campus life
  3. State / regional tourism boards (.org or .gov) — If your topic is location-anchored
  4. State agencies — Departments of agriculture, parks, health, transportation depending on niche

Why this beats keyword-based content marketing for backlinks:

  • .gov / .edu domains carry disproportionate ranking weight in Google's algorithm
  • These institutions don't do reciprocal link asks (so the link is permanent and editorial)
  • They're under-crawled by AI link-builders and SEO tool scrapers (no cold outreach saturation)
  • They reply to genuine offers because most of their content is volunteer-maintained

Operational pattern:

  • 1 hour/day, 60 days = 60 emails sent
  • Realistic conversion: 10-20% of emails get a link = 6-12 quality backlinks
  • Each .gov / .edu link is worth 5-10 random wordpress.com links
  • Cumulative effect: structurally outranks competitors with "fake moats" (lots of low-quality links)

Stoddard's growth-hack variant: user-submitted stories. Sober Nation's "sober stories" feature. Each submission becomes:

  • Indexable unique page (programmatic content)
  • Self-distributing (author shares it)
  • Engagement signal Google rewards

Application to our sites:

UPick Atlas targets:

  • 50 state agriculture departments (.gov)
  • USDA local food portal partnership
  • ~3,000 county tourism boards (.org)
  • ~1,500 agriculture extension offices (.edu)
  • Local family-blog publishers (low-DR but contextually relevant)

Waterfall Atlas targets:

  • US Forest Service regional offices
  • USGS hydrology programs
  • 50 state parks departments
  • ~500 state tourism boards
  • National park visitor centers
  • Regional outdoor recreation councils
  • Photography clubs (.org)

Don't outsource this. Stoddard's literal answer to "what's your cold outreach hack" was: "I was just willing to do it." An hour a day, manually, for two months. That's the hack.


Where my framework can still fail

Honest list of things this methodology doesn't catch:

  1. Google's mood. A "helpful content" update can crush programmatic AI sites overnight. No analysis catches this until traffic dies.
  2. Trend reversal. Pickleball today, dead in 5 years? Disc golf trends actually decelerating? Trend data lags reality.
  3. CPM volatility. Display ad rates fluctuate 30-50%. Last year's $25 RPM might be $14 this year.
  4. Hidden compliance costs. Some niches (private schools, daycare) have GDPR/COPPA/data-privacy implications I haven't priced in.
  5. Local-pack monopoly. "near me" queries increasingly trigger Google Maps/local-pack results that bypass directories entirely. UPick is at risk here.

These are why probability of capture is never 100% even on a 40/40 niche. Plan for failure modes.


Process for next time

  1. Brainstorm 30 candidates (don't pre-filter, generate widely)
  2. Run Step 1 + Step 2 on all 30 (fast — checks SERP shape via 1 query each)
  3. Eliminate ~20 that fail SERP-shape test
  4. Run Step 3 + Step 4 on remaining ~10 (medium effort — pulls competitor data)
  5. Eliminate ~5 that fail competitor or data tests
  6. Run Step 5-7 in depth on remaining 3-5
  7. Pick #1, build it. Don't spread yourself across 3 — concentrate.

Estimated Semrush API budget per round: ~3,000-5,000 units (6-10% of monthly Guru allowance).


What we should do differently next time

Based on what I got wrong this cycle:

  1. Always pull KD scores, not just Competition scores. They diverge wildly (boba comp 0.11 / KD 69; drive-in comp 0.06 / KD 65).
  2. Always check the SERP shape before estimating capture probability. I overrated EV Charging because I didn't see afdc.energy.gov dominating until I pulled the live SERP.
  3. Always size up the strongest 2-3 defenders, not just the obvious one. PlugShare looked like the only EV directory; ChargePoint with 442K visits is the bigger threat.
  4. Force yourself to compute revenue ceiling. Most niches we considered have $1K-3K/year ceilings, not the $1K-3K/month we casually claimed.

This methodology now lives in the workspace. Future niche research goes through these 7 steps.