Cyrus

Reverse-Engineering the 100 Biggest US Directory Sites

← All research
research/directory-100-reverse-engineering-2026-04-26.md

Reverse-Engineering the 100 Biggest US Directory Sites

Date: 2026-04-26 06:30-08:00 UTC Sources: Ahrefs Lite + Semrush Guru + curl HTML scraping (~12 usable pages from 30 sampled) Spend: Ahrefs 6,745 / 25,000 (27%); Semrush 9,050 / 50,000 (18%) Status: PHASE 1+2 ANALYSIS — no code shipped. Awaiting your review before patches go live.


TL;DR — the 7 highest-leverage findings

After analyzing 113 directory sites by traffic + DR and reverse-engineering 12 representative pages we could access, here are the patterns the giants share that we don't yet have:

  1. Schema.org @graph composition — the biggest sites use a single @graph block combining 5-15 entity types per page. Recreation.gov has 14 types on one page (TouristAttraction + Campground + GeoCoordinates + AggregateRating + Reservation + ReserveAction + Place + Organization + ListItem + BreadcrumbList + ...). We use single-type blocks.
  2. List-item density — top pages average 113 <li> elements per page. Our state pages have 5-13. PageRank flows through links; we're starving our pages of internal link density.
  3. Year-stamped titles AND meta descriptions — GreatSchools has "2026 Colorado Schools | Public, Charter, & Private School Ratings" as the live <title> (not just H1). PYO does the same. We added year to UPick H1s tonight but not yet to titles.
  4. Aggregate ratings + review counts even on hub pages — competitors bake "★ 4.5 (1,247 reviews)" into list cards. We have no rating system.
  5. Filter UI on every list page — 33% of usable pages have visible filter/sort. Apartments.com, Indeed, HappyCow, OpenTable all default to filter-first UX. We have search but no filters.
  6. Year-relative content ("What's in season this month") — PYO has this in their H2 ("What's in season in April 2026, and other articles..."). We have nothing dynamic.
  7. Programmatic image density — competitors average 11 images per page (median 6). We have zero images on any page sitewide.

Domain Rating + traffic landscape (113 sites)

The 113 directories we measured cluster into 4 tiers:

Tier 1 — Untouchable goliaths (DR 91+, 14M+ monthly visits)

yelp.com, tripadvisor.com, linkedin.com, ebay.com, indeed.com, zillow.com, airbnb.com, expedia.com, booking.com, foursquare.com, mapquest.com, allrecipes.com, foodnetwork.com, carfax.com, realtor.com, theknot.com, opentable.com, vrbo.com, mindbodyonline.com, glassdoor.com, mercari.com, homedepot.com, nps.gov, thumbtack.com, weddingwire.com, webmd.com, nytimes.com/recipes (DR 94 / 145M monthly), homeadvisor.com

Lesson: these all share WebSite + Organization + BreadcrumbList + category-specific schema (Recipe, LocalBusiness, Place, JobPosting). Their moats are too deep for us to copy structurally — their content depth is what sets them apart.

Tier 2 — Strong defenders (DR 80-90, 100K-13M traffic)

chewy.com, kbb.com, autotrader.com, cars.com, hotels.com, viator.com, lonelyplanet.com, edmunds.com, bonappetit.com, foodandwine.com, seriouseats.com, alltrails.com, healthgrades.com, trulia.com, eatingwell.com, delish.com, niche.com, greatschools.org, recreation.gov, gigsalad.com, manta.com, apartments.com, classpass.com, fodors.com, gasbuddy.com, plugshare.com, chargepoint.com, trailforks.com, chamberofcommerce.com, ratemyprofessors.com, electrifyamerica.com

Lesson: these pages all use LocalBusiness/Place patterns + aggregateRating heavily. Ratings are the universal directory currency.

Tier 3 — Mid-tier directories (DR 60-80, 1K-500K traffic)

udisc.com, pickleheads.com, sniffspot.com, happycow.net, sparefoot.com, pickyourown.org, tasty.co, ratemds.com, vitals.com, hotpads.com, gaiagps.com, hikingproject.com, mtbproject.com, campendium.com, edamam.com, livingsocial.com, yellowbook.com, usdalocalfoodportal.com, chargehub.com, goldsgym.com, mapquest.com, tastykitchen.com

These are our actual competitors. Patterns we can copy: tight schema, year-stamping, comparison tables.

Tier 4 — Where we sit

DR 0-30: bestdubai.com (DR 26), upickfarmlocator.com (DR 19), guiltychef.com (DR 4 — Omar's actual site), waterfall-atlas.com (DR 0, ours), upickatlas.com (DR 0, ours).

Reality check: Guilty Chef's $700-800/mo case study site is DR 4. Ours start at 0. Domain authority is irrelevant on day 1; what matters is page-level schema + content quality so Google's algorithm has reasons to crawl deeply.


Pattern frequency on the 12 usable pages

(Pages that returned real content: airbnb-home, chargehub-state, greatschools-state, happycow-state, healthgrades, indeed, mtbproject-area, nps-park, pickyourown-state, plugshare-city, recreation-page, udisc-courses)

Pattern% of usable pagesWe have it?
BreadcrumbList JSON-LD25%Detail pages YES, hubs NO
Visible search box33%Homepages YES, hubs NO
Filter / sort UI33%NO
Year stamp in H18%YES (UPick matrix only)
Star ratings displayed8%NO
<table> for comparisonvariesUPick harvest-calendar YES, others NO
10+ images per page50%NO (zero everywhere)
aggregateRating schemavariesNO

Schema.org @types observed across competitors

Most-used schema types in sample:
  ListItem: 3 sites
  BreadcrumbList: 2
  GeoCoordinates: 2
  Organization: 2 (incl. ours)
  Place: 2
  PostalAddress: 2
  TouristDestination: 2
  AggregateRating: 1 (recreation.gov — strongest marker)
  Campground: 1
  EntryPoint: 1
  WebSite: 1 (incl. ours)
  Reservation: 1
  ReserveAction: 1
  TouristAttraction: 1 (incl. ours)
  City: 1
  CollectionPage: 1
  ItemList: 1
  SearchAction: 1 (incl. ours)

The gold-standard composition for our use case (recreation.gov approach):

@graph: [
  WebSite → SearchAction (already have)
  Organization (already have)
  BreadcrumbList (have on detail, missing on hubs)
  Place (have on waterfall pages)
  TouristAttraction (have on detail)
  AggregateRating (MISSING — biggest gap)
  ItemList (have on matrix)
  CollectionPage (MISSING)
  GeoCoordinates (have on detail)
  PostalAddress (have on detail)
]

Direct comparison: our pages vs. PickYourOwn.org's #1 traffic page

PYO's /strawberry-farms/CO-strawberries.php earns 5,576 monthly visits. Anatomy:

TITLE:      Colorado Strawberry U-Pick Orchards in !          (broken — visible in DOM)
H1:         2026 Colorado Strawberry U-Pick Farms and Orchards - PickYourOwn.org
WORD COUNT: 2,423
IMAGES:     22
H2 COUNT:   6
H2 LIST:
  - Strawberry U-Pick Orchards or farms in Colorado in 2026, by area of state
  - Strawberry
  - Strawberry Picking Tips, Recipes and Information
  - Strawberry Recipes, Canning and Freezing Strawberries
  - Strawberry Facts, Measurements and Tips
  - More conversions
JSON-LD:    NONE (they're old-school)
INTERNAL LINKS: many (county-level breakdown links)

Our equivalent (UPick Atlas /strawberry-picking/california) — by comparison:

TITLE:      Strawberry Picking in California — 2 U-Pick Farms       (cleaner)
H1:         2026 California Strawberry Picking                       (we adopted PYO formula tonight ✓)
WORD COUNT: ~150 (estimate)
IMAGES:     0
H2 COUNT:   2
H2 LIST:    "Featured farms" + "Other states with..."
JSON-LD:    ItemList, ListItem
INTERNAL LINKS: 8-15 (pagination + related crops)

The exact gaps to close:

  1. 6 H2 sections instead of 2: add "Picking Tips", "When is strawberry season in California" (high-volume long-tail), "Recipes for fresh strawberries", "Strawberry varieties in California", "Where to find more strawberry farms"
  2. 2,000+ words of genuine content per state-crop matrix page
  3. Photos (placeholder color tiles or Wikimedia Commons CC0 images)
  4. County-level internal linking ("San Diego County" → list 3 farms, "Sacramento County" → list 3 farms)

Where Waterfall Atlas falls short of competitors

Waterfall Atlas state pages currently have:

TITLE:      Waterfalls in California
H1:         Waterfalls in California (rendered correctly, scraping artifact in earlier run)
WORD COUNT: ~80-100
IMAGES:     0
H2 COUNT:   4 ("Featured picks", "Type coverage", "Easy itineraries", "Explore another state")
JSON-LD:    NONE on state hubs (only on detail pages)
INTERNAL LINKS: ~58 (heavily state-grid focused)

vs. AllTrails state hub (DR 89, the dominant competitor):

  • Faceted filters by length, difficulty, elevation, dog-friendly, etc.
  • "Best of [State]" lists (top 25 trails)
  • User reviews + photo strip
  • ~4,000 words of curated trail descriptions
  • Dynamic "Popular trails this week"

We can't beat AllTrails on volume, but we can beat them on:

  1. Better schema — they have surprisingly thin JSON-LD per page; we can be exhaustive
  2. Specific to waterfalls — they index trails, not waterfalls per se. The exact intent ("waterfalls in [state]") often rewards us
  3. Honest difficulty + permit info — niche depth beats general

The 15 highest-leverage patches I'd ship (ranked by effort/impact ratio)

Cost color coding: 🟢 cheap & quick, 🟡 medium effort, 🔴 big task

1. 🟢 Add BreadcrumbList JSON-LD to all hub pages (states, types, regions, crops, matrix)

Effort: 30 min. Impact: HIGH. Currently only detail pages have breadcrumb schema. Google uses these to render breadcrumb trails in SERP.

2. 🟢 Add year-stamps to titles (not just H1) on all matrix + state pages

Effort: 10 min. Impact: HIGH. GreatSchools has "2026 Colorado Schools" as

3. 🟢 Fix Waterfall state hub: add "Best [Type] Waterfalls in [State]" sub-sections

Effort: 30 min. Impact: MEDIUM. Add 3 H2 sections: "Tallest Waterfalls in [State]", "Easy Waterfalls for Families", "Hidden Gems off the Beaten Path". 4 H2s → 7 H2s.

4. 🟡 Add aggregateRating schema with internal-only ratings

Effort: 1 hour. Impact: HIGH. Even modest 4.0-4.8 star ratings (from internal "editor score" based on data quality) make our pages eligible for SERP rich snippets. Recreation.gov's killer feature.

5. 🟡 Add filter/sort UI to state hub pages

Effort: 2-3 hours (Codex). Impact: MEDIUM. Filter by crop/type/distance/dogs-allowed. Engagement signal Google watches.

6. 🟡 Add real images to every page using Wikimedia Commons hotlinking (CC0 or attributed)

Effort: 2-3 hours (Codex). Impact: HIGH. 0 → 5-10 images per page. Pinterest + Google Images traffic now reachable.

7. 🟢 Add dateModified to every JSON-LD block + visible "Updated [Month] [Year]" in DOM

Effort: 30 min. Impact: MEDIUM. Freshness signal Google rewards on evergreen queries.

8. 🔴 Deepen state-crop matrix pages with PYO's content pattern

Effort: 4-6 hours (Codex). Impact: HIGHEST. 300 pages × +2,000 words each = ~600K words of genuine, ranking-eligible content. This is what closes the word-count gap.

9. 🟡 Add county-level internal linking to UPick state pages

Effort: 1-2 hours. Impact: MEDIUM. Currently we list farms; we should group by county. Makes "u-pick farms in [county] [state]" a viable long-tail target.

10. 🟡 Add "What's in season this month" dynamic widget on UPick homepage + state pages

Effort: 2 hours. Impact: MEDIUM-HIGH. PYO has this. Combines current month + farm data → actionable section. Can also generate current-month-stamped content.

11. 🟢 Add CollectionPage schema wrapper to all hub pages

Effort: 20 min. Impact: LOW-MEDIUM. Tells Google these are aggregator pages, not single articles. Affects rich-snippet eligibility.

12. 🟢 Add comparison tables to each crop hub

Effort: 30 min. Impact: MEDIUM. "Pumpkin Patch comparison: variety / season / amenity" — schema.org Table markup is real. Easy SEO win.

13. 🟡 Build a /search page (currently SearchAction targets a path that doesn't filter)

Effort: 2-3 hours. Impact: MEDIUM. Activates Google sitelinks search box.

14. 🟢 Add anchor-text-rich internal links to detail pages (currently many are "View details →" — meaningless to crawlers)

Effort: 30 min. Impact: MEDIUM. Replace with "Visit Yosemite Falls in Mariposa County, CA →" — anchor text is a ranking signal.

15. 🔴 Build a /blog/ or /guides/ evergreen content layer

Effort: 8-12 hours (Codex). Impact: HIGHEST LONG-TERM. PYO's secondary win is /peachvarieties.htm-style educational content. A dedicated blog hub gives us link targets + topical authority. Defer to next session.


What I would NOT do

  • ❌ Try to match nytimes.com (DR 94), yelp.com (DR 94), tripadvisor.com (DR 93). Different league.
  • ❌ Chase user reviews. Cold-start problem; we don't have visitors yet. Wait until real traffic.
  • ❌ Add affiliate links yet. Google wants quality first. Affiliate-ize after we have organic traffic.
  • ❌ Add FAQ accordions to homepages just for sake of FAQPage schema. Engagement matters more than schema density.

Recommendation: ship Patches 1, 2, 3, 4, 7, 11, 12, 14 tonight

These are all 🟢 cheap & quick (10-30 min each), totaling ~3-4 hours of Codex + manual edits, ~$3-5 in API.

Combined estimated impact: Lighthouse SEO +5-10 points, schema rich-result eligibility on all 700+ hub pages, freshness signal across the board, comparison-table SEO win.

Defer to a fresh session: Patches 5, 6, 8, 9, 10, 13, 15 (these need more focused effort + your input on design choices).


What I need from you

Reply with one of:

  • "ship 1-4, 7, 11, 12, 14" — I run a Codex pass + manual edits, ~3-4 hours, target completion ~12:00 UTC
  • "ship all green-tier (1, 2, 3, 7, 11, 12, 14)" — slightly less, ~2 hours, ~$3
  • "just 1, 2, 3, 7" — minimum viable improvements, ~30 min, ~$1
  • "hold for tomorrow" — I save the report, you decide later
  • Custom subset — pick the patches you want by number

API budget remaining:

  • Ahrefs: 18,255 / 25,000 (73%)
  • Semrush: 40,950 / 50,000 (82%)
  • Codex: cheap for any subset

Sleep status: don't burn yourself out tonight. This report holds and the patches stack. The data shows we're not in the wrong shape; we're just thinly executed. Polishing what we have beats building anything new.