Top-10 Directory Deep Analysis — what the giants do that we don't

Date: 2026-05-05 18:10 UTC Method: Firecrawl JS-rendered scrape (concurrency 2) of homepage + hub + detail page across 10 reference directories, paired with raw-HTML JSON-LD extraction. Compared head-to-head against UPick Atlas, Waterfall Atlas, Hot Springs Atlas, Aurora Atlas. Files: raw extractions in /tmp/top10/*.json, aggregated CSV /tmp/top10/aggregated.csv, JSON-LD breakdown /tmp/top10/jsonld-summary.json. Spend: ~190 Firecrawl credits (709 / 899 remaining).

TL;DR — 6-line executive summary

Hub pages are our worst gap. Competitor hub pages average 4,400 words / 41 images / 38 list-items / 73% with ratings. Our hubs (/states, /states/al) are 213 words / 0 images / 0 ratings. The hub is where SERP intent ("waterfalls in California") lands — and we're shipping near-empty pages.
JSON-LD: we already do well at the homepage and detail level, but our hubs miss CollectionPage + BreadcrumbList. AllTrails/Yelp/Recreation.gov each ship 5–14 entity types per detail page; we ship 4–11. Closing this gap is hours, not weeks.
AggregateRating is the universal directory currency. 73% of competitor hubs and 83% of competitor detail pages render visible ratings; we render zero. Even an internal "editor score" earns SERP rich-snippet eligibility.
Year-stamping in <title> is rare among the top 10 (only TripAdvisor detail does it consistently — (2026)). That's a lever we already pull on most pages and should keep pulling, contrary to the giants — it's a small-site advantage.
Image density is the single biggest visible gap. Competitor hubs average 41 images, ours have 0. Even minimal Wikimedia/Unsplash hotlinks would move our pages from "thin aggregator" to "real directory" in Google's eyes.
Two patches will close more SEO ground than the next ten: (a) deepen state hubs to ~2,000 words with sub-sections + 5–10 images + visible "editor's pick" ratings; (b) add CollectionPage+BreadcrumbList+ItemList+AggregateRating to the hub schema graph.

Findings file: /root/.openclaw/workspace/research/top10-directory-deep-analysis-2026-05-05.md

What I scraped

Reference set (10 sites × up to 3 page types = 30 scrapes)

Site	Home	Hub	Detail
Yelp	✅	✅	✅
TripAdvisor	✅	✅	✅
Zillow	✅	✅	—
Airbnb	✅	✅	—
Booking.com	✅	✅	—
OpenTable	✅	✅ (blocked, partial)	✅ (blocked)
AllTrails	✅	✅	✅
Realtor.com	❌ (bot-blocked)	❌	—
GreatSchools	✅	✅	✅
Healthgrades	✅	✅	—
Recreation.gov	✅	—	✅
Apartments.com	✅	✅	—

Realtor.com served a bot-block page; OpenTable served partial pages. Everything else returned real, JS-rendered content.

Our set (8 scrapes)

UPick Atlas home + hub. Waterfall Atlas home + hub + detail. Hot Springs Atlas home + hub + detail. Aurora Atlas home (only live page).

The numbers — competitor vs. us

Homepages (n=12 competitors, 4 ours)

Signal	Competitors	Ours	Gap
H2 count	7.8 avg	4.0 avg	−3.8
Images	20.9 avg	0	−20.9 ❌
List-items	55.9 avg	30	−25.9
Internal links	27.4 avg	53.2 avg	+25.8 ✅
Word count	1,100	1,694	+594 ✅
Ratings visible	33%	25%	−8%
Year stamp	0%	75%	+75% ✅
Filter UI	0%	75%	+75% ✅

Read: Our homepages already over-deliver on word count, internal linking, year-stamping, and visible filter UI. Where we lose is images (20× behind) and list-items (link density to crawl targets).

Hub pages — our biggest gap (n=11 competitors, 3 ours)

Signal	Competitors	Ours	Gap
H2 count	2.9 avg	0.3 avg	−2.6
Images	41.3 avg	0	−41.3 ❌❌
List-items	38.5 avg	0	−38.5 ❌
Word count	4,403	213	−4,190 ❌❌
Internal links	29.0 avg	42.0 avg	+13 ✅
Ratings visible	73%	0%	−73% ❌
Filter UI	55%	33%	−22%
Search box	64%	0%	−64% ❌

Read: This table is the single most important finding in this report. Our state-hub pages (/states, /states/al) are essentially link grids with zero content depth. Booking.com's San Francisco hub is 7,500 words, 21 images, 88 list-items, ratings everywhere. AllTrails California hub is 5,000 words, 40 images, 126 list-items, "4.4 (7,067,442 reviews)" right next to the H1. Our equivalent is "State Hot Springs Hubs / 0 images / 0 H2s / 213 words". This is what's keeping us out of the long-tail SERPs.

Detail pages (n=6 competitors, 2 ours)

Signal	Competitors	Ours	Gap
H2 count	7.5	4.0	−3.5
Images	10.5	0	−10.5 ❌
List-items	43.8	4	−39.8
Word count	2,221	361	−1,860 ❌
Ratings visible	83%	0%	−83% ❌
Breadcrumb	67%	0% (visible UI)	−67%
JSON-LD types	1.0 (per-block; many blocks)	11 (Hot Springs detail)	(we win schema-volume)

Read: On schema, our detail pages are competitive — Hot Springs Atlas detail ships 11 entity types vs. AllTrails detail's 10. But we're missing the visible signals (images, ratings, longer body copy) that distinguish a real directory entry from a thin generated page.

JSON-LD: who ships what

This is the part of the analysis that surprised me most. I extracted JSON-LD from raw HTML on every page where I could:

Page	Blocks	Total `@type` count	Notable types
yelp-detail (Gary Danko)	4	13 unique	`Restaurant`, `AggregateRating`, `ImageObject`×42, `Person`×72, `Review`×24, `VideoObject`×6, `Place`×42
alltrails-detail (Lands End)	8	10	`LocalBusiness`×6, `AggregateRating`, `Review`×5, `Person`×5, `BreadcrumbList`, `WebPage`
recreation-detail (Upper Pines)	2	11	`Campground`, `AggregateRating`, `LocationFeatureSpecification`×8, `Place`×2, `Organization`×2, `Offer`, `BreadcrumbList`
tripadvisor-detail	3	10	`LocalBusiness`, `AggregateRating`, `WebSite`+`SearchAction`+`EntryPoint`, `BreadcrumbList`
greatschools-detail	2	7	`School`, `AggregateRating`, `Review`×5, `BreadcrumbList`
hotsprings-detail (ours)	2	11	`WebSite`+`Organization`, `BreadcrumbList`, `TouristAttraction`+`Place`, `GeoCoordinates`+`PostalAddress`, `FAQPage`+`Question`×5+`Answer`×5
waterfall-detail (ours)	2	10	similar to hot-springs (no `Place` type)
upickatlas-hub (ours)	2	6	`WebSite`, `Organization`, `BreadcrumbList`, `CollectionPage`, `ItemList`, `ListItem`×4
hotsprings-home / waterfall-home / upickatlas-home (ours)	1 each	2 each	only `WebSite` + `Organization`
auroratlas-home	1	2	`WebSite` + `Organization`
healthgrades-hub / opentable-hub	0	0	Their JS injects schema post-load; Firecrawl missed it

Three concrete schema wins for us

Add AggregateRating to detail pages. Every directory in the top tier does this. We can compute an internal "editor score" from data quality (verified address + GPS + photo + review confidence + last-updated freshness) → 4.0–4.8 stars. This makes our pages eligible for SERP star snippets — the highest-impact schema win available.
Add CollectionPage + ItemList + BreadcrumbList to every hub page that doesn't have it. UPick Atlas hub already has this composition (6 types). Waterfall Atlas hub and Hot Springs Atlas hub do not — they'd benefit from copying the UPick pattern verbatim.
For UGC-heavy directories (after we have any reviews) add Review + Rating types nested inside the LocalBusiness/TouristAttraction. Yelp ships 24 reviews per page in JSON-LD. AllTrails ships 5. We ship 0. This is a "wait until we have content" item — but the schema slot should be ready.

Patterns we should steal from specific top-10 sites

From AllTrails: lead with the rating

AllTrails California hub H2 #1 is literally 4.4(7,067,442 reviews). They put the social proof above everything else on the page. Even without reviews, we could do "Verified by Hot Springs Atlas — 4.5 editorial confidence" as a structured rating element near the H1. It moves the page from "list of links" to "curated authority".

From Booking.com: long-form FAQ + market snapshot on hub pages

Booking's San Francisco hub has H2 FAQs about hotels in San Francisco and Best hotels with breakfast in San Francisco and nearby. They turn one URL into a hub for 6–8 long-tail intents (parking, breakfast, late-night, etc.). For us:

Waterfall Atlas California hub could have H2s: "Best easy waterfalls in California", "Tallest waterfalls in California", "Year-round waterfalls in California", "FAQs about visiting California waterfalls".
Hot Springs Atlas California hub: "Free hot springs in California", "Family-friendly hot springs in California", "Wild vs. developed California hot springs", "Best California hot springs by season". Each H2 → 200-300 words → ~2,000-word hub in 30 minutes of Codex per state.

From Recreation.gov: feature-spec lists

Recreation.gov's Campground page ships 8 LocationFeatureSpecification JSON-LD blocks (one per amenity). We display amenity badges on Hot Springs detail pages but don't emit them as schema. Easy win: add amenityFeature array with LocationFeatureSpecification types.

From TripAdvisor: year stamp in detail title

Their detail title is Superstition Mountains (2026) — All You SHOULD Know Before Going. This is the formula our Hot Springs Atlas detail pages already use (Alvord Hot Springs — Day-use fee or pay-per-soak in Fields, OR (2026)). Confirmed: we're aligned with the highest-volume travel directory's title formula. Keep doing this and extend to UPick + Waterfall.

From Yelp: image-rich detail with `ImageObject` schema

Yelp's Gary Danko page ships 42 ImageObject JSON-LD entries. Even when we don't have 42 photos, we should ship every <img> with ImageObject schema (one per image, with contentUrl, caption, creator). This is what powers Google Image carousel inclusion.

From GreatSchools: visible "Compare and choose" widget

GreatSchools homepage H2 is literally Compare and choose. — they make comparison the homepage's primary CTA. We have a comparison-table opportunity on every state-crop matrix (UPick) and on state hub pages. Concrete patch: add a 3-column comparison table on each state hub ("Best for families | Best for adventure | Best free options").

What we already do better than the giants

It's worth marking these so we don't undo them in optimization passes:

Year-stamping in titles + H1s — only TripAdvisor detail does this consistently among the top 10. We do it on most matrix and detail pages. Keep.
Word count on homepages — our homepages average 1,694 words vs. competitor 1,100. We're not under-writing the homepage; we're under-writing the hub.
Internal link density on homepages — we have 53 vs. their 27. Don't reduce this; it's how Google discovers our long-tail pages.
Filter UI on homepages — 75% of our homepages have visible filter/sort vs. 0% of theirs (most save filters for the hub). Aligns with our small-site UX strategy of putting power in the user's hands faster.
JSON-LD on hub pages — UPick Atlas hub has CollectionPage + ItemList schema. Most competitor hubs (when scraped from raw HTML) don't ship that. Extend this composition to Waterfall Atlas and Hot Springs Atlas hubs.

Ranked patch list — by impact / effort

🟢 Tier 1 — Ship this week (high impact, <2 hours each)

Hot Springs Atlas + Waterfall Atlas: copy UPick Atlas's hub schema graph (CollectionPage + ItemList + BreadcrumbList). 30 min, applies to ~100 hub pages each. Impact: HIGH.
Add AggregateRating schema to every detail page with internal editorial-confidence score (4.0–4.8). 1–2 hours one-time, applies to all current + future detail pages. Impact: HIGHEST (rich-snippet eligibility).
Add visible "Verified" / "Editor's pick" badges on detail pages that mirror the rating schema. 30 min. Impact: MEDIUM-HIGH (CTR + trust).
Replace Waterfall + Hot Springs hub pages' empty H2 with 4 sub-sections each ("Best easy / Tallest / Year-round / FAQs about [state]"). 2 hours of Codex per atlas. Impact: HIGHEST (closes the 4,000-word hub gap).

🟡 Tier 2 — Schedule next 2 weeks (high impact, 2–6 hours each)

Image hotlinks from Wikimedia Commons (CC0/public domain) for at least the home + state hub + first 50 detail pages of each atlas. 4 hours. Impact: HIGHEST visible signal.
Hub-page word-count expansion to 1,500–2,500 words via sub-section content per state (UPick already has this for matrix pages; do the same for state hubs across Waterfall + Hot Springs). 6 hours of Codex per atlas. Impact: HIGHEST long-tail.
Add LocationFeatureSpecification schema for amenities on Hot Springs + UPick detail pages. 1 hour. Impact: MEDIUM (feature-rich snippets).
Add a comparison table (<table>) to every hub page ("Best for X | Best for Y | Best free"). 2 hours. Impact: MEDIUM (table-snippet eligibility).

🔴 Tier 3 — Defer (high impact but expensive or premature)

Real photography per detail page — defer until after image hotlinks prove their lift.
User reviews / UGC — cold-start; don't simulate.
Sitelinks search box activation — needs /search route to actually filter; defer to post-traffic.

❌ Don't do

Don't try to match the giants' image counts on homepages (they're CDN-backed; we're statically generated). 5–10 images is plenty.
Don't chase year-stamps on <title> for every page; we already do it on the highest-impact ones.
Don't add FAQ schema for the sake of FAQ schema. Hot Springs Atlas detail already has FAQPage+5×Q&A, which is competitive. Going further is diminishing returns.

Estimated combined impact of Tier 1 patches

If we ship patches 1–4 next session (~5–6 hours of focused Codex):

Lighthouse SEO score: +5–10 across all atlases
Schema rich-snippet eligibility: AggregateRating stars on ~700 detail pages
Hub page indexability: moves Waterfall + Hot Springs hubs from "thin content risk" to "competitive aggregator" tier
Time-to-rank for long-tail "[type] in [state]" queries: ~2–3 weeks faster
Estimated traffic lift over 90 days: +30–60% on hub pages (from current low baseline)

Cost: ~$5–10 in Codex API.

Open questions / things I'd want a human to sanity-check

Editorial confidence score formula for AggregateRating — I'd default to: 4.0 base + 0.2 verified address + 0.1 GPS confirmed + 0.1 image present + 0.1 phone/website + 0.1 reviewed in last 12 months. Cap at 4.8 (no perfect 5s). Want sign-off before shipping.
Wikimedia hotlinking risk — bandwidth-throttling exists but is rarely enforced. Acceptable risk for first 1,000 pages; revisit if we see 429s.
Year-stamp policy — keep aggressive (every matrix + detail page) or pull back (only on time-sensitive pages)? My read: keep aggressive — it's a small-site advantage the giants don't exploit.

Appendix — quick reference for next session

Aggregated CSV: /tmp/top10/aggregated.csv (38 rows, all observed metrics)
Per-page Firecrawl JSON: /tmp/top10/{site}-{home|hub|detail}.json
Full JSON-LD type breakdown: /tmp/top10/jsonld-summary.json
Schema used for extraction: /tmp/top10/schema.json

Top-10 Directory Deep Analysis — what the giants do that we don't

Top-10 Directory Deep Analysis — what the giants do that we don't

TL;DR — 6-line executive summary

What I scraped

Reference set (10 sites × up to 3 page types = 30 scrapes)

Our set (8 scrapes)

The numbers — competitor vs. us

Homepages (n=12 competitors, 4 ours)

Hub pages — our biggest gap (n=11 competitors, 3 ours)

Detail pages (n=6 competitors, 2 ours)

JSON-LD: who ships what

Three concrete schema wins for us

Patterns we should steal from specific top-10 sites

From AllTrails: lead with the rating

From Booking.com: long-form FAQ + market snapshot on hub pages

From Recreation.gov: feature-spec lists

From TripAdvisor: year stamp in detail title

From Yelp: image-rich detail with ImageObject schema

From GreatSchools: visible "Compare and choose" widget

What we already do better than the giants

Ranked patch list — by impact / effort

🟢 Tier 1 — Ship this week (high impact, <2 hours each)

🟡 Tier 2 — Schedule next 2 weeks (high impact, 2–6 hours each)

🔴 Tier 3 — Defer (high impact but expensive or premature)

❌ Don't do

Estimated combined impact of Tier 1 patches

Open questions / things I'd want a human to sanity-check

Appendix — quick reference for next session

From Yelp: image-rich detail with `ImageObject` schema