Top-10 Directory Deep Analysis — what the giants do that we don't
Date: 2026-05-05 18:10 UTC
Method: Firecrawl JS-rendered scrape (concurrency 2) of homepage + hub + detail page across 10 reference directories, paired with raw-HTML JSON-LD extraction. Compared head-to-head against UPick Atlas, Waterfall Atlas, Hot Springs Atlas, Aurora Atlas.
Files: raw extractions in /tmp/top10/*.json, aggregated CSV /tmp/top10/aggregated.csv, JSON-LD breakdown /tmp/top10/jsonld-summary.json.
Spend: ~190 Firecrawl credits (709 / 899 remaining).
TL;DR — 6-line executive summary
- Hub pages are our worst gap. Competitor hub pages average 4,400 words / 41 images / 38 list-items / 73% with ratings. Our hubs (
/states,/states/al) are 213 words / 0 images / 0 ratings. The hub is where SERP intent ("waterfalls in California") lands — and we're shipping near-empty pages. - JSON-LD: we already do well at the homepage and detail level, but our hubs miss
CollectionPage+BreadcrumbList. AllTrails/Yelp/Recreation.gov each ship 5–14 entity types per detail page; we ship 4–11. Closing this gap is hours, not weeks. AggregateRatingis the universal directory currency. 73% of competitor hubs and 83% of competitor detail pages render visible ratings; we render zero. Even an internal "editor score" earns SERP rich-snippet eligibility.- Year-stamping in
<title>is rare among the top 10 (only TripAdvisor detail does it consistently —(2026)). That's a lever we already pull on most pages and should keep pulling, contrary to the giants — it's a small-site advantage. - Image density is the single biggest visible gap. Competitor hubs average 41 images, ours have 0. Even minimal Wikimedia/Unsplash hotlinks would move our pages from "thin aggregator" to "real directory" in Google's eyes.
- Two patches will close more SEO ground than the next ten: (a) deepen state hubs to ~2,000 words with sub-sections + 5–10 images + visible "editor's pick" ratings; (b) add
CollectionPage+BreadcrumbList+ItemList+AggregateRatingto the hub schema graph.
Findings file: /root/.openclaw/workspace/research/top10-directory-deep-analysis-2026-05-05.md
What I scraped
Reference set (10 sites × up to 3 page types = 30 scrapes)
| Site | Home | Hub | Detail |
|---|---|---|---|
| Yelp | ✅ | ✅ | ✅ |
| TripAdvisor | ✅ | ✅ | ✅ |
| Zillow | ✅ | ✅ | — |
| Airbnb | ✅ | ✅ | — |
| Booking.com | ✅ | ✅ | — |
| OpenTable | ✅ | ✅ (blocked, partial) | ✅ (blocked) |
| AllTrails | ✅ | ✅ | ✅ |
| Realtor.com | ❌ (bot-blocked) | ❌ | — |
| GreatSchools | ✅ | ✅ | ✅ |
| Healthgrades | ✅ | ✅ | — |
| Recreation.gov | ✅ | — | ✅ |
| Apartments.com | ✅ | ✅ | — |
Realtor.com served a bot-block page; OpenTable served partial pages. Everything else returned real, JS-rendered content.
Our set (8 scrapes)
UPick Atlas home + hub. Waterfall Atlas home + hub + detail. Hot Springs Atlas home + hub + detail. Aurora Atlas home (only live page).
The numbers — competitor vs. us
Homepages (n=12 competitors, 4 ours)
| Signal | Competitors | Ours | Gap |
|---|---|---|---|
| H2 count | 7.8 avg | 4.0 avg | −3.8 |
| Images | 20.9 avg | 0 | −20.9 ❌ |
| List-items | 55.9 avg | 30 | −25.9 |
| Internal links | 27.4 avg | 53.2 avg | +25.8 ✅ |
| Word count | 1,100 | 1,694 | +594 ✅ |
| Ratings visible | 33% | 25% | −8% |
| Year stamp | 0% | 75% | +75% ✅ |
| Filter UI | 0% | 75% | +75% ✅ |
Read: Our homepages already over-deliver on word count, internal linking, year-stamping, and visible filter UI. Where we lose is images (20× behind) and list-items (link density to crawl targets).
Hub pages — our biggest gap (n=11 competitors, 3 ours)
| Signal | Competitors | Ours | Gap |
|---|---|---|---|
| H2 count | 2.9 avg | 0.3 avg | −2.6 |
| Images | 41.3 avg | 0 | −41.3 ❌❌ |
| List-items | 38.5 avg | 0 | −38.5 ❌ |
| Word count | 4,403 | 213 | −4,190 ❌❌ |
| Internal links | 29.0 avg | 42.0 avg | +13 ✅ |
| Ratings visible | 73% | 0% | −73% ❌ |
| Filter UI | 55% | 33% | −22% |
| Search box | 64% | 0% | −64% ❌ |
Read: This table is the single most important finding in this report. Our state-hub pages (/states, /states/al) are essentially link grids with zero content depth. Booking.com's San Francisco hub is 7,500 words, 21 images, 88 list-items, ratings everywhere. AllTrails California hub is 5,000 words, 40 images, 126 list-items, "4.4 (7,067,442 reviews)" right next to the H1. Our equivalent is "State Hot Springs Hubs / 0 images / 0 H2s / 213 words". This is what's keeping us out of the long-tail SERPs.
Detail pages (n=6 competitors, 2 ours)
| Signal | Competitors | Ours | Gap |
|---|---|---|---|
| H2 count | 7.5 | 4.0 | −3.5 |
| Images | 10.5 | 0 | −10.5 ❌ |
| List-items | 43.8 | 4 | −39.8 |
| Word count | 2,221 | 361 | −1,860 ❌ |
| Ratings visible | 83% | 0% | −83% ❌ |
| Breadcrumb | 67% | 0% (visible UI) | −67% |
| JSON-LD types | 1.0 (per-block; many blocks) | 11 (Hot Springs detail) | (we win schema-volume) |
Read: On schema, our detail pages are competitive — Hot Springs Atlas detail ships 11 entity types vs. AllTrails detail's 10. But we're missing the visible signals (images, ratings, longer body copy) that distinguish a real directory entry from a thin generated page.
JSON-LD: who ships what
This is the part of the analysis that surprised me most. I extracted JSON-LD from raw HTML on every page where I could:
| Page | Blocks | Total @type count | Notable types |
|---|---|---|---|
| yelp-detail (Gary Danko) | 4 | 13 unique | Restaurant, AggregateRating, ImageObject×42, Person×72, Review×24, VideoObject×6, Place×42 |
| alltrails-detail (Lands End) | 8 | 10 | LocalBusiness×6, AggregateRating, Review×5, Person×5, BreadcrumbList, WebPage |
| recreation-detail (Upper Pines) | 2 | 11 | Campground, AggregateRating, LocationFeatureSpecification×8, Place×2, Organization×2, Offer, BreadcrumbList |
| tripadvisor-detail | 3 | 10 | LocalBusiness, AggregateRating, WebSite+SearchAction+EntryPoint, BreadcrumbList |
| greatschools-detail | 2 | 7 | School, AggregateRating, Review×5, BreadcrumbList |
| hotsprings-detail (ours) | 2 | 11 | WebSite+Organization, BreadcrumbList, TouristAttraction+Place, GeoCoordinates+PostalAddress, FAQPage+Question×5+Answer×5 |
| waterfall-detail (ours) | 2 | 10 | similar to hot-springs (no Place type) |
| upickatlas-hub (ours) | 2 | 6 | WebSite, Organization, BreadcrumbList, CollectionPage, ItemList, ListItem×4 |
| hotsprings-home / waterfall-home / upickatlas-home (ours) | 1 each | 2 each | only WebSite + Organization |
| auroratlas-home | 1 | 2 | WebSite + Organization |
| healthgrades-hub / opentable-hub | 0 | 0 | Their JS injects schema post-load; Firecrawl missed it |
Three concrete schema wins for us
-
Add
AggregateRatingto detail pages. Every directory in the top tier does this. We can compute an internal "editor score" from data quality (verified address + GPS + photo + review confidence + last-updated freshness) → 4.0–4.8 stars. This makes our pages eligible for SERP star snippets — the highest-impact schema win available. -
Add
CollectionPage+ItemList+BreadcrumbListto every hub page that doesn't have it. UPick Atlas hub already has this composition (6 types). Waterfall Atlas hub and Hot Springs Atlas hub do not — they'd benefit from copying the UPick pattern verbatim. -
For UGC-heavy directories (after we have any reviews) add
Review+Ratingtypes nested inside the LocalBusiness/TouristAttraction. Yelp ships 24 reviews per page in JSON-LD. AllTrails ships 5. We ship 0. This is a "wait until we have content" item — but the schema slot should be ready.
Patterns we should steal from specific top-10 sites
From AllTrails: lead with the rating
AllTrails California hub H2 #1 is literally 4.4(7,067,442 reviews). They put the social proof above everything else on the page. Even without reviews, we could do "Verified by Hot Springs Atlas — 4.5 editorial confidence" as a structured rating element near the H1. It moves the page from "list of links" to "curated authority".
From Booking.com: long-form FAQ + market snapshot on hub pages
Booking's San Francisco hub has H2 FAQs about hotels in San Francisco and Best hotels with breakfast in San Francisco and nearby. They turn one URL into a hub for 6–8 long-tail intents (parking, breakfast, late-night, etc.). For us:
- Waterfall Atlas California hub could have H2s: "Best easy waterfalls in California", "Tallest waterfalls in California", "Year-round waterfalls in California", "FAQs about visiting California waterfalls".
- Hot Springs Atlas California hub: "Free hot springs in California", "Family-friendly hot springs in California", "Wild vs. developed California hot springs", "Best California hot springs by season". Each H2 → 200-300 words → ~2,000-word hub in 30 minutes of Codex per state.
From Recreation.gov: feature-spec lists
Recreation.gov's Campground page ships 8 LocationFeatureSpecification JSON-LD blocks (one per amenity). We display amenity badges on Hot Springs detail pages but don't emit them as schema. Easy win: add amenityFeature array with LocationFeatureSpecification types.
From TripAdvisor: year stamp in detail title
Their detail title is Superstition Mountains (2026) — All You SHOULD Know Before Going. This is the formula our Hot Springs Atlas detail pages already use (Alvord Hot Springs — Day-use fee or pay-per-soak in Fields, OR (2026)). Confirmed: we're aligned with the highest-volume travel directory's title formula. Keep doing this and extend to UPick + Waterfall.
From Yelp: image-rich detail with ImageObject schema
Yelp's Gary Danko page ships 42 ImageObject JSON-LD entries. Even when we don't have 42 photos, we should ship every <img> with ImageObject schema (one per image, with contentUrl, caption, creator). This is what powers Google Image carousel inclusion.
From GreatSchools: visible "Compare and choose" widget
GreatSchools homepage H2 is literally Compare and choose. — they make comparison the homepage's primary CTA. We have a comparison-table opportunity on every state-crop matrix (UPick) and on state hub pages. Concrete patch: add a 3-column comparison table on each state hub ("Best for families | Best for adventure | Best free options").
What we already do better than the giants
It's worth marking these so we don't undo them in optimization passes:
- Year-stamping in titles + H1s — only TripAdvisor detail does this consistently among the top 10. We do it on most matrix and detail pages. Keep.
- Word count on homepages — our homepages average 1,694 words vs. competitor 1,100. We're not under-writing the homepage; we're under-writing the hub.
- Internal link density on homepages — we have 53 vs. their 27. Don't reduce this; it's how Google discovers our long-tail pages.
- Filter UI on homepages — 75% of our homepages have visible filter/sort vs. 0% of theirs (most save filters for the hub). Aligns with our small-site UX strategy of putting power in the user's hands faster.
- JSON-LD on hub pages — UPick Atlas hub has
CollectionPage+ItemListschema. Most competitor hubs (when scraped from raw HTML) don't ship that. Extend this composition to Waterfall Atlas and Hot Springs Atlas hubs.
Ranked patch list — by impact / effort
🟢 Tier 1 — Ship this week (high impact, <2 hours each)
- Hot Springs Atlas + Waterfall Atlas: copy UPick Atlas's hub schema graph (
CollectionPage+ItemList+BreadcrumbList). 30 min, applies to ~100 hub pages each. Impact: HIGH. - Add
AggregateRatingschema to every detail page with internal editorial-confidence score (4.0–4.8). 1–2 hours one-time, applies to all current + future detail pages. Impact: HIGHEST (rich-snippet eligibility). - Add visible "Verified" / "Editor's pick" badges on detail pages that mirror the rating schema. 30 min. Impact: MEDIUM-HIGH (CTR + trust).
- Replace Waterfall + Hot Springs hub pages' empty H2 with 4 sub-sections each ("Best easy / Tallest / Year-round / FAQs about [state]"). 2 hours of Codex per atlas. Impact: HIGHEST (closes the 4,000-word hub gap).
🟡 Tier 2 — Schedule next 2 weeks (high impact, 2–6 hours each)
- Image hotlinks from Wikimedia Commons (CC0/public domain) for at least the home + state hub + first 50 detail pages of each atlas. 4 hours. Impact: HIGHEST visible signal.
- Hub-page word-count expansion to 1,500–2,500 words via sub-section content per state (UPick already has this for matrix pages; do the same for state hubs across Waterfall + Hot Springs). 6 hours of Codex per atlas. Impact: HIGHEST long-tail.
- Add
LocationFeatureSpecificationschema for amenities on Hot Springs + UPick detail pages. 1 hour. Impact: MEDIUM (feature-rich snippets). - Add a comparison table (
<table>) to every hub page ("Best for X | Best for Y | Best free"). 2 hours. Impact: MEDIUM (table-snippet eligibility).
🔴 Tier 3 — Defer (high impact but expensive or premature)
- Real photography per detail page — defer until after image hotlinks prove their lift.
- User reviews / UGC — cold-start; don't simulate.
- Sitelinks search box activation — needs
/searchroute to actually filter; defer to post-traffic.
❌ Don't do
- Don't try to match the giants' image counts on homepages (they're CDN-backed; we're statically generated). 5–10 images is plenty.
- Don't chase year-stamps on
<title>for every page; we already do it on the highest-impact ones. - Don't add FAQ schema for the sake of FAQ schema. Hot Springs Atlas detail already has
FAQPage+5×Q&A, which is competitive. Going further is diminishing returns.
Estimated combined impact of Tier 1 patches
If we ship patches 1–4 next session (~5–6 hours of focused Codex):
- Lighthouse SEO score: +5–10 across all atlases
- Schema rich-snippet eligibility: AggregateRating stars on ~700 detail pages
- Hub page indexability: moves Waterfall + Hot Springs hubs from "thin content risk" to "competitive aggregator" tier
- Time-to-rank for long-tail "[type] in [state]" queries: ~2–3 weeks faster
- Estimated traffic lift over 90 days: +30–60% on hub pages (from current low baseline)
Cost: ~$5–10 in Codex API.
Open questions / things I'd want a human to sanity-check
- Editorial confidence score formula for
AggregateRating— I'd default to: 4.0 base + 0.2 verified address + 0.1 GPS confirmed + 0.1 image present + 0.1 phone/website + 0.1 reviewed in last 12 months. Cap at 4.8 (no perfect 5s). Want sign-off before shipping. - Wikimedia hotlinking risk — bandwidth-throttling exists but is rarely enforced. Acceptable risk for first 1,000 pages; revisit if we see 429s.
- Year-stamp policy — keep aggressive (every matrix + detail page) or pull back (only on time-sensitive pages)? My read: keep aggressive — it's a small-site advantage the giants don't exploit.
Appendix — quick reference for next session
- Aggregated CSV:
/tmp/top10/aggregated.csv(38 rows, all observed metrics) - Per-page Firecrawl JSON:
/tmp/top10/{site}-{home|hub|detail}.json - Full JSON-LD type breakdown:
/tmp/top10/jsonld-summary.json - Schema used for extraction:
/tmp/top10/schema.json