Reverse-Engineering How LLMs Find & Cite Directories
Date: 2026-04-26 17:30 UTC Method: Ran 8 directory-style queries through Grok-4-fast (with web search), captured all cited URLs, analyzed citation patterns. Compared against: Ahrefs DR + backlink data on every cited directory. Cost: Grok ~$0.07, Ahrefs ~150 units.
TL;DR — what we discovered
LLMs cite directories disproportionately when they exist for a niche. Of 134 citations across 8 queries, 80% went to directories or listicle-style aggregators, only 14% to individual businesses, 5% to big aggregators (Yelp/TripAdvisor), and <1% to government sources.
The directory wins the citation game in 2026. This is the data validation we needed for the entire methodology.
But there's a critical nuance: DR is NOT the dominant citation factor. Some DR 1-26 directories were cited heavily (thetrekkingmama.com DR 1, coloradohikesandhops.com DR 26, 49miles.com DR 11). What mattered more was: does this URL match the user's query intent and present structured information?
The 8 queries we ran
| # | Query | Vertical | AOV | Most-cited domain (cite count) | DR |
|---|---|---|---|---|---|
| 1 | "Pumpkin patches in Georgia for fall 2026 with kids" | tourism / family | $30 | atlantaparent.com (8x) | 55 |
| 2 | "5 best US waterfalls in Oregon w/ hike difficulty" | outdoor recreation | $0 | thetrekkingmama.com (5x) | 1.0 |
| 3 | "Rehab center in Texas for alcohol addiction" | YMYL high-AOV | $30K | recovery.com (9x) | 74 |
| 4 | "Hot springs resort romantic getaway Colorado" | tourism premium | $1.5K-12K | coloradohikesandhops.com (4x) | 26 |
| 5 | "3 senior living facilities in Chicago, dementia care" | YMYL high-AOV | $5K-15K/mo | seniorly.com (4x) | 60 |
| 6 | "Dog park in SF, large breeds, fenced" | local recreation | $0 | 49miles.com (7x) | 11 |
| 7 | "Wedding venues Austin TX under $10K, outdoor" | events | $10K | weddingwire.com (21x) | 90 |
| 8 | "Free disc golf courses near Denver, 18+ holes" | hobby specific | $0 | udisc.com (5x) | 76 |
Pattern 1: LLMs cite by query-intent match, not just authority
The DR-1 mom blog (thetrekkingmama.com) was cited 5 times for the Oregon waterfalls query — more than nps.gov, AllTrails, or any major outdoor publication.
Why? Her page was titled "20 Best Oregon Waterfall Hikes" with structured difficulty + accessibility info per fall. Exactly the format the user asked for.
Implication: if you write the page that EXACTLY matches a high-intent query format, LLMs will cite you regardless of your domain authority. This is huge for a new directory at DR 0.
Pattern 2: LLMs prefer specialized verticals over horizontal aggregators
Yelp + TripAdvisor accounted for only 5% of citations. Big aggregators were rarely cited.
udisc.com was cited 5x for disc golf (it's specialized). Yelp was cited 0x for any directory query (it's horizontal).
weddingwire.com was cited 21x for the wedding venue query (specialized). Yelp showed up only 4x there as a fallback.
Implication: Frey's "AI Search Era" thesis (Rule #10) is genuinely correct — niche directories survive AI search; horizontal directories get hurt. The data confirms it.
Pattern 3: The "LLM citation triangle" — three signals that drive citation
Cross-referencing what got cited heavily, three factors emerged:
Signal A — Query phrasing in title/H1 (most important)
Sites whose page titles literally matched the user's query format got cited disproportionately.
- "Best Oregon Waterfall Hikes" → 5 citations
- "Best Pumpkin Patches Near Atlanta" → 8 citations
- "Best Free Disc Golf Courses in Denver" → cited every time
Signal B — Listicle structure with metadata
LLMs visibly prefer pages that present:
- Numbered lists ("1. Name. 2. Name.")
- Per-entry metadata (height, difficulty, AOV, hours)
- Clear "best of" framing
- A specific count ("5 of the best")
Signal C — DR ≥ 60 or DR-irrelevant if Signals A+B are perfect
- High-DR specialty directories dominated YMYL queries (recovery.com DR 74, weddingwire.com DR 90)
- For non-YMYL, even DR 1 wins if Signals A+B are present
Pattern 4: YMYL queries get the strict-DR treatment
For Q3 (rehab) and Q5 (dementia care):
- 100% of cited domains were DR 60+ specialized directories
- No DR <50 sources cited
- No mom blogs cited
- One government source (HHS Texas)
For non-YMYL:
- Mom blogs cited freely (DR 1 is fine)
- Smaller niche directories cited (DR 11-38)
Implication: Stoddard's playbook (rehab directory at DR 72) is uniquely defensible against new entrants because YMYL queries refuse to cite low-DR sources. If we ever build in YMYL niches, we must reach DR 50+ before LLMs will even see us.
Pattern 5: The "first-cited" position has an alpha bias
For each query, the first URL Grok cited (often as [[1]]) was statistically more likely to be a domain that:
- Had the user's query phrase in the page title verbatim
- Was a structured directory or listicle
- Had been around for >2 years
Implication: The page title is the single biggest LLM citation lever. Even more than DR.
How this validates / changes our directory strategy
What this validates
-
✅ Schema.org @graph composition matters — directories with rich schema dominated citations. WeddingWire (DR 90) and udisc.com (DR 76) both have heavy
LocalBusiness+ItemListmarkup. Our investment in this is validated. -
✅ Year-stamped, intent-matching titles work — the "2026 Georgia Pumpkin Patches" formula we shipped on UPick state-crop matrix pages is the right architecture. Pages whose titles match what users ask for win.
-
✅ Niche directories survive AI search — confirmed by data. Horizontal aggregators are losing ground.
-
✅ Hot Springs Atlas thesis is sound — "hot springs Colorado" cited
coloradohikesandhops.com(DR 26) 4x. A specialized DR-26 site beat outsideonline.com (DR 90) on this query. A new specialized hot-springs directory at DR 0 → 30 in 12 months has a real path.
What this changes
-
Our title formula needs sharper user-intent matching. Current UPick titles like "2026 California Pumpkin Patches" need to match more search intents:
- "Best Pumpkin Patches in California"
- "California Pumpkin Patches with Hayrides"
- "Family-Friendly Pumpkin Patches Near LA"
Most useful: have the matrix system generate multiple title variants per page based on common LLM query patterns, and use the highest-volume one as
<title>. -
We should add an "anti-hallucination" content section. Grok cited individual business sites occasionally (5x in pumpkin-patches query, 5x in hot-springs query) when the directory wasn't comprehensive. The fix: include exhaustive entity coverage so LLMs always cite our directory, never the underlying single business.
-
YMYL niches require DR investment time-horizon. Hot Springs, Wedding Venues, Senior Living all need 12-18 months of DR-building before LLMs cite them. UPick + Waterfall (non-YMYL) get LLM citation faster.
-
Mom-blog-style listicles are a gap on our sites. Our state hubs are filter-driven; competitors that win citation are listicle-driven ("5 Best..."). Add a
/best-of/[state]/[crop|type]listicle template that uses the matrix data to render a ranked listicle format. This is Patch 12 from our earlier optimization round, expanded.
Updated playbook implications
New action items added to the directory-builder skill
-
Page title formula expansion (Methodology Rule #12 candidate): every directory page should test multiple title variants and pick the highest-volume LLM-query-shape match. Tools: Grok query simulator + Ahrefs title-tag tracking.
-
The DR threshold table:
- Non-YMYL niches: LLM citation possible at DR 0+ if title matches intent
- YMYL niches: LLM citation requires DR 50+
- Local-recreation niches: DR 11-38 sweet spot, mom-blog-format wins
-
Always include listicle pages alongside directory pages. "Best of" + count + state + intent qualifier wins citations heavily.
-
The 80% citation share: 80% of LLM citations across all 8 queries went to directories or listicles. We're in the right business model. Stop second-guessing this.
Recommendations for our existing sites (in order of leverage)
Immediate (this week, no new code)
- Add
/best-of/[state]/[crop]listicle templates to UPick (mirrors Pattern 1 win) - Add
/best-of/[state]/[type]listicle templates to Waterfall - Use Grok to generate the optimal title for each state hub: query "5 best [crops] in [state]" and use whatever phrasing it returns
- Add explicit count language to titles: "5 Best...", "Top 10...", "12 Verified..."
Short-term (next month)
- Pre-publish "best of" listicles for the 10 highest-volume state combinations on each site
- Build a Grok-monitoring cron that runs the queries we tested and tracks whether our sites get cited (early warning system for SEO health)
Medium-term (next 3 months)
- Hot Springs Atlas: launch with listicle-first information architecture (every state has a "10 Best Hot Springs in [State]" page as flagship)
- Migrate UPick + Waterfall homepages to listicle-led above-the-fold (currently they're search-led)
Cumulative API spend this round
- Grok: $0.07 (8 queries × ~$0.009 each)
- Ahrefs: 14,645 / 25,000 used (58%)
- Semrush: ~14,500 / 50,000 used (29%)
Plenty of budget remaining for further validation.
Single-sentence takeaway
LLMs disproportionately cite specialized directories with intent-matching titles and structured listicle content; DR matters most in YMYL but is irrelevant for non-YMYL if the title matches user query phrasing perfectly — meaning a new directory has a real path to LLM citation in 2026 if it gets the title formula right.