How AI search actually works (2-minute version)
Think of it like a research assistant. When you ask ChatGPT a question, here's what happens:
Search
It searches the web for relevant pages.
Chunk
It breaks each page into small chunks (~150 words each).
Match
It picks the chunks most relevant to your question.
Cite
It writes an answer and cites the sources.
Key insight
AI cites PASSAGES, not pages. A 3,000-word article with one great paragraph buried at the bottom will lose to a 500-word article that leads with the answer.
The implication is simple: your website needs to be structured for AI to extract answers easily, not just for humans to read.
Review your top pages: is the key answer in the first paragraph or buried below? AI looks at the top third first.
Structure your content for AI
This is the highest-impact, lowest-effort change. Every rule below is explained with its WHY and measured impact.
Lead with the answer
44% of citations44% of citations come from the top third. Put your key message first, then elaborate. No long intros — answer first.
Write in 120-180 word sections
+70% citationsThis is how AI chunks content. One clear topic per section. Result: 70% more citations vs. long text blocks.
Use consistent headings
+40% citationsH1 > H2 > H3, never skip levels. AI uses headings to understand structure. 40% more citations with proper hierarchy.
Add data and sources
+22-40%Statistics (+22%), quotes (+37%), and source links (+40%) all boost visibility. LLMs love verifiable data.
Keep content fresh
3.2x citationsUpdated within 30 days = 3.2x more citations. Show "Last updated" dates prominently.
Use lists and tables
2.5x for tablesListicles = 50% of top citations. Tables = 2.5x citation rate. Structure your data.
This is the single highest-ROI action. Ask your content team to restructure your top 10 pages this month.
Checklist for each page: (1) Answer in first paragraph? (2) Sections under 180 words? (3) Stats cited? (4) Last updated date visible? (5) Proper H1>H2>H3 hierarchy?
Add Schema.org markup
Schema markup is invisible code that tells AI what your content IS — 'this is a FAQ,' 'this is a how-to guide,' 'this is written by John Smith who works at Company X.' It's like adding labels to your content.
Organization
Tell AI who you are. Include links to Wikipedia, LinkedIn, social profiles. This is your digital ID card for LLMs.
FAQPage
+30% citationsDirectly maps to AI Q&A format. 30% citation improvement. Keep answers 40-60 words.
Article / BlogPosting
Establishes authorship and dates. AI trusts content with clear authors.
HowTo
Step-by-step content AI can decompose. Ideal for guides and tutorials.
{
"@context": "https://schema.org",
"@graph": [
{
"@type": "Organization",
"@id": "https://example.com/#org",
"name": "Your Brand", // Your company name
"url": "https://example.com",
"logo": "https://example.com/logo.png",
"sameAs": [
"https://www.linkedin.com/company/your-brand",
"https://twitter.com/yourbrand",
"https://www.wikidata.org/wiki/Q12345" // Link to Wikidata!
],
"foundingDate": "2020-01-01",
"founder": { "@id": "https://example.com/#founder" }
},
{
"@type": "Person",
"@id": "https://example.com/#founder",
"name": "Jane Doe",
"jobTitle": "CEO",
"worksFor": { "@id": "https://example.com/#org" },
"sameAs": [
"https://www.linkedin.com/in/janedoe"
]
},
{
"@type": "WebSite",
"@id": "https://example.com/#website",
"url": "https://example.com",
"publisher": { "@id": "https://example.com/#org" }
}
]
}Ask your dev team: 'Do we have schema markup on our key pages?' If the answer is no or 'I don't know,' this is Priority 1.
Add Organization schema to your homepage this week. Then add FAQPage schema to your top 5 FAQ/knowledge base pages. Validate with Google's Rich Results Test.
Configure robots.txt for AI crawlers
Your robots.txt file tells AI bots what they can and can't access. The key decision: let them INDEX your content (so they cite you) but DON'T let them TRAIN on it (so they can't copy it).
ALLOW (they cite you)
- OAI-SearchBot
- ChatGPT-User
- Claude-SearchBot
- PerplexityBot
- Googlebot
- Bingbot
BLOCK (they copy you)
- GPTBot
- ClaudeBot
- Google-Extended
- Meta-ExternalAgent
- CCBot
- Bytespider
ALWAYS ALLOW
- Googlebot
- Bingbot
Traditional search engines also feed AI answers.
# AI Search bots (ALLOW — they cite you) User-agent: OAI-SearchBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: Claude-SearchBot Allow: / User-agent: PerplexityBot Allow: / # AI Training bots (BLOCK — they copy you) User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: Meta-ExternalAgent Disallow: / User-agent: CCBot Disallow: / User-agent: Bytespider Disallow: / # Traditional search (ALWAYS ALLOW) User-agent: Googlebot Allow: / User-agent: Bingbot Allow: / Sitemap: https://www.yoursite.com/sitemap.xml
Note:Only 19% of sites have configured robots.txt for AI. This is a 30-minute task.
Ask your tech team to review robots.txt. Only 19% of sites have configured it for AI. This is a 30-minute task.
Copy the robots.txt template from this guide. Deploy it today. Test with Google's robots.txt tester.
Build your entity presence
AI trusts brands it can verify. Wikipedia is ChatGPT's #1 source (7.8% of all citations). If your brand has a Wikipedia page and Wikidata entry, AI treats you as a 'verified fact.' Without them, you're an 'unverified claim.'
of ChatGPT's total citations come from Wikipedia — its most-cited source. Your Wikipedia page is your most valuable asset.
Create/update your Wikidata entry
Free, 1-2 hours. Fill in: instance of, official website, founder, inception, industry, headquarters.
Create/improve Wikipedia page
If eligible (Wikipedia notability guidelines). The biggest AI visibility amplifier.
Appear on 4+ platforms
Brands on 4+ platforms are 2.8x more likely to be cited. G2, Capterra, Reddit, industry directories.
Get mentioned in authoritative publications
85% of AI mentions come from third-party pages. Plan 2-3 PR placements per quarter.
85% of AI brand mentions come from third-party pages. Invest in digital PR — it's now the highest-leverage AEO activity.
Create a Wikidata entry if you don't have one. List your brand on G2, Capterra, or relevant industry directories. Plan 2-3 PR placements per quarter in outlets AI trusts.
llms.txt
A new file format that describes your site to AI. 844,000 sites use it. No proof it actually works yet, but takes 1-2 hours and zero risk.
Reality Check
No major AI company has confirmed using llms.txt during crawling. Implement it if you have time, but do priorities 1-4 first.
Create a llms.txt file at yoursite.com/llms.txt if you have time. Not a priority — do priorities 1-4 first.
The complete checklist
Restructure top 10 pages: answer-first, 120-180 word sections, proper headings
Add stats, quotes, and source links to key content
Add Organization schema to homepage
Add FAQPage schema to top 5 FAQ pages
Configure robots.txt for AI crawlers
Create/update Wikidata entry
List brand on G2, Capterra, or industry directories
Add Article/BlogPosting schema to blog posts
Show 'Last updated' dates on all key pages
Create llms.txt (optional)
Key Takeaways
- 1
AI cites passages, not pages. Structure content so the answer is in the first paragraph, in sections of 120-180 words.
- 2
Content restructuring is the highest-ROI action. Start with your top 10 pages this week.
- 3
Schema.org markup (Organization + FAQPage) gives AI the context it needs to trust and cite you.
- 4
Configure robots.txt in 30 minutes: allow search bots, block training bots.
- 5
Wikipedia + Wikidata = 'verified fact' status for AI. Your PR team is your most valuable AEO asset.