Speakable Schema: Optimizing for Voice AI and Audio Search
The Rise of Voice AI Search
Voice-first AI interactions are growing rapidly. Users speak to Siri, Alexa, Google Assistant, and now ChatGPT's voice mode to get information. When an AI assistant reads an answer aloud, it needs content that sounds natural when spoken — not content designed to be scanned visually on a screen.
This is where Speakable schema comes in. It is a structured data markup that explicitly tells AI systems which parts of your content are suitable for text-to-speech and audio presentation.
What Is Speakable Schema?
Speakable is a Schema.org property that identifies sections within a web page that are best suited for audio playback using text-to-speech (TTS). Originally developed for Google Assistant's news reading feature, Speakable has broader implications for any AI system that delivers content verbally.
The markup tells AI engines: "This specific section of my page is written in a way that sounds good when read aloud."
Why It Matters for GEO
AI assistants with voice capabilities need to select content that:
- Is concise enough to be read in under 30 seconds per section
- Uses natural language (not keyword-stuffed SEO text)
- Makes sense without visual context (no "as shown in the image below")
- Provides complete, self-contained answers
By implementing Speakable schema, you give AI systems explicit permission and guidance to use your content in voice responses — increasing your chances of being cited in audio interactions.
How to Implement Speakable Schema
JSON-LD Implementation
The recommended approach is JSON-LD in your page's <head> section:
{
"@context": "https://schema.org",
"@type": "Article",
"name": "Your Article Title",
"speakable": {
"@type": "SpeakableSpecification",
"cssSelector": [
".article-summary",
".key-takeaway",
"#main-definition"
]
}
}
Alternatively, use XPath to target specific content:
{
"@context": "https://schema.org",
"@type": "Article",
"name": "Your Article Title",
"speakable": {
"@type": "SpeakableSpecification",
"xpath": [
"/html/body/article/p[1]",
"/html/body/article/section[1]/p[1]"
]
}
}
WordPress Implementation
For WordPress sites, you can add Speakable schema through:
- Custom fields: Store speakable sections as meta data and output via your theme's JSON-LD
- GEO plugins: Tools like Arvo GEO can automatically identify and mark speakable content
- Manual markup: Add CSS classes to speakable sections and reference them in your schema
Writing Speakable Content
Not all content works well when spoken aloud. Follow these guidelines for sections you mark as speakable:
Do
- Write in complete sentences that make sense without visual context
- Keep sections under 150 words — about 30-45 seconds of speech
- Use natural conversational language that flows when read aloud
- Front-load the key information in case the audio is interrupted
- Spell out abbreviations the first time they appear (GEO becomes "Generative Engine Optimization")
Avoid
- Visual references: "as shown in the chart above" or "click the button below"
- Complex formatting: tables, bullet points with sub-bullets, nested lists
- URL mentions: "visit example.com/long-path/to-page" sounds terrible in audio
- Parenthetical asides: "(see section 3 for details)" breaks audio flow
- Ambiguous pronouns: "It" and "this" without clear antecedents confuse listeners
Best Practices for Speakable Sections
Identify Your Most Speakable Content
Not every section deserves Speakable markup. Focus on:
- Article summaries: The 2-3 sentence overview of what the page covers
- Key definitions: Clear, concise definitions of terms
- Direct answers: Sections that directly answer a specific question
- Conclusions and takeaways: Summarized advice or findings
- Data points: Single, notable statistics with context
Structure for Audio Consumption
Create dedicated speakable sections within your articles:
<div class="key-takeaway">
Generative Engine Optimization is the practice of making your web content
easily discoverable and citable by AI-powered search engines like ChatGPT,
Perplexity, and Claude. Unlike traditional SEO which targets ranking positions,
GEO focuses on being cited as a trusted source in AI-generated answers.
</div>
This section works perfectly as a voice response because it:
- Provides a complete definition without external context
- Uses natural language flow
- Is concise (under 50 words)
- Contains no visual references or complex formatting
Test Your Speakable Content
Before marking content as speakable, read it aloud. Ask yourself:
- Does it sound natural when spoken?
- Does it make complete sense without seeing the page?
- Is it the right length (15-45 seconds of speech)?
- Would it satisfy a user who asked a voice assistant this question?
Speakable and AI Citation Patterns
Voice AI searches tend to cite fewer sources than text-based AI search. When Perplexity reads an answer aloud, it might reference one or two sources rather than five or six. This makes the competition for voice citations more intense — and Speakable markup more valuable.
Content marked as speakable has an advantage because:
- AI systems can identify it as voice-ready without additional processing
- It signals that the content creator considered audio delivery
- The content within speakable tags tends to be more concise and clear
- It reduces the risk of awkward audio experiences for the AI product
Combining Speakable With Other Schema
Speakable works best when combined with other relevant schema types:
- FAQPage + Speakable: Mark your FAQ answers as speakable for voice assistant queries
- HowTo + Speakable: Identify step summaries that work in audio format
- Article + Speakable: Highlight key findings and conclusions
- LocalBusiness + Speakable: Make business hours and contact info voice-ready
Measuring Speakable Impact
Track these signals to understand whether your Speakable implementation is working:
- Voice assistant referral traffic (if attributable)
- AI crawler activity on pages with Speakable markup versus without
- Rich result appearances for voice-related queries
- Brand mention growth in voice-first platforms
Getting Started Today
Implementing Speakable schema is straightforward:
- Audit your top 20 pages for voice-friendly sections
- Write or rewrite those sections to sound natural when spoken aloud
- Add CSS classes to identify speakable sections
- Implement JSON-LD Speakable markup referencing those classes
- Validate your markup with Schema.org's validator
- Monitor AI crawler behavior on marked pages
Voice AI search is still early, but the infrastructure you build now — clear, speakable content with proper markup — compounds over time. When voice search fully matures, your content will already be positioned for citation.