Semantic SEO is the nascent art of optimizing web sites and other web-based resources for semantic search. But, strictly speaking, it’s unnecessary to speak of “semantic SEO” or “semantic search” because the reality of contemporary search engines have made the qualifier redundant.
Semantic web technologies are now intrinsic to the way modern search engines work, and organic search marketing strategies need to address this reality.
This is a version of a talk I gave at SMX East in New York City on 2 October, 2013, under the slightly different title “Strategic Semantic SEO”.
I think that the subject Mike, Jon and I will be addressing today is a vitally important one for SEO. In fact, I think that the changes wrought by semantic web technologies on the way search engines operate signal a turning point in the way in which SEO is practiced.
Before I start to explore the nature of this change, a quick word about me. I’ve been a student of semiotics, a librarian, a web designer and an SEO. I don’t know if that background that makes me specially qualified to talk about the intersection of search marketing and the semantic web, but it’s certainly been what’s led to my interest in this field.
And it’s a field that’s exploding! From the Google Knowledge Graph to Bing Snapshots to Google Hummingbird, all of a sudden semantic search is popping up all over the place, and people are taking notice because, well, it’s hard to ignore how search has changed.
How search, I think, has fundamentally changed.
What is this nature of this seismic shift?
Fundamentally it is a shift from strings to things.
From keywords to entities. From the words that are used to describe things to the to the thing being described.
As a catchphrase used to promote the Knowledge Graph “strings to things” is a minor marketing triumph. But as actionable information for SEO the phrase is pretty useless, at least on its own.
Not that SEOs have been idle in their efforts to optimize for semantic search.
In particular, search marketers have been quick to embrace schema.org and Google authorship.
These have been relatively easy sells for search marketers, because the reward is obvious and demonstrable: get rich snippets, get a higher CTR from the SERPs, make more money.
These are fine tactics, and they produce good results, but they’ve mostly been carried out in a strategic void.
As search becomes ever more semantic, marketers need to understand the context of these tactics, so they can develop strategies for semantic SEO, and from there develop effective and innovative tactics of their own.
That’s what I hope to help provide today – context, and some strategies that emerge from better understanding that context.
And that context is all about the semantic web technologies without which Google and Bing wouldn’t look remotely like they do today.
But don’t be alarmed. I don’t think semantic web stuff is anywhere near as difficult as people make out, and in the interest of putting my money where my mouth is I’m going to talk about just two really simple principles of the semantic web and, after that, provide you with a three word definition of semantic SEO that, unlike “strings to things,” will be actionable in buckets if I’m successful in explaining it to you.
So what about those strings and things?
The first semantic web technology I want to talk about has to do very much with strings, not things, so a quick word on the differences between the two.
In search engine and semantic parlance those “things” are called “entities.”
Entities are different than keywords. They’re what keywords are used to identify.
Keywords themselves are imprecise.
Different keywords can be used to refer to the same entity.
“Dean Martin,” “Jerry Lewis’ sidekick” and “the hardest-drinking member of the Rat Pack” all refer to the same named personal entity.
And the same keywords can be used to refer to different entities. I used to pass through Paris from time to time as a kid – but not the Paris that probably comes to mind when you hear the word.
There are many types of entities. “Dean Martin” is an named personal entity. “Paris” is a geographic entity. But entities need not be proper nouns: concepts like “cat” and “desk” are also entities – topical entities.
So how – now finally arriving at the first of those two simple principles I want to discuss – are entities handled in the world of semantic search?
In semantic web applications each entity is assigned a unique identifier.
Unique identifiers allow computers to talk about things: a unique identifier represents the actual thing that a word is talking about. Not a keyword, but the meaning underlying a keyword.
This is a critical distinction, because in the keyword universe there’s no canonical “aubergine” or “eggplant” word that can be used to reliably and unambiguously refer to the concept of that particular vegetable, but in the entity universe – the formal universe of the semantic web – there is.
In the semantic web world those unique identifiers tend to be a URL – a URI if you want to get all fancy and semantic-webby – like the Wikipedia and IMDb addresses on the slide.
And URLs make awesome identifiers for a whole whack of reasons, including that fact that they’re readily accessible on the web, and that you can provide useful information about the thing at that URL.
What sort of information? That brings us to the other semantic web fundamental that I want to talk about.
The semantic web has a standard for describing things, a description framework that’s based on a triple.
As the name suggests a triple is a three part statement about something.
A triple is composed of a subject, a predicate and an object.
The subject is what’s being described – in the first example Mr. Dean Martin.
The predicate states what thing about the subject is being described – that would be his height.
The object is the value – which can be a piece of text or a number – of the thing about the subject described by the predicate. Here that’s that value of Dean Martin’s height – 5’10″.
This framework enables you to describe pretty much anything about any entity in a format that computers can readily understand. That format is the structure that’s referred to in the phrase “structured data.”
And if you’ve ever marked up HTML using microformats, or schema.org, or Open Graph, you’ve been using triples.
When you publish the schema.org code above you’re saying to Google, very unambiguously, “this product has the name ‘Acme 8 Gigabyte USB Drive’.”
And, this in, turn allows you to say other very unambiguous things about the named USB drive, like it has a price of ten bucks, or that 32 people have reviewed it, or that it’s blue.
And when put that description framework together with unique URL identifiers, it all starts to get terribly exciting.
No, really!
Because when you’re able to figure out what things are and how to find them – unique identifiers – and understand information provided to you about those things – the description framework – you’re able to make all sorts of meaningful connections between all sorts of things.
Take a look at these Knowledge Graph results for the query “GDP of France.” Aside from the desired answer, you’ll see that the results are awash in other entities and information about them.
Among the facts displayed you’ll find things like the population of France and the GDP of the UK. Why is Google displaying this information?
It knows through its query logs that people who searched for “GDP of France” also searched for these other figures. But it doesn’t know this by simply adding up the occurrences of keywords and keyword phrases of queries executed in the same session.
It knows this because it has used those keywords and the context of the queries to extract and disambiguate entities (to a unique identifier), and to store statements about them (as triples). Results like this – or, say, the comparison feature of Google Hummingbird – simply wouldn’t be possible without the technologies I’ve described.
This is the semantic web at work, and it’s the new face of search.
It changes web pages from isolated islands, to islands joined by billions of bridges. It’s a search environment that doesn’t only try to provide answers about things, but about the connections between things. And it’s the environment for which search marketers require an optimization strategy.
And with that we’re back to strings and things.
SEO strategy to date has been focused on keywords – strings describing things.
While keywords will continue to play a central role in search – precisely because they do describe things – strategies developed for keywords alone, for strings, are inadequate for the dynamic world of things.
That entities are important for semantic SEO is obvious, but simply replacing “keywords” with “entities” as your optimization target isn’t particularly helpful, and it doesn’t address what makes semantic search so powerful.
That power is the ability to understand what things are and how they’re connected – and it’s those relationships you want your web page, or video, or email, or tweet, or pin, or picture, or post to play a role in.
You want your site to make an appearance just at the moment Google connects the dots for a searcher.
You need your search engine optimization strategy to include not just nouns, but verbs.
Semantic SEO is not about optimizing for strings, or for things, but for the connections between things.
Semantic SEO is optimizing for relationships.
The relationships between entities facilitated by the ability to uniquely and unambiguously identify them, and to provide unambiguous data about them.
And if you’re successful in this, your presence in search will be extended, and you’ll be connected to searchers looking for very specific things. You’ll appear not just for “blender,” but for “blender recommendations,” “good blenders under $ 200,” and “blender under 18 inches tall,” along with implicit queries that the search engines are increasingly able to work out from the query context and information about the user, like “blenders recommended by my friends” or “machine for crushed ice margaritas” or “compare blenders and juicers.”
As semantic SEO is rooted in the world of things, a logical starting point for semantic SEO strategy is the identification of things, and in particular the things found on your website.
There are powerful tools that you can use – like entity extraction APIs – to identify the entities present in your content. Many of these APIs, in fact, lean on the same resources, like Wikipedia and Freebase, used by the Google Knowledge Graph or Bing’s Snapshots.
But identifying entities is not unlike the tried-and-true task of identifying keywords to target in search, and a lot of the techniques and tools used in keyword research can be applied to the task – though with the critical difference that entities are actual things that keywords are used to describe.
Just as identifying entities is not unlike keyword research, entity disambiguation is not unlike hunting down and consolidating pages that cannibalize each other – keyword cannibalization – and it isn’t conceptually dissimilar to specifying the canonical version of a URL.
However, a site free of pages that cannibalize keywords may have several pages that refer to the same entity – different strings that point to the same thing. In the age of semantic search, using multiple pages to cover off synonyms referring to the same underlying thing is exactly the wrong approach.
Another important approach is to start thinking of your content – or rather the data that resides in that content – in the same way that a search engine does.
With your entities identified you can then work out the properties associated with them, the types of values you’d expect to see for those properties and – most importantly – the properties and values that are shared between entities.
You may or may not end up creating triples of your own – like marking up code with schema.org – but understand that Google and Bing are going to use triples in storing your stuff and processing queries whether or not the data on your page is structured or not.
And approaching your content from this vantage point will help you immensely for all sorts of tasks, from query targeting to site architecture.
I have no time to go into this in detail, but I think this sort of data organization is the keyword analysis of the future – and, indeed, keyword analysis is a crucial tool for organizing data in this fashion.
Of course, a primary means of ensuring that search engines unambiguously understand your entities is to formally declare them and provide information about them.
And the most obvious way of doing this is with structured data markup. This includes marking up existing code with schema.org (using microdata or RDFa), microformats, and Open Graph meta tags.
If a particular type of entity is important to your business, but it isn’t a part of any readily usable schema, find a way – any way – of declaring those entities and their properties. Leverage an existing structured vocabulary or, better yet, extend schema.org and work at getting the added schema or schemas added to the vocabulary.
Did you know that there is no schema available for video games? I know it’s now a mere $ 15 billion dollar industry in the US, but I nonetheless think that a well-thought out extension that supports the markup of video games would be favorably received.
But is it worthwhile marking up entities for which search result rich snippets aren’t currently generated?
In a word, yes.
Search engines are going out of their way to get webmasters to feed them structured data, which suggests that they find it useful for reasons other than producing rich snippets.
Where’s the rich snippet generated by the “musicBy” property for schema.org/TVSeries? Where’s the rich snippet when you tell Google’s about a restaurant’s cuisine with the Data Highlighter? Is Google ignoring this information?
No, it’s using the data to get a better understanding of the resources being described. And while the promise of rich snippets continues to be the carrot dangled in front of webmasters to encourage the use of structured data markup, ultimately this markup – in the words of the Data Highlighter – helps search engines “understand your site’s data.”
Finally, if – as I’ve argued – semantic SEO is about optimizing for relationships, then you need to know how things are connected across your site or sites.
Fortunately, the mechanism for exposing the relationships that exist between things on the web is not a mysterious one: it’s the hyperlink.
Structured data provides a method of explicitly declaring relationships between things, but for any type of resource a search engine won’t connect the dots when there’s nothing connecting them, so you need to ensure that your content is sensibly linked.
Take a product page on an ecommerce site.
Is it linked to similar types of items? To products that belong to that same brand? To an upper-level page that represents the brand on that domain? Does a company blog link to this same page when discussing that brand? Are share buttons on that page connected to the verified accounts of the company? And so on.
While it is necessary it is not, however, sufficient to identify, provide information about and make connections between things on and beyond your pages. For search engines, you must also demonstrate the data you’re providing is trustworthy.
Keywords – strings – aren’t judged by the search engines on data quality, since they’re only indirectly related to data. But when semantically declared entities are in play it’s all about data – after all, it’s not called structured data for nothing.
So while the search engines can judge how relevant a resource like, say, a web page or video might be for a particular keyword by looking at the keyword universe of that resource, for semantic search they’re also concerned about the veracity of data that’s been offered.
How do you demonstrate to the search engines that your data is trustworthy?
Certainly use verification methods when they’re available, and as they become available.
What makes Google Authorship perhaps the killer search application is Google+. While I’m sure Google has every hope that Google+ will evolve into everyone’s favorite social network, it would have enormous value to Google if it contained exactly zero posts, photos and videos by zero contributors. It is a verified identity network that allows Google to disambiguate individuals, businesses and other corporate entities, and connect all of these to websites and website pages.
From a data point of view what a byline says is, “this article was written by so-and-so.” When that byline is linked to a verified identity, Google knows exactly who so-and-so is, including, possibly, what people, organizations, social networks, websites and topics to which they have a connection.
Bing Tags, Twitter Cards and Pinterest Rich Pins are all similar methods of verifying identities, and in turn help search engines and other data consumers see your data as more trustworthy.
In addition to verifying data – and especially in the absence of verification methods – you should ensure that your data is consistent between sources, and even go out of your way to demonstrate that data fidelity.
In an ecommerce environment, this means that you should provide the same product information displayed on your site, encoded on the structured data on your site, listed in your search engine product feeds, and anywhere else you might display it – like Facebook, Twitter or Pinterest.
Google Shopping now requires unique product identifiers in merchant feeds. Why? To “continue improving data quality on Google Shopping.” And Bing, in a move that explicitly demonstrates the principle of data fidelity, has now started offering “Rich Captions” that display product price and availability information if – and only if – the information displayed on the merchant’s site is identical to the information provided to Bing in a Product Ads feed. Tied together, of course, by a unique identifier, the product URL.
If these are all good strategies for semantic SEO, what are some expected outcomes of employing them?
First, you should see improved search visibility in the form of “rich snippets.” I use the phrase “rich snippets” in quotes because I mean any sort of enhanced search result, call-out, answer box, vertical and anything else that looks like or is related to Google’s Knowledge Graph or Hummingbird, or Bing’s Snapshots.
Pete Myers of Moz recently identified 85 – count ‘em, 85 – different type of “rich SERPs,” and it’s likely we’ll see even more diversity with time. The era of 10 blue links is well and truly dead, and semantic search killed it.
The less readily visible outcome of effective semantic SEO – and, I think in the long run, the more important one – is that the search engines will come to understand your content much better. You’ll “rank” better in the sense that you’ll be better associated with the entities referenced by your content – making timely appearances in the SERPs as the search engines make connections on behalf of their users.
For both of these outcomes, measuring success is – alas – currently problematic.
Reporting on organic search success has, until recently, focused on keywords. Any efforts to classify traffic by the things referenced – as opposed to strings referencing them – require a lot of manual heavy lifting, and there’s virtually no way of reliably tracing back a click to an enhanced search result, let alone the type of the answer box or vertical or rich snippet where that result appeared.
And even when it comes to strings, semantic search is rendering keyword data less and less reliable because it facilitates information discovery.
But I no longer need to walk you through this slide showing how keyword data is being muddied by semantic search – based largely on an excellent and prescient presentation titled “Breaking Up With Your Keyword Data” by our Q&A coordinator Annie Cushing – because in the two or so weeks since I created this deck Google has announced its intention to break up with all of us.
At a general level, successful semantic SEO should result in increased traffic from search, insofar as your content supports it.
But even if that’s not the case, you should expect the quality of search traffic to improve because the search engines are better matching the things present in user queries to the things present on your site.
So you’d expect to see higher conversion rates, fewer bounces, increased engagement and more return visits for search-derived traffic.
I’m hopeful that the coming (not provided) apocalypse will stimulate the development of reporting tools and techniques, but its likely that producing metrics for semantic search will remain a challenge for the foreseeable future.
To conclude, semantic search is all about search engines connecting users with data. Make those connections your targets, and let the search engines be your matchmakers.
Semantic SEO: Making the Shift from Strings to Things