With this in mind, duplicated content is one of the (many) things at the front our minds right now as Search Marketers. Besides some of the inherent issues that some CMS’s can cause due to complicated URLs, filters, facets and all the rest, one for the most common issues and questions about duplicate content is internationalisation. This is what I wanted to focus on in this post.
French, English, German, Russian
Google generally isn’t fussed about duplicate content in multiple languages; they understand that this isn’t always possible providing the content is targeted at different countries. However, it’s not always that easy, depending on the website you’re trying to achieve this international visibility.
The ultimate question is “what about English content for the UK and English content for the States?” Sure both sides of the pond have their vocabulary differences; Mom/Mum, labor/labour to name a few. These language differences in spelling provide a strong indication to Google that specific content is targeted at specific countries, but it doesn’t end there.
Sub Folders and Sub Domains
Ideally, this is the way to go about things. Divide the content up into country specific folders or subdomains. That way, there is a clear indication of which is which and you can also control geographic targeting through Google’s Webmaster Tools. But, again, that’s not always possible either.
If you can’t implement anything
I recently had a situation where we needed to implement changes across a site to demonstrate that content was divided between the UK and US. Unfortunately, due to the CMS we were unable to implement the usual methods such as <head> changes using hreflang tags, sub folders, addresses in footers and everything in between. So we needed a solution that by passed that completely but still allowed us to show search engines that particular content was meant for particular geographies.
In March 2012 Google introduced a new way of being able to differentiate content geographically. This is through the use of the sitemap.xml file. The method employees the use of the rel=”alternate” and hreflang=”x” annotations, but within the XML file itself.
Let’s say we have a website called www.example.com (original I know) that has three pages targeting two countries: www.example.com/english.html and www.example.com/usa.html. Each has the same content, with a few subtle differences. You can tell Google using the sitemap.xml the equivalent pages in each country using the following syntax:
<url> <loc>http://www.example.com/english.html</loc> <xhtml:link rel=”alternate” hreflang=”en-us” href=”http://www.example.com/usa.html” /> </url>
The part above says “hey, this is content for English speakers in the UK, but this other URL is for English speakers in the US”
Then for the main USA page entry, you use the syntax:
<url><loc>http://www.example.com/usa.html</loc> <xhtml:link rel=”alternate” hreflang=”en-gb” href=”http://www.example.com/english.html” /> </url>
The above says “this URL has content for English speakers in the US, but this alternative URL has content for English speakers in the UK”.
The same goes for content in other countries. For example, you could target German speakers in Switzerland using de-ch. Or target English speaking users in Australia using en-au. Cool stuff.
For a site with hundreds or thousands or even millions of multilingual pages that are crying out for a solution like this, automating the process would be the answer. However, I’m not currently aware of a solution that employs this – I would love to hear from anyone who does know of a sitemap.xml generator that accurately provides this solution.
Hope you find the above useful!