Panda Recovery: A Guide to Recovering Google's Panda Update

Over the past two years, I’ve worked on about five projects where a company has come to me or Venture Harbour after being hit either by Google’s Panda or Penguin update. I also had one of my own side project websites get hit.

Out of these 5-6 cases, three have recovered. One is still in the process of re-inclusion, and the other two were fundamental business model problems that the client wouldn’t change (making the site ‘panda friendly’ would result in the business turning over a loss).

I wanted to create this post as a guide to anyone who has had their site hit by one of Google’s major algorithm updates and is seeking ideas on how they might be able to recover.

Step #1: Confirm which algorithm updates affected you

When conducting a Panda / Penguin site audit, the first thing I usually try to confirm is which updates affected the site’s traffic and by what magnitude (Moz.com’s algorithm history tool is very useful here). You’ll end up with something that looks like this:

More often than not, this exercise reveals that the situation is not as black and white as “we got hit by panda”, but in actual fact a site may have been hit by one update, but benefited from a data refresh, and then got hit again by another update. Knowing which specific update hit your site is incredibly useful for pinpointing the root cause of the penalty.

For the purpose of this blog post, I’m going to assume that you were hit by the Panda updates. If you were hit by Penguin, I’d recommend reading Jonnie’s post about recovering from penguin, and then looking at the “manual actions” tab in Google Webmaster Tools to identify what’s causing the problem. You can then begin removing and disavowing anything that looks suspicious.

Step 2: Audit the site on every possible cause

When it comes to creating a panda recovery plan, I think it helps to take a birds-eye view of the situation, and assess everything that could be causing the penalty. This prevents you from jumping to conclusions like “we have a high bounce rate, we must fix this immediately!” when in actual fact, the penalty may be related to duplicate content, page speed, or something else. Here’s a pretty extensive list of technical factors which you’ll want to assess as a part of your panda audit.

Ad-content ratio
Duplicate Content (internal)
Duplicate Content (external)
Thin Content
Page Speed
Clean Design
Canonical Redirects
Grammar & Spelling
Unique Meta Data
Clear Site Architecture
Cross-browser compatibility
Broken Links
Title Tag Over-Optimisation
Rich Snippet Implementation

Many of these are derived from Google’s Panda recovery guidelines, which advise that a website must adhere to the follow guidelines to be considered high quality.

Would you trust the information presented in this article?
Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature?
Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?
Would you be comfortable giving your credit card information to this site?
Does this article have spelling, stylistic, or factual errors?
Are the topics driven by genuine interests of readers of the site, or does the site generate content by attempting to guess what might rank well in search engines?
Does the article provide original content or information, original reporting, original research, or original analysis?
Does the page provide substantial value when compared to other pages in search results?
How much quality control is done on content?
Does the article describe both sides of a story?
Is the site a recognized authority on its topic?
Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don’t get as much attention or care?
Was the article edited well, or does it appear sloppy or hastily produced?
For a health related query, would you trust information from this site?
Would you recognize this site as an authoritative source when mentioned by name?
Does this article provide a complete or comprehensive description of the topic?
Does this article contain insightful analysis or interesting information that is beyond obvious?
Is this the sort of page you’d want to bookmark, share with a friend, or recommend?
Does this article have an excessive amount of ads that distract from or interfere with the main content?
Would you expect to see this article in a printed magazine, encyclopedia or book?
Are the articles short, unsubstantial, or otherwise lacking in helpful specifics?
Are the pages produced with great care and attention to detail vs. less attention to detail?
Would users complain when they see pages from this site?

I usually like to sum up this audit with a prioritised one-pager of actions. It can be tough knowing which fixes are more important than others, so i’m going to go through some of the things that I find typically appear at the top of the list.

Thin / Low Quality Content

Usually, the most justified reason for a Google Panda slap is if you’ve got a significant amount of thin content sitting on your site. These are pages that add little value to the user – or do not contain enough information. Below is a real example of this. When I searched for a “2 bedroom house for sale in East Sussex” I scrolled down to find this result ranking on page 9-10 of Google.

It clearly offers absolutely no value to the user, and therefore shouldn’t rank very well for that specific search term. To see whether this might be affecting your site, do a site search (type ‘site:www.yourdomain.com’ into Google) and see how many low quality pages come up. If none, great – onto the next thing!

Duplicate Content

Similarly to above, duplicate content adds very little value to the user, and is a major culprit for causing panda penalties. Broadly speaking, there are two kinds of duplicate content – external and internal.

External duplicate content is where multiple domains are showing basically the same content, and therefore Google will typically try to identify which website is the original source, or adds the most value. External duplicate content looks like this.

In my experience, the best way to combat this type of duplicate content is to hire copywriters to write 300-500 unique words per page for you (if you have a low volume of duplicate pages). As is often the case, sites that get hit by external duplicate content tend to be massive and therefore this solution doesn’t scale affordably. In those instances, I’d recommend getting smart with your data. By pulling data from relevant APIs and databases you may be able to generate a unique and valuable paragraph on every page that will render your result as the most unique. It’s not ideal, but it may help.

Internal duplicate content comes in many shapes and sizes. First of all there is URL-based duplicate content. This is where different URLs show the same content. First off there is the URL capitalisation issue. All of the following URLs on zildjian.com show the exact same content:

/Products/dRumset-Cymbals
/Products/dRuMsEt-CyMbaLs
/Products/DRUMSET-CYMBALS
/PRODUCTS/dRumset-Cymbals

As far as Google is concerned, if each of these URLs gets linked to, they can each be indexed and considered as different pages showing the same content i.e. duplicate content. Similarly, when a URL does not have proper canonicalisation between the www-version and non-www version of the website’s pages, this can also create similar issues.

https://yoursite.com should redirect to https://www.yoursite.com (or vis versa)

While all of these issues should be resolved using the correct server header responses, it’s also recommended that you have re=”canonical” tags implemented to ensure that Google knows which page they should be indexing whenever there is any confusion.

Ad-Content Ratio

When clients ask how much advertising they can get away with before they’re in panda territory, I usually advise to keep ads to a minimum until you’ve recovered. Long banner ads above the fold are an absolute no-no. To answer the question visually, this is ideal:

This is on the edge:

This is too much:

Unfortunately, if you want out of panda, you’re probably going to have to ditch the ads for the time being until the site recovers.

Page Loading Speed

Google have made it very clear that poor page loading time can hinder a website’s rankings. If that weren’t enough, there are plenty of case studies kicking around that demonstrate how page speed is linked to your conversion rate.

I recommend using Pingdom and GTMetrix to diagnose your site’s page loading speed. The general rule of thumb is to keep page loading time below 2 seconds. One thing to bear in mind when optimising page speed is that external widgets can be a pain in the backside. If you’re using lots of embedded widgets (email subscription forms, social plugins, content sliders etc.), these can often be the culprits slowing your site down. Keep the number of external DNS requests to a minimum where possible.

Clean design

Clean design is quite a subjective topic – one man’s trash is another man’s treasure. In general, avoid too many contrasting colours, make sure there’s a lot of ‘white space’, and make it easy for users to navigate. Your site needs to look trustworthy and professional, not cheap and tacky. Which one of these websites would you trust with your credit card details more?

Over optimisation

When it comes to over-optimisation, I think the only thing you can really go too far with (before getting into the realms of black hat / keyword stuffing) is over-shooting with your title tags.

I don’t think title tag over optimisation is something that you’d ever be penalised for in isolation. However, if you’re already suffering from another major signal, and you’re also found to be stuffing your titles with every key term under the sun, it’s unlikely to help you in any way. In the screenshot below, I’d consider the top result okay, and the second result as over-doing it.

Clear Site Architecture

There are a handful of reasons why you’ll want to fix up your site architecture following a panda hit. First of all, a clear site structure will improve your indexation – enabling Google to spot your changes and update their index less time. It will also help your users navigate the site better, which should be noticeable in the behavioural factors (reduced bounce rate, longer time on site etc).

The rules of thumb when it comes to site architecture are:

No page should have more than 100 links (internal & external combined)
No page should be more than 3 clicks from the homepage (i.e. your site’s ‘hierarchy should be as flat as possible)
Navigation should be clear and intuitive for the user

You’ll also want to ensure that your site isn’t riddled with broken links and 404 pages. This not only makes navigating the site a pain for the user, but it also hinders indexation. To check for broken links and 404s, I’d recommend running a Screaming Frog or Xenu crawl of the site, and checking your Webmaster Tools ‘crawl errors’ section.

XML & HTML Sitemaps

To improve your indexation, it’s recommended that you submit a HTML and XML sitemap to Google in Webmaster Tools. One tip that I’ve used in the past to identify which particular sections of a site are being hit by a Panda update, is to put each of your website’s page categories into a separate sitemap, to see the indexation ratios of each page category.

For example:

Swimwear Sitemap – 2,492 pages submitted / 3,120 pages indexed
Sunglasses Sitemap – 8,429 pages submitted / 1,231 pages indexed
Women’s Shoes Sitemap – 1,412 pages submitted / 1,313 pages indexed
Blog Sitemap – 724 pages submitted / 701 pages indexed

By segmenting our sitemaps into four category specific ones, it’s clear that the sunglasses section is what’s causing us problems. While panda hits are typically site-wide, they do tend to hit certain sections of a site harder than others. This is a good way of confirming the epicentre of the problem.

Grammar & Spelling

I think this is pretty self-explanatory. If you’re worried your site may have poor spelling and grammar (a typical issue with user-generated content sites), it’s worth using Grammarly to highlight the all of the suspect issues. Alternatively, if there’s a link between your thin / low quality content and poor grammar, you may be better off culling those pages off or combining them into a more comprehensive and well written piece of content.

Behavioural Factors

It’s been suggested that having an unusually low click through rate from search results is a probable factor taken into account by the post-panda algorithm. This once again highlights the importance of having compelling title tags and descriptions, but also brings up the opportunity for using rich snippets to increase your CTR.

Which one of these is more compelling to click on?

There are several different options for marking your content up like this, but i’d recommend using Schema.org markup, as it’s probably the most universal and comprehensive option. You can also connect your page with your Google+ publisher and author profile, to show your staff’s faces alongside search results.

Step 3: Fix everything and wait!

Once you’re absolutely sure that there is nothing more that you can do to improve the website, it’s time to sit back and wait. In the past, I’ve noticed that when you’re making these changes you will see slight increases in traffic and site performance, but nothing significant. It’s not until Google re-run the algorithm in a data refresh that you leap up to pre-panda traffic levels. The last time this was refreshed was around July 18th.

Conclusion

I wrote this very long article because I know how it feels to have a site hit by panda; it sucks. I hope this has sparked a few actionable ideas that might help you to get things back on track. Of course, if you have any questions feel free to drop me a tweet, send me an email, or give our office a call. I’d be more than happy to help.

Good luck!
Marcus

Panda Recovery: A Guide to Recovering Google’s Panda Update

Step #1: Confirm which algorithm updates affected you