Beginner’s Guide to Preventing Blog Content Scraping in WordPress

If you want to write original content day in and day out, you already are aware of the fact that your posts will end up on bunch of SPAM sites within a few days sometimes even few minutes. There are some users even noticed that the site with stolen content outranked the original post. For the website owner it is very frustrating to see that someone is stealing your content without permission, monetizing it, outranking you in SERPs, and stealing your audience. Now a days Content Scraping is a huge problem because it is so easy for someone to steal your content. In this article, we will cover what is blog content scraping, how to catch content scrapers, how to deal with content scrapers, how you can reduce and prevent content scraping, how to take advantage of content scraping, how to make money from content scrapers, and is content scraping ever good?

What is Blog Content Scraping?

Blog content scraping performed with scripts that extract content from numerous sources and pulls it into one site. Anyone can install a WordPress site because it is very easy. Now put a free or commercial theme, and install a few plugins that will go and scrape content from selected blogs, so it can be published on their site.

Why are they Stealing my Content?

Users have noticed that their content is being stolen. Naturally have questions on their mind why are they stealing my content? The simple answer is because you are AWESOME. The truth is that these content scrapers have ulterior motives. There are just few reasons why someone would scrape your content is given below:

  • Affiliate commission – There are some dirty affiliate marketers who wants to exploit the system to make few extra bucks. They will use your content and other’s content to bring traffic to their site through search engine. These sites are usually targeted towards a specific niche, so they have related products that they are promoting.
  • Lead Generation – We see lawyers and realtors doing this generally. They want to seem like industry leaders in their small communities. They do not have the bandwidth to produce quality content, so they go out and scrape content from other sources. Sometimes, they are not even aware of this because they are paying some scumbag $30/month to add content and help them get better SEO. We have encountered quite a few of these in the past.
  • Advertising Revenue – There are some folks who just want to create a “hub” of knowledge. A one-stop-shop for users in a specific niche. If I had a penny for every time someone has done this with our content, then we would have a few hundred pennies. Often we notice that our site content is being scraped. The scraper always replies, I was doing this for the good of the community. Except the site is plastered with ads.

These are also a few reasons why someone would steal your content.

How to Catch Content Scrapers?

Catching content scrapers is not a easier task and it can take up a lot of time. There are few ways that you can utilize to catch content scrapers.

Search Google with Your Post Titles

In this method it probably not worth it especially if you are writing about a very popular topic.

Trackbacks

If you add links which is internally in your posts, you will notice a trackback if a site steals your content. In this way the scraper will tell you that they are scraping your content. If you are using Akismet, then a lot of these trackbacks will show up in the SPAM folder. Again, this will only work if you have internal links in your posts.

Webmaster Tools

If you use google webmaster tools, then you are probably aware of the Links to your site page. If you look under “Traffic”, you will see a page that says Links to your site. Chances are your scrapers will be among the top ones there. They will have hundreds if not thousands of links to your pages (considering that you have internal links).

FeedBurner Uncommon Uses

If you have setup Feedburner for your WordPress blog, then you can see some uncommon uses. In the Analyze Tab under Feed Stats, you will see “Uncommon Uses”. There you will see a list of sites.

How to Deal with Content Scrapers

There are few approaches that people take when dealing with content scrapers. The Do Nothing Approach, Kill them all approach, Take Advantage of them approach.

The Do Nothing Approach

This is the easiest approach you can take. Usually the most popular bloggers would recommend this because it takes A LOT of time fighting the scrapers. This approach simply recommends that “instead of fighting them, spend your time producing even more quality content and having fun”. Now obviously if it is a well-known blog like Smashing Magazine, CSS-Tricks, Problogger, or others, then they do not have to worry about it. They are authority sites in Google’s eyes.

During the update of panda, we know some good sites got flagged as scrapers because google thought their scrapers were original content. So this approach is not always the best in our opinion.

Kill them all Approach

This is the opposite of the “Do Nothing Approach”. In this approach, you simply contact the scraper and ask them to take the content down. If they refuse to do so or simply do not reply to your requests, then you file a DMCA (Digital Millennium Copyright Act) with their host. In our experience, the majority of the scraping websites do not have a contact form available. If they do, then utilize it. If they do not have the contact form, then you need to do a Whois Lookup.

You can notice the contact info on the administrative contact. The administrative, and technical contact is usually the same. The whois also shows the domain registrar. Most well-known web hosting companies and domain registrars have DMCA forms or emails. You can see that this specific person is with Hostgator because of their nameservers. HostGator has a form for DMCA complaints. When the nameserver is something like ns1.theirdomain.com, then you have to dig deeper by doing reverse IP lookups and searching for IPs.

You can also use a third party service for DMCA.com  for takedowns.

Jeff Starr in his article suggest that you should block the bad guy’s IPs. Access your logs for their IP address, and then block it with something like this in your root .htaccess file:

.htaccess
Deny from 123.456.789

You can also redirect them to a dummy feed by doing something like this:

.htaccess
RewriteCond %{REMOTE_ADDR} 123\.456\.789\.
RewriteRule .* http://dummyfeed.com/feed [R,L]

You can get really creative here as Jeff suggests. Send them to really large text feeds full with Lorem Ipsum. You can send them some disgusting images of bad things. You can also send them right back to their own server causing an infinite loop which will crash their site.

The last approach that we take is to take Advantage of them.

How to Take Advantage of Content Scrapers

This is our approach of dealing with content scrapers, and it turns out quite well. It helps our SEO as well as help us make extra bucks. Majority of the scrapers use your RSS Feed to steal your content. So these are some of the things that you can do:

  • Internal Linking – You need to interlink the CRAP out of your posts. With the Internal Linking Feature in WordPress 3.1 , it is now easier than ever. When you have internal links in your article, it helps you increase pageviews and reduce bounce rate on your own site . Secondly, it gets you backlinks from the people who are stealing your content. Lastly, it allows you to steal their audience. If you are a talented blogger, then you understand the art of internal linking. You have to place your links on interesting keywords. Make it tempting for the user to click it. If you do that, then the scraper’s audience will too click on it. Just like that, you took a visitor from their site and brought them back to where they should have been in the first place.
  • Auto Link Keywords with Affiliate Links– There are few plugins like Ninja Affiliate and SEO Smart Links that will automatically replace assigned keywords with affiliate links. For example: HostGator, StudioPress , MaxCDN , Gravity Forms << These all will be auto-replaced with affiliate links when this post goes live.
  • Get Creative with RSS Footer – You can either use the RSS Footer or WordPress SEO by Yoast  Plugin to add custom items to your RSS Footer. You can add just about anything you want here. We know some people who like to promote their own products to their RSS readers. So they will add banners. Guess what, now those banners will appear on these scraper’s website as well. In our case, we always add a little disclaimer at the bottom of our posts in our RSS feeds. By doing this, we get a backlink to the original article from scraper’s site which lets google and other search engines know we are authority. It also lets their users know that the site is stealing our content. If you are good with codes, then you can totally get nuts. Such as adding related posts just for your RSS readers, and bunch of other stuff. Check out our guide to completely manipulating your WordPress RSS feed.

How You Can Reduce Blog Content Scraping and Possibly Prevent It

Considering if you take our approach of lots of internal linking, adding affiliate links, rss banners and such chances are that you will reduce content scraping to good measure. If you take Jeff Starr’s suggestion of redirecting content scrapers, that too will stop those scrapers. Aside from what we have shared above, there are a few other tricks that you can use.

Full vs. Summary RSS Feed

There has been a debate which is in the blogging community whether to have full RSS feed or summary RSS feed. We are not going to go into much details about that debate, however one of the PROS of having a Summary Only RSS feed is that you prevent content scraping. You can change the settings by going to your WordPress admin panel and going under Settings » Reading. Then change the setting For each article in a feed show: Summary.

Note: We have full feed because we care more about our RSS readers than the spammers.

Trackback SPAM

Trackbacks and Pingbacks definitely had great uses however, they are now constantly being abused. Often themes display trackbacks and pingbacks under or among the comments. This gives the spammer an incentive to scrape your site and send trackbacks. If you mistakenly approves it, then they get a backlink and mention from your site.

Is Content Scraping Ever Good?

Yes, it can be. If you see that you are making money from the scraper’s site, then sure it can be. If you see a lot of traffic from a scraper’s site, then it can be. In most cases however, it is not. You should always try to get your content taken off. But you will realize as your blog gets larger, it is almost impossible to keep track of all content scrapers. We still send out DMCA complaints, however we know that there are tons of other sites that are stealing our content that we just cannot keep up with.

We hope this article will help you to solve your problem.