Stop Scrapers From Hijacking your Web Pages
This article was written to show innocent parties, who have experienced having their website or keyword rank hijacked, and what you can do about it. Website URLs can be hijacked with 302 Redirects, meta refresh redirects, no follow meta tags, deceptive redirects, scraper directories and scam artists who use your content. This article will show you how to recognize when someone is hijacking your web page URL and/or it's content, and also how to stop scrapers.
NOTICE: The 302 website redirect problem isn't such an issue any more because search engines now "usually" know when someone has stollen your content. But just in case they messed up, this article may help you figure out how to stop it.
About 302 Website Redirects
If you are redirected every time you surf the internet on Windows, then it is probably not a 302 redirect but Malware, AdWare or Trojan Horses.
Lori, Thanks to your information about web site "hijacking" we were able to determine that "hijacking" was the cause of a 7-month long problem we've had with our Google rankings! Now we can take the appropriate measures to solve the problem!
Adam Berman, Latin Love Search.
URL hijacking, or 302 redirects, tell the search engine that the page's information has moved temporarily to the hijackers web site and that information now belongs to the hijacker. Scrapers "scrape" content off your site and put it on their site to steal your keyword rank.
Once a website is hijacked with a malicious 302 redirect, or the content is scraped, the ranking value of the victim's page may quickly drop in Google as Google often wrongly attributes the victim's content to the hijacker/scraper website so the problem needs to be removed before it destroys the web site completely.
Some 302 redirects are merely tracking URLS and are not a deliberate attempt to hijack your web page so please read all of the following before taking action.
Often what people think are problems from a 302 are usually just issues with their code either causing indexing problems or something they are doing has been recently outlawed by Google and is causing a keyword ranking problem.
For an-indepth analysis of your website see the following:
What Benefit is there in Hijacking or Scraping A Web Site?
The reason someone hijacks or scrapes your content is to scam you out of your traffic or your keyword ranking.
The 302 redirect hijacker gets the benefit of the ranking that you have worked so hard for and thus you may not only loose what rank your website has gained in the search engine you may also be penalized for various reasons depending on the method being used by the hijacker.
A Scraper is a website that scrapes your whole website or just copies one or more pages and puts them on their site because they are too lazy to write their own content. Google devalues copies so the scraper's website may be able to outrank your site for your own content. More info here: Stop Stolen Content.
See more information on How to Find a Hijacker
How to Stop Scrapers or Tell if your Website has been Hijacked:
1. ID Redirects|
2. Meta Refreshes
3. 302 Redirects
5. Your Title in Their Title
6. Bad Links = Split Domain
If the link includes another sites URL and then your domain on the end, with some code in between, then your site may be redirected. Sometimes they use an ID number such as ID=4125 instead of the url to your website and then their program will attribute that number to your URL so if you see ID in the url click on it and see what happens. It may move so fast you can't see it, in which case you need to check a server header checker (explained below).
Here is an example of one of these redirected URL's. The code before and after your URL will contain various forms of redirecting (bogus urls are being used below with http removed so this page validates):HijackersWebSiteURLgoeshere/id-codecausingRedirect goes here.?site=
Sometimes the URL will look innocent but will go to the dishonest site and there will be code on that site that will redirect to your site with a Meta Refresh Tag (automatically redirecting the browser to your site). The Meta Refresh is a favorite of spammers so search engines may ban sites that use meta refreshes. To see if a site is using a Meta Refresh, view the code by clicking on view/source in your browser menu and see if there are any meta refresh tags in the code at the top of that page. Look for a meta redirect tag set for "0" seconds, that is redirecting to your site, similar to the following:
meta http-equiv="refresh" content="0"; url=http://www.YourOwnDomainNameHere
However there are ways of hiding a redirect so that the general viewer can't see it, as follows:
A hijacker may also set up 302 Redirects in an htaccess file telling the server to redirect to your site. 302 redirects tell the search engines their site, or page, has moved temporarily to your site and to credit the content of your site to their site and thus stealing your ranking. Many directories used to use 302 redirects on the links in their directory to track clicks, resulting in the same problem. However, most directories now use a search engine friendly 302 redirect so be sure to check all such links in a server header checker (see below).
You can't determine a 302 redirect by just viewing the URL, you need to copy the link into a server header checker. If the results show they are using 302 redirects then the directory may be stealing your page rank (you can check this by searching for the main keywords on the page and see if the other site ranks higher than your site). If the result from the server header checker shows a 200 code, then the page is probably OK (if this search produces an error make sure there are no breaks in the link).
One way to prevent unknowingly submitting to a site like this is to ALWAYS check out a directory's links with a server header checker before submitting your link to their site or it may cause your site irreparable harm instead of helping your page rank.
Another method that dishonest webmasters use to benefit their own site and provide no benefit to your own site is to copy your pages and then install code that draws your whole web page (including images and working links) into a frame so your whole web page is displayed on their site. This is dishonest because they are using your design and content and displaying it as their own without your permission.
The reason they do it is to provide content for their site because they are too lazy to write their own articles. Search engines can't see it and thus you'll probably never know it's there. This also steals traffic from your site because people often see what they wanted on the scraper's web site without even visiting your web site. You also get no benefit of the ranking from that link because the link is inside a frame on their web site which most search engines can't read.
Following is a sample of this kind of url (http was removed and bogus URLs also removed so this page will validate):
www.YourWebSiteURLGoesHere.comandYourIDnumbergoes hereID=____, etc.
This recently happened to one of my web pages that I had updated which resulted in being #1 in Google for the main keywords and within 2 days this page appeared on someone else's web site inside of a frame. I found proof of this in my traffic monitor's referral URLs and quickly set up a "pop out of frame" script for that page and many others on my site and wrote the owner of the site to remove the link.
Often Scraper Directories (that you may have submitted to yourself) will take your web page (if it's ranking well in the search engines) and set up a separate web page, often called "more info" or "details" with your business name in their title and/or your URL in their title and also your most important text on your page that brings ranking to your site. If your site is new or has very little ranking this will enable them to out rank your website for your own business name.
If you have submitted your site to search engines as www.YourDomain.com and somone links to your site with just YourDomain.com, i.e., without the WWW in front, Google will think you have two sites under the same domain and may penalize your site with a duplicate content penalty. This is known as a Split Domain. You can prevent it by doing the following:
- Add a canonical tag with the full address of that individual page, and do this on every page on your site (every URL needs to be unique to that page).
- Set up a 301 redirect in your .htaccess file. so all traffic to YourOwnDomain.com will go to www.YourOwndomain.com or whichever one has the most ranking.
- Change all relative URLS on your site to full URLS (include the full address on all internal links) EXCEPT for the home page (do not list index.html, just list the domain for all links to the home page or Google will think you have two home pages). See merging example.com and example.com/index.html
Hijackers are even starting to link to sites with a http/, i.e., they left out the colon and two slashes or www.domain.com . with a space on the end. The space will show up in your site: command search as www.domain.com%20. So check every link coming into your site carefully for other new techniques designed to put you out of business.
A 302 Redirect Could be a Google Bug
Some of the above examples have been the result of a Google bug and not necessarily deliberate attempts to hijack your site or your ranking, because Google is following these redirects and attributing the resulting ranking to the offending site and then penalizing your site for duplicate content.
A very good step-by-step explanation of what happens with one of these redirected URLs can be found in Webmaster World's discussion on this matter in the Your Site is at risk from Hijackers thread. See post # 157 on page 11 of this thread for a very good explanation of other steps to take not listed here.
Google Bug via shared IP Address or Shared Hosting
Some of these Google bugs are a result of 2 sites who are both on the same shared IP address, i.e., one site sets up a 302 redirecting a page on their site to another page on their site and your site is affected because it has the same IP address and Google thinks your site's content has moved to the other site. Sometimes Google will list the other sites Title in your title if it's on the same IP address. I haven't seen this occur for several years so Google may have solved this bug. If you experience this problem the best way to avoid this is by getting a dedicated IP address (usually only $1.00 more a month on a good host). Don't confuse this with a dedicated server which is hundreds of dollars more per month.
All of the above 302 hijacking examples have one result- stealing your traffic or ranking and getting your web site penalized and dropping out of sight in Google, which translates into lost traffic and income for your business. Until Google fixes this 302 redirect bug completely there are measures you can take to eliminate the problem:
© 11-17-04 - updated 12-12-2020
All rights Reserved
More Information on Website HijackingOther methods of hijacking websites
Dupe content checker - 302's - Page Jacking - Meta Refreshes
Webmaster World's Forum, Google News, discussing these hijackings.
Anti-Comment-Spam Tag Exploits. Set up Recip-checkers to look for rel=nofollow?
Lost in Google - No Title, No Description for your site as a result of 302 redirects, etc and what you can do about it.
Xencraft has a very good article on Preventing Web Site Hijacking.
Millions of Pages Google Hijacked via Open Directory feed