So today’s post is a little bit more technical than usual. But I’m sure that some of our technical readers will get a lot of value out of it.
In the world of SEO everybody is trying to move ahead, gain an advantage. Because of this fact, sometimes some webmasters are willing to take positions that are unethical. Sometimes they manipulate their own websites to get an advantage. However, sometimes they manipulate other people’s websites to take advantage as well.
One of this manipulation techniques is done through the query strings of your website. Let me explain:
If your website uses a content management system, it relies on query strings to search for the appropriate pages to serve. And sometimes because of the configuration of your server this can be exploited.
In this case the exploitation happens in the following manner:
1. The spammer creates a link to a non-existent page on your website of the form: http://www.yourwebsite.com/?referrer=www.thespammerssite.com
2. The spammer hopes that your CMS is configured in such a way that when Google visits this URL it takes the spider to the spammers website with a 301 redirect.
3. The benefit being that you are passing link juice or link value to the spammers website.
How do we prevent this from happening?
Well, if you are you are using an IIS server, the solution might be a little bit more complex to find. However I am going to explain the solution in an Apache server. The techniques explained here shouldn’t be used by beginners, anything that you do with your htaccess file can render your website unavailable. Also I always recommend making a backup of all the available files before starting.
First open your .htaccess file in your home directory. Then, add the following lines to it:
RewriteCond %{QUERY_STRING} referrer=
RewriteRule (.*) https://www.yoursite.com/$1? [R=301,L]
The first line is the qualification line. And basically what he means is if the query string of the request matches the keyword “referrer”, Then go ahead and apply the Rewrite Rule.
In other words, if the request includes a query string with the string “referrer” then it matches the condition for applying a the rewrite.
The second line is of course the description of the rewrite rule. It starts with the keyword RewriteRule. Then it has a space. The first parameter is two parentheses with the dot and an asterisk inside.
The dot in regular expressions means match zero or any character. The * means match the previous match as many times as needed (in this case the dot). In other words what that means is match any characters as needed. In this particular example what that’s going to do is to match completely the request.
After that we have another space. And the second parameter is the results of your rewrite. In this case:
https://www.yoursite.com/$1?
The https://www.yoursite.com part is going to redirect to your domain name, and then add a specific part as the request: /$1 – The dollar sign followed by the number one acts as a placeholder for the matched regular expression in the parenthesis. For example, if our request was /this-and-that?referrer=bad-actor.com, then the $1 would match all the request: /this-and-that.
The $1 means the matching that we did in the first parameter – because we encapsulated it with parentheses. The question mark(?)eliminates any query strings.
The final parameter is the set of flags for this rewrite rule. The first flag means rewrite with a code of 301 which means the permanent rewrite. The second flag is an “L”, which means this is the last rule that we want to apply.
And that’s it.
Of course in this case we are only approaching that referrer query string, so make sure to keep an eye on your log files to always identify what are the spammers trying to use as your query strings.