how to use Apache’s mod_rewrite to counter blog content piracy

I was eyeballing who is referring traffic to this site, when I noticed www.domain.example.com in there. The rest of the site traffic is from Google Search or direct landing on the main page or feed. I copy+pasted the site name to Firefox. There, I found the full content of a post I wrote earlier this morning. It was filed on that site as if it were the site’s very own! The only thing to attribute the post to this site, the original author, is a ‘original post’ link at the bottom of the full blog post.

Googling found many discussions about this type of content piracy. So, it is not that unusual as I initially thought. Since www.domain.example.com is a publisher using Google Adsense, it violated the Google Adsense policy changes effective recently. I reported as such to Google Adsense via email as instructed on its blog.

Meanwhile, not sure how long it takes Google Adsense or even Google Search to come to our rescue, I want to do something on my end to help the situation. This site runs Apache web server, which has rather powerful URL rewriting capability via mod_rewrite. I gave it a spin. Here is the rules I came up with.

<IfModule mod_rewrite.c>
RewriteEngine On

# to redirect requests referred from pirates
ReWriteCond %{HTTP_REFERER} (www.domain.example.com) [NC]
ReWriteRule \.(png|gif|jpeg|jpg|bmp)$ - [F,L,NC]
</IfModule>

Basically, the rewrite rules responds with a 403 status code (access forbidden) to the browser, when the referrer is www.domain.example.com. In other words, now my site refused to serve image requests when the images are to be viewed on the content-pirate’s pages. The browser will display whatever it displays when an image is missing on a regular web page. An interesting thought would be actually created a this-is-stolen-goods banner and serve it instead of the requested image. I’d think it won’t look too pretty on the content-pirate’s site.

Note that the rules above rewrite only for image requests, since this particular content-pirate site copied all the text content and left all the image link intact. Therefore, the innocent user’s browser would request only these images from my site. In other words, image requests to my site is the only opportunity I get to say no.

The side effect of access denied only for images is that the original_post link on that content-pirate site still works to bring you true traffic. So are the images you created and attached. Well, at least it is the case for this site running Wordpress 2.1.

Leave a Comment

Powered by WP Hashcash