Advertisements

Posts Tagged htaccess

Redirect non-www to www (and vice-versa) via htaccess

This is probably old news for a lot of folks, but it’s handy nonetheless and I often have to look up these .htaccess strings. Note that Apache mod_rewrite is required for any of this to work.

This (and it’s comments) are based on the Drupal default .htaccess file [License as of 9/30/11: GPL], because the Drupal .htaccess file explains it very well. Mind that lines starting with # are comments. Uncomment them to make them take effect.

Also, this should NOT be used with a WordPress install (it will result in a redirect loop). Instead, make the change in Dashboard > Settings > General.

RewriteEngine On
# If your site can be accessed both with and without the 'www.' prefix, you
# can use one of the following settings to redirect users to your preferred
# URL, either WITH or WITHOUT the 'www.' prefix. Choose ONLY one option:
#
# To redirect all users to access the site WITH the 'www.' prefix,
# (http://example.com/... will be redirected to http://www.example.com/...)
# uncomment the following:
# RewriteCond %{HTTP_HOST} !^www. [NC]
# RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
#
# To redirect all users to access the site WITHOUT the 'www.' prefix,
# (http://www.example.com/... will be redirected to http://example.com/...)
# uncomment the following:
# RewriteCond %{HTTP_HOST} ^www.(.+)$ [NC]
# RewriteRule ^ http://%1%{REQUEST_URI} [L,R=301]

Questions, comments, or feedback? Please leave a comment below. Thank you!

Advertisements

, , ,

Leave a comment

How to block requests by referrer using .htaccess

There may be times you want to stop site visitors from clicking from a link on another site. This is called blocking the “referrer”, it’s also used to prevent image hot-linking.

A “referrer” is another site that is linking to yours. When a user clicks on the link on the other site, they are considered the referrer. In a basic referrer block, you block the traffic by specifying what domains (referrers) may not send you traffic.

For example, the following code will block any visitors that visit by clicking on links at example.com, or block any pages or images included in example.com that are from your site (iframes, images, etc).

RewriteEngine On
# Next line may be required, uncomment it if you're having trouble
# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} example.com [NC]
RewriteRule .* - [F]

The thing about this is it does not stop someone from simply typing in your website address into their browser; it specifically stops traffic that originates from that other domain.

If you simply want to stop image hot-linking, we just block the file types, rather than all traffic. The only line to change is the RewriteRule line. You have two choices: To block the images completely, or to present an image that says that you don’t allow hotlinks.

Option 1: Forbid the request

RewriteRule .*.(jpe?g|gif|bmp|png)$ - [F]

Option 2: Redirect to something else

RewriteRule .*.(jpe?g|gif|bmp|png)$ http://mysite.com/nohotlinks.jpg [L]

Feel free to adapt this to your needs.

Questions, comments, or feedback? Got another method that you think is better, or am I missing something? Please feel free to share it in the comments below. Thank you!

Leave a comment

Block access to your website via proxies using .htaccess

If you’re a webmaster, you may want to block access to your site via proxies. While you could do this by blocking the proxy domains individually, there are thousands upon thousands of proxies and more popping up all the time. Rather than block them individually, you can easily block the HTTP headers that [properly behaving] proxies use.

Originally appearing at Perishable Press, the following code segment gets added to your .htaccess file:

RewriteEngine on
RewriteCond %{HTTP:VIA}                 !^$ [OR]
RewriteCond %{HTTP:FORWARDED}           !^$ [OR]
RewriteCond %{HTTP:USERAGENT_VIA}       !^$ [OR]
RewriteCond %{HTTP:X_FORWARDED_FOR}     !^$ [OR]
RewriteCond %{HTTP:PROXY_CONNECTION}    !^$ [OR]
RewriteCond %{HTTP:XPROXY_CONNECTION}   !^$ [OR]
RewriteCond %{HTTP:HTTP_PC_REMOTE_ADDR} !^$ [OR]
RewriteCond %{HTTP:HTTP_CLIENT_IP}      !^$
RewriteRule ^(.*)$ - [F]

This sends visitors a “403 Forbidden” message. Period.

An interesting update to this finds that most anonymous proxies aren’t sending the headers that this filtering acts upon. You can test this by visiting a site that shows browser HTTP headers, such as HTTP Header Viewer: list browser headers, using a proxy. If you don’t see any of the headers mentioned above, then this code isn’t going to filter for you. Unfortunately the proxies have realized that people aren’t going to use them if they can be easily blocked, and the proxies are getting smarter. That makes blocking them all the more difficult.

If you have a method for blocking proxy access to your site, or anything else to share on this subject, please feel free to share it in the comments below!

Leave a comment

Redirect or deny site visitors based on IP using .htaccess

As a webmaster, there may be times when you want to deny one or more IP addresses or ranges from accessing your website. Be it comment spammers, page scrapers, or various other reasons. For that, you’re welcome to adopt the following code snippets. If you’ve found them helpful, or have something to share, please do so in the comments below.

Deny by IP

Start by looking for the following two lines in your .htaccess file

order allow,deny
allow from all

Add the following directives BETWEEN those lines. If you don’t have the above two lines, add them and the following lines between them.

# Throws a "403 Forbidden" for the matching IP ranges
# Matches 91.46.*.*
Deny from 91.46.
# Matches 10.4.5.6 exactly
Deny from 10.4.5.6

Redirect by IP

Copy/paste the following code and add to your .htaccess file, changing the RewriteCond and RewriteRule lines to meet your needs. Read the comments.

# Permanently redirect based on IP rage
<IfModule mod_rewrite.c>
RewriteEngine On
# Try uncommenting this line if it doesn't take
# Options +FollowSymlinks
# The following line describe the IP-pattern to match
# If 1 octet is given, it will match the entire Class-A address
# If 2 octets are given, it will match the Class-B address
# if 3 octets are given, will match the Class-C address
# If 4 octets are given, will match the full address
# The NC directive means Not-Case-Sensitive. It's probably not required.
# Matches 10.1.2.3 exactly
RewriteCond %{REMOTE_HOST} 10.1.2.3 [NC,OR]
# Matches 10.2.*.*
# The trailing period is needed to prevent partial-octet matching
# (i.e. matching 10.200. or 10.20. instead of just 10.2.
RewriteCond %{REMOTE_HOST} 10.2. [NC,OR]
# OR statement must be on all but the last IP to match
# It means, literally, "match this rule, or..."
# Matches 10.5.6.*
RewriteCond %{REMOTE_HOST} 10.5.6. [NC]
# Next line format: RewriteRule <regex to match> <destination> <Flags>
# by default, the next line matches all URLs
# and permanently (301) redirects to the destination.
# Use R=302 instead for a temporary redirect.
# L means this is the last rule to process when this condition is met (recommended)
RewriteRule .* http://example.com/destination.htm [R=301,L]
</IfModule>

Thomas indicates that there should be a trailing backslash-period at the end of the 4th octet, if the octet is less than 3 digits. In testing, I haven’t found it necessary (and it behaves as expected), but readers are encouraged to try it both ways. If you find that one way works over another, feel free to share in the comments.

Here’s a simpler, albeit somewhat less flexible method. This doesn’t use the Apache rewrite module, nor pattern matching, and may or may not fit your needs.

# 301 (permanently) redirect requests for /bar to http://new.example.com/foo/bar
301 redirect /bar http://new.example.com/foo/bar
# Also redirect the web root
301 redirect / http://new.example.com

Questions, comments, and feedback regarding these methods are welcome and apreciated. Have something to contribute, or have feedback about the above code? Please feel free to share in the comments below!

, ,

Leave a comment

Bad robots

As part of being on a VPS, bandwidth is limited. One of the things you have to watch for is bots, crawlers, and scrapers coming and stealing your content and bandwidth.

Some of these bots are good and helpful, like the Google, Yahoo, and Bing crawlers. They index your site so it will appear in the search engines. Others, like the Yandex bot, crawl and index your pages for a Russian search engine. If you have an English-only site targeting US visitors, you might want to consider blocking the Yandex bot.

In my searches I also came across the Dotbot, which seems to crawl your pages just to get your response codes. I’m not sure what they do with the data, but in my opinion it’s better to block them.

So how does one block these bots? The Robots Exclusion Protocol states that a file, called robots.txt, can be put in your DocumentRoot with directives for bots to follow. For example, if your domain is example.com, your robots.txt should be at the following URL:

http://example.com/robots.txt

The robots.txt directives can tell bots which files they are allowed to index and which they are not. Well-behaved web robots will look at this file before attempting to crawl your site, and obey the directives within. The directives are based on the bots UserAgent string. A couple of examples:

Block the Dotbot robot from crawling any pages:

UserAgent: dotbot
Disallow: /

Block all robots from crawling anything under the /foo/ directory:

UserAgent: *
Disallow: /foo/

The Google Webmaster Tools has an excellent tool for checking your robots.txt file. You can find instructions on how to access it here. Google account required.

However, not all bots obey (or even look at) the robots.txt file. Those that don’t need special treatment in the .htaccess file, which I’ll describe in another post.

, , ,

Leave a comment

Optimizing WordPress

So after my little fiasco with plug-ins and CPU throttling, I’ve been looking for ways to make WordPress at least a little lighter and faster. I’m not going to cover disabling plug-ins, I’m going to go over a few other ways, starting with …

Disabling revisions:

Every time a post is edited and/or published, a new revision is created. These stick around in the database (never deleted) and can not only grow the database, but can also lengthen query times for information. So, per MyDigitalLife and the WordPress codex, here’s the quick-and-dirty:

…simply add the following line of code to wp-config.php file located in the root or home directory of WordPress blog.

define('WP_POST_REVISIONS', false);

If you would rather limit the number of revisions to something small, say 2 for example, just use the number instead of FALSE:

define('WP_POST_REVISIONS', 2);

It should be added somewhere before the require_once(ABSPATH . 'wp-settings.php'); line. That’s it. Revisions will no longer be created. If you want to delete previously created revisions, read on…

Deleting revisions:

So now that you’ve disabled revisions, how do you delete all the old cruft laying around? MyDigitalLife has the answer on this one too.

…and then issue the following [SQL] command (it’s also recommended to backup the database before performing the deletion SQL statements):

DELETE FROM wp_posts WHERE post_type = "revision";

All revisions should now be deleted from the database.

Caching:

Caching is a hot button for sites that could potentially see high amounts of traffic (and since we would all like to be in that category…) The caching plug-in that I use and recommend is WP Super Cache. The UI is easy enough to work around, though it does require editing of the .htaccess file.

Database queries:

Shared hosting providers get real upset when applications and scripts perform excessive and unoptimized database queries. Heavy themes, excessive numbers of widgets, and badly-written plug-ins all contribute to this. Fortunately, a post on CravingTech points to an easy method to check the number of queries happening on a single page load.

You can insert this snippet of code on your Footer.php file (or anywhere) to make it display the number of queries:

<?php echo $wpdb->num_queries; ?> <?php _e(‘queries’); ?>

After looking at the number of queries occurring on a page load, try changing themes, disabling plug-ins, and/or reducing the number of widgets on a page to reduce the number of queries. SQL Monitor looks like a useful plug-in for further examining SQL queries, but I haven’t used it, so I can’t comment on it’s usefulness (or lack thereof).

Also…

I’ve stumbled on some additional information while researching, and apparently the “WordPress should correct invalidly nested XHTML automatically” setting (under Settings > Writing) can not only increase the load when a post is saved, but can also break some plug-ins. If you’re familiar enough with (X)HTML to handle correctly closing tags, you might actually be better turning this off.

You can also find other settings for wp-config.php on the WordPress Codex page.

Are you a WordPress user? Do you have any experience optimizing your WordPress installation? Have you tried any of the methods listed above? Did they work for you? Have any methods other than the above that you’ve tried?

, , , , ,

Leave a comment