Thursday, May 17, 2012

Hidden features of mod_rewrite


There seem to be a decent number of mod_rewrite threads floating around lately with a bit of confusion over how certain aspects of it work. As a result I've compiled a few notes on common functionality, and perhaps a few annoying nuances.



What other features / common issues have you run across using mod_rewrite ?


Source: Tips4all

5 comments:

  1. Where to place mod_rewrite rules

    mod_rewrite rules may be placed within the httpd.conf file, or within the .htaccess file. if you have access to httpd.conf, placing rules here will offer a performance benefit (as the rules are processed once, as opposed to each time the .htaccess file is called).

    Logging mod_rewrite requests

    logging may be enabled from within the httpd.conf file (including <Virtual Host>):

    # logs can't be enabled from .htaccess
    # loglevel > 2 is really spammy!
    RewriteLog /path/to/rewrite.log
    RewriteLogLevel 2


    Common use cases


    to funnel all requests to a single point:

    RewriteEngine on
    RewriteCond %{REQUEST_FILENAME} !-f # ignore existing files
    RewriteCond %{REQUEST_FILENAME} !-d # ignore existing directories
    RewriteRule ^(.*)$ index.php?query=$1 # map requests to index.php and append
    # as a query string

    handling 301/302 redirects:

    RewriteEngine on
    RewriteRule ^oldpage.html$ /newpage.html [R=302] # 302 Redirect
    RewriteRule ^oldpage2.html$ /newpage.html [R=301] # 301 Redirect


    note: external redirects are implicitly 302 redirects:

    # this rule:
    RewriteRule ^somepage.html$ http://google.com
    # is equivalent to:
    RewriteRule ^somepage.html$ http://google.com [R]
    # and:
    RewriteRule ^somepage.html$ http://google.com [R=302]

    forcing SSL

    RewriteEngine on
    RewriteCond %{SERVER_PORT} 80
    RewriteRule ^(.*)$ https://mysite.com/$1 [R,L]

    common flag usage:


    [R] force a redirect (default 302)
    [R=301] force a 301 redirect
    [L] stop rewriting process (see note below in common pitfalls)
    [NC] case insensitive matches


    you can mix and match flags:

    RewriteRule ^olddir(.*)$ /newdir$1 [L,NC]



    Common pitfalls


    mixing mod_alias style redirects with mod_rewrite

    # Bad
    Redirect 302 /somepage.html http://mysite.com/otherpage.html
    RewriteEngine on
    RewriteRule ^(.*)$ index.php?query=$1

    # Good (use mod_rewrite for both)
    RewriteEngine on
    RewriteRule ^somepage.html$ /otherpage.html [R=302,L] # 302 redirect, and stop processing
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(.*)$ index.php?query=$1 # handle other redirects


    note: you can mix mod_alias with mod_rewrite, but it involves more work than just handling basic redirects as above.
    context affects syntax

    within .htaccess files, a leading slash is not used in the pattern:

    # given: GET /directory/file.html

    # .htaccess
    # result: /newdirectory/file.html
    RewriteRule ^directory(.*)$ /newdirectory$1

    # .htaccess
    # result: no match!
    RewriteRule ^/directory(.*)$ /newdirectory$1

    # httpd.conf
    # result: /newdirectory/file.html
    RewriteRule ^/directory(.*)$ /newdirectory$1

    # Works in both contexts:
    RewriteRule ^/?directory(.*)$ /newdirectory$1

    [L] is not last! (sometimes)

    within the .htaccess context, [L] will not force mod_rewrite to stop. it will continue to trigger internal sub-requests:

    RewriteRule ^dirA$ /dirB [L] # processing does not stop here
    RewriteRule ^dirB$ /dirC # /dirC will be the final result


    our rewrite log shows the details:

    rewrite 'dirA' -> '/dirB'
    internal redirect with /dirB [INTERNAL REDIRECT]
    rewrite 'dirB' -> '/dirC'

    ReplyDelete
  2. The deal with RewriteBase:

    You almost always need to set RewriteBase. If you don't, apache guesses that your base is the physical disk path to your directory. So start with this:

    RewriteBase /

    ReplyDelete
  3. Other Pitfalls:

    1- Sometimes it's a good idea to disable MultiViews

    Options -MultiViews


    I'm not well verse on all of MultiViews capabilities, but I know that it messes up my mod_rewrite rules when active, because one of its properties is to try and 'guess' an extension to a file that it thinks I'm looking for.

    I'll explain:
    Suppose you have 2 php files in your web dir, file1.php and file2.php and you add these conditions and rule to your .htaccess :

    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(.*)$ file1.php/$1


    You assume that all urls that do not match a file or a directory will be grabbed by file1.php. Surprise! This rule is not being honored for the url http://myhost/file2/somepath. Instead you're taken inside file2.php.

    What's going on is that MultiViews automagically guessed that the url that you actually wanted was http://myhost/file2.php/somepath and gladly took you there.

    Now, you have no clue what just happened and you're at that point questioning everything that you thought you knew about mod_rewrite. You then start playing around with rules to try to make sense of the logic behind this new situation, but the more you're testing the less sense it makes.

    Ok, In short if you want mod_rewrite to work in a way that approximates logic, turning off MultiViews is a step in the right direction.

    2- enable FollowSymlinks

    Options +FollowSymLinks


    That one, I don't really know the details of, but I've seen it mentioned many times, so just do it.

    ReplyDelete
  4. if you need to 'block' internal redirects / rewrites from happening in the .htaccess, take a look at the

    RewriteCond %{ENV:REDIRECT_STATUS} ^$


    condition, as discussed here.

    ReplyDelete
  5. Equation can be done with following example:

    RewriteCond %{REQUEST_URI} ^/(server0|server1).*$ [NC]
    # %1 is the string that was found above
    # %1<>%{HTTP_COOKIE} concatenates first macht with mod_rewrite variable -> "test0<>foo=bar;"
    #RewriteCond search for a (.*) in the second part -> \1 is a reference to (.*)
    # <> is used as an string separator/indicator, can be replaced by any other character
    RewriteCond %1<>%{HTTP_COOKIE} !^(.*)<>.*stickysession=\1.*$ [NC]
    RewriteRule ^(.*)$ https://notmatch.domain.com/ [R=301,L]


    Dynamic Load Balancing:

    If you use the mod_proxy to balance your system, it's possible to add a dynamic range of worker server.

    RewriteCond %{HTTP_COOKIE} ^.*stickysession=route\.server([0-9]{1,2}).*$ [NC]
    RewriteRule (.*) https://worker%1.internal.com/$1 [P,L]

    ReplyDelete