RSS Feed

Remove special characters from url using redirects

Feb 17 2012 Published by admin under Apache, Redirects

  • Sharebar

There are few CMS which do not remove special characters from the post and use them while entering the title. You can remove them using redirects using htaccess file or httpd.conf itself. A very basic rewrite is written as below:

RewriteEngine on
RewriteBase /redirects
RewriteRule ^(.*)['\-+](.*)$ /redirects/$1$2 [R=301]
#or for different encoding characters use the below
RewriteRule ^(.*)[^a-zA-Z0-9\-\.][^a-zA-Z0-9\-\.][^a-zA-Z0-9\-\.](.*)$ /redirects/$1$2 [R=301]

You can use any special character inside the bracket. Remember to escape the required characters else it will break the redirect or even the syntax itself causing the apache to throw error.

The above mentioned redirect will keep redirecting until all the specified special characters are removed from the url, so if we are using url as below:

http://yourdomain.com/redirects/t'e+s-+t.html

Here we have four special characters and the redirect rule will eliminate them one by one and you can see that in firebug as below:

GET t%27e+s-+t.html http://localhost/redirects/t%27e+s-+t.html 301 Moved Permanently 
GET t%27e+s-t.html http://localhost/redirects/t%27e+s-t.html 301 Moved Permanently
GET t%27e+st.html http://localhost/redirects/t%27e+st.html 301 Moved Permanently
GET t%27est.html http://localhost/redirects/t%27est.html 301 Moved Permanently
GET test.html http://localhost/redirects/test.html 200 OK
5 requests

For a single special character this should work fine, but for multiple special characters it will keep on redirecting until all the characters are removed. These many redirects are not considered as good. Therefore, you can choose the correct set of redirect and write it as below:

RewriteEngine on
RewriteBase /redirects
RewriteRule ^(.*)['\-+](.*)['\-+](.*)$ /redirects/$1$2$3 [R=301]     #remove two special characters simoultaneously.

With this still there are 3 requests made out of which two are redirects. First one removing first two special characters and second one removing the remaining two. For four special characters you can write.

RewriteRule ^(.*)['\-+](.*)['\-+](.*)['\-+](.*)['\-+](.*)$ /redirects/$1$2$3$4$5 [R=301]

Now all four special characters are removed in one go.

One response so far

Leave a Reply