Feb 28

Jugglingdb has crashed… rumoured to be a php attack going around at the moment so i’m guessing it’s that. Thought I’d better capture my script files in case things go really belly up. it’s reminded me of another two items of note (1) the idea of creating a graphical regexp editor, and (2) this regexp for recognising web addresses (not strictly URLs, since it allows www.blah without the http://, but requires http:// if the server name is not www:

[ Note: this is cut & pasted from some PHP code, so there are extra \s in there! ]

// BitchASS regular expression for web addresses (like URIs, but includes those where the
// http:// is missed off:
// By steve@juggler.net
$preg = “(” // Opening ‘quote’
. “(?:(?:http://([\w_\-]+))|(www))” // opening gambit - match “http://(anything)” or “www”
. “(” // Capture remainder of address
. “(?:\.[\w_\-]+)+” // remainder of domain NOTE excludes domainless machine names
. “(?:/” // Got slash?
. “(?:(?:[\w_\-+\.\~]|%[0-9a-f][0-9a-f])+)” // Directory or ~user or filename
. “(?:/(?:[\w_\-+\.]|%[0-9a-f][0-9a-f])+)*” // Any number of subdirs
. “(?:#(?:[\w_\-+\.]|%[0-9a-f][0-9a-f])*)?” // #anchor
. “(?:\?\S*)?” // ?querystring: only terminated by space character
. “)?” // You might not have any of this.
. “)” // end capture
.”)i”; // ignore case

// NOTE the replace expression: for http://something, $1 will contain ’something’ and $2 will
// be blank. For those beginning www, $1 will be blank and $2 will contain ‘www’. So $1$2$3
// is the expression for the whole address.

return preg_replace($preg,”<A href=”http://$1$2$3″ TARGET=”_blank”>$0</A>”,$outbuff);

Please credit me (in the source code at least) if you use it, please mail me if you improve it.


leave a reply

You must be logged in to post a comment.

 

February 2002
S M T W T F S
    Mar »
 12
3456789
10111213141516
17181920212223
2425262728  

Archives

Meta