manpagez: man pages & more
info bigloo
Home | html | info | man
[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

13.3 An Extended Example

Here’s an extended example from Friedl that covers many of the features described above. The problem is to fashion a regexp that will match any and only IP addresses or dotted quads, ie, four numbers separated by three dots, with each number between 0 and 255. We will use the commenting mechanism to build the final regexp with clarity. First, a subregexp n0-255 that matches 0 through 255.

(define n0-255
  "(?x:
  \\d          ;  0 through   9
  | \\d\\d     ; 00 through  99
  | [01]\\d\\d ;000 through 199
  | 2[0-4]\\d  ;200 through 249
  | 25[0-5]    ;250 through 255
  )")

The first two alternates simply get all single- and double-digit numbers. Since 0-padding is allowed, we need to match both 1 and 01. We need to be careful when getting 3-digit numbers, since numbers above 255 must be excluded. So we fashion alternates to get 000 through 199, then 200 through 249, and finally 250 through 255.(9)

An IP-address is a string that consists of four n0-255s with three dots separating them.

(define ip-re1
  (string-append
    "^"        ;nothing before
    n0-255     ;the first n0-255,
    "(?x:"     ;then the subpattern of
    "\\."      ;a dot followed by
    n0-255     ;an n0-255,
    ")"        ;which is
    "{3}"      ;repeated exactly 3 times
    "$"        ;with nothing following
    ))

Let’s try it out.

(pregexp-match ip-re1 "1.2.3.4")        ⇒ ("1.2.3.4")
(pregexp-match ip-re1 "55.155.255.265") ⇒ #f

which is fine, except that we also have

(pregexp-match ip-re1 "0.00.000.00") ⇒ ("0.00.000.00")

All-zero sequences are not valid IP addresses! Lookahead to the rescue. Before starting to match ip-re1, we look ahead to ensure we don’t have all zeros. We could use positive lookahead to ensure there is a digit other than zero.

(define ip-re
  (string-append
    "(?=.*[1-9])" ;ensure there's a non-0 digit
    ip-re1))

Or we could use negative lookahead to ensure that what’s ahead isn’t composed of only zeros and dots.

(define ip-re
  (string-append
    "(?![0.]*$)" ;not just zeros and dots
                 ;(note: dot is not metachar inside [])
    ip-re1))

The regexp ip-re will match all and only valid IP addresses.

(pregexp-match ip-re "1.2.3.4") ⇒ ("1.2.3.4")
(pregexp-match ip-re "0.0.0.0") ⇒ #f

[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

This document was generated on October 23, 2011 using texi2html 5.0.