Using "regular expressions"

  • 7 Replies
  • 470 Views
*

ron

  • Administrator
  • Guru
  • *****
  • 3,238
Using "regular expressions"
« on: February 24, 2017, 06:45:06 »
8th has PCRE regular expressions (aka: "regex") built-in.  So if you're familiar with regex in perl or other languages, you pretty much know how to use them in 8th.

You can create a regex using the "/" lead-in character, like:  /cat/.  Alternatively, you can create one from a string like: "cat" r:new.  Due to the vagaries of parsing, one may be better than the other in your situation.  The second option lets you build a regex at run-time and create it, while the first does not.

Once you have a regex, you use it to find matches.  So for instance, you might do: "cat in the hat" /.at/ tuck r:match .s cr.  That will show the number "1" on TOS, followed by the regex you created.  The "1" means there is exactly one matching pattern for that regex in the given string.  It does not mean that there is only one match in the string all-told!

To get the text of the match, type drop 0 r:@ . cr.  Now you'll see "cat" printed on the string.  To get further matches, type dup r:+match .s cr and see that it produces a '1' again.  Now if you do the regex print you'll see "hat".  If you repeat this process again, the r:+match will return null (in 17.02 and later, it will return 0).

Unexpected things:  you may think that if you have a string like "1234567" and you match against /\d\d\d/ that your first match is "123" (true) and that the second match is "456".  The latter is not true, because "r:+match" continues the match from one character beyond the beginning of the previous match.  In 17.02 and later there is r:++match which matches from the end of the previous match.  But r:+match will return "234" in the above case.

Other 17.02 and later changes:  the "match" words no longer consume the regex, acting like most other words.  Thus after r:match, the regex used will remain on the stack, just under the number of matches.  The r:++match word is new.  The match words return 0 instead of null if there is not a match.  The r:/ word now returns an array of all matches from the matched string, which is more convenient than the 17.01 and prior behavior. 

These are all breaking changes if you are using regex, so please be aware of them when updating to 17.02 and later!

*

Dirt Meister

  • Guru
  • *****
  • 567
Re: Using "regular expressions"
« Reply #1 on: February 25, 2017, 00:08:19 »
Cool! Love the new changes!

*

d.k

  • Apprentice
  • **
  • 32
Re: Using "regular expressions"
« Reply #2 on: February 27, 2017, 19:34:38 »
Me too! :)

*

Stefen8

  • Novice
  • *
  • 1
Re: Using "regular expressions"
« Reply #3 on: March 13, 2017, 10:26:15 »
Perfect! I love it. Thanks for that  8)

*

ron

  • Administrator
  • Guru
  • *****
  • 3,238
Re: Using "regular expressions"
« Reply #4 on: March 13, 2017, 10:27:18 »
You're welcome

*

RichAMead

  • beta
  • Guru
  • *****
  • 593
  • "We all live in a big black hole. No, really."
Re: Using "regular expressions"
« Reply #5 on: March 13, 2017, 16:00:36 »
Can you explain the change to 'r:/' a bit more?  It was already returning an array of the matches (or null) - or at least the matched capturing groups, to be specific...  are you saying that it will now return matches as if using the 'g' (global) option by default?  If so, why was this change made - was it not possible to add the option to the regex and get the same result?

*

ron

  • Administrator
  • Guru
  • *****
  • 3,238
Re: Using "regular expressions"
« Reply #6 on: March 13, 2017, 16:14:23 »
The older version did not work unless there were capturing subexpressions, and it matched only the first match.
The new version matches whether or not there is capturing, and it matches all matches, which is what you would probably expect from a "split" functionality.

In the older version: "abcdef" /.../ r:/ gives [].

In the new version it gives ["abc","bcd","cde","def"]

*

RichAMead

  • beta
  • Guru
  • *****
  • 593
  • "We all live in a big black hole. No, really."
Re: Using "regular expressions"
« Reply #7 on: March 17, 2017, 03:01:05 »
Very nice summary.  Thanks!