Processing Regular Expressions

Richard Smith

Get Microsoft Silverlight

Abstract

In so-called “scripting languages” (Perl, Ruby, Awk and Tcl, to name a few), the ability to search text using “regular expressions” is a cornerstone for the power and flexibility that these languages deliver. Although APL is (currently) mostly used to process numeric data, APL has most of the characteristics of a good scripting language, and many current and future APL applications could benefit from the availability of regular expression support tightly integrated with the language.
The support for regular expressions in Perl inspired Philip Hazel to create the Perl Compatible Regular Expression library known as PCRE, which has been incorporated into many open-source applications. Although APL vendors and tool smiths have previously implemented system or library functions which interface to PCRE and other “regex engines”, one of the typical usage patterns is to call a function to process each “match” of the regular expression within an input document, suggesting that an operator might be a more appropriate model. This paper will discuss the design decisions which led ultimately to ⎕RX, Dyalog’s first “system operator”, which can search using PCRE and make modifications to the text either by using a simple transformation syntax (similar to that used by the Unix utility “sed”), or by using an APL function to express the transformation.
The paper will illustrate some possible examples of ⎕RX in use. In doing so it will consider character classes (such as “an alphabetic character”) which may be used in regular expressions, and invite further discussion on whether there could or should be additional classes specifically included in order to support searching APL source code.

Downloads

WMV MP4