Regular Expressions |
The User Interface > Preview/Edit Window > Text/Edit Tab > Editor Toolbar... > Text Cleanup > Regular Expressions The Text Cleanup function can work with Regular Expressions, known as RegEx. ClipMate uses a RegEx library, written by Andrey V. Sorokin. There is a lot of information at his website, which describes RegEx in detail. This page is not meant to be a full reference for RegEx, but will show some simple but useful examples. Introduction Regular Expressions are a widely-used method of specifying patterns of text to search for. Special metacharacters allow You to specify, for instance, that a particular string You are looking for occurs at the beginning or end of a line, or contains n recurrences of a certain character. Metacharacters Metacharacters are special characters which are the essence of Regular Expressions. There are different types of metacharacters, described below. Metacharacters - line separators ^ start of line $ end of line \A start of text \Z end of text . any character in line Examples: ^foobar matches string 'foobar' only if it's at the beginning of line foobar$ matches string 'foobar' only if it's at the end of line ^foobar$ matches string 'foobar' only if it's the only string in line foob.r matches strings like 'foobar', 'foobbr', 'foob1r' and so on Metacharacters - predefined classes \w an alphanumeric character (including "_") \W a nonalphanumeric \d a numeric character \D a non-numeric \s any space (same as [ \t\n\r\f]) \S a non space You may use \w, \d and \s within custom character classes. Examples: foob\dr matchs strings like 'foob1r', ''foob6r' and so on but not 'foobar', 'foobbr' and so on foob[\w\s]r matchs strings like 'foobar', 'foob r', 'foobbr' and so on but not 'foob1r', 'foob=r' and so on Metacharacters - word boundaries \b Match a word boundary \B Match a non-(word boundary) A word boundary (\b) is a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a \W. Metacharacters - iterators Any item of a regular expression may be followed by another type of metacharacters - iterators. Using this metacharacters You can specify number of occurences of previous character, metacharacter or subexpression. * zero or more ("greedy"), similar to {0,} + one or more ("greedy"), similar to {1,} ? zero or one ("greedy"), similar to {0,1} {n} exactly n times ("greedy") {n,} at least n times ("greedy") {n,m} at least n but not more than m times ("greedy") *? zero or more ("non-greedy"), similar to {0,}? +? one or more ("non-greedy"), similar to {1,}? ?? zero or one ("non-greedy"), similar to {0,1}? {n}? exactly n times ("non-greedy") {n,}? at least n times ("non-greedy") {n,m}? at least n but not more than m times ("non-greedy") So, digits in curly brackets of the form {n,m}, specify the minimum number of times to match the item n and the maximum m. The form {n} is equivalent to {n,n} and matches exactly n times. The form {n,} matches n or more times. There is no limit to the size of n or m, but large numbers will chew up more memory and slow down r.e. execution. If a curly bracket occurs in any other context, it is treated as a regular character. Examples: foob.*r matchs strings like 'foobar', 'foobalkjdflkj9r' and 'foobr' foob.+r matchs strings like 'foobar', 'foobalkjdflkj9r' but not 'foobr' foob.?r matchs strings like 'foobar', 'foobbr' and 'foobr' but not 'foobalkj9r' fooba{2}r matchs the string 'foobaar' fooba{2,}r matchs strings like 'foobaar', 'foobaaar', 'foobaaaar' etc. fooba{2,3}r matchs strings like 'foobaar', or 'foobaaar' but not 'foobaaaar' A little explanation about "greediness". "Greedy" takes as many as possible, "non-greedy" takes as few as possible. For example, 'b+' and 'b*' applied to string 'abbbbc' return 'bbbb', 'b+?' returns 'b', 'b*?' returns empty string, 'b{2,3}?' returns 'bb', 'b{2,3}' returns 'bbb'. Hexidecimal You can use hexidecimal codes to replace any characters with any other characters. For example, to replace all tabs (x09) with a carriage-return and linebreak (x0D x0A), use this: Find: \x09 Replace: \x0D\x0A More Examples: hello matchs string 'hello' \^FooBarPtr matchs '^FooBarPtr' examples: ^HELLO matchs string 'HELLO' at the beginning of line. GOODBYE$ matchs string 'GOODBYE' at the end of line. ^HELLO$ matchs string 'HELLO' if it's the only string in the line. H.+O matches strings like 'HELLO', 'HI HO', FOOB.R matchs strings like 'FOOBAR', 'FOOBBR', 'FOOB1R', etc. =$ Any line ENDING with the '=' sign. IMAGES\.NAME - Here I'm trying to exclude "IMAGES.NAME", but need to ESCape the period.
|