Difference between revisions of "Regex regular expressions"
(5 intermediate revisions by the same user not shown) | |||
Line 4: | Line 4: | ||
/[abc]+/ -matches a bb ccc | /[abc]+/ -matches a bb ccc | ||
= PCRE Perl Compatible Regular Expressions = | = [http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_04_02.html <code>grep</code> and regular expressions] = | ||
<code>grep</code> understands three different versions of regular expression syntax: | |||
* basic (BRE), | |||
* extended (ERE), <code>-E, --extended-regexp</code> | |||
* perl (PCRE), <code>-P, --perl-regexp</code> | |||
In GNU grep there is no difference in available functionality between basic and extended syntaxes. In other implementations, basic regular expressions are less powerful. The following description applies to extended regular expressions; differences for basic regular expressions are summarized afterwards. Perl-compatible regular expressions give additional functionality, and are documented in pcresyntax(3) and pcrepattern(3), but work only if PCRE is available in the system. The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any meta-character with special meaning may be quoted by preceding it with a backslash. The period <code>.</code> matches any single character. | |||
;Basic vs Extended Regular Expressions | |||
In basic regular expressions the meta-characters <code>?, +, {, |, (, and )</code> lose their special meaning; instead use the backslashed versions <code>\?, \+, \{, \|, \(, and \)</code>. | |||
There is a number of special meta characters that need to be escaped but not all: | |||
<source lang=bash> | |||
# Characters in grep to escape | |||
# \(\) - braces | |||
# Chars that don't need escaping | |||
# [] | |||
# | | |||
</source> | |||
Examples: | |||
<source lang=bash> | |||
# Find 2+ consecutive digits, even if separated by a space | |||
echo "1234 5678 9101 1234" | grep '\([0-9]\) *\1' # basic BRE, meta characters need escaping | |||
echo "1234 5678 9101 1234" | grep -E '([0-9]) *\1' # extended ERE, brackets are not escaped | |||
1234 5678 9101 1234 | |||
# ^ ^ # <- matched pattern: | |||
</source> | |||
;[https://www.regular-expressions.info/wordboundaries.html Word boundaries] | |||
The GNU extensions to POSIX regular expressions add support for the <code>\b</code> and <code>\B</code> word boundaries, as described above. GNU also uses its own syntax for start-of-word and end-of-word boundaries. <code>\<</code> matches at the start of a word, like Tcl's <code>\m</code>. <code>\></code> matches at the end of a word, like Tcl's <code>\M</code>. | |||
<source lang=bash> | |||
echo 'To eat this worlds due, by the grave and thee.' | grep '\bthe\b' | |||
echo 'To eat this worlds due, by the grave and thee.' | grep '\<the\>' | |||
To eat this worlds due, by the grave and thee. | |||
^^^ | |||
</source> | |||
= Regular Expression Matching ([http://molk.ch/tips/gnu/bash/rematch.html REMATCH]) = | |||
Regular Expression Matching is performed when using command like below, if the expression matches the string, the matched part of the string is stored in the <code>BASH_REMATCH</code> array. | |||
<source lang=bash> | |||
[[ string =~ regexp ]] | |||
[[ "abcde" =~ b.d ]]; echo ${BASH_REMATCH[@]} # -> bcd | |||
[[ "/home/peter/index.html" =~ ^(.*)/(.*)\.(.*)$ ]]; | |||
echo ${BASH_REMATCH[@]} # -> /home/peter/index.html /home/peter index html | |||
# Now: | |||
# BASH_REMATCH[0]=/home/peter/index.html | |||
# BASH_REMATCH[1]=/home/peter | |||
# BASH_REMATCH[2]=index | |||
# BASH_REMATCH[3]=html | |||
</source> | |||
= PCRE Perl Compatible Regular Expressions = | |||
This also covers '''Notepad ++''' | This also covers '''Notepad ++''' | ||
regex like [\r\n]+ matches CRLF in Notepad ++ | regex like [\r\n]+ matches CRLF in Notepad ++ | ||
Line 15: | Line 74: | ||
</source> | </source> | ||
;Join 2 lines with colon | |||
<source> | <source> | ||
Find what: ^(.*)[\r\n]+ | Find what: ^(.*)[\r\n]+ | ||
Replace with: '\1: ' #single quotes are not needed | Replace with: '\1: ' #single quotes are not needed | ||
</source> | |||
Before: | |||
{| class="wikitable" | |||
|+ Join 2 lines with colon | |||
|- | |||
! Before | |||
! After | |||
|- style="vertical-align:top;" | |||
| <source> | |||
Extensions | Extensions | ||
None | None | ||
Line 28: | Line 96: | ||
Name | Name | ||
bastion-1 (All resources to be created) | bastion-1 (All resources to be created) | ||
</source> | |||
| <source> | |||
Extensions: None | Extensions: None | ||
Cloud init: No | Cloud init: No | ||
Line 35: | Line 103: | ||
bastion-1 (All resources to be created) | bastion-1 (All resources to be created) | ||
</source> | </source> | ||
|} | |||
; Resources | |||
*[https://stackoverflow.com/questions/5876296/regex-remove-lines-containing regex-remove-lines-containing] Stackoverflow | *[https://stackoverflow.com/questions/5876296/regex-remove-lines-containing regex-remove-lines-containing] Stackoverflow | ||
= Websense | = Forepoint aka legacy name Websense = | ||
It accepts reg expresions with limited form example: | It accepts reg expresions with limited form example: | ||
*<tt>(.)</tt> -same as * wildcard matches any | *<tt>(.)</tt> -same as * wildcard matches any length of string and brackets are Reg-ex delimiters | ||
= References = | = References = | ||
*[https://regex101.com/ regex101.com] Dynamic RegEx builder | *[https://regex101.com/ regex101.com] Dynamic RegEx builder |
Latest revision as of 19:58, 14 May 2020
General expressions
+ -unlimited string \ -escape/protect character /[abc]+/ -matches a bb ccc
grep
and regular expressions
grep
understands three different versions of regular expression syntax:
- basic (BRE),
- extended (ERE),
-E, --extended-regexp
- perl (PCRE),
-P, --perl-regexp
In GNU grep there is no difference in available functionality between basic and extended syntaxes. In other implementations, basic regular expressions are less powerful. The following description applies to extended regular expressions; differences for basic regular expressions are summarized afterwards. Perl-compatible regular expressions give additional functionality, and are documented in pcresyntax(3) and pcrepattern(3), but work only if PCRE is available in the system. The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any meta-character with special meaning may be quoted by preceding it with a backslash. The period .
matches any single character.
- Basic vs Extended Regular Expressions
In basic regular expressions the meta-characters ?, +, {, |, (, and )
lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \)
.
There is a number of special meta characters that need to be escaped but not all:
# Characters in grep to escape # \(\) - braces # Chars that don't need escaping # [] # |
Examples:
# Find 2+ consecutive digits, even if separated by a space echo "1234 5678 9101 1234" | grep '\([0-9]\) *\1' # basic BRE, meta characters need escaping echo "1234 5678 9101 1234" | grep -E '([0-9]) *\1' # extended ERE, brackets are not escaped 1234 5678 9101 1234 # ^ ^ # <- matched pattern:
The GNU extensions to POSIX regular expressions add support for the \b
and \B
word boundaries, as described above. GNU also uses its own syntax for start-of-word and end-of-word boundaries. \<
matches at the start of a word, like Tcl's \m
. \>
matches at the end of a word, like Tcl's \M
.
echo 'To eat this worlds due, by the grave and thee.' | grep '\bthe\b' echo 'To eat this worlds due, by the grave and thee.' | grep '\<the\>' To eat this worlds due, by the grave and thee. ^^^
Regular Expression Matching (REMATCH)
Regular Expression Matching is performed when using command like below, if the expression matches the string, the matched part of the string is stored in the BASH_REMATCH
array.
[[ string =~ regexp ]] [[ "abcde" =~ b.d ]]; echo ${BASH_REMATCH[@]} # -> bcd [[ "/home/peter/index.html" =~ ^(.*)/(.*)\.(.*)$ ]]; echo ${BASH_REMATCH[@]} # -> /home/peter/index.html /home/peter index html # Now: # BASH_REMATCH[0]=/home/peter/index.html # BASH_REMATCH[1]=/home/peter # BASH_REMATCH[2]=index # BASH_REMATCH[3]=html
PCRE Perl Compatible Regular Expressions
This also covers Notepad ++
regex like [\r\n]+ matches CRLF in Notepad ++
Notetad++ | npp | npp++
Find, replace and remove matching line. Find any occurances of help</help>, ending with
\r?\n
. \r
is optional in case the file doesn't have Windows line endings.
Find what: ^help\r?\n
Replace with:
- Join 2 lines with colon
Find what: ^(.*)[\r\n]+
Replace with: '\1: ' #single quotes are not needed
Join 2 lines with colon
Before
After
Extensions
None
Cloud init
No
TAGS
Name
bastion-1 (All resources to be created)
Extensions: None
Cloud init: No
TAGS: Name
bastion-1 (All resources to be created)
- Resources
- regex-remove-lines-containing Stackoverflow
Forepoint aka legacy name Websense
It accepts reg expresions with limited form example:
- (.) -same as * wildcard matches any length of string and brackets are Reg-ex delimiters
References
- regex101.com Dynamic RegEx builder