Regular expressions are commonly known as regex. These are nothing more than a pattern or a sequence of characters, which describe a special search pattern as text string.
Regular expression allows you to search a specific string inside another string. Even we can replace one string by another string and also split a string into multiple chunks. They use arithmetic operators (+, -, ^) to create complex expressions.
By default, regular expressions are case sensitive.
Regular expression is used almost everywhere in current application programming. Below some advantages and uses of regular expressions are given:
You can create complex search patterns by applying some basic rules of regular expressions. Many arithmetic operators (+, -, ^) are also used by regular expressions to create complex patterns.
| Operator | Description |
|---|---|
| ^ | It indicates the start of string. |
| $ | It indicates the end of the string. |
| . | It donates any single character. |
| () | It shows a group of expressions. |
| [] | It finds a range of characters, e.g., [abc] means a, b, or c. |
| [^] | It finds the characters which are not in range, e.g., [^xyz] means NOT x, y, or z. |
| - | It finds the range between the elements, e.g., [a-z] means a through z. |
| | | It is a logical OR operator, which is used between the elements. E.g., a|b, which means either a OR b. |
| ? | It indicates zero or one of preceding character or element range. |
| * | It indicates zero or more of preceding character or element range. |
| + | It indicates zero or more of preceding character or element range. |
| {n} | It denotes at least n times of preceding character range. For example - n{3} |
| {n, } | It denotes at least n, but it should not be more than m times, e.g., n{2,5} means 2 to 5 of n. |
| {n, m} | It indicates at least n, but it should not be more than m times. For example - n{3,6} means 3 to 6 of n. |
| \ | It denotes the escape character. |
| Special Character | Description |
|---|---|
| \n | It indicates a new line. |
| \r | It indicates a carriage return. |
| \t | It represents a tab. |
| \v | It represents a vertical tab. |
| \f | It represents a form feed. |
| \xxx | It represents an octal character. |
| \xxh | It denotes hexadecimal character hh. |
PHP offers two sets of regular expression functions:
The structure of POSIX regular expression is similar to the typical arithmetic expression: several operators/elements are combined together to form more complex expressions.
The simplest regular expression is one that matches a single character inside the string. For example - "g" inside the toggle or cage string. Let's introduce some concepts being used in POSIX regular expression:
Brackets [] have a special meaning when they are used in regular expressions. These are used to find the range of characters inside it.
| Expression | Description |
|---|---|
| [0-9] | It matches any decimal digit 0 to 9. |
| [a-z] | It matches any lowercase character from a to z. |
| [A-Z] | It matches any uppercase character from A to Z. |
| [a-Z] | It matches any character from lowercase a to uppercase Z. |
The above ranges are commonly used. You can use the range values according to your need, like [0-6] to match any decimal digit from 0 to 6.
A special character can represent the position of bracketed character sequences and single characters. Every special character has a specific meaning. The given symbols +, *, ?, $, and {int range} flags all follow a character sequence.
| Expression | Description |
|---|---|
| p+ | It matches any string that contains atleast one p. |
| p* | It matches any string that contains one or more p's. |
| p? | It matches any string that has zero or one p's. |
| p{N} | It matches any string that has a sequence of N p's. |
| p{2,3} | It matches any string that has a sequence of two or three p's. |
| p{2, } | It matches any string that contains atleast two p's. |
| p$ | It matches any string that contains p at the end of it. |
| ^p | It matches any string that has p at the start of it. |
PHP provides seven functions to search strings using POSIX-style regular expression -
| Function | Description |
|---|---|
| ereg() | It searches a string pattern inside another string and returns true if the pattern matches otherwise return false. |
| ereg_replace() | It searches a string pattern inside the other string and replaces the matching text with the replacement string. |
| eregi() | It searches for a pattern inside the other string and returns the length of matched string if found otherwise returns false. It is a case insensitive function. |
| eregi_replace() | This function works same as ereg_replace() function. The only difference is that the search for pattern of this function is case insensitive. |
| split() | The split() function divide the string into array. |
| spliti() | It is similar to split() function as it also divides a string into array by regular expression. |
| Sql_regcase() | It creates a regular expression for case insensitive match and returns a valid regular expression that will match string. |
Perl-style regular expressions are much similar to POSIX. The POSIX syntax can be used with Perl-style regular expression function interchangeably. The quantifiers introduced in POSIX section can also be used in PERL style regular expression.
A metacharacter is an alphabetical character followed by a backslash that gives a special meaning to the combination.
For example - '\d' metacharacter can be used search large money sums: /([\d]+)000/. Here /d will search the string of numerical character.
Below is the list of metacharacters that can be used in PERL Style Regular Expressions -
| Character | Description |
|---|---|
| . | Matches a single character |
| \s | It matches a whitespace character like space, newline, tab. |
| \S | Non-whitespace character |
| \d | It matches any digit from 0 to 9. |
| \D | Matches a non-digit character. |
| \w | Matches for a word character such as - a-z, A-Z, 0-9, _ |
| \W | Matches a non-word character. |
| [aeiou] | It matches any single character in the given set. |
| [^aeiou] | It matches any single character except the given set. |
| (foo|baz|bar) | Matches any of the alternatives specified. |
There are several modifiers available, which makes the work much easier with a regular expression. For example - case-sensitivity or searching in multiple lines, etc.
Below is the list of modifiers used in PERL Style Regular Expressions -
| Character | Description |
|---|---|
| i | Makes case insensitive search |
| m | It specifies that if a string has a carriage return or newline characters, the $ and ^ operator will match against a newline boundary rather than a string boundary. |
| o | Evaluates the expression only once |
| s | It allows the use of .(dot) to match a newline character |
| x | This modifier allows us to use whitespace in expression for clarity. |
| g | It globally searches all matches. |
| cg | It allows the search to continue even after the global match fails. |
PHP currently provides seven functions to search strings using POSIX-style regular expression -
| Function | Description | |
|---|---|---|
| preg_match() | This function searches the pattern inside the string and returns true if the pattern exists otherwise returns false. | |
| preg_match_all() | This function matches all the occurrences of pattern in the string. | |
| preg_replace() | The preg_replace() function is similar to the ereg_replace() function, except that the regular expressions can be used in search and replace. | |
| preg_split() | This function exactly works like split() function except the condition is that it accepts regular expression as an input parameter for pattern. Mainly it divides the string by a regular expression. | |
| preg_grep() | The preg_grep() function finds all the elements of input_array and returns the array elements matched with regexp (relational expression) pattern. | |
| preg_quote() | Quote the regular expression characters. |
