Highlighting like a pro! A guide to regex

Preface

Pokemon Showdown has many very powerful commands that are meant to help with your chat experience and highlights are no exception. What many users don't know is that Morfent has done some work adding the ability to use Regular Expressions (commonly refereed to as Regex) with the command. This thread is meant to be a tutorial to teach users with no knowledge of regex or programming how they can use this tool to both remove clutter from their highlights list and potentially even make it more functional.

/highlight commands

First, for those of you who have never used highlights before, let me address what the commands are so you know how to actually implement the patterns taught in this tutorial.
  • /highlight add, [word] - Add the word [word] to the highlight list.
  • /highlight list - List all words that currently highlight you.
  • /highlight delete, [word] - Delete the word [word] from the highlight list.
  • /highlight delete - Clear the highlight list.
This is a verbatim copy of the help that is given by the client when you use /highlight, but what it fails to mention is that the words are also regex patterns.

Additionally, your own name is highlighted by default, and you may or may not want to have this feature turned on. It can be adjusted in the settings menu (gear icon).

How Highlights Work

Without getting too technical I'd like to cover what the command actually does. Each time you receive a chat message it searches for cases of the first letter of the word in the chat message; if it finds the letter, the next letter is checked against the next letter of your given word and so on until you've gone through the whole word. If every single letter matches then the entire word is contained within the chat message and the message get's highlighted. It is important to understand this, because with regex we will discus ways in which not every single letter has to match.

Now to actually dive into the regex: When you use " /highlight add, (test) " Showdown automatically adds a little bit of extra code for you in the form of " /(test)/i ". the two '/' characters designate the regex pattern, and the 'i' on the end means the pattern is case-insensitive. Showdown will automatically set your highlights to lowercase, so case sensitivity never matters with them. If you capitalize a word it will not change the functionality of the highlight, but for this post you will see me write the examples in all lowercase as that is how they will be read anyways. Though the '//i' is what makes the pattern regex, again it is added automatically and using something like " /highlight add, /(test)/i " will cause unintentional behavior--the pattern itself is the only thing you need to add.

Optional Cases (?)

The easiest part of regex to understand is also one of the most powerful. If a case is labeled as optional, it doesn't need to match for the highlight to happen; however, if it is there it will also highlight. You can specify a single optional character by following it with a question mark (?). For the sake of understanding, if you wanted to highlight both "color" and "colour" you could highlight the pattern " colou?r ". The u will now be an optional letter which will allow both spellings of the word to highlight.
To give an example of how this can be used with usernames, say your username is "SteelEdges" but sometimes people call you "Steel Edges"; simply highlighting " steel ?edges " will now allow your name to highlight with or without the space.

Single letters are nice, but what if you want to do a whole word? The syntax becomes a little different, but it's still the same general concept. You can indicate a group of letters by putting them in parenthesis like I did with " (test) ", so all we have to do is follow the group with a question mark. if EpicNikolai wanted "Epic" to be optional it could be achieved by highlighting " (epic)?nikolai ". Though this highlight is functional, it is not optimal and can make more complex highlights harder to understand. It is recommended to also include the prefix for the optional group, which looks like this: " (?:epic)?nikolai ". The added prefix makes it much easier to understand the group is meant to be optional and from a computational standpoint actually runs faster.

There is one other way to create an optional case which gives some insight into how optional case actually works. The syntax looks like this: " d(rago){0,1}nite ". Instead of a question mark we see "{0,1}" following the group. What this means is the group will be checked for matching 0 or 1 times. If "rago" doesn't appear it matches, and if "rago" appears once it matches as well. This is a rather unclear way of writing it for a simple optional case, but it most directly shows the logical connection to what this pattern is doing.

Multiple Possibilities (|)

This next section is for when you have multiple highlights that would be very similar with maybe only a letter or word difference. For example "snuggle" and "huggle" are very similar words with almost identical spelling; rather than highlighting both of them the pattern " (sn|h)uggle " can be used. This group has a '|' character in it, which is functioning as an OR operator--This means the value of group (what's in the parentheses) can be either "sn" or "h" which gives us the equivalent of two different highlights.

There is no limit on the number of possibilities that could be string together, each one simply needs to be separated with '|'. This can drastically save space in your highlights list if you have many alts with the same general theme like XTheElegantShadowX. All his alts can be highlighted simple with " xthe(elegant|sleeping|afk|laddering)shadowx "

When many possible options exist adding them all manually would be impractical, but with regex we can check 6 chases at once with " (some|any)(body|one|1) help ".

NOT Gates (?!)

Up until now, everything has been possible by simply having multiple highlights, but NOT gates add functionality that is not possible without regex. The concept of a NOT gate is if the group matches the chat message it will not highlight the line, as the name might imply. This can be extremely useful in certain cases. For example, some users refer to me as "Solar", however that is a word that shows up rather frequently in Pokemon, meaning I would be highlighted in some cases where I don't want to be. To get around this I use the highlight " solar ?(?!power|beam|flare) ". A group with the '?!' prefix indicates a NOT gate, and as we learned in the last section, a group can be made to have multiple cases with an OR gate. What this highlight does is highlights chat messages that include "solar" as long as the following letters aren't "power", "beam", or "flare". Now almost every time I am highlighted by the word "solar" it is somebody trying to get my attention, not other users talking about Pokemon concepts.

Times Matched (* and +)

We're all (unfortunately) familiar with stretching, right? If not, stretching just means to put extra letters into words to supposedly make the person mentally read it for longer (like thiiiissssssssss). Regex gives us a way to account for this. A letter followed by a '+' character will match if it appears 1 or more times. This makes it very easy to account for stretching if it is suspected. " thi+s+ " will highlight "this", "thisss", and "thiiisssss".

The character '*' holds a similar function to '+', except it checks for a match 0 or more times. This can be used to do some very powerful things very simply. For example every chat message that contains italics can be highlighted with " __.*__ ". I'll discus this in greater detail later, but '.' is a special character that can be anything, so this highlight checks for any messages that has two sets of underscores with any amount of text between them; or '__' then '0 or more of any character' then '__'. (this is a bit more complicated to do with bold, which I will address later).

Finally for other quantities matched curly brace notation can be used. We did this before in the optional section and it's more uncommon to be needed, so I won't go into too much detail here. If you follow a character or group with a set of curly braces, it will match if it appears between the left number and the right number of times. So " (fox){2,4} " will highlight "foxfox", "foxfoxfox", and "foxfoxfoxfox" but not "fox" or "foxfoxfoxfoxfox".

Special Characters and Escape Sequence

The most important of the special characters is '.' as it can be any character (for the purposes of what you can do with highlights); this means it can be a letter, number, space, non-alphanumeric character, and basically anything you can type in chat. There are a few others that are worth mention though. '\s' is the same as a regular space, however some find it easier to tell when they are trying to check for spaces when using \s. If you want to highlight a non-ascii character (something from the unicode table), you'll need to use '\uxxxx' where the x's are the number of the unicode character.

Now, for when you want to highlight a character that would actually be used as code in regex ( . $ ^ { } [ ] ( | ) * + ? / \ ) you need to precede these characters with a backslash '\'. This is called an "escape sequence", and while it can be messy looking it lets you do things you would not be able to otherwise. For example, if you are in the RP room and you want out of character text to highlight (for some weird reason) you might think to try " ((.+)) ". However, because parenthesis just indicate a group this will not work and will instead highlight every single line of chat. Instead you want to use the parentheses themselves as characters and need to use " \(\(.+\)\) ". While this is hard to read, this would not be possible otherwise, and can be used for perhaps more productive ways, like highlighting lines of chat with bold in them.

Square Bracket Notation

If you're still reading up to this point and are still with me I'm impressed! With just the knowledge above you can create basically any highlight you would ever need, but I'll cover this one last section just for those of you who are finding this interesting.

Square brackets are used to find a range of characters and can be used to specify many possibilities at once. By default, taking something like " [aeiou] " is the same as " (a|e|i|o|u) ". Now, by adding '^' as the first character it acts like a NOT gate, so " [^aeiou] " will highlight everything that isn't one of the letters in the square brackets.

Ranges can be used as well which allows for even more flexibility. " [0-9]+ " for example highlights for any number, and " [^a-z0-9]{5,} " can be used to highlight emoticons as it checks for 5 or more non-alphanumeric characters.

More Examples

For my name I use the highlight:
(so+l(?:a(r|w))?) ?(?!power|beam|flare)(?:is(?:f(o|au)x)?)?
It utilizes nearly every concept addressed, but it is literally the only highlight I need in regards to my username.

using '.+' to check for phrases:
(some|any)(body|one|1).+(mak(e|ing)|draw(?:ing)?).+m(e|y)
This highlight checks whole sentences to try to figure out when somebody is making a request in the art room.

Mofent's name highlights:
m(?:[eo][rn]f(?!erno)|urr?f)[a-z]*
[a-z]*f(?:en|ne)t
.
The former one matches all of the following followed by any number of letters, and excludes monferno: morf, merf, monf, menf, murf, anf murrf. The latter matches fent or fnet preceded by any number of letters.followed by fent or fnet.

I could use some more here. If you have a cool regex highlight share it in this thread so I can add it as an example.

Troubleshooting

If you're having everything highlight check your highlights list for a trailing comma, you might be highlighting a space on accident. If that is not the case and you are using regex, check to make sure you aren't highlighting something like '.*' which is every line, or something that is entirely optional parameters, like " (?:hello)? (?:world)? "

If you're having more specific problems, feel free to post it in this thread for help.
 
Last edited:

Morfent

formerly known as clifford the big red pawg
is a Programmer Alumnusis a Battle Simulator Moderator Alumnus
What many users don't know is that Morfent has done some work adding the ability to use Regular Expressions (commonly refereed to as Regex) with the command.
Technically highlight has always used regular expressions to match patterns in chat, but adding them was sometimes problematic. What I've done is making it possible to actually use {n,m} quantifiers without the command splitting the highlight in half, and preventing regexes like (?>a***) from being added and breaking your entire list out of a syntax error

For another example, I match every variation and horrific fuckup of my name with two highlights: m(?:[eo][rn]f(?!erno)|urr?f)[a-z]* and [a-z]*f(?:en|ne)t. The former one matches all of the following followed by any number of letters, and excludes monferno: morf, merf, monf, menf, murf, anf murrf. The latter matches fent or fnet preceded by any number of letters.followed by fent or fnet.
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top