Skip to content

Evaluating Lisp Forms in Regular Expressions

Jan 25 13
by mickey

One of the oft-repeated complaints about Emacs is its antiquated regular expression engine: that it cannot compete with Vims’; that it’s GNU-style regex and not PCRE; and that you have to quote everything in triplicates and write your regular expression on carbon copied paper or something. There is some truth to this, but its detractors overlook the features it adds that you won’t find in most other editors or regexp implementations.

I recently did a VimGolf challenge where I abused sort-regexp-fields so I could swap two different words. I decided to do it in the most obtuse way possible so I could demonstrate the flexibility of Emacs’s sort commands, and solve what would be a banal challenge in any editor — swapping exactly two elements, once, in a two-line file — using an “eclectic” feature.

But there’s a better way: a “scalable” way that works with an arbitrary number of elements to swap, and it uses.. regular expressions!

Evaluating Forms

Due to the pervasive nature of Elisp in Emacs, you can invoke elisp from within a call to replace-regexp. That is to say, in the Replace with: portion of the call, you can tell Emacs to evaluate a form and return its output in lieu of (or in combination with) normal strings. The syntax is \,(FORM).

Let’s say you have the following file:

And you want to give everybody a raise; well, you don’t reaally want to give Lumbergh a raise, but ho hum, right?

The simplest method is to use [query-]replace-regexp with the following parameters.

Here we search for the salary; we store the dollar amount in a capturing group.

And here we replace the salary with the output from an elisp form that multiplies the result from capturing group \1 — but in this case represented as \#1, as we want Emacs to convert the result to an integer first — by two.

And now the output looks like:

Neat.

Conditional Replace

Let’s refactor some random Java code into CamelCase:

Replace underscore and all lowercase characters, which we capture for use later.

Here I use an elisp form to capitalize each match from capturing group \1. The underscore is removed also, so create_os_specific_factory becomes createOsSpecificFactory, and so on.

And now it looks like this:

Unfortunately, the word Os should probably be written like this, OS. Let’s change the form so that certain words are treated differently.

Same as before.

This time I’ve use a cond form — basically a case statement — to check that if the capturing group is a member of the list ("os"), we call the function upcase which, you guessed it, uppercases \1; if that condition fails, fall through to the second clause which, because its conditional is t, always returns true and, therefore, we call capitalize on \1. I could’ve used an if statement and saved a bit of typing, but cond is more flexible if you want more than one or two conditionals.

The end result is that we now uppercase OS and leave the rest capitalized.

Swapping Elements

Going back to what I said in the beginning about swapping text. It is perfectly possible, as I’m sure you can imagine now, to swap two items — you don’t even need a cond element for that!

Consider this trivial Python function.

Let’s say we want to swap x and y. That is, of course, a no-op, but there’s nothing stopping you from extending the example here and using it for something more meaningful, such as reversing > and <.

This is the most simple way of swapping two values. You only need one capturing group, because we only need to compare against one capturing group, hence why y isn’t captured.

Next, we test for existence; if \1 is not nil — that is to say, we actually have something IN the capturing group, which would only happen if we encounter x — we fall through to the THEN clause ("y") in the if form; if it is nil, we fall through to the ELSE clause ("x").

The result is what you would expect:

The variables are swapped. But let’s say you want to swap more than two elements: you’d need to nest if statements (ugly) or use cond (less ugly, but equally verbose.)

More Cond Magic

Consider the string 1 2 3. To turn it into One Two Three you have two ways of doing it:

One, you can use N capturing groups like so: \(1\)\|...\|\(N\) and in a cond check for the existence of each capturing group to determine if it is the “matched” one. There’s no reason why you couldn’t use just one capturing group and then string match for each item, but it’s swings and roundabouts: you’re either grouping each match you care about in the search bit, or you’re checking for the existence of the elements you care about in a form in the replace bit.

Let’s go with the first option.

Look for, and capture, the three integers.

Using cond, if any of the three capturing groups are non-nil, the body of that conditional is returned.

Not unsurprisingly, the result is One Two Three.

If you change the capturing groups to this \(1\|2\)\|3 and if you then change the replacement string to \,(cond (\1 "One") (\2 "Two")), you end up with One One Two. So as you can see, it’s very easy to create rules that merge multiple different strings into one using cond.

More ideas?

I think I’ve amply demonstrated the power of mixing regular expressions and lisp. I know a few more tricks, but I’m always keen to find more, so if you have a novel application I’d love to hear about it.

There’s nothing stopping you from invoking elisp functions that don’t return anything useful; that is, you’re free to call functions just for their side effects, if that is what you want. For instance, if you want to open a list of files in a buffer, use \,(find-file-noselect \1). Another useful application is formatting strings using format. Check out the help file (C-h f format RET) for more information. And lastly, you can use Lisp lists to gather matches for use programmatically later: M-: (setq my-storage-variable '()) then \,(add-to-list 'my-storage-variable \1).

Know any useful regexp hacks? Post a comment!

Fun with Vimgolf 4: Transpositioning text with Tables

Jan 17 13
by mickey

Here’s another vim challenge, and one you might actually encounter frequently in real life.

Transpose the original lines in separate columns, one for each line.

Link to challenge.

Simple, really; a transposition here, some alignment there… but can we do better than the good ole’ brute force approach the Vim guys will invariably use? Can we do it without a call to a shell commands like column and paste? (Psst.. yes we can! It’s Emacs!)

Here’s the original data:

… Standard comma-, and newline-delimited data, and we must turn it into this:

Looking into the challenge, I was already thinking “tables” — Emacs tables. Transposition is a common operation in tables (spreadsheets) and mathematics, and Emacs can do both very well indeed.

So here’s what I did, utilizing our old friend org-mode; or rather, one of its subsidiary libraries. You don’t have to be in org-mode for this trick to work. It echoes an earlier VimGolf challenge I did where I used its hierarchical “sort” function to sort an address book: Fun with Vimgolf 1: Alphabetize the Directory.

Command Description
C-x h Mark the whole buffer
M-x org-table-convert-region Converts a region into an org table. The defaults use comma and new line separators by default. Bonus.
M-x org-table-transpose-table-at-point As the name implies this will transpose our table as required by the Vimgolf challenge.
C-x C-x Exchange point and mark.
M-x replace-regexp Replace: ^| \\||
With: RET

Replace the pipe at the beginning of the line and a whitespace, or any pipe.

M-x delete-trailing-whitespace Delete trailing whitespace. Could also be done with a more sophisticated regexp in the penultimate step.

Done right, and it should look like this:

Six “keystrokes” (for an arbitrarily large definition of “keystroke”…), but done the Emacs way. The best score on VimGolf is currently 31 (characters). Very impressive, but would it work on a 20×20 or with variable-length rows? “Think Abstract,” the developer cried.

The result above should be identical to that of the Vimgolf output, but done without the hyper-specialized and brittle solutions (most?) of the VimGolfers employ.

I didn’t tweak the cell width (whitespacing between each word) for each column; that it aligns perfectly with VimGolf’s output is dumb luck. Maybe they generated the resultant output in Emacs? :)

Nevertheless, the solution is “typical Emacs” and would scale well to very large datasets, and you don’t have to worry about things like unusually long cells; uneven number of rows and columns; etc.

Fun with Vimgolf 3: Swapping Words by Sorting

Jan 14 13
by mickey

Jon over at Irreal’s been busy with VimGolf challenges, and I figured I’d throw in my two pieces of eight.

The “challenge” is a simple. Take this text:

And turn it into

As you can see, a simple transposition between two words on a line is all that we need, and Jon’s come up with a solution that solves it in 8 keystrokes. Arguably something like his solution is what I would do were I to do it in real life.

I’m not sure if the Vim guys count typing out strings of text as one atomic operation or if they count each character; in Emacs, arguably, each character is in itself a command as each key stroke will invoke self-insert-command but it’s more fun to think of strings of text as a single keystroke, for simplicity’s sake, and to give ourselves a sporting chance against our Vim nemeses.

So here’s my solution. It only solves this particular challenge and nothing more. It relies on the good ole sort order — that C come before S.

Command Description
C-x h Mark whole buffer
M-x sort-regexp-fields Run sort-regexp-fields. See Sorting Text by Line, Field and Regexp in Emacs for more information
Regexp specifying records to sort: \\([a-z_]+\\)$ We want each record -- that's the part of the line we want Emacs to use for sorting -- to be the last word on each line
Regexp specifying key within record: \\1 The key -- that's the part inside the capturing group from above -- we want to sort by is the entire capturing group.

And we're done. So how does it work? Well, we rely on the side effect that the word CHALLENGE_FOLDER is less than, lexicographically, SOLUTIONS_FOLDER, because C comes before S.

It boils down to this: sort-regexp-fields is pure magic. As my article on the subject talks about at length, you can tell Emacs to only sort by parts of a line -- the part that matches the regular expression -- and using that match, you can then tell Emacs how you want to sort that data. We tell Emacs to sort by the last word on each line and leave the rest untouched. Simple :)

So how many keystrokes is that? Good question. I don't know: it depends on how you count it. Two if you count the commands only; four if you count the commands and the prompts; and many more if you count each character.

As always, these challenges are pointless (though fun!) but they do force you to think on your feet.