Skip to content

Working with Coding Systems and Unicode in Emacs

Aug 9 12
by mickey

Dealing with unicode in Emacs is a daily task for me. Unfortunately, I don’t have the luxury of sticking to just UTF-8 or iso-8859-1; my work involves a lot of fidgeting with a lot of coding systems local to particular regions, so I need a flexible editor that has the right defaults that will cover my most common use-cases. Unsurprisingly, Emacs is more than capable of fulfilling that role.

Emacs has facilities in place for changing the coding system for a variety of things, such as processes, buffers and files. You can also force Emacs to invoke a command with a certain coding system, a concept I will get to in a moment.

The most important change (for me, anyway) is to force Emacs to default to UTF-8. It’s practically a standard, at least in the West, as it is dominant on the Web; has a one-to-one mapping with ASCII; and is flexible enough to represent any unicode character, making it a world-readable format. But enough nattering about that. The biggest issue is convincing Emacs to treat files as UTF-8 by default, when no information in the file explicitly says it is.

I use the following code snippet to enforce UTF-8 as the default coding system for all files, comint processes and buffers. You’re free to replace utf-8 below with your own preferred coding system.

Once evaluated, Emacs will treat new files, buffers, processes, and so on as though they are UTF-8. Emacs will still use a different coding system if the file has a file-local variable like this -*- coding: euc-tw -*- near the top of the file. (See 48.2.4 Local Variables in Files in the Emacs manual.)

OK, so Emacs will default to UTF-8 for everything. That’s great, but not everything is in UTF-8; how do you deal with cases where it isn’t? How do you make an exception to the proverbial rule? Well, Emacs has got it covered. The command M-x universal-coding-system-argument, bound to the handy C-x RET c, takes as an argument the coding system you want to use, and a command to execute it with. That makes it possible to open files, shells or run Emacs commands as though you were using a different coding system. Very, very useful. This command is a must-have if you have to deal with stuff encoded in strange coding systems.

One problem with the universal coding system argument is that it only cares about Emacs’s settings, not those of your shell or system. That’s a problem, because tools like Python use the environment variable PYTHONIOENCODING to set the coding system for the Python interpreter.

I have written the following code that advises the universal-coding-system-argument function so it also, temporarily for just that command, sets a user-supplied list of environment variables to the coding system.

Insert the code into your emacs file and evaluate it, and now Emacs will also set the environment variables listed in universal-coding-system-env-list. One important thing to keep in mind is that Python and Emacs do not share a one-to-one correspondence of coding systems. There will probably be instances where obscure coding systems exist in one and not the other, or that the spelling or punctuation differ; the mapping of such names is left as an exercise to the reader.

Compiling and running scripts in Emacs

May 29 12
by mickey

I’ve talked about running shells and executing shell commands in Emacs before, but that’s mostly used for ad hoc commands or spelunking; what if you want to compile or run your scripts, code or unit tests?

There’s a command for that…

Not surprisingly, Emacs has its own compilation feature, complete with an error message parser that adds syntax highlighting and “go to next/prev error” so you can walk up and down a traceback in Python, or jump to a syntax error, warning or hint in GCC.

There are two ways of using the compilation feature in Emacs: the easiest is invoking M-x compile followed by the compile command you want Emacs to run. So, for Python, you can type M-x compile RET python myfile.py RET and a new *compilation* buffer will appear with the output of the command. Emacs will parse the output and look for certain patterns — stored in the variables compilation-error-regexp-alist[-alist] — and then highlight the output in the compilation buffer with “hyperlinks” that jump to the file and line (and column, if the tool outputs it) where the error occurred.

One annoying thing about compile is its insistence on wanting to save every. single. unsaved buffer before continuing. It’s there to keep you from accidentally compiling a mix of newly saved and old, stale files, which would lead to unexpected behavior, and the only reason it’s still there — and why it insists on saving files unrelated to what you are compiling — is the complete lack of a proper, integrated project management suite in Emacs. But I’ll save that rant for later.

Anyway, if it offends you as much as it offends me, you can add this to your emacs file to shut it up:

There is nothing stopping you from customizing compilation-save-buffers-predicate to take into account the files or directories so its save mechanism is cleverer, but I personally never bothered.

If compilation is successful — and like most things in the UNIX world, this is governed by the exit code — the modeline and compilation buffer will say so; likewise, if an error occurred, this is also displayed.

You can jump to the next or previous error ([next/previous]-error) with M-g M-n and M-g M-p — they’re bound to more keys, but I think those are the easiest ones to use. Similarly, in the compilation buffer itself (only), you can go to the next/previous file (compilation-[next/previous]-file) with M-g M-} and M-g M-{, respectively, and RET jumps to the location of the error point is on.

There is an undocumented convention in Emacs that commands like dired, grep, and compile can be rerun, reverted or redisplayed by typing g in the buffer.

By default the compilation buffer is just a dumb display and you cannot communicate with the background process. If you pass the universal argument (C-u) you can; the buffer is switched to comint mode, and you can converse with the process as though you were running it in the shell.

Python Debugging with Compile

I work mostly in Python, and quite often I have to debug the script I am writing with pdb, the Python debugger. I use breakpoints a lot, and I have a command bound to F5 that imports pdb and calls pdb.set_trace() — a standard Python debugging idiom. What it also does is highlight the line in bright red so I don’t miss it. And if I call compile from a Python buffer, it checks if there is a breakpoint present in the file: if there is, it switches to the interactive comint mode automagically; if there isn’t, it defaults to the “dumb” display mode. Having it switch automagically is perhaps a slightly pointless flourish, but screw it I’m using Emacs, not vim :)

This is the code I use. Feel free to adapt it for other languages. Send me an e-mail if you do something neat with it.

There’s a Minor Mode for that…

The compile workflow isn’t for everyone; some people want everything in one place: their shell. Well, good news then — you can have your cake and eat it. The minor mode M-x compilation-shell-minor-mode is designed for comint buffers of all sizes and works well with M-x shell and most modes that use comint. You get the same benefits offered by compile without altering your workflow. If you compile or run interpreted stuff in your Emacs shell you’ll feel like a modern-day Prometheus with this minor mode enabled!

Add this to your emacs file and the compilation minor mode will start when shell does.

Fun with Emacs Calc

Apr 25 12
by mickey

The Challenge

Jon over at Irreal’s Emacs blog posted an interesting solution to a challenge raised by Xah Lee:

How do you convert a string like this 37°26′36.42″N 06°15′14.28″W into a decimal answer like this 37.44345 -6.25396.

First off, I’m not sure Xah’s example answer is entirely correct; my understanding of latitude and longitude is limited to what I can google, and if I type the original degrees, minutes and seconds into this tool by the U.S. FCC it returns 37.44345 6.25396.

Anyway, on with the challenge. Jon’s solution is very interesting, but it got me thinking: surely Emacs has the facility in place to do this already? It’s Emacs, right? Right.

Fun with Emacs Calc

Emacs is equipped with a really, really awesome and spiffy RPN calculator (HP calculator fans, rejoice) capable of manifold things like algebraic and symbolic manipulation; matrix manipulation; conversion routines and much, much more. It’s truly wonderful but really complex, but it does come with a really nice Info manual (type C-h i g (calc) and check it out.) It’s a shame so few people know about its potential, as it’s basically a much simpler version of Mathematica, or even Wolfram Alpha (arguably you’ll have as much trouble telling Calc what you want as you would Wolfram Alpha…)

Anyway, I figured the Emacs calculator would have a facility in place for converting Deg-Min-Sec to decimal form, and sure enough, it does.

To try it out, type C-x * * and the calculator will open. Two windows will appear: the calculator mode and the trail containing a trail — a history — of commands and actions. The first thing we need to do is switch the calculator to “HMS” mode so we can try it out. To do this, type m h in the mode (the left) window and the modeline will change and say something like Calc: 12 Hms. The 12 is the floating point precision.

Next, type in the expression, replacing the unicode symbols above with @ for degrees; ' for minutes; and " for seconds. If you typed it in correctly, it will appear in the calculator window.

All we have to do now is convert it. Calc can convert between a wide range of units and systems, but we only care about decimals. Type c d and Calc will convert it to a decimal number. If you entered 37@26'36.42" you should see 37.44345 appear in its place.

OK, so we know it can do it, but how do we weaponize it? It so happens that Calc comes with a neat, little (though underdocumented) command called calc-eval.

Entering IELM, M-x ielm, we can query the calculator in real time:

Yep. It does work. The hardest part about using it is mapping the “algebraic” notation used above with the indirect, keybinding-based input you use in the RPN calculator. Thankfully, the manual and (often) the trail will tell you the name of the function you are calling.

Let’s digress a little so I can show you how neat this calculator actually is: solving the elementary equation 2x+5=10. In the calculator, type m a to go to algebraic mode; next, type (2x + 5 = 10) — don’t forget the brackets — and it should appear as an equation. Finally, type a S to “solve for” a variable — and when prompted, answer x. The answer will appear in your calculator window. How awesome is that?

Back to the challenge. Calling the “convert to degrees” function is what we need to do, and the answer is hidden in the trail — it’s called deg.

Putting it all together, and we get:

That looks right. But the original challenge said that we had to take a string, like the one given above, and map that. So here’s my solution:

Calling it from IELM yields the following answer:

Looks good to me. Job done, and a fun challenge.