Olé! Diacritics in Emacs

by mickey on October 13th, 2010

If you’ve ever had to write the word résumé or über or smörgåsbord and, like me, lacked the keyboard character set to do it, you’ve probably reached for an external program to type those characters. That’s completely unnecessary though, as Emacs has complete support for Unicode and has several input methods that makes Emacs act like a bilingual keyboard but without the hassle of having to change your keyboard character set.

Emacs has three modes for inserting symbols and diacritical marks: Unicode Code Points, Character Composition and Multilingual Text Input.

NOTE: Some of the features in this article require Emacs 23+

Unicode Code Points

Because Emacs’s internal display and text engine now supports Unicode you can make Emacs insert any character you like. The only caveat is that your font may not support all of them. (This is certainly true of Consolas.)

To insert a code point type C-x 8 RET and enter the Unicode name (type TAB twice to get a complete list).



If you don’t see a snowman and a snowflake above, your browser or font lacks Unicode support

Character Composition

This is a simpler way of entering ANSI symbols. To write Olé You’d type: O l C-x 8 ' e.

To get a list of all available character compositions type C-x 8 C-h. And to get a list of all accented characters you can type C-x 8 ' C-h, and so on.

Multilingual Text Input

The Multilingual Text Input hooks into all typeable symbols and augment them so they act like a dead key. If you don’t know what that is, you’ve never used a type writer or a keyboard character set with accented characters.

A dead key is a key like the acute accent(´) that attaches itself to character typed immediately after the dead key. So to write Olé on a Spanish keyboard you’d type O l ´ e.

Emacs’s Multilingual Text Input mode does exactly the same. To enable it, type C-\ or M-x toggle-input-method select an input method (I picked latin-1). If the input method is toggled on, you should see an abbreviation of the method you picked appear in the left-hand corner of your modeline.

Now type a symbol like an apostrophe(') and Emacs will ask for a symbol to attach the character to, and you’re done.

To change the input method again, type C-x RET C- or M-x set-input-method.

Although I used latin-1 you can use any number of input methods, even ones that aren’t tied to a language per se, like sgml or TeX.

The TeX input method in particular deserves a mention, because it lets you insert symbols the same way you would in a TeX document: by requiring the escape character first you can use the other symbols on your keyboard with impunity.

The “utility” input methods are useful indeed, and they show you just how powerful Emacs truly is: it even supports the International Phonetic Alphabet (IPA).

Thanks to Clinton Curry for suggesting the TeX input method

  1. Clinton Curry permalink

    Another input method worth considering is the TeX input method. To use it, use the function set-input-method (typically bound to C-x RET C-\) to TeX. Then to type “Olé” you would do what you usually do in TeX or LaTeX: type “Ol\’e”. Using C-\ to toggle the input method means that it gets out of your way when you don’t need it, but when you need the odd diacritic or mathematical symbol (ℏ or ⊕ anyone?) TeX mode is an easy way to get it. For more specialized symbols like SNOWFLAKE, you still have the other method (C-x 8 RET) available to you.

    • mickey permalink

      Clinton, that’s very cool. I didn’t know TeX was a valid input method. That’s very handy. I’ve updated the blog post :)

  2. Although Microsoft has (shockingly) done a bad job of making this knowledge available, the version of Consolas that ships with Windows 7 properly supports far more Unicode characters. The version of Consolas that ships with Windows Vista is broken – and, inexplicably, that’s still the version that Microsoft’s web site offers for download. I like Consolas, so I try to spread this knowledge when people talk about how Consolas is broken. A fix is available – you just have to copy over the font files from a Windows 7 machine. I wish Microsoft would put up the non-broken version for download.

    Also this was a useful post for me, because I end up needing to type accented characters in emacs from time to time. Thank you.

    • mickey permalink

      I didn’t know Consolas differed between Vista and 7; but saying that, it’s still nowhere near as complete as Arial is. It’s quite frustrating at times.

  3. brad clawsie permalink

    fun! i’ve been wanting to try out the eye candy of putting real lambdas in my haskell! thanks!

  4. zirpu permalink

    there is also rfc1345.
    in your .emacs add:

    (require ‘quail)
    (set-input-method ‘rfc1345)

    then use C-\ to toggle your input-mode.

    &e’ produces é
    &E’ produces É

  5. Casey permalink

    Thank you (and Clinton Curry) for this! The TeX input method is just what I needed, and so much nicer than the clunky way I was doing it before.

  6. I’m newbie user of emacs, and catalan speaker. Today i can write correctly my language in Emacs. Thank you!

    I write in my blog some notes in catalan about your post.


