Sunday, 6 November 2011

Fixing up XML well-formedness

You have a lot of text marked up as XML (most likely XHTML). You've edited it by hand, and as a result it's chock-full of errors. Mainly, it's missing closing-tags (or, when you meant to type a closing-tag, you actually typed an opening-tag).

It's easy to find the errors using xmllint. But if you want to fix them, that's a bit cumbersome.

I found emacs worked very nicely. You don't have to use emacs-specific keyboard shortcuts (much :); the current version has friendly menus and standard Gtk dialogues. Just remember not to try and use normal keyboard shortcuts like ctrl-s to save. (Or Alt to bring up menus. If you're addicted to standard keyboard shortcuts, you can use F10 and the arrow keys - this is a less well known standard in GNOME and other CDE/CUA-compatibles).

If you don't see an XML menu, you need to switch to XML mode manually.

M-x xml-mode

(M for "meta", which by default will be the Alt key. So Alt-x, then type "xml-mode" and hit Enter).

Then look in the XML menu. Note the "Next error" command - exactly what we want. Conveniently, if you look to the right, you'll see it's keyboard shortcut: C-c C-n. I.e. Ctrl-c, followed by Ctrl-n. You can keep Ctrl pressed down for both, or not, as you wish.

I used XML->Set Schema->Any Well-formed XML, for a first-pass checking for mismatched tags etc, without bothering about validity against the specific DTD.

You can quickly add missing close-tags, by typing "<", and then C-Enter for auto-completion. (I was obscurely amused to see M-Tab listed as an alternative shortcut. If you don't get the joke, try it yourself in any popular desktop environment; it doesn't do any harm. I didn't realize it myself until I tried it). You may also want Options->Line Wrapping for this buffer->Word Wrap.

I originally tried xmlcopyeditor. In order to change an open-tag to a close-tag, you have to delete the open-tag and then create a fresh close tag. It's not that this takes too long - backspace/delete kills the whole tag at once, and then "

No comments: