Student of the Source: 2008

Monday, 14 July 2008

Reasons to wean myself off Gmail

I became addicted to Gmail soon after it was opened to the public. Web access, labels, conversation view, gigabytes of free storage and (more recently) IMAP access, yay! But I've come to the point where it's a bit too much magic and I need to look at moving elsewhere.

To import my email archive with accurate dates, I had to set up a box poll. Uploading over IMAP causes broken dates.
To use Thunderbird on multiple computers I have to repeatedly unlock a CAPTCHA. Google said they were working on it... at the end of March.
Spam and general filtering. I'd like to have as much control over spam filtering as possible. I'd like to be able to keep spam if I really want to. I definitely want to keep the spams I flag for training. And for general filtering, I'd like to at least be able to export my filters.
Google like to think they use standards so people can control their data at will. But they don't provide Sieve or any other way to manage filters. They're not leading in the IMAP standardization effort, in particular on the mobile side. And while they provide contact import and export, they didn't quite go as far as providing LDAP.
They don't support my favorite browser - KDE's Konqueror. Fortunately the "basic HTML" view is very usable, but it lacks support for e.g. changing settings and easily deleting lots of junk mail. This is not really graceful degradation, let alone progressive enhancement.

Call me impatient, but my general impression is that Google aren't quite set up for application support. In any case, they don't value my use cases.

Email is very important to me. I appreciate Google for showing the world and I how powerful webmail can be, but I just can't rely on them to get it right. I have a couple of family members I'd like to convert to email-stored-on-the-server (IMAP and/or webmail), but I can't in conscience recommend Google. Both of them are used to desktop email clients, and Problem #1 above negates the advantage of being able to access email from multiple different computers.

I've been paying for what looks like a loss leader test account at Tuffmail.com for a few years now. It's time for me to review it again and see if I can make the switch. It's a smallish technical-oriented service. My only concern is it looks like a one-man company - I don't know what would happen if the guy left the business.

Phew. Links are tiring - but I don't like to moan without any citations. I need to start using Markdown.

Friday, 20 June 2008

Git (Guilt) followup

Aha! The Guilt docs say

"The patch directory can also be placed under revision control, so you can have a separate history of changes made to your patches."

I assumed this meant "git-add .git/patches/master". That's not possible though as Git doesn't believe you should be able to shoot yourself in the foot by tracking files under .git/. But it does let you do "cd .git/patches/master; git-init", and create a second repo under .git/. I guess this is what the doc means by a "separate history".

The advantage of this format is I can easily publish all my changes before I'm 100% happy with them. So I've now published the complete code for module-init-tools indexes, which shaves a whole second off my EeePC boot-time. It passes the testsuite, as well as my own "does it break my computer" test:

time grep MODALIAS /var/log/udev | cut -d = -f 2- | xargs -n1 /sbin/modprobe 2>&1|sort -o a
time grep MODALIAS /var/log/udev | cut -d = -f 2- | xargs -n1 ./modprobe 2>&1|sort -o b
diff a b

I just need to clean it up a little now and wait for the maintainer to get back in touch with me.

WARNING. It is not suitable as a drop-in replacement for module-init-tools. At least on Ubuntu/Debian, their module-init-tools package has been heavily customized with extra options, in particular -Q (--silent). Your Ubuntu will likely fail to boot if you build modprobe from scratch and install a binary which doesn't support the -Q option.

Git: is not so great for managing patch queues actually

I found myself doing a bit more work on module-init-tools than planned. The good news was that it had a testsuite. The bad news is I had to fix the testsuite first to work on my computer, then to get it to use valgrind as documented.

I also ended up fixing a number of memory leaks. These are unlikely to important in module-init-tools where all the programs are short running commands. However, they make it much more difficult to use valgrind to check for leaks in my new code.

Plus, while I was grokking all this existing code it seemed a pity not to profile it and shave some time off the hotspots (again, using valgrind, also with wonderful profile graphics from kcachegrind).

As a result, I currently have 12 patches with just under 6000 lines. My first approach was to import all the changes into separate commits in Git - hence the GitHub account. This is, it turns out, the Wrong Thing To Do. It's very cumbersome to go back and correct a commit if you miss something, or as a response to feedback.

So I'm now using "guilt". This is much like Andrew Mortons "quilt" patch management system, except it lets you feel superior for using Git underneath and not having to type "quilt add" every time you edit a different file. I've already used it to reorder my patches and regroup them more logically, and it works great.

What I'm trying to say here is that all the history I published on GitHub is now obsolete, and I'm going to blow it away :-). Before I do that, hopefully I can work out what the murmerings about tracking your patches directory in Git mean - then I can publish my patch queue on GitHub.

But now I have to go collect my SDM open assessment feedback. I was hoping to meet with Niall at the same time but he's not replied. Oh, and maybe I should get some food in the house.

Tuesday, 17 June 2008

Boot times revisited

I've been fiddling with boot time optimisations for a while now, but it's now born fruit.

If you look at GitHub sidebar on the ~~left~~ right you'll see "my-module-init-tools". I've just published all my baseline fixes.

Coming soon: the module index (already implemented and tested) that speeds up stock Ubuntu boot times by 3 whole seconds! This is on my EEE, remember - a 630Mhz Celeron. But even my Core 2 Duo desktop takes a whole second faffing around reading these module configuration files!

The current maintainer Jon Masters is very interested, but apparently quite busy at the moment.

Next up: UDEV. I had one simple fix to speed rule execution (compute jump targets at load time, instead of scanning at run-time), but the main problem appears to be that it just forks too many processes. I spent ages squeezing the last drops out of string manipulation code etc, because the fork overhead isn't really clear on oprofile. I still don't have a good way to profile it. My planned fix is to move away from directly access to environment variables, use pthreads instead of multiple processes, and then only fork & set environment variables when udev needs to run a different program like modprobe or vol_id.

Altogether, we can expect a modest boot-time improvement for everyone. (Ignoring the few freaks who manage to run a functional system without udev).

Sorry, no convenient links - I've run out of time and need to sleep. You know where the search engine is. (No, it is so not extra effort. Use a proper GUI & browser with support for middle mouse button "paste URL", and default search, aka search from the address bar. Select a word or phrase on the page, middle click, and magic happens. I'm on Konqueror on X11).

Saturday, 10 May 2008

Reason #232 for bringing an EEE into your life

For the hypothetical readers of my previous post: the operation was a success :-). I now have an EEE running the vast majority of my favorite software (storage is a problem), based on Kubuntu (but with lots of packages I don't use thrown out). Installation was by an indirect route - I bootstrapped it using the eeeXubuntu install media, which worked great, and set me up with some hot-key scripts. (Unfortunately volume up/down isn't working right now). After switching to Kubuntu, I then upgraded to Hardy, which broke suspend somewhat.

Obviously there's a lot I've left out. Suffice it to say that I'm now using the EEE as a desktop replacement. Apart from storage, which is fixable, the only regrets I can think of are

1. Flash sucks so bad it can't manage full-screen video on a 630Mhz Celeron (and no, I don't use compiz or similar effects). Mplayer can do it! Unfortunately Mplayer loses audio sync on flv files.
2. KDE4 isn't ready for prime-time, so I don't have a dashboard equivalent. It would be really handy to have a clock with the full date I could bring to the fore with a single keypress.

Though it does help that I can plug in a set of larger peripherals -

screen (xrandr goodness!)
keyboard (I bought a nice USB keyboard with this in mind to replace the battered and ancient incumbent)
speakers (quirk: the eee causes a background hum on the speakers, which is really quite loud - but it's much improved if you plug in an external monitor)

For completeness: I also built my own kernel - which has it's upsides and downsides. (The webcam driver doesn't work for some reason, and obviously I miss out on a certain amount of Ubuntu / community support). Also, I'm now an even heavier user of hibernation than before, with uswsusp for compression - useful for both space and faster resume times. Unfortunately I had to hack ubuntu to use uswsusp.

Now I can get to the point of today's post! That would be the magic of wake-on-RTC-alarm, transforming my laptop into a (internet) radio alarm clock. All low level at the moment. Though pm-utils has some support for it, and I assume HAL has been wired up to it, there's no alarm-clock GUI yet. So here's a set of commands I'm trying today:


sudo -s # gain privs for echo
echo 0 > /sys/class/rtc/rtc0/wakealarm # must clear any existing alarm first

# Test alarm - resume in next 4 minutes (hibernation takes a _long_ time.  TODO, non-blocker
#t=`date +%s`; w=`expr $t + 240`; echo $w > /sys/class/rtc/rtc0/wakealarm

# Real usage - Alarm clock
w=`date -d "tomorrow 7:55" +%s`; echo $w > /sys/class/rtc/rtc0/wakealarm

pm-hibernate # Now hibernate
exit # drop privs

# Play BBC Radio 4.  Yes, theres no reason why mplayer shouldn't be able to play .ram files directly
mplayer `wget -O- http://www.bbc.co.uk/radio4/realplayer/media/fmg2.ram`

UPDATE: fixed date command so this actually works. Big caveat: this only seems to work on mains power - so unfortunately it's not a truly portable alarm clock. Probably not as eco-friendly as I was hoping for either.

Saturday, 5 April 2008

EEE alternate OS installation plan

Constraints:
4G SSD
No optical drive (for install)

Requirements:
Should boot similar speed to default OS (10-15s on 630Mhz Celeron)
Reasonable wireless configuration tools

== Filesystem ==

Apparently reiserfs saves almost 200M in filesystem overhead. Plus it should store small files more compactly... but my calculations suggest journaling may be a bad idea, as confirmed by Asus choice to avoid it (mounting as ext2):

Maximum write b/w: 10Mb/s
Maximum reiser journal size: 127M (32749*4096)
SSD Write cycles: 100_000

Wear leveling will have very little effect. Typically wear leveling occurs in zones of around 4mb, and I assume this is contiguous (and so is the journal).

Now we should halve the bandwidth, since journal writes are accompanied by normal writes.

100_000 / (5Mb/s / 127M) = 100_000 * 25.4s = 2.54 mega-seconds, = 29.4 days.

This is absolute worst case - reiserfs only journals metadata, and it's not clear why one would generate 5Mb/s continous metadata writes without any data writes. But it looks better avoided. ext2 for me.

== Boot speed ==

Looks like Puppy linux might be good.
Alternatively, hibernation. Use swap file for flexibility (can resize). uswsusp can do compression for even more speed.

Thursday, 27 March 2008

Boot times (Ubuntu)

I've been playing with bootchart (which I was very impressed by). I thought I should share my findings about boot time.

The biggest improvement came from generating a boot profile (booting once with the "profile" option), so that readahead worked well on my system. I found compiling a custom kernel to work _without an initramfs_ also had a significant effect. But I also found a couple of other tweaks which can save a second or two:

console-setup

ckbcomp is run by setupcon each boot (twice) to generate a keymap for the console, using the X keymap files. This takes a reasonable amount of CPU; it can take almost a second to run during boot. setup-con had already saved the generated keymap - this is done when the console-setup package is installed/reconfigured (or when "/etc/init.d/console-setup restart" is run manually). Unfortunately it only used it if /usr wasn't (yet) mounted. It wasn't immediately obvious how to make it do the right thing, so I just hacked it to always read my saved keymap. (It's a hack because I commented out the code which would regenerate the keymap if the configuration changes)

hwclock

hwclock is invoked twice during the boot process, from hwclockfirst.sh and hwclock.sh. Unfortunately it waits for the next second boundary before it reads/sets the clock. In other words, whenever it's run it will take an average of half a second. Two observations here:

This really ought to be run in parallel with something, rather than blocking the entire boot process. I enabled concurrency=shell in /etc/rc, renumbered the boot script to run in parallel with keyboard-setup, and removed the ".sh" extension which also prevented hwclock.sh from being run in parallel with anything else.
When built for a 64-bit intel system, hwclock does not trust RTC (hardware clock) interrupts. This means it has to _busy wait_ for the next second boundary, which consumes 100% CPU, slowing down the boot and generally looking stupid. I think it should trust the kernel - I patched it to use RTC interrupts and it worked just fine. This should be a low risk change: in the worst case it will time out and continue after waiting 5 seconds for an interrupt.

To be more specific: the kernel should return an error if it does not support RTC interrupts. However, on 64 bit intel systems hwclock does not trust the kernel to do this, so it always has to busy-wait.

 /* Turn on update interrupts (one per second) */
#if defined(__alpha__) || defined(__sparc__) || defined(__x86_64__)
/* Not all alpha kernels reject RTC_UIE_ON, but probably they should. */
rc = -1;
errno = EINVAL;
#else
rc = ioctl(rtc_fd, RTC_UIE_ON, 0);
#endif
if (rc == -1 && (errno == ENOTTY || errno == EINVAL)) {
  /* This rtc device doesn't have interrupt functions.  This is typical
     on an Alpha, where the Hardware Clock interrupts are used by the
     kernel for the system clock, so aren't at the user's disposal.
     */
  if (debug)
   printf(_("%s does not have interrupt functions. "),
   rtc_dev_name);
  ret = busywait_for_rtc_clock_tick(rtc_fd);

Saturday, 5 January 2008

paranoia: "Use your CDROM drive to read audio tracks.... and have it actually work right!"

For some reason I use the latest testing version of cdparanoia (which hasn't been updated for a while now -

cdparanoia III release 10pre0 (August 29, 2006)

I think this is because the stable version, 9.8, wasn't being paranoid enough on recent versions of linux where my cdrom drive gets treated as SCSI, in some strange way. (It's a standard ATAPI drive). I was getting different results for rips of the same track, which is what paranoia is supposed to avoid.

I do cmdline ripping, no fancy tagging. I don't know whether the GUIs are paranoid enough, and its nice to know whats actually going on. Recently I've been using "picard" to add tags afterwards.

---
DEV=/dev/cdrom

# cd -> audio_??.inf audio.cddb audio.cdindex
icedax -device $DEV --info-only --cddb 0

# cd -> track??.cdda.wav
cdparanoia --force-cdrom-device $DEV --batch --abort-on-skip

# *.wav -> *.flac
for i in *.wav; do flac --best --replay-gain $i; done &

# *.wav -> *.mp3 (includes fast replaygain as standard)
for i in *.wav; do lame --preset standard $i; done &

A new years resolution

To log my tech experiences.

This is for purely selfish reasons, of course. Next time I re-install, I'll have my hacks and recipes recorded.

Student of the Source