One More Thing About Those Radiohead WAVE Files

I was getting tired of analyzing WAVE files by hand with a hex editor. So I wrote a Perl script to do it. It parses through the RIFF file and it’s chunks, making sure the WAVE header exists and that it’s properly formed. Unexpectedly, I found another issue with the Radiohead The King Of Limbs WAVE files. The WAVE header is a couple bytes longer than it needs to be. The “fmt ” chunk, where the WAVE header is stored, is properly formed, so it’s not an error. But from what I’m reading, a number of programs create WAVE files with this quirk (I’ve seen files like this from Sound Forge, and the sound recorder that comes with Windows XP apparently does this as well).

The chunk size is indicated correctly, so it’s not technically wrong, but some programs reportedly have problems with WAVE files created this way, as they assume that everything in the WAVE file is aligned in four-byte intervals. The standard WAVE header is 24 bytes, and the RIFF header is 12 bytes, so the “data” chunk typically starts at byte offset 36, which is four-byte aligned. The two extra bytes in the header would push the data chunk to byte offset 38, which is not four-byte aligned.

You can’t actually make any such assumptions about the internal format of a RIFF (and thus WAVE) file, so this is buggy behavior on the part of any program that does this. But it seems to be common enough that someone wrote a Windows program called FixWAV that trims the extra two bytes off the WAVE header to “fix” it. (My rewrapWAV Perl script does the same thing, by the way.)

Why are these extra bytes added in some cases?

WAVE files are usually used to store PCM audio, but it is actually possible to store compressed audio. There is a “AudioFormat” field of the WAVE header which indicates the kind of audio being used. It’s almost always “1” which indicates PCM audio, but if it is another value, there’s compression of some sort being used, and the header is extended to indicate the type of compression.

If AudioFormat is something other than “1”, the two bytes after the end of the traditional WAVE header is a length of the extra compression information, and the actual information comes after that. These two bytes, called ExtraParamSize, are not expected to be present if AudioFormat is 1, but strictly speaking, it doesn’t hurt if they are there, but set to zero, which means there are not bytes past those. Any program that can generate compressed WAVE files might choose to add this field regardless of the AudioFormat.

The Radiohead WAVE files have these extra bytes.

My WAVE validation tool detects this in the file, as well as the extra JUNK and BWF chunks. Here’s an example of the output:

01 Bloom.wav: info: JUNK chunk found (28+8 bytes)
01 Bloom.wav: info: JUNK is not a standard WAVE chunk
01 Bloom.wav: info: fmt chunk found (18+8 bytes)
01 Bloom.wav: info: data chunk found (55507428+8 bytes)
01 Bloom.wav: info: bext chunk found (858+8 bytes)
01 Bloom.wav: info: bext is not a standard WAVE chunk
01 Bloom.wav: info: iXML chunk found (2280+8 bytes)
01 Bloom.wav: info: iXML is not a standard WAVE chunk
01 Bloom.wav: info: end of file
01 Bloom.wav: warning: WAVE header is larger (18 bytes) than normal (16 bytes)
01 Bloom.wav: info: audio format is 1 (PCM)
01 Bloom.wav: info: number of channels is 2
01 Bloom.wav: info: sampling rate is 44100
01 Bloom.wav: info: byte rate is 176400
01 Bloom.wav: info: block align is 4
01 Bloom.wav: info: bits per sample: 16
01 Bloom.wav: finished with 0 errors and 1 warnings

5 Comments

  1. Lark
    Posted 28 February 2011 at 3:34 pm | Permalink

    http://thekingoflimbspart2.blogspot.com/

    The people at this site would have a field day reading into what you’ve said about the .wav files.

  2. Gil
    Posted 23 March 2011 at 11:48 am | Permalink

    “bext” and “iXML” chunks are part of the Broadcast WAV format. Many sound editing programs, especially ProTools, save files in this format.

    http://en.wikipedia.org/wiki/Broadcast_Wave_Format

  3. Posted 23 March 2011 at 12:50 pm | Permalink

    ProTools can save BWF files, but it certainly doesn’t have to. I explored that a bit in a previous post (http://blog.bangsplatpresents.com/?p=970) but didn’t link here. In this case, there’s no useful or interesting information in the BWF chunks, so I don’t think it was intentional. Not that it should matter- the files should still work for any proper WAVE player.

  4. SirNickity
    Posted 2 May 2011 at 4:03 pm | Permalink

    WAVE files are aligned on 2-byte boundaries, not 4 bytes. It is technically proper to write a single 0x00 after the end of a chunk if its size is odd.

    The SIZE field of the chunk does *NOT* include this byte; it is assumed that if chunk size is odd, the next chunk will start at fpos + 4 (ID) + 4 (SIZE) + + 1.

    This rule is so commonly broken in the wild that, if you’re writing code to read WAVE files and the next calculated chunk position is on an odd byte, it’s safer to check for 4 ASCII characters (the chunk ID) at that calculated odd offset, and if you don’t find the ID there, THEN check the following even offset (where it’s supposed to be.) You might reverse this order if you want to favor properly written files, just in case the last byte of the previous chunk passes the (32 <= char <= 126) test, though if the author used 0x00 on the padding byte like he should have, that won't be a concern.

    Furthermore, it's NOT safe to assume any common field is a known size. *ALWAYS* use the SIZE field (+4+4+[0|1]) to calculate the next chunk's offset. While most headers are 44 bytes long, it's your choice whether you want to successfully read "most" valid files, or "all" valid files. :-) It's perfectly legal to include a 0-byte extension to the format header, and just as easy to parse if you use the SIZE field instead of guessing how long you think it should be.

    (For a good format reference, see: http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html)

  5. Posted 5 May 2011 at 1:51 pm | Permalink

    Thanks for that link! I have never had the original Microsoft documentation, so I was unaware of the even byte requirement. Makes sense, since it is explicitly mimicking the IFF spec. I need to update my code.

    However, this is not what I’m describing. There are two extra bytes, so it’s 16-bit aligned either way. I still think my ExtraParamSize == 0 guess is correct.

    The way everyone does the fmt chunk (at least for PCM audio), it will always be an even number of bytes, and thus the chunk size will be correct with no need for padding. (This is not necessarily true for the data chunk, which for some formats could end up with an odd size, and require padding.

Post a Comment

Your email is never shared. Required fields are marked *

*
*