On my ever onward quest for new useless knowledge I wanted to know how to change a file's length and pad it with a specific byte value, on the command line.

Funnily enough, I know how to do this with JavaScript in the browser pretty easily (for me), but the challenge was using tools fit for the job (and I didn't really want to use a JS engine to do the job).

UK EVENTAttend ffconf.org 2024

The conference for people who are passionate about the web. 8 amazing speakers with real human interaction and content you can't just read in a blog post or watch on a tiktok!

dd my old friend

dd is a command line tool that can generate files, useful in fact when you want to create dummy files (as I was 13 years ago… 😱).

To generate the dummy data, dd reads from /dev/zero (which gives you a stream of the null byte, 0x00).

Using dd I need to:

  • Copy the source file
  • Add X new bytes to the end of the source file until it reaches Y length
  • Ensure those new bytes are the one I defined, the null zero byte isn't good enough

First task before I use dd is: create a stream of the byte I want, let's say 0xFC (252 in decimal).

A stream of bytes

There's actually two tools I could use here. The first (and the one I'll use) is /dev/zero the alternative is to use yes with an argument (so yes 1 will repeat 1\n over and over) - but with yes I'd need to strip the new line, then change the 1 character.

So, pipe /dev/zero to tr and transform the zero byte into 0xFC. Of course, tr doesn't do hex, it does octal (base 8), so I need:

tr '\0' '\374' < /dev/zero

Let's double check this is the right value, by looking at the first 16 bytes as hex:

tr '\0' '\374' < /dev/zero | head -c 16 | hexdump
0000000 c3 bc c3 bc c3 bc c3 bc c3 bc c3 bc c3 bc c3 bc
0000010

Urm…spot a problem? I've got 0xC3BC as a repeating word which is not what I was after. From what I could gather, this is because tr is used to translate strings, and the default language it's using is UTF8 which… the byte 0xFC is outside of a normal range, or, something - I'm never quite going to understand UTF8 fully (but these two do).

The solution is to switch the language tr is using:

LC_CTYPE=C tr '\0' '\374' < /dev/zero | head -c 16 | hexdump
0000000 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
0000010

That's better. Now let's pipe that to dd

Generating fixed length files

My aim is to take a file that has an arbitrary length with existing data, and pad it to 16K and any padded bytes must be 0xFC.

To create a 16K file (called output.bin) with dd using our byte, the line is:

$ LC_CTYPE=C tr '\0' '\374' < /dev/zero | dd of=output.bin bs=16k count=1

This means our blocksize bs= is 16k (dd supports shorthand for file sizes) and we want one block of 16k. Alternatively I could have bs=1k count=16 to make 16 blocks of 1k - hopefully you get the idea.

Except this makes a whole file, I need to add to a file. This is multipart task with a calculator handy.

Instead of 16K of data, I need 16K - $currentFileLength. To get this, I need two things: expr (for the maths) and stat (to get the file size accurately).

To test this, I use the following command:

$ expr 16384 - $(stat -f "%Dz" input.bin)
15488

To generate a zero byte for 15,488 bytes I'll use dd to fill with a blocksize of 1 byte and for a count of 15,488 bytes:

dd if=/dev/zero bs=1 count=$(expr 16384 - $(stat -f "%Dz" input.bin))

But this is still zero bytes, so streaming through tr converts to 0xFC:

dd if=/dev/zero bs=1 count=$(expr 16384 - $(stat -f "%Dz" input.bin)) \
 | LC_CTYPE=C tr '\0' '\374'

Now I have 15,488 bytes of 0xFC.

So now I need to take the contents of input.bin and the generated output from dd and concatenate the two blocks of content into a final file called output.bin.

That looks like this:

cat input.bin \
 <( dd if=/dev/zero bs=1 count=$(\
      expr 16384 - $(stat -f "%Dz" input.bin)
    ) | LC_CTYPE=C tr '\0' '\374'
 ) > output.bin

A bit of a mouthful, but it puts together all the parts, then uses stream redirection to add dd to the output of cat and the concatenated result is put in output.bin, which is: exactly 16K large, contains 896 bytes of the original input.bin data and then follows 0xFC for all the subsequent bytes.

Ah yes, that old "command line rabbit hole".