On my ever onward quest for new useless knowledge I wanted to know how to change a file's length and pad it with a specific byte value, on the command line.
READER DISCOUNTSave $50 on terminal.training
I've published 38 videos for new developers, designers, UX, UI, product owners and anyone who needs to conquer the command line today.
dd my old friend
To generate the dummy data,
dd reads from
/dev/zero (which gives you a stream of the null byte,
dd I need to:
- Copy the source file
- Add X new bytes to the end of the source file until it reaches Y length
- Ensure those new bytes are the one I defined, the null zero byte isn't good enough
First task before I use
dd is: create a stream of the byte I want, let's say
0xFC (252 in decimal).
A stream of bytes
There's actually two tools I could use here. The first (and the one I'll use) is
/dev/zero the alternative is to use
yes with an argument (so
yes 1 will repeat
1\n over and over) - but with
yes I'd need to strip the new line, then change the
tr and transform the zero byte into
0xFC. Of course,
tr doesn't do hex, it does octal (base 8), so I need:
tr '\0' '\374' < /dev/zero
Let's double check this is the right value, by looking at the first 16 bytes as hex:
tr '\0' '\374' < /dev/zero | head -c 16 | hexdump 0000000 c3 bc c3 bc c3 bc c3 bc c3 bc c3 bc c3 bc c3 bc 0000010
Urm…spot a problem? I've got
0xC3BC as a repeating word which is not what I was after. From what I could gather, this is because
tr is used to translate strings, and the default language it's using is UTF8 which… the byte 0xFC is outside of a normal range, or, something - I'm never quite going to understand UTF8 fully (but these two do).
The solution is to switch the language
tr is using:
LC_CTYPE=C tr '\0' '\374' < /dev/zero | head -c 16 | hexdump 0000000 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 0000010
That's better. Now let's pipe that to
Generating fixed length files
My aim is to take a file that has an arbitrary length with existing data, and pad it to 16K and any padded bytes must be
To create a 16K file (called
dd using our byte, the line is:
$ LC_CTYPE=C tr '\0' '\374' < /dev/zero | dd of=output.bin bs=16k count=1
This means our blocksize
bs= is 16k (
dd supports shorthand for file sizes) and we want one block of 16k. Alternatively I could have
bs=1k count=16 to make 16 blocks of 1k - hopefully you get the idea.
Except this makes a whole file, I need to add to a file. This is multipart task with a calculator handy.
Instead of 16K of data, I need
16K - $currentFileLength. To get this, I need two things:
expr (for the maths) and
stat (to get the file size accurately).
To test this, I use the following command:
$ expr 16384 - $(stat -f "%Dz" input.bin) 15488
To generate a zero byte for 15,488 bytes I'll use
dd to fill with a blocksize of 1 byte and for a count of 15,488 bytes:
dd if=/dev/zero bs=1 count=$(expr 16384 - $(stat -f "%Dz" input.bin))
But this is still zero bytes, so streaming through
tr converts to
dd if=/dev/zero bs=1 count=$(expr 16384 - $(stat -f "%Dz" input.bin)) \ | LC_CTYPE=C tr '\0' '\374'
Now I have 15,488 bytes of
So now I need to take the contents of
input.bin and the generated output from
dd and concatenate the two blocks of content into a final file called
That looks like this:
cat input.bin \ <( dd if=/dev/zero bs=1 count=$(\ expr 16384 - $(stat -f "%Dz" input.bin) ) | LC_CTYPE=C tr '\0' '\374' ) > output.bin
A bit of a mouthful, but it puts together all the parts, then uses stream redirection to add
dd to the output of
cat and the concatenated result is put in
output.bin, which is: exactly 16K large, contains 896 bytes of the original
input.bin data and then follows
0xFC for all the subsequent bytes.
Ah yes, that old "command line rabbit hole".