Solving Tom’s Data Onion

Photo by Maria Hochgesang on Unsplash

WARNING! Spoiler Alert!

I highly recommend trying to solve the puzzle on your own before continuing reading this article. Progressing through the layers is a lot of fun and further it includes some nice learnings. In the following lines I will describe the solution for each layer, so be warned that your puzzling experience may suffer from reading about a layer before solving it yourself first.

Disclaimer

The below described solutions are by far not perfect! Especially in the first few layers I’m doing lots of conversions (especially to String) that, as I discovered later, aren't necessary. But hey, I just kept it like it was because it solves the layers and it somehow also shows my progress about working with bytes and bitwise operations.

Layer 0

Instructions

Tom’s Data Onion consists of multiple layers. Solving one layer, which means decoding its payload, will output another text file to decode. It continues like that until you reach the mysterious core.

Solution

Adobe-flavoured ASCII85 is a binary-to-text encoding algorithm. It encodes 4 bytes of binary data to 5 ASCII characters, which then can be transmitted over communication protocols that are designed to transmit human-readable text instead of arbitrary binary data.

  • The start and end points of the payload are marked with <~ and ~>
  • White spaces and new lines must be ignored
  • 4 zero bytes would actually be encoded to !!!!! but are compressed to z as this appears quite often
  1. Get the ASCII values of the encoded characters
  2. Substract those values by 33 (this is necessary because the first of the 85 ASCII characters that is used is !, which is represented by the value 33 not 0)
  3. Calculate a 32 bit number with those values
  4. Translate this 32 bit number into 4 bytes and add them to the result

Layer 1

Instructions

When reaching layer 1 it is time to get into bitwise operations. Now every second bit of the (ASCII85 decoded) payload needs to be flipped and all bits in a byte have to be rotated one position to the right.

Solution

First of all I moved the functions I wrote in the previous layer to a utils file. As I couldn’t find out how to call a function from a separate file with the kotlin-main-kts artifact I started using kscript, which worked just nice.

Flip each second bit

To flip a bit you simply need to XOR it with 1. Therefore you need to XOR the byte with 11111111 first. Now all bits are flipped. As we just want to flip every second bit we need to mask the result to extract the bits we are interested in. This can be done using AND and setting the bits we are interested in to 1. Thus the flipped byte needs to be masked with 01010101 to get all even bits and the original byte needs to be masked with 10101010 to get all odd bits. Then the results can be combined with a simple OR. Below you can see an example for this using the input 00110110.

// Extract the flipped bits we are interested in 
00110110
XOR 11111111 //flip all bits
--------
11001001
AND 01010101 //mask
--------
01000001
// Extract the original bits we are interested in
00110110
AND 10101010 //mask
--------
00100010
// Combine them
01000001
OR 00100010
--------
01100011

Rotate one position to the right

To rotate a byte one position to the right first the last bit (that will be shifted out) needs to be extracted. This can be easily done with masking again. Afterwards the byte can be shifted to the right and combined with the last bit using OR. Below the above pictured example is continued.

// Extract the last bit
01100011
AND 00000001
--------
00000001
// Shift the byte one position to the right and combine with the last bit
01100011
SHR --------
00110001
OR 1
--------
10110001

Layer 2

Instructions

In this layer the author added a parity bit to each byte. Parity bits are used to check if bytes are corrupted. When the number of ones within the most significant seven bits of a byte is odd, the least significant bit (= the parity bit) should be set to 1, otherwise it should be 0. You can probably imagine the task now. It is to check the parity bit of each byte and to omit invalid bytes.

Solution

This layer was rather easy in my opinion. First all invalid bytes need to be filtered out by checking the parity bit. To get the parity bit I used masking again. So I masked the byte with 00000001 and stored it in a variable. Afterwards I created a counter for the number of ones and simply shifted the byte to the right and masked it with 00000001 again to get the (new) least significant bit. If it was one, the counter was increased by one. This step was repeated seven times. Well, checking if the number of ones is odd or even and filtering out invalid ones is rather self explanatory then. In the following code example you can see that applied.

Layer 3

Instructions

With layer 3 we start to dive into encryption. In this layer the payload has been encrypted using XOR and a 32 byte long cycling key. This means the input was XORed with the key to encrypt the payload. As the key is shorter than the input it was simply repeated all the time until the input was over. The encrypted payload is given, but neither the key nor the input are known before starting to implement the layer. The author just gives the hint that a XOR operation can be reversed without loosing any data. This means if A XOR B = C also the following two equations are true: A XOR C = B and B XOR C = A.

Solution

This is the layer I definitely spent the most time on because I wanted to get a “mathematical” solution where I could just calculate or maybe even brute force the key or the input. But some hours and lots of papers with dozens of zeros and ones on them later I started to believe that this won’t work. At this time I started discussing about this topic with a friend that is into encryption and he gave me the hint to look at the file and use the things I know about the expected result.

==[ Layer 4/6: Network Trafficù============================ Ž$en computers send data over a ·)twork like the internet, the d¸8a is broken up and placed with°" packets. As well as containin¾lthe data being sent, packets c¶"tain extra data like the desti·-tion address

Layer 4

Instructions

In this layer the data represents a stream of raw network data. To be precise, a series of IPv4 packets with a UDP (User Datagram Protocol) payload inside. To solve this layer those packets need to be parsed and the invalid ones should be omitted. A valid packet is defined like this:

  • Sent from any port of 10.1.1.10
  • Sent to port 42096 of 10.1.1.200
  • IPv4 and UDP header checksums are correct

Solution

As the size of the packets may vary I wrote a recursive function that gets, parses and validates the current packet and then calls itself again with the start index of the next packet. A packet consists of the IPv4 header (20 bytes), the UDP header (8 bytes) and the actual data that is sent with the packet. The length of this payload is defined in the IPv4 header.

  1. Extract the IPv4 and the UDP header
  2. Read the data size from the IPv4 header and extract the data
  3. Check if the packet is valid (see below for more details) and add the data to the result if yes
  4. Parse the next packet
  • A pseudo header containing
    - Source and Destination defined in the IPv4 header: 8 bytes
    - 0: 1 byte
    - Protocol: 1 byte (this is always 17, which stands for UDP)
    - Length defined in the UDP header: 2 bytes
  • The UDP header without the checksum
  • The full data without headers
  • A zero byte of padding if the total number of bytes should be odd

Layer 5

Instructions

The payload of this layer has been encrypted using the Advanced Encryption Standard, to be precise AES-256 in Counter Mode(CTR). The key and initialization vector (IV) to decrypt the payload are given in the beginning of the payload. Since the key is encrypted with a key encrypting key (KEK) and another IV there is one step that needs to be completed prior to decrypting the payload. This means the key needs to be unwrapped before the payload itself can be decrypted.

Solution

The first step to solve this layer is to extract the given values. This is pretty straight forward and you can see the code for that below.

Layer 6

Instructions

For me layer 6 was definitely the most intimidating layer when reading the instructions for the first time. To solve this layer it is required to build an emulator for the Tomtel Core i69 that translates machine code written for this machine to text. When I read through the instructions a bit more carefully the solution became quite clear and was by far not that scary anymore. The Tomtel has fixed sets of registers and instructions and also a fixed amount of memory, which you need to use to decrypt the payload.

Solution

The first step to solve this layer is to store all bytes of the encrypted payload in the memory of the Tomtel. Then I created a loop, that is running until the HALT instruction appears which means that the program completed. One rather important register is the pc register. It holds the program counter and tells us which command to read next. In the beginning it is set to 0. So the steps that are repeated in each iteration of the loop are the following:

  1. Parse the command from memory at the position where pc is pointing to
  2. Increase the value of the pc register by the size of the command (some commands take the next byte(s) as input and therefore are bigger than 1)
  3. Execute the just parsed command

The Core

After solving layer 6 you’ll reach the mysterious core. Tom Dalling (the author) says the following about it:

Version Information

This post describes the solution for version 1.1.1 of Tom’s Data Onion.

Resources

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Katharina Sick

Katharina Sick

Creative and detail-oriented software developer. Advanced from mobile to backend development and now getting into full stack solutions.