Internet Checksum Covert Channels
From The Math Club
This is adapted from a paper I wrote in 2001 entitled IP Checksum Covert Channels and Selected Hash Collision. The title and premise of the paper was slightly weak, as I presented it as a security risk, which it is not, but the fundamental methodology behind using the checksum function to embed a covert channel is solid.
The internet checksum (RFC1071) works on a set of 16-bit groupings (referred to as WORDs). An example would be a protocol header broken into two-octet chunks. Calculating the checksum is a very straightforward process, so if you are unfamiliar with it, please refer to the RFC. We will refer to any arbitrarily sized set of 16-bit WORDs as W. A basic IPv4 header would consist of 20 bytes or 10 words therefore W would contain 10 elements in this case.
The first thing we will want to perform on W is the simple summation of the elements in the set.
The sum of all the elements in W can be uniquely expressed as c times 65536 plus some value, m, between 0 and 65535, inclusive. This holds true due to the division theorem.
From here, the internet checksum can be calculated by the following function.
Which simply means the internet checksum, S, equals the 1's complement of (c + m) modulus 65536.
Lets say we want to embed a message, m, in W and we can choose an insignificant member of W to be a pivotal element, p, which will be dependant on our message. We can define a new set not containing this pivotal element.
This simply defines a new set, W-bar as the set that is like W without p in it. This will allow us to work with p, adjusting it to fit our selected secret message, m. To facilitate this, we allow p to occur as a dependant variable in the following.
We can now define m to be our 16-bit message that we would like to send over our covert channel.
Meaning a 16-bit p can be calculated and any arbitrary message may be
sent and a checksum collision will be guaranteed. We must only insure that c can always be
chosen to meet the required restraints of the inequality above since it is the only free variable. We will prove this by first defining j
which can be expressed in terms of k and r by the division theorem, therefore we can
express the p inequality in the following form:
To show it to hold true for any value of r and k, the case where r = 0 would be
which holds for c = k for any value of k. Otherwise, if r > 0 then assume c = k + 1 (there is no way that c > k + 1 otherwise p >= 65536).
Which holds any value of k and r and agrees with the prior restraints on r for this case, therefore values for c and p can generated for any arbitrarily selected value of our message m.
Example
What follows is a method for message generation, and an example dataset.
We will verify our example dataset by first including p into W-bar therefore re-creating W, and then checking to see if our message m comes out of the division theorem and then that our checksum matches.
An example of how this can be used in the IP header would be the following: Set up an
IP header with an additional 4 octets for IP options, set the first WORD of the options to
0 (end-of-options), and allow the second WORD to be p, which will be calculated later.
Allow W-bar to be the set of WORDs in the IP header, not including p. Allow for s to be the internet checksum, not yet calculated. Apply the method for message generation, selecting m to be the 16-bit message. Calculate p and s.
This method can be used for any protocol that uses the Internet checksum, including ICMP, UDP, TCP, as well as many others. The most interesting use though comes from the IP header, because the fact that upon forwarding the packet to the gateway, and along each intermediate router, the TTL is decremented, and the checksum is recalculated, therefore losing the immediate covert-channel checksum. The end destination, in order to retrieve the original checksum, must replace the TTL with the original TTL and calculate the sum in the normal fashion, and then retrieve m. An extension to this would be to use the IP ID field as a 32-bit ‘key’, which the target node must also replace in order to retrieve the message.
Source Code in C
I will be posting some source code in C soon...
