[ ... ]
So I decided to invent my own algorithm, and I just wanted anybody's opinion
on how secure this could be compared to the Win32 API version.
First, a C# program generates an array of 256 bytes, as such:
[ ... ]
This is run twice, so I have two arrays of 256 bytes, say key1 and key2.
These are then hardcoded into the C++ encryption algorithm.
Then, the C++ encryption algorithm goes as such:
It memcpys the _TCHAR array to a byte array, then loops round each byte of
this array.
For each byte, it gets the value of key1[n] (where n is the byte number),
and calls this 'b_current_indir' (the starting 'indirection level').
Then, it gets the value of key2[n] and calls this 'levels' - the number of
indirection levels.
Then, an inner loop runs 'levels' times - and on each loop the following
happens: the current byte of the data to be encrypted (dictated by the outer
loop) is XORed with key2[b_current_indir], and THEN, b_current_indir is
reassigned to take on the value of key2[b_current_indir].
[ ... ]
It seems to work to my eyes, but is it a pile of rubbish?
Yes, pretty much.
First you need to realize that repeated XORs have little real effect.
For the moment, let's assume that you XOR one particular input byte
with two other key bytes, calling the input A and the key bytes B and
C, giving: encrypted_byte = (A ^ B) ^ C. The associative property
says this is equivalent to encrypted_byte = A ^ (B ^ C).
If we chose to, we could do:
D = (B ^ C);
encrypted_byte = A ^ D;
Changing the number of times the inner loop executes only changes the
number of items on the right side of the first line that computes D,
meaning that whole inner loop is equivalent to XORing each input byte
with exactly ONE effective key byte (which is the result XORing
together all the key bytes you currently use).
Therefore, what you really have is just a slow and complex way of
producing the same output as XORing each input byte with one key
byte.
There are a number of ways of breaking this. Using a chosen
plaintext, the attack is downright simple: the attacker just gets you
to encrypt a bunch of zero bytes. Since the result of 0 XOR D is D,
this transmits your effective key. The result is that your attacker
gets the _effective_ key that even YOU don't know -- what you
transmit will look (to you) pretty much like random garbage (as you'd
expect for encrypted data) but is really the effective key.
The result of that is that the attacker can decrypt by XORing each
input byte with the effective key for that byte -- IOW, he can
actually decrypt your messages slightly FASTER than you can.
Using a known-plaintext attack is almost as easy: he simply XORs each
byte of the encrypted text with the original text that was encrypted
to (again) get the effective encryption key. Once again, he can
decrypt your messages faster than you can.
A ciphertext-only attack is a _little_ more difficult, but not much.
There are a couple of different typical methods. One is to look only
at the statistics of the encrypted text. You start separating the
text into two bins -- odd-numbered bytes in the first bin and even-
numbered bytes in the second. You then do three bins, four bins,
etc., until you find a length at which (almost by magic) there is a
substantial difference in the number of characters in each bin. At
that point, the attacker has just found your effective key length
(something you probably don't know).
From there his attack becomes fairly simple, based on the frequency
of various characters -- if you're using normal English input, the
space will typically be the most common character. Letters follow a
fairly consistent pattern -- 'e' the most common, 't' the second
most, etc. This will probably solve the encryption pretty easily,
but if he has any difficulty with it, he might also use digraphs and
trigraphs -- sets of two and three letters that do/don't occur
together (e.g. 'th' is common, 'q' is normally followed by 'u', 'jj'
is almost certainly a mistake, and so on).
The other basic approach to an attack requires two encrypted
messages. The attacker realizes (even if you don't) that each byte of
each message is A xor D, where A was the input byte and D and some
(effective) key byte. If he XORs together the bytes of the two
messages, he gets:
(A XOR D) XOR (A' XOR D)
Using the distributive property, this is:
(A xor A') XOR (D XOR D)
Since anything XOR'd with itself is zero, this is:
(A xor A') xor 0
XORing with zero is an identity operation, so this is equivalent to:
A xor A'
Now the attacker uses what are know as cribs -- he takes some guesses
at things that are likely to be in one message or the other. Just for
example, "the' is probably going to occur at least once in one of the
messages. Since he's now factored out the key, and just has the two
original messages XORd against each other, he'll simply take 'the'
and xor it with the combined message. At a place that one message
contained 'the' to start with, he'll get the plaintext of the other
message at the same spot as well. If he tries 'the' and gets 'qjv' as
the matching plaintext from the other message, he can easily guess
that he hasn't found a match. OTOH, if he gets 'his' there's a pretty
fair chance that he's just found a place that one of the original
messages contained 'the' and the other contained 'his'.
In a real attack, he might combine both approaches -- the first
approach is easy to automate, so he'd just let the computer crank
through that. Once he knows your effective key length, each match
with the crib solves more than just the few letters that match -- it
solves all the other characters where the same key byte would be
used.
With as simple of a scheme as you've devised, the cribs probably
wouldn't even come into use at all. The first approach would break
your cipher automatically in almost no time at all. Just for
comparison consider that years ago, WordPerfect would "encrypt"
documents with essentially this same scheme. On the 20 MHz 386 I used
back then, I could normally recover people's passwords for them
entirely automatically in about 10 seconds or so. On a current
machine, that would be reduced to a matter of milliseconds...