***********************************************
Some ideas to increase detection complexity
by Second Part To Hell
***********************************************
Index:
******
0) Introduction
1) Improving tau-obfuscation?
2) Reverse Engineering vs. Meta-Language in Body
3) Code Integration -> Code Merging
4) Overlapping Code for mutations
0) Introduction
Here you'll find a few small ideas and thoughts about making detection
of computerviruses harder. Thanks alot to herm1t and hh86 for discussion
and asking the right questions.
1) Improving tau-obfuscation?
The idea of tau-obfuscation is to perform a time-intensive calculation
before encrypting/executing the virus-code, with the result that
realistic AV emulators have to give up (as they can't scan one file for
too long). This technique has been already covered by Beaucamps &
Filiol[1] and Z0MBiE[2].
A simple example:
encrypted_code=[ENCRYPTED CODE];
key=sum(factors(VERY_BIG_INTEGER_NUMBER));
eval(decrypt(encrypted_code, key));
* First question: What algorithm should be used?
Algorithms such as factorization need much code, and could be a source
for detection themselves.
Z0MBiE used a RSA algorithm, which is smaller than factorization, but
still big in terms of assembler instructions - and as its asymmetric
decryption, it has to carry both encrypted code and decryption key.
In MatLab.MicrophoneFever[3] I've used inbuild complex mathematic
functions provided by MatLab, thus reduced the code size. Disadvantage
of this methode is obviously the dependence on mathematical programs.
A simple solution is to use short Random Number Generators such as
LCG or XORSHIFT, which can be created with <10 assembler instructions.
With that method, the decryption key could be the n-th random number
starting from a given random seed. n can be adjusted such that it
takes xxx seconds to find the key.
To avoid X-Ray attacks, subsequent numbers can be combined to form
the whole key.
* Second question: What about observant users?
Imagine threshold tau is set to one minute. An infected program is
executed, the user would have to wait for one minute. Obviously this
will smell fishy.
The most simple solution would be to start the decryption engine as
own process with lowest priority. By that, whenever CPU isn't used,
the engine continues to decrypt itself.
Advantage: user wont notice anything and emulator still would have to
to invest much time.
* Third question: After decryption - fully unprotected?
We could use partial decryption of the code:
Get 1st key with tau-obfuscation
Decrypt 1st part
Execute 1st part
Re-encrypt 1st part
Get 2nd key with tau-obfuscation
Decrypt 2nd part
Execute 2nd part
Re-encrypt 2nd part
...
Get n-th key with tau-obfuscation
Decrypt n-th part
Execute n-th part
Re-encrypt n-th part
The virus will never be fully undecrypted in memory - it never loses
its shild.
* Fourth question: Suspicious single loop?
What if antivirus program mark a short long-running loop as
suspicious?
Simple: Instead of searching for one key after N loops of a RNG engine
we can search for m keys after (N/m) loops each, and use each key to
encrypt one of the m parts of the virusbody.
* Fifth question: Can I use it only for encryption?
We can use this technique for general obfuscation, not just
encryption.
Examples:
bignum=BIG_SPECIAL_NUMBER;
jmpvalue=add(factors(bignum))%pow(2,32);
jmp dword[jmpvalue]
or
bignum=BIG_SPECIAL_NUMBER;
datavalue=add(factors(bignum))%pow(2,32);
mov dword[eax], datavalue
We see using tau-obfuscation can be fun for us and pain for them. :)
[1] Philippe Beaucamps & Eric Filiol, "On the possibility of practically
obfuscating programs Towards a unified perspective of code protection"
Journal in Computer Virology, April 2007.
[2] Z0MBiE, ""DELAYED CODE" technology (version 1.1)", 2000,
http://vxheavens.com/lib/vzo23.html
[3] SPTH, "Matlab.MicrophoneFever2", Valhalla Magazine, July 2011.
2) Reverse Engineering vs. Meta-Language in Body
Metamorphic viruses/worms need the information of their structure coded
in a metalanguage to work with it later (change it and write it back to
native code).
One way is to get it by reverse engineering (disassembling) the code.
- -
Biologic organisms need the information of their structure coded in a
metalanguage to work with it later (due to the lack of a "copy
function").
They could also use a mechanism of reverse engineering the structures in
the cell to get this information.
They dont do this, because its way to complicated. Instead, they save
the whole information within the cell in form of the metalanguage
(DNA), and therefor they can directly start at this step.
For compuerviruses, the meta-language structure must not appear in
plain-text, and simple encryption is vulnerable to statistical
attacks.
Instead, one could write the zero-form at runtime to memory:
mov edi, Alloc_memory_for_metalanguage
mov dword[edi], 'AABBCCDD'
mov dword[edi+4], 'EEFFGGHH'
Advantage: This writing process is an excellent source for metamorphic
mutations, thus increases the variability of the organism alot, by that
also increases the detection complexity.
We can be funny and add simple encryption to written memory:
mov edi, Alloc_memory_for_metalanguage
mov dword[edi], 'XXYYZZAA'
mov dword[edi+4], 'BBCCDDEE'
...
for(int i=0; i<Metalanguage_size; i++)
{
mov byte[edi+i], (byte[edi+i]+23)%26;
}
...
Now - an emulation can kill us? No, just use tau-obfuscation :)
PS: Conway's Game of Life is known to be Turing-complete. In 2010,
Andrew Wade wrote the first self-replicator in that "universe". The
self-replicator has its own structure stored in a dynamic tape (DNA)
and uses a glider-stream (biosynthesis?) to gain the information.
(http://conwaylife.com/forums/viewtopic.php?f=2&t=399)
3) Code Integration -> Code Merging
Code integration is certainly the most complex infection technique
for computer viruses so far. It was first used in ZMist by Z0MBiE for
Win32 executeables in 2001[4][5], and later in 2007 by herm1t in his
Linux.Lacrimae[6][7].
The idea is to fully disassemble the host and virus, and integrate the
viruscode into the hostcode:
*************** #####################
* * ## ##
* H * ## jmp Vir1 ##
* * ## Host1: ##
* O * ## H ##
* * ## jmp Host2 ##
* S * ## Vir3: ##
* * ## R ##
* T * ## jmp Host1 ##
* * ## Host2: ##
*************** ## O ##
- - - > ## jmp Host3 ##
+++++++++++++++ ## Vir1: ##
+ + ## V ##
+ V + ## jmp Vir2 ##
+ + ## Host3: ##
+ I + ## S ##
+ + ## jmp Host4 ##
+ R + ## Vir2: ##
+ + ## I ##
+++++++++++++++ ## jmp Vir3 ##
## Host4: ##
## T ##
## ##
#####################
This is a successful technique. However, we can try to put it one
additional step further.
We can not just insert the virus between the hostcode, but actually use
the hostcode as viruscode, by creating a second codeflow.
Let's say, we want to include a simple
invoke MessageBox, 0x0, VMSG1, VMSG2, 0x0
into a given hostcode:
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
include 'E:\Programme\FASM\INCLUDE\win32ax.inc'
.data
FileName db 'info.txt',0
hCreateFileFile dd 0x0
.code
start:
push 0x0
push FILE_ATTRIBUTE_NORMAL
push OPEN_ALWAYS
push 0x0
push 0x0
push (GENERIC_READ or GENERIC_WRITE)
push FileName
stdcall dword[CreateFileA]
mov dword[hCreateFileFile], eax
ret
.end start
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
To get enough instructions that we can use, we can expand the hostcode
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
include 'E:\Programme\FASM\INCLUDE\win32ax.inc'
.data
FileName db 'info.txt',0
hCreateFileFile dd 0x0
.code
start:
push 0x0
mov eax, FILE_ATTRIBUTE_NORMAL
push eax
push OPEN_ALWAYS
push 0x0
push 0x0
mov eax, (GENERIC_READ or GENERIC_WRITE)
push eax
mov eax, FileName
push eax
mov eax, CreateFileA
stdcall dword[eax]
mov ebx, hCreateFileFile
mov dword[ebx], eax
ret
.end start
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
And now let's merge our MessageBox with this hostcode.
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
include 'E:\Programme\FASM\INCLUDE\win32ax.inc'
.data
FileName db 'info.txt',0
hCreateFileFile dd 0x0
VMSG1 db 'Hello',0
VMSG2 db 'VXers!',0
.code
start:
xor ecx, ecx ; Set ZF
jmp VirInstr1
HostInstr0:
push 0x0
mov eax, FILE_ATTRIBUTE_NORMAL
jnz HostInstr1
VirInstr3:
add eax, (VMSG1-FileName)
xor ecx, ecx ; Set ZF
jmp VirInstr4
HostInstr1:
VirInstr6:
push eax
jz VirInstr7
push OPEN_ALWAYS
VirInstr7:
push 0x0
jz VirInstr8
VirInstr1:
push 0x0
jz VirInstr2
mov eax, (GENERIC_READ or GENERIC_WRITE)
VirInstr4:
push eax
jz VirInstr5
VirInstr2:
mov eax, FileName
jz VirInstr3
push eax
jnz HostInstr4
VirInstr10:
inc ecx ; Clear ZF
jmp HostInstr0
HostInstr4:
mov eax, CreateFileA
VirInstr9:
stdcall dword[eax]
jz VirInstr10
jnz HostInstr2
VirInstr5:
add eax, (VMSG2-VMSG1)
xor ecx, ecx ; Set ZF
jmp VirInstr6
HostInstr2:
mov ebx, hCreateFileFile
jnz HostInstr3
VirInstr8:
add eax, (MessageBox-VMSG2)
xor ecx, ecx ; Set ZF
jmp VirInstr9
HostInstr3:
mov dword[ebx], eax
ret
.end start
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
We use the instructions given by the hostcode, and combine them with
conditional jumps. The only instructions that are not merged are some
re-adjustments of addresses (MessageBox, VMSG1, VMSG1) - but in fact
this could be done by merging too, however, the result would be more
complex.
Beside of hard recognizion of the code (even for the human eye), it
provides alot of freedom which can be used to alter after every
generation: which instructions are expanded; which registers are used
for expansion; how is the codeflow of the virus; ...
In my oppinion: Absolutly worth to bring to reality! :)
[4] Z0MBiE, "Automated reverse engineering: Mistfall engine.", 2000,
http://vxheavens.com/lib/vzo21.html
[5] Peter Ferrie & Péter Ször, "Zmist Opportunities", VirusBulletin Mar 2001,
http://vxheavens.com/lib/apf47.html
[6] herm1t, "Code integration on Linux: Cooking the PIE", EOF-DR-RRLF, 2008.
[7] Peter Ferrie, "Crimea river", VirusBulletin February 2008,
http://vxheavens.com/lib/apf12.html
4) Overlapping Code for mutations
Overlapping code are code segments that have different behaviour
depending on how they are executed. For instance:
00402000 > $ B8 31C04040 MOV EAX,4040C031
what happens if we jump to 00402001?
00402001 > 31C0 XOR EAX,EAX
00402003 . 40 INC EAX
00402004 . 40 INC EAX
This can be used in a vast variety of ways for obfuscation (in 1994,
Stormbringer wrote a virus that just consists of jump instructions,
using overlapping code[8]) or code protection[9].
Certainly, this can be used in mutation engines too, gives additional
variability.
Some examples:
Our code:
00402000 > $ 31C0 XOR EAX,EAX
00402002 . 40 INC EAX
00402003 . 40 INC EAX
Overlapped Code:
00402000 > $ 68 11204000 PUSH overlap_.00402011
00402005 . 68 0C204000 PUSH overlap_.0040200C
0040200A . 81F7 31C040C3 XOR EDI,C340C031
00402010 . C3 RETN
00402011 . 40 INC EAX
or
00402000 > $ B8 31C04040 MOV EAX,4040C031
00402005 . 3D 31C04040 CMP EAX,4040C031
0040200A .^74 F5 JE SHORT overlap_.00402001
or
00402000 > $ EB 02 JMP SHORT overlap_.00402004
00402002 . 81FE 31C04040 CMP ESI,4040C031
There are over 9.000 other ways to write the original instructions down
using overlapping code. One may consider this when planing the next
mutation engine.
[8] Stormbringer, "Jump", 40hex #14, 1994.
[9] Matthias Jacob & Mariusz H. Jakubowski & Ramarathnam Venkatesan, "Towards
Integral Binary Execution: Implementing Oblivious Hashing Using
Overlapped Instruction Encodings", 2007.
Second Part To Hell
July 2011