Ghidra Plug-in to Decode XOR-Obfuscated Strings

As a reverse engineer/malware analyst, it is important to be able to write scripts to help automate your workflow. One example is to be able to write plug-ins for tools such as Ghidra that can aid in deobfuscating strings for a particular sample. In this example we will be recovering strings that have been obfuscated using the bitwise XOR operation. This example uses fairly simple techniques that will serve as an introduction to writing Python Plugins, and string deobfuscation routines.

Code: https://github.com/comosedice2012/XOR-Decode-Strings-Ghidra-Plugin/blob/main/deobfuscate_ghidra_strings.py
Original sample and DLL: https://github.com/jstrosch/XOR-Decode-Strings-IDA-Plugin
Analysis on Youtube: https://youtu.be/un8I6dfuDVQ

Below is a sample of the obfuscated string pattern. The function called to deobfuscate the strings is FUN_10001210, and takes three arguments – the size of the string to decode, the key, and the obfuscated string, in that order.

Function FUN_10001210 allocates memory for the deobfuscated string using LocalAlloc and a loop. The loop takes each letter of the key and XORs with the obfuscated string. If the string is longer that the key, it uses modulo division to repeat back over the key and continue until the string is fully deobfuscated.

Finally, the pointer to the allocated memory that contains the deobfuscated string is returned and assigned to a global variable. The default behavior for this plug-in is to add the deobfuscated string value as a comment next to the assignment.

The first step to writing a plug-in is becoming familiar with the methods the Ghidra API provides. There is less information in Google searches about the Ghidra API compared to the IDA API, so your best bet is to read the Ghidra help docs. This particular page shows some of the methods used in this Python script:
https://ghidra.re/ghidra_docs/api/ghidra/program/database/references/ReferenceDBManager.html

Let’s get an overview of the code. For this example we know the obfuscated strings are pushed on the stack and used by a function to return the deobfuscated string. Therefore, if we iterate through all the cross referenced addresses to this function, we can get the values of our three variables, which are needed by the function do deobfuscate the strings, that are pushed on the stack before the function call.

We get the address of each cross reference, than go to the previous instruction and get the “obfuscated string”. We repeat this two more times to get the “key”, and the “string size” values. We make use of two other functions. The first, get_string(), is used to get the “obfuscated string” and the “key” strings. We do this by copying one byte at a time into an empty string, using the address and string size.

The other function, decode(), XORs the obfuscated string with the key, and uses modulo math to make sure we don’t go off the end of the string into other memory.

Lastly, you need to print the string as a comment using the methods from the Ghidra API.

This is what the disassembly looks like after using the new plug-in.

I read a great piece of advice regarding reverse engineering/malware analysis:
“If you find yourself doing the same task more than twice, find a way to automate it”.
Learning how to write scripts like this may take time initially, but it will pay off in the long run. Hopefully this demo will get your toes wet. Good luck!

Note:
This plug-in is not intended to decode all XOR obfuscated strings, but serves as a starting point to implement the logic you encounter and deobfuscate those strings.