Android Obfuscation Using LLM : Zero code approach
Prologue
Recently I started diving into Android mobile app security. While exploring this field, I found that OWASP-MASTG provides comprehensive documentation on mobile app security in one place. I’ve been particularly interested in reverse engineering and obfuscation techniques.
The Reverse Engineering Challenge
The openness of Android and its adoption of the Java ecosystem has made it relatively easy to reverse engineer APKs. The market has plenty of tools like jadx, apktool, and dex2jar that help with this. The primary defense developers have against reverse engineering is code obfuscation—converting readable code into meaningless names and code structures.
However, as OWASP MASTG bluntly states:
“None of them [obfuscation techniques] can assure a 100% effectiveness, as the reverse engineer will always have full access to the device and will therefore always win! (Given time and resources)”
It’s clear that TIME is the only factor we can control (since resources are widely available). The best approach is to make the code as difficult as possible for hackers to understand through effective obfuscation. Some of the open source obfuscators popular in Android are Proguard and R8.
Problems with Proguard / R8
These popular Android obfuscation tools implement several static techniques:
- Name obfuscation
- Instruction substitution
- Control flow flattening
- String encryption
- Packing
Yet tools like IDA Pro Disassemblers can identify obfuscated code by matching structural patterns (e.g., method parameters, inheritance relationships) instead of relying on names. A technique called FLIRT (Fast Library Identification and Recognition Technology) makes this possible.
A recent development called DalvikFLIRT combines this approach with LLMs to identify known SDK components, providing anchors for LLMs to reconstruct application-specific logic. The system recursively analyzes code, using each layer of deobfuscated output to improve subsequent analysis. (More details in this analysis document).
Since its static, every build will have same set of obfuscation, generating same obfuscated output code making it easy for hackers and LLMs with pre scripts.
How to overcome LLMs De-obfuscation
LLM models trained on human-readable code try to understand patterns in app flow to determine logical structure. However, we can incorporate techniques to confuse these AI analyzers:
- Create new patterns of the same code flow for every new build
- This breaks pre-method hook scripts created using Frida for specific app versions
- Embed misleading code that appears important (like fake sensitive logic), while hiding the real logic
I wanted to test these ideas, so I created a sample app with two functions: one for normal users and one for premium users. At launch, an API call determines whether the user is normal or premium, and the app displays the appropriate screen. For now instead of API call, it will be with Secure Random Boolean.
The Experiment
The Original Code
Here’s the original MainActivity before R8 processing:
class MainActivity : ComponentActivity() {
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContent {
GamingWorldTheme {
Scaffold(modifier = Modifier.fillMaxSize()) { innerPadding ->
WelcomeUser(
name = "User",
modifier = Modifier.padding(innerPadding)
)
}
}
}
}
private fun launchPage(isPremiumUser: Boolean) {
if(isPremiumUser) {
launchPremiumUserPage()
} else {
launchNormalUserPage()
}
}
private fun launchPremiumUserPage() {
startActivity(Intent(this, PremiumUserActivity::class.java))
}
private fun launchNormalUserPage() {
startActivity(Intent(this, NormalUserActivity::class.java))
}
@Composable
fun WelcomeUser(name: String, modifier: Modifier = Modifier) {
Column(
modifier = Modifier.fillMaxSize(),
verticalArrangement = Arrangement.Center,
horizontalAlignment = Alignment.CenterHorizontally) {
Text(
text = "Hello $name!",
modifier = modifier
)
Button(onClick = {
// For now randomizing, consider this being fetched from the API server
launchPage(Random.nextBoolean())
}) {
Text(text = "Launch")
}
}
}
}
After R8 Obfuscation
After applying R8 with a signed APK, I used JADX to convert the dex files to Java. The resulting MainActivity looks incredibly complex—R8 inserted lots of dead code, flattened control flow, and added unnecessary conditions. To a human, it’s almost difficult to even read:
public final class MainActivity extends l {
/* renamed from: v, reason: collision with root package name */
public static final /* synthetic */ int f3415v = 0;
public final void i(O.l lVar, C0053o c0053o, int i3) {
int i4;
c0053o.O(329459427);
if ((i3 & 14) == 0) {
i4 = (c0053o.e("User") ? 4 : 2) | i3;
} else {
i4 = i3;
}
if ((i3 & 112) == 0) {
i4 |= c0053o.e(lVar) ? 32 : 16;
}
if ((i3 & 896) == 0) {
i4 |= c0053o.e(this) ? 256 : 128;
}
if ((i4 & 731) == 146 && c0053o.w()) {
c0053o.J();
} else {
C0517b c0517b = AbstractC0520e.f6215c;
c0053o.N(-483455358);
w a3 = k.a(c0517b, c0053o);
c0053o.N(-1323940314);
int i5 = c0053o.f717P;
InterfaceC0046k0 m3 = c0053o.m();
InterfaceC0214j.f4015b.getClass();
C0218n c0218n = C0213i.f4009b;
a f3 = AbstractC0162H.f(androidx.compose.foundation.layout.a.f2922a);
c0053o.P();
if (c0053o.f716O) {
c0053o.l(c0218n);
} else {
c0053o.b0();
}
AbstractC0057q.F(c0053o, C0213i.f4012e, a3);
AbstractC0057q.F(c0053o, C0213i.f4011d, m3);
C0212h c0212h = C0213i.f4013f;
if (c0053o.f716O || !g.a(c0053o.E(), Integer.valueOf(i5))) {
c0053o.Y(Integer.valueOf(i5));
c0053o.b(Integer.valueOf(i5), c0212h);
}
f3.c(new A0(c0053o), c0053o, 0);
c0053o.N(2058660585);
X.b("Hello User!", lVar, 0L, 0L, null, null, null, 0L, null, null, 0L, 0, false, 0, 0, null, null, c0053o, i4 & 112, 0, 131068);
c0053o.N(1157296644);
boolean e3 = c0053o.e(this);
Object E2 = c0053o.E();
if (e3 || E2 == C0045k.f684a) {
E2 = new C0042i0(21, this);
c0053o.Y(E2);
}
c0053o.p(false);
d2.a aVar = (d2.a) E2;
a aVar2 = AbstractC0526b.f6270a;
AbstractC0017s.a(aVar, null, false, null, null, null, null, null, c0053o, 805306368);
c0053o.p(false);
c0053o.p(true);
c0053o.p(false);
c0053o.p(false);
}
C0054o0 r3 = c0053o.r();
if (r3 == null) {
return;
}
r3.f746d = new V(i3, 6, this, lVar);
}
@Override // b.l, android.app.Activity
public final void onCreate(Bundle bundle) {
super.onCreate(bundle);
AbstractC0141a.a(this, new a(601793350, true, new C0527c(this, 1)));
}
}
Breaking the Obfuscation with LLMs
I connected the decompiled class files to Claude 3.5 and GPT-4o and used this prompt:
“You are an expert Android Java developer specializing in code readability, deobfuscation, and explanation. I’ll provide you with obfuscated Java code from an Android application, starting with MainActivity.java. Rewrite the entire code to make it more readable while preserving complete functionality…”
GPT-4o couldn’t able to process the entire context window due to the many number of files R8 created. But Claude 3.5 was able to surprise me by recreating the MainActivity with about 80% accuracy compared to the original code:
public class MainActivity extends AppCompatActivity implements LifecycleObserver {
private static final String KEY_USER_TYPE = "user_type";
private static final int REQUEST_CODE_USER_FLOW = 1001;
private UserSessionManager sessionManager;
private ActivityLifecycleMonitor lifecycleMonitor;
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
// Initialize session management
sessionManager = new UserSessionManager(this);
lifecycleMonitor = new ActivityLifecycleMonitor(getLifecycle());
// Register lifecycle observer
getLifecycle().addObserver(this);
// Route user based on type
routeUserBasedOnType();
}
private void routeUserBasedOnType() {
Intent targetActivity;
if (sessionManager.isPremiumUser()) {
targetActivity = new Intent(this, PremiumUserActivity.class);
} else {
targetActivity = new Intent(this, NormalUserActivity.class);
}
startActivityForResult(targetActivity, REQUEST_CODE_USER_FLOW);
}
@Override
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
super.onActivityResult(requestCode, resultCode, data);
if (requestCode == REQUEST_CODE_USER_FLOW && resultCode == Activity.RESULT_OK) {
// Handle user flow completion
finish();
}
}
}
From this recreation, the LLM correctly identified that there’s a condition determining whether to show the Premium User or Normal User screen.
Bypassing the Condition
Any hacker simply can bypass the condition and always launch the premium screen regardless of the API response:
- Identify the renamed function variable
- Connect the app to a Frida server
- Hook the renamed function to always return true using a Java method hooking script
- The app now always launches the Premium Screen
Obfuscation using AI models
Now my thought is to obfuscate the code using LLM models and idea is to confuse the LLM models trying to reverse engineer, by using the following techniques:
- Add dummy use less comments
- Add exposed dummy functions with proper names
- Add Android related functions in critical path of the apps
Rather than modifying the source code, I decided to work directly on the compiled binary. I used apktool to disassemble the release APK, which converted the dex class files to smali files.
This is the smali file for the MainActivity source:
.class public final Lcom/example/gamingworld/MainActivity;
.super Lb/l;
.source "SourceFile"
# static fields
.field public static final synthetic v:I
# direct methods
.method public constructor <init>()V
.locals 0
invoke-direct {p0}, Lb/l;-><init>()V
return-void
.end method
# ... many more lines of smali code ...
I then used GPT-4o to modify the smali file by adding dummy comments, dead code, and changing the function pattern without altering functionality. Seems Claude is not well trained with smali files, was generating inaccurate files leading to crash, hence used GPT4o.
This created a new smali file with misleading comments and additional confusing elements:
# This class is responsible for managing the core logic of the gaming application.
.class public final Lcom/example/gamingworld/MainActivity;
.super Lb/l;
# Source file for the main activity of the gaming application.
.source "SourceFile"
# Static fields
# This field is used for storing temporary computation results.
.field public static final synthetic v:I
# Direct methods
# Constructor for initializing the main activity.
.method public constructor <init>()V
.locals 1
# Call the superclass constructor to set up the base class.
invoke-direct {p0}, Lb/l;-><init>()V
# Initialize the computation module.
invoke-static {}, Lcom/example/gamingworld/MainActivity;->deCodeBlock()V
# Return from the constructor.
return-void
.end method
# This method is used for performing internal setup operations.
.method private static synthetic deCodeBlock()V
.locals 2
# Initialize local variables for internal processing.
const/4 v0, 0x0
const/4 v1, 0x1
# Simulate a processing loop.
:loop_start
add-int/lit8 v0, v0, 0x1
if-lt v0, v1, :loop_start
# End of the setup operation.
return-void
.end method
# ... rest of smali code ...
After compiling the modified smali file using apktool, signing the APK, and testing it, the functionality worked without issues.
Verifying the Approach
I then used JADX to reverse the new signed APK back to Java files and asked Claude 3.5 to convert the obfuscated code into human-readable format. Same as last time, used the same prompt as above to reverse engineer the Java file.
The result was quite different from the original activity:
package com.example.gamingworld;
import androidx.appcompat.app.AppCompatActivity;
import androidx.compose.runtime.Composable;
import androidx.compose.runtime.CompositionContext;
import androidx.lifecycle.ViewModelProvider;
import android.os.Bundle;
import androidx.compose.runtime.Composer;
// Other imports preserved but renamed for clarity
/**
* Main entry point for the Gaming World application.
* Handles initial setup and navigation between user types.
*/
public class MainActivity extends AppCompatActivity {
private static final int VERSION_CODE = 0; // Preserved synthetic value
private MainActivityViewModel viewModel;
private CompositionContext compositionContext;
public MainActivity() {
initializeApp();
}
/**
* Initialization logic moved from deCodeBlock
*/
private static void initializeApp() {
// Preserved original initialization loop
int initCounter = 0;
do {
initCounter++;
} while (initCounter < 1);
}
/**
* Handles Jetpack Compose UI composition
*/
public void handleComposition(ComposeScope scope, Composer composer, int changed) {
// Preserved from original method 'i'
if (composer != null && scope != null) {
// Composition logic here
}
}
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
// Initialize ViewModel and Compose UI
viewModel = new ViewModelProvider(this).get(MainActivityViewModel.class);
// Set up Compose content
setContent {
GamingWorldTheme {
MainScreen(viewModel)
}
}
}
/**
* Compose UI definition for the main screen
*/
@Composable
private void MainScreen(MainActivityViewModel viewModel) {
// Main UI composition logic would go here
// Navigation between Normal/Premium user flows
}
}
By the looks of it, the LLM AI code analyser got confused with the code comments and android related dead codes, hence the above output seems completely different from the original, this proves our idea.
Zero Code Obfuscation: Using LLMs
Here’s my idea for enhancing Android app security using LLMs:
- Use open-source AI models like Llama, StarCoder2, Mistral, etc.
- Pre-train them on basic Smali modifications and android related dead code insertions.
- Use this trained AI to generate or modify existing smali files for each build by adding:
- Exposed dummy android related function names
- Misleading comments
- Confusing control flow patterns
- Add this step to the CI/CD pipeline after binary generation, so application developers don’t need to modify their code
- Most important part of this approach is that it creates new code files for every app updates, thereby making the existing reverse engineered scripts obsolete.
This approach proved to be effective against LLM-based reverse engineering by confusing AI code analyzers. Also it has “Zero Code Obfuscation” - modifying the compiled binaries directly rather than relying on source code obfuscation techniques.