Android Obfuscation Using LLM : Zero code approach

Sun, May 4, 2025 10-minute read

Prologue

Recently I started diving into Android mobile app security. While exploring this field, I found that OWASP-MASTG provides comprehensive documentation on mobile app security in one place. I’ve been particularly interested in reverse engineering and obfuscation techniques.

The Reverse Engineering Challenge

The openness of Android and its adoption of the Java ecosystem has made it relatively easy to reverse engineer APKs. The market has plenty of tools like jadx, apktool, and dex2jar that help with this. The primary defense developers have against reverse engineering is code obfuscation—converting readable code into meaningless names and code structures.

However, as OWASP MASTG bluntly states:

“None of them [obfuscation techniques] can assure a 100% effectiveness, as the reverse engineer will always have full access to the device and will therefore always win! (Given time and resources)”

It’s clear that TIME is the only factor we can control (since resources are widely available). The best approach is to make the code as difficult as possible for hackers to understand through effective obfuscation. Some of the open source obfuscators popular in Android are Proguard and R8.

Problems with Proguard / R8

These popular Android obfuscation tools implement several static techniques:

Name obfuscation
Instruction substitution
Control flow flattening
String encryption
Packing

Yet tools like IDA Pro Disassemblers can identify obfuscated code by matching structural patterns (e.g., method parameters, inheritance relationships) instead of relying on names. A technique called FLIRT (Fast Library Identification and Recognition Technology) makes this possible.

A recent development called DalvikFLIRT combines this approach with LLMs to identify known SDK components, providing anchors for LLMs to reconstruct application-specific logic. The system recursively analyzes code, using each layer of deobfuscated output to improve subsequent analysis. (More details in this analysis document).

Since its static, every build will have same set of obfuscation, generating same obfuscated output code making it easy for hackers and LLMs with pre scripts.

How to overcome LLMs De-obfuscation

LLM models trained on human-readable code try to understand patterns in app flow to determine logical structure. However, we can incorporate techniques to confuse these AI analyzers:

Create new patterns of the same code flow for every new build
- This breaks pre-method hook scripts created using Frida for specific app versions
Embed misleading code that appears important (like fake sensitive logic), while hiding the real logic

I wanted to test these ideas, so I created a sample app with two functions: one for normal users and one for premium users. At launch, an API call determines whether the user is normal or premium, and the app displays the appropriate screen. For now instead of API call, it will be with Secure Random Boolean.

The Experiment

The Original Code

Here’s the original MainActivity before R8 processing:

class MainActivity : ComponentActivity() {
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContent {
            GamingWorldTheme {
                Scaffold(modifier = Modifier.fillMaxSize()) { innerPadding ->
                    WelcomeUser(
                        name = "User",
                        modifier = Modifier.padding(innerPadding)
                    )
                }
            }
        }
    }

    private fun launchPage(isPremiumUser: Boolean) {
        if(isPremiumUser) {
            launchPremiumUserPage()
        } else {
            launchNormalUserPage()
        }
    }

    private fun launchPremiumUserPage() {
        startActivity(Intent(this, PremiumUserActivity::class.java))
    }

    private fun launchNormalUserPage() {
        startActivity(Intent(this, NormalUserActivity::class.java))
    }

    @Composable
    fun WelcomeUser(name: String, modifier: Modifier = Modifier) {
        Column(
            modifier = Modifier.fillMaxSize(),
            verticalArrangement = Arrangement.Center,
            horizontalAlignment = Alignment.CenterHorizontally) {
            Text(
                text = "Hello $name!",
                modifier = modifier
            )
            Button(onClick = {
                // For now randomizing, consider this being fetched from the API server
                launchPage(Random.nextBoolean())
            }) {
                Text(text = "Launch")
            }
        }
    }
}

After R8 Obfuscation

Compress

After applying R8 with a signed APK, I used JADX to convert the dex files to Java. The resulting MainActivity looks incredibly complex—R8 inserted lots of dead code, flattened control flow, and added unnecessary conditions. To a human, it’s almost difficult to even read:

jadx

public final class MainActivity extends l {

    /* renamed from: v, reason: collision with root package name */
    public static final /* synthetic */ int f3415v = 0;

    public final void i(O.l lVar, C0053o c0053o, int i3) {
        int i4;
        c0053o.O(329459427);
        if ((i3 & 14) == 0) {
            i4 = (c0053o.e("User") ? 4 : 2) | i3;
        } else {
            i4 = i3;
        }
        if ((i3 & 112) == 0) {
            i4 |= c0053o.e(lVar) ? 32 : 16;
        }
        if ((i3 & 896) == 0) {
            i4 |= c0053o.e(this) ? 256 : 128;
        }
        if ((i4 & 731) == 146 && c0053o.w()) {
            c0053o.J();
        } else {
            C0517b c0517b = AbstractC0520e.f6215c;
            c0053o.N(-483455358);
            w a3 = k.a(c0517b, c0053o);
            c0053o.N(-1323940314);
            int i5 = c0053o.f717P;
            InterfaceC0046k0 m3 = c0053o.m();
            InterfaceC0214j.f4015b.getClass();
            C0218n c0218n = C0213i.f4009b;
            a f3 = AbstractC0162H.f(androidx.compose.foundation.layout.a.f2922a);
            c0053o.P();
            if (c0053o.f716O) {
                c0053o.l(c0218n);
            } else {
                c0053o.b0();
            }
            AbstractC0057q.F(c0053o, C0213i.f4012e, a3);
            AbstractC0057q.F(c0053o, C0213i.f4011d, m3);
            C0212h c0212h = C0213i.f4013f;
            if (c0053o.f716O || !g.a(c0053o.E(), Integer.valueOf(i5))) {
                c0053o.Y(Integer.valueOf(i5));
                c0053o.b(Integer.valueOf(i5), c0212h);
            }
            f3.c(new A0(c0053o), c0053o, 0);
            c0053o.N(2058660585);
            X.b("Hello User!", lVar, 0L, 0L, null, null, null, 0L, null, null, 0L, 0, false, 0, 0, null, null, c0053o, i4 & 112, 0, 131068);
            c0053o.N(1157296644);
            boolean e3 = c0053o.e(this);
            Object E2 = c0053o.E();
            if (e3 || E2 == C0045k.f684a) {
                E2 = new C0042i0(21, this);
                c0053o.Y(E2);
            }
            c0053o.p(false);
            d2.a aVar = (d2.a) E2;
            a aVar2 = AbstractC0526b.f6270a;
            AbstractC0017s.a(aVar, null, false, null, null, null, null, null, c0053o, 805306368);
            c0053o.p(false);
            c0053o.p(true);
            c0053o.p(false);
            c0053o.p(false);
        }
        C0054o0 r3 = c0053o.r();
        if (r3 == null) {
            return;
        }
        r3.f746d = new V(i3, 6, this, lVar);
    }

    @Override // b.l, android.app.Activity
    public final void onCreate(Bundle bundle) {
        super.onCreate(bundle);
        AbstractC0141a.a(this, new a(601793350, true, new C0527c(this, 1)));
    }
}

jadx-gui

Breaking the Obfuscation with LLMs

I connected the decompiled class files to Claude 3.5 and GPT-4o and used this prompt:

“You are an expert Android Java developer specializing in code readability, deobfuscation, and explanation. I’ll provide you with obfuscated Java code from an Android application, starting with MainActivity.java. Rewrite the entire code to make it more readable while preserving complete functionality…”

GPT-4o couldn’t able to process the entire context window due to the many number of files R8 created. But Claude 3.5 was able to surprise me by recreating the MainActivity with about 80% accuracy compared to the original code:

public class MainActivity extends AppCompatActivity implements LifecycleObserver {
    
    private static final String KEY_USER_TYPE = "user_type";
    private static final int REQUEST_CODE_USER_FLOW = 1001;
    
    private UserSessionManager sessionManager;
    private ActivityLifecycleMonitor lifecycleMonitor;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
        
        // Initialize session management
        sessionManager = new UserSessionManager(this);
        lifecycleMonitor = new ActivityLifecycleMonitor(getLifecycle());
        
        // Register lifecycle observer
        getLifecycle().addObserver(this);
        
        // Route user based on type
        routeUserBasedOnType();
    }

    private void routeUserBasedOnType() {
        Intent targetActivity;
        if (sessionManager.isPremiumUser()) {
            targetActivity = new Intent(this, PremiumUserActivity.class);
        } else {
            targetActivity = new Intent(this, NormalUserActivity.class);
        }
        startActivityForResult(targetActivity, REQUEST_CODE_USER_FLOW);
    }

    @Override
    protected void onActivityResult(int requestCode, int resultCode, Intent data) {
        super.onActivityResult(requestCode, resultCode, data);
        if (requestCode == REQUEST_CODE_USER_FLOW && resultCode == Activity.RESULT_OK) {
            // Handle user flow completion
            finish();
        }
    }
}

From this recreation, the LLM correctly identified that there’s a condition determining whether to show the Premium User or Normal User screen.

Bypassing the Condition

Any hacker simply can bypass the condition and always launch the premium screen regardless of the API response:

Identify the renamed function variable
Connect the app to a Frida server
Hook the renamed function to always return true using a Java method hooking script
The app now always launches the Premium Screen

Obfuscation using AI models

Now my thought is to obfuscate the code using LLM models and idea is to confuse the LLM models trying to reverse engineer, by using the following techniques:

Add dummy use less comments
Add exposed dummy functions with proper names
Add Android related functions in critical path of the apps

Rather than modifying the source code, I decided to work directly on the compiled binary. I used apktool to disassemble the release APK, which converted the dex class files to smali files.

apktoold

This is the smali file for the MainActivity source:

.class public final Lcom/example/gamingworld/MainActivity;
.super Lb/l;
.source "SourceFile"

# static fields
.field public static final synthetic v:I

# direct methods
.method public constructor <init>()V
    .locals 0

    invoke-direct {p0}, Lb/l;-><init>()V

    return-void
.end method

# ... many more lines of smali code ...

I then used GPT-4o to modify the smali file by adding dummy comments, dead code, and changing the function pattern without altering functionality. Seems Claude is not well trained with smali files, was generating inaccurate files leading to crash, hence used GPT4o.

This created a new smali file with misleading comments and additional confusing elements:

# This class is responsible for managing the core logic of the gaming application.
.class public final Lcom/example/gamingworld/MainActivity;
.super Lb/l; 
# Source file for the main activity of the gaming application.
.source "SourceFile"

# Static fields
# This field is used for storing temporary computation results.
.field public static final synthetic v:I

# Direct methods
# Constructor for initializing the main activity.
.method public constructor <init>()V
    .locals 1

    # Call the superclass constructor to set up the base class.
    invoke-direct {p0}, Lb/l;-><init>()V

    # Initialize the computation module.
    invoke-static {}, Lcom/example/gamingworld/MainActivity;->deCodeBlock()V

    # Return from the constructor.
    return-void
.end method

# This method is used for performing internal setup operations.
.method private static synthetic deCodeBlock()V
    .locals 2

    # Initialize local variables for internal processing.
    const/4 v0, 0x0
    const/4 v1, 0x1

    # Simulate a processing loop.
    :loop_start
    add-int/lit8 v0, v0, 0x1
    if-lt v0, v1, :loop_start

    # End of the setup operation.
    return-void
.end method

# ... rest of smali code ...

apktool

After compiling the modified smali file using apktool, signing the APK, and testing it, the functionality worked without issues.

Verifying the Approach

I then used JADX to reverse the new signed APK back to Java files and asked Claude 3.5 to convert the obfuscated code into human-readable format. Same as last time, used the same prompt as above to reverse engineer the Java file.

The result was quite different from the original activity:

finalcode

package com.example.gamingworld;

import androidx.appcompat.app.AppCompatActivity;
import androidx.compose.runtime.Composable;
import androidx.compose.runtime.CompositionContext;
import androidx.lifecycle.ViewModelProvider;
import android.os.Bundle;
import androidx.compose.runtime.Composer;
// Other imports preserved but renamed for clarity

/**
 * Main entry point for the Gaming World application.
 * Handles initial setup and navigation between user types.
 */
public class MainActivity extends AppCompatActivity {

    private static final int VERSION_CODE = 0; // Preserved synthetic value
    private MainActivityViewModel viewModel;
    private CompositionContext compositionContext;

    public MainActivity() {
        initializeApp();
    }

    /**
     * Initialization logic moved from deCodeBlock
     */
    private static void initializeApp() {
        // Preserved original initialization loop
        int initCounter = 0;
        do {
            initCounter++;
        } while (initCounter < 1);
    }

    /**
     * Handles Jetpack Compose UI composition
     */
    public void handleComposition(ComposeScope scope, Composer composer, int changed) {
        // Preserved from original method 'i'
        if (composer != null && scope != null) {
            // Composition logic here
        }
    }

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        
        // Initialize ViewModel and Compose UI
        viewModel = new ViewModelProvider(this).get(MainActivityViewModel.class);
        
        // Set up Compose content
        setContent {
            GamingWorldTheme {
                MainScreen(viewModel)
            }
        }
    }

    /**
     * Compose UI definition for the main screen
     */
    @Composable
    private void MainScreen(MainActivityViewModel viewModel) {
        // Main UI composition logic would go here
        // Navigation between Normal/Premium user flows
    }
}

By the looks of it, the LLM AI code analyser got confused with the code comments and android related dead codes, hence the above output seems completely different from the original, this proves our idea.

Zero Code Obfuscation: Using LLMs

Here’s my idea for enhancing Android app security using LLMs:

Use open-source AI models like Llama, StarCoder2, Mistral, etc.
Pre-train them on basic Smali modifications and android related dead code insertions.
Use this trained AI to generate or modify existing smali files for each build by adding:
- Exposed dummy android related function names
- Misleading comments
- Confusing control flow patterns
Add this step to the CI/CD pipeline after binary generation, so application developers don’t need to modify their code
Most important part of this approach is that it creates new code files for every app updates, thereby making the existing reverse engineered scripts obsolete.

This approach proved to be effective against LLM-based reverse engineering by confusing AI code analyzers. Also it has “Zero Code Obfuscation” - modifying the compiled binaries directly rather than relying on source code obfuscation techniques.

I'm Vignesh