APK File Contents - In-Depth Explanation
You all know that APK is the file extension for android application installation file. APK is the short form of application package. Technically it is just a zip file or more specifically a JAR file with its extension set to APK. That means you can rename APK into zip and open it using any compression utility. This is what I’m going to do today and I will explain what are the different files inside an APK file.
Here I am using the APK file of WhatsApp.
So first, let’s rename APK file into zip.
Now extract the zip file using your favorite unzipping tool. Here I am using WinRAR.
You can see a bunch of different files and folders. Take a look at all of them and see if you can identify the purpose of each.
This is called Reverse Engineering – understanding the building process or the working of something by analyzing its behavior, looks etc. Software crackers does this to identify the licensing mechanism and produce patches to disengage it. I do reverse engineering on everything all the time because I am curious about how those stuff work.
So let’s start with files.
This file contains the meta information about the app such as name of the app, package name, different activities and services, permissions required, supported version of Android etc. But it is not a plain text file. It is a binary (compiled form) XML (eXtended Markup Language. HTML’s daddy!). So you cannot simply open it and read the contents. Actually you can but the contents will look as if they are encrypted. Anyway try it.
This is the actual code of the app. “dex” is the short form of Dalvik Executable (remember Dalvik VM from previous chapter?). You already know that the android apps are coded in Java. The source code will be in the extension “.java”. When it is compiled it will become “.class”. But in android all these class files are further optimized and packed into dex file for running easily in the android run time. It also provides protection to the code so that no one can steal it easily.
This file is actually an archive of compiled resources. Resources are mostly the design parts of app such as layout of the app, strings, different values, images etc. Some of those resources are compiled and optimized when building app because otherwise they may affect the performance. This file is the package of those resources. Rest of the resources which are not compiled are kept in “res” folder.
The assets folder contains files which are not directly used by the app components . They are accessed from the code using a facility called AssetManager. In WhatsApp’s asset folder you can see another folder called “fonts”. Inside it there are three true type font files which is obviously Roboto. It is the font used in Material Design. So I think they use these fonts for keeping the same font style for KitKat and previous versions.
Do you know about the different text styles we can apply to chat messages in WhatsApp? And I’m pretty much sure that WhatsApp uses these fonts to show those styles. By the way these are just assumptions.
Lib folder contains the native libraries owned by app. In previous chapter I mentioned that we can code some part of app in C instead of Java to improve the performance. When we compile the app to APK file these C programs become native libraries with extension “.so”. They are connected with the Java code using JNI (Java Native Interface). Softwares written in Java are known to be portable. But at this point Java loses its portability because native libraries have to be compiled separately for different processor architectures.
You can see another folder inside lib which is “armeabi-v7a”. This denotes the libraries inside that folder is built for ARM v7 and above processors. But why there is only “armeabi-v7a” in this APK file when WhatsApp runs in many other processors too? This is a strategy used by many developers. You can upload different APKs for different architecture in Play Store. Play Store will provide the right APK file according to the phone. So the advantage of this is the APK need not contain the libraries for all architecture. That means APK file will be smaller. If an APK file is designed to work with all architectures then you will be able to see more folders like this such as “armeabi” for generic ARM processors, “arm64-v8a” for 64bit ARM v8 and above processors, “x86”, “x86_64”, “mips” etc. for the corresponding processor families.
As I mentioned earlier APK file is an extended version of JAR format which is used to package Java softwares. JAR means Java Archieve. This was the extension of apps in old J2ME supported phones. So as per the standards of a JAR file it should contain this folder.
In this folder you can see a MANIFEST.MF file. This file is used to store the meta information of the JAR package such as the first executed class file, package name etc. But in our case they are already specified in the AndroidManifest.xml more effectively. So they are not included in this file. Instead, this file is used to store the SHA-1 digest of the other files inside the APK file. In simple words, in this context, SHA is a hashing algorithm used to detect if a file is modified or not. Digest is the output of SHA algorithm executed on a file.
The picture shows the contents of MANIFEST.MF. You can see the file name and its corresponding SHA-1 digest. This file is generated by Android Studio. This file is used to verify that no changes are made to the file.
There is another file in the same folder named WHATSAPP.SF.
Above picture shows the contents of SF file. As you can see this is similar to the MANIFEST.MF but the difference is the SHA-1 digest values in this file is digitally signed using the developer’s certificate. And the digital signature is stored in WHATSAPP.DSA file for verifying these values. DSA (Digital Signature Algorithm) is the name of algorithm used in digital signatures. All these stuff comes under the topic – Cryptography. It is a huge topic so I’m not going deep into it.
This file is generated by Java compiler. You can see the version of Java used to develop this WhatsApp APK file in 4th line which is Java 1.8 Update 92.
res is where the resources of app which is not compiled to resources.arsc is stored. The XML files inside these folder are compiled to binary XML for performance. As you can see there are many sub folders inside it. Each folder contains different type of resources. For example, anim contains the definitions of animations used, color defines different colors used in app, drawable contains images and drawable xml files, layout contains the xml files that defines the structure of app’s user interface, menu defines different menus in the app, raw contains various static files such as audio, text etc. and xml folder contains other arbitrary xml files. Usually there will be another folder with name “values” which contains XML files that contain simple values, such as strings (strings.xml), integers (arrays.xml), dimension values (dimens.xml), styles (styles.xml) and colors (colors.xml).
You might have noticed there are many drawable folders and many layout folders. But they are different in names, they have some suffix. This is a technique to provide compatibility for app across different android versions. In previous chapter I mentioned about the API levels. This is written as suffix v12 and v13 etc. That means app uses layouts inside layout-v13 if the API level of android is 13 which is Honeycomb 3.2. Similarly if it is Lollipop 5.0 then app uses contents of layout-v21. The folder without any suffix “layout” is common to all API levels.
There is another set of suffixes which are ldpi (low density), mdpi (medium density), hdpi (high density), xhdpi (extra-high density), xxhdpi (extra-extra-high density), xxxhdpi (extra-extra-extra-high density) etc. This is also for compatiblity. But here these suffixes matches different screen density (remember the Dots Per Inch (DPI) I mentioned in previous chapter?). That means developer can alter the layout according to the pixel density of the phone screen.
There are more suffixes for screen size as small, normal, large, xlarge etc. for orientation as land, port etc. WhatsApp has used land suffix for layout. You will need to use these suffixes when you are developing an app which has a design that will work with many android versions and phones. Android operating system itself provides many compatibility patches. But if you are not satisfied with those design then customize it with these configurations.
Modifying APK File
Now many of you might have the question – can I modify these files? Yes. But it will be very difficult to modify the binary XML. So it is better to use tools such as apktool or apkstudio for decoding all the files inside APK file. But what about the digital signatures? Yes. That is important. The certificates used to sign these files are secret properties of WhatsApp developers. So they wont share it. So the only way is to create own certificates and sign the APK file by yourself. That means you can decode, modify, repack and sign the APK file. And we can install it on any device. This is what app moders do. And you can use the above tools to do it easily.
But still there is one problem. You cannot install the modified version if the original version is still in the phone. This is because when you update an app Android OS checks whether the signatures of the existing app and the updated APK file are same. But since we modified the APK file and signed with our own certificates the signatures will not match and the installation will be aborted. So the only way is to uninstall the existing version and install the modified version. You might have done this when you tried to install modded apps, haven’t you?
But do this only for educational purposes, to know how things work. I will not recommend you to mod the app and share it because it is illegal. This will not put you in jail, most probably. But you will discourage the developers. So, show some respect to the devs and appreciate their hard work.
To be continued..