Large-Scale Analysis of Remote Code Injection Attacks in Android Apps

It is pretty well known that insecure code updating procedures for Android allow remote code injection attack. However, other than codes, there are many resources in Android that have to be updated, such as temporary files, images, databases, and configurations (XML and JSON). Security of update procedures for these resources is largely unknown.This paper investigates general conditions for remote code injection attacks on these resources. Using this, we design and implement a static detection tool that automatically identifies apps that meet these conditions.We apply the detection tool to a large dataset comprising 9,054 apps, from three different types of datasets: official market, third-party market, and preinstalled apps. As a result, 97 apps were found to be potentially vulnerable, with 53 confirmed as vulnerable to remote code injection attacks.


Introduction
Widespread adoption of mobile technologies and significant increases in the number of smartphone users have resulted in a significant demand for diverse applications ("apps" for short). As of February 2016, over 65 billion apps have been downloaded from Google Play (the leading Android app store, with 80.7% market share [1]) [2], which had more than two million apps available for download up to March 2016 [3]. Further, to compete effectively in the highly competitive app market, developers are constantly adding new features to their apps.
With demands to support a plethora of features, apps often rely on external servers to dynamically update their resources at runtime. We generally refer to this class of updates as dynamic resource update (DRU). For example, apps often utilize dynamic code loading (DCL) to load additional code resources (e.g., .dex, .jar, .so, and .apk) at runtime to improve app startup performance, code reuse, extensibility, and self-updating [4]. In such cases, the additional codes could be downloaded from an external server while the app is running. In addition, an advertising library (AdSDK) included in an app may fetch ad resources such as images and videos from its servers and display them to the user so that advertisers can freely update their ads whenever they desire while the app is running. As recent statistics and studies show that over 44% of apps in Google Play include at least one mobile advertising library [5] and 32.48% of apps contain DCL components [6], DRUs are obviously prevalent in today's Android app implementations.
Unfortunately, several studies have revealed that DRUs are susceptible to remote code injection attacks [4,[6][7][8]. Apps that download external resources via an insecure protocol (such as HTTP) are vulnerable to man-in-themiddle (MITM) attacks. Consequently, it is possible for network attackers to modify or replace the DRU resources being downloaded. Falsina et al. [4] and Poeplau et al. [6] showed that if an app does not properly verify code resources downloaded via HTTP, an attacker can perform a remote code injection attack by injecting a malicious payload, which is then executed when the app loads the malicious payload at runtime. Watson [7] and Welton [8] found that remote code injection attacks can be carried out on other resource update procedures. If an app does not sanitize an input of ZIP extraction, filenames containing path traversal information may cause them to be stored or extracted outside of the intended directory. This situation can then be exploited by 2 Security and Communication Networks attackers to overwrite existing arbitrary executables such as .so, .jar, and .dex files. Consequently, when attackers are able to modify ZIP archives being downloaded, they can perform remote code injection attacks.
From these studies [4,[6][7][8], we observe that there are three conditions that must be met for remote code injection attacks to be successful in Android apps: no or bypassable validation checks, file overwrite vulnerabilities, and code trigger points. The first condition includes the case when (1) apps do not perform integrity or authenticity checks on downloaded DRU resources or (2) attackers are able to bypass such validation checks. The second condition indicates the case when the injected payload can overwrite executables. The third condition is met when there exists a code trigger point where the overwritten files are loaded and executed in the app's context. Remote code injection attacks are successful when these three conditions are met.
Apps based on secure communication protocols (such as HTTPS) are not vulnerable to remote code injection attacks as an MITM attack is not possible unless vulnerabilities exist in an app's SSL/TLS implementation, such as trusting all certificates, allowing all hostnames, trusting many CAs, and mixed-mode/no SSL [9]. However, as shown in recent studies, the use of HTTP and improper use of HTTPS are widespread problems in Android apps [9][10][11], resulting in remote code injection attacks still being a serious threat in today's Android apps.
The problem becomes complicated when apps maintain multiple connections and download multiple DRU resources, because, in this case, all DRU updates have to be implemented securely. For example, apps generally apply HTTPS or integrity checking only to sensitive communications such as login, posting, purchasing, and self-updating activities and critical procedures. Remote code injection attacks can also be accomplished via other DRU resources such as images (.jpeg, .gif, etc.) and configurations (.xml, .json, .txt, etc.). Listing 1 is an example of vulnerable configurations. App developers may not be cognizant of the security implications of all DRU resources downloaded and stored in the file system. For example, developers usually implement the theme updates for apps by simply downloading images via HTTP. However, if file overwrite vulnerabilities and code trigger points exist in the theme updates, and there are no validation checks in update procedures, remote code injection attacks can still be carried out. While remote code injection attacks against code resource updates such as self-updates are well known, attacks against other resource update activities and their impacts are still largely unknown. In this paper, we show that updates involving other resources, such as archives, images, and temporary files, can also be vulnerable to remote code injection attacks. To this end, we first investigate general conditions for remote code injection attacks in Android apps and then present a static detection tool that automatically finds apps that satisfy these conditions. Our automatic detection tool uses the application binary (.apk) as input to identify potentially vulnerable apps using static program analysis. It combines network-aware program slicing, data dependency analysis, and string analysis to provide a comprehensive analysis of each condition. Finally we perform a large-scale analysis to identify vulnerable apps in the wild.
Our main contributions can be summarized as follows: (i) We investigate three conditions for successful remote code injection attacks in Android apps: (1) no or bypassable validation checks, (2) file overwrite vulnerabilities, and (3) code trigger points.
(ii) We present the design and implementation of the first static detection tool that automatically identifies apps that meet these three conditions. More specifically, the detection tool takes only a binary (.apk), extracts DRU-related codes, and identifies whether the codes meet these three conditions by leveraging heuristics, string analysis, and data dependency analysis.
(iii) We perform a large-scale analysis using three different types of datasets, comprising 4,718 apps from an official market (Google Play), 2,967 from a third-party market (Tencent Myapp), and 1,369 from preinstalled apps (system apps). Our analysis identified a total of 97 apps as being potentially vulnerable, and 53 apps are confirmed to be vulnerable to remote code injection attacks.
The remainder of this paper is organized as follows: Section 2 provides necessary background, and Section 3 presents our threat model. Section 4 analyzes the three conditions necessary for successful remote code injection attacks. Section 5 presents our heuristics-based static detection tool. Section 6 presents the results of our large-scale analysis. Section 7 discusses mitigations and limitations. Section 8 reviews related work. Finally, Section 9 concludes this paper.

Background
This section provides background on dynamic resource updates and remote code injection attacks in Android apps.

Dynamic Resource Update.
Android app developers often utilize external servers to dynamically update app resources while the app is running. As stated in the Introduction, throughout this work, we refer to this concept as dynamic resource update (DRU). DRUs are very commonly used for a variety of purposes.

Application Code Resource
Update. At various times, Android apps may need to download additional features (i.e., application code) from external servers at runtime. For example, certain commercial apps such as game apps, which are initially distributed free of charge from an official market (such as Google Play) with minimum features, may need to provide premium features to their users after being purchased. In such cases, these game apps implement DRUs, in which the codes serving premium features are downloaded from the external server and then loaded into the app's context at runtime to provide the related premium service.
Another example of application code resource update is self-update. Self-update occurs when apps need to upgrade themselves (e.g., .apk or .dex) or import libraries (e.g., .so). In self-update procedure, an app usually requests update information that specifies a downloadable URL from the external server. Listing 1 shows an example of update information in self-updating app. After receiving update information, the app downloads application code using the identified URL. Finally, the downloaded code will be loaded and executed for the purpose of self-updating. Unlike Google Play's update mechanism that requires user confirmation, self-update is usually executed without any user interaction. With the help of DCL (defined in the Introduction), apps can load downloaded code resources during execution. (Note that Google currently specifies the content policy of Google Play in its "Privacy and Security" section: An app downloaded from Google Play may not modify, replace, or update itself using any method other than Google Play's update mechanism [12].) However, Poeplau et al. [6] found that this policy is not technically enforced, to the extent that a large number of apps on Google Play still load external code resources.

Advertisement Resource
Update. Recent statistics [5] show that advertising is prevalent in today's Android app implementations, where over 44% of apps in the Google Play include at least one mobile ad library. Once developers build their apps with AdSDK, it may fetch ad resources from its servers and display them to the users of the apps at runtime. In this manner, advertisers can freely update their ads whenever they wish to change the currently serving ads. (Note that ad resources are usually offered in the form of archives, such as ZIP or GZIP format, which may include a variety of compressed resources such as images, videos, HTML, and JavaScripts.) Similarly, advertising libraries (such as .so files) can also be updated via external servers such as application code updates. When an app starts, the library checks its version using REST APIs (HTTP request, and JSON or XML response). If a new version is released, the library triggers its update logic to download the newly released library. After the library is successfully downloaded, it is loaded and executed in the context of the app.

Other Resources Updates.
Apps also often need to download other types of resources from external servers to use at runtime. These resources can include temporary files, images, databases, and configurations needed in the context of the app. These are stored in the app's data folder (/data/data/PACKAGE NAME) or an external storage (/mnt or /sdcard). For example, if an app needs to update a constant string, it may download an XML file from its server and save it to the data folder. The app then parses the constant strings from the XML file and adds them at runtime. (Note that whereas the Android framework defines certain types of application resources (Animation, Color State List, Drawable, Layout, Menu, String, Style, etc. [13]) that developers can provide in their resources directory (res/), the "resource" referred to in this paper includes not only those types defined in [13] but also a wide range of others (e.g., codes or archives) needed in app's context.)

Remote Code Injection Attacks.
As stated in Section 1, apps communicating with external servers via a plaintext protocol such as HTTP are vulnerable to MITM attacks (note that apps using encrypted communications such as HTTPS can also be vulnerable to MITM attacks if they mis-implement TLS/SSL). Once an MITM attack is possible, an attacker can perform remote code injection attacks by modifying or replacing the resources being downloaded. From the literature [4,[6][7][8], we observe that remote code injection attacks can succeed under the following three conditions (CI, CII, and CIII).

CI: No or Bypassable Validation Checks.
There should be either no validation checks, or the validation checks can be bypassed by network attackers.
CII: File Overwrite Vulnerabilities. The injected payloads are stored in a specified location in accordance with the app's DRU implementations.
CIII: Code Trigger Points. The injected payload has to be executed in the context of the app when the app starts, or while it is running.
We investigate these three conditions in Section 4.
Example: A Remote Code Injection Attack against a Self-Updating App. Listing 1 shows an example of metadata in a self-updating app that is vulnerable to remote code injection attacks. In general, the self-updating app requests update information from the server via HTTP and receives metadata containing update information such as "url" and "mdvalue" in JSON format. In this case, the self-updating app meets all the aforementioned conditions for a successful remote code injection attack. First, because the app communicates with the server via a plaintext protocol (HTTP), attackers can bypass its validation check by modifying the integrity value ("mdvalue" in the metadata) provided by the server (CI). In addition, owing to the nature of the self-update mechanism, the downloaded .apk file is stored in the app's directory by replacing or overwriting the existing (legacy) executable (CII), and the downloaded .apk is loaded and executed in the context of the app (CIII).
In summary, with the aid of an MITM attack, if the app satisfies the three conditions above, attackers can easily perform a successful remote code injection attack by injecting their payload (.apk) with its correct hash value.

Threat Model
This section introduces the threat model used throughout this work.
We assume that apps are benign but potentially vulnerable to remote code injection attacks, that the external servers communicating with the apps are secure, so they cannot be compromised by an attacker, and also that users are benign and often connect to Wi-Fi networks consisting of unencrypted or untrusted access points (APs).

Ability of an Adversary.
We assume that attackers cannot access a user's device but can obtain an app code running the user's device. The attackers cannot compromise or access the external server communicating with the app, but can install a rogue AP in a public or private area to lure the user into their Wi-Fi network [14]. Note that the rogue AP can be easily installed in such a manner that it has the same SSID as a trusted AP and has stronger signal strength than the original one. Attackers can also exploit a vulnerable AP to compromise it for mounting MITM attacks. For the remote code injection attack, attackers can inject their payload into transactions served over HTTP.

Attack Scenario.
We consider the following attack scenario. We assume that the user initially connects to a Wi-Fi AP that is compromised or has been installed by an attacker, or the user connects to the attacker's device after connecting to the Wi-Fi APs, with the attacker performing ARP spoofing or any other techniques needed for an MITM attack such as DNS spoofing, so that the user's traffic passes through the attacker's device. When the user downloads DRU resources from an external server, the attacker carries out code injection by monitoring this transaction and replacing the resource being downloaded with his/her payload. When the injected payload is loaded and executed in the context of the app, the attacker gains access to the app's shell remotely and then can perform a privilege escalation attack to gain access to a higher level, such as root shell.
Note that our threat model is similar to prior works dealing with the security implications of dynamic code loading [4,6]. However, the difference from prior works is that we only focus on network attackers and corresponding attack scenarios in which attackers can remotely inject payloads. Code injection attacks from malicious apps running on the same device are out of the scope of this paper.

Conditions for Successful Remote Code Injection Attacks
In this section, we analyze the three conditions required for successful remote code injection attacks against Android apps using decompiled code snippets as examples.

No or Bypassable Validation Checks.
In general, apps usually ensure that downloaded resources have not been modified by using validation checks that verify the integrity of the downloaded resources. In this regard, servers provide a unique hash value produced by a hash function (such as MD5 and SHA-256) with the resource, and the app compares the provided hash value with a newly computed hash value for the downloaded resource. If the hash value that the app computes is the same as the hash value provided, the app determines that there is nothing wrong with the downloaded resource. Alternatively, servers can attach authenticity information (such as signatures) to the resource by digitally signing the produced hash value being distributed. This enables the app to check the downloaded resource by verifying its signature. Attackers could then tamper with the resource only if they are able to steal the server's signing key. However, a hash value match does not necessarily guarantee that the downloaded resources have not been tampered by attackers. This is because if the provided hash value is transmitted via a plaintext protocol, with the aid of MITM attacks, attackers can bypass the validation checks by simply changing the hash value. Furthermore, and most importantly, developers often forget or do not recognize the importance of validation checks for downloaded resources [6]. For example, as discussed in Section 2, if a self-updating app does not verify the downloaded resource during the self-update procedure, attackers can successfully carry out a remote code injection attack by simply modifying the update information or by replacing the resources being downloaded.
Although the self-updating app verifies downloaded resources using a hash value provided by the corresponding server, this does not guarantee that the downloaded resources have not been modified by attackers. For example, in Listing 1, the self-updating app verifies the downloaded resource by examining whether the provided MD5 hash value ("mdvalue": "FF8AFDA9887E158F1FBF601489031AD1") is equal to the hash value computed by the app. In this case, if the hash value is transmitted via a plaintext protocol (HTTP), with the aid of MITM attacks, attackers can bypass the validation check by modifying the hash value ("mdvalue" in Listing 1). Therefore, remote code injection attacks can still be successful in cases where the validation checks are bypassed. URL url = new URL(''http://www.appnext.com/android/images2.zip''); (4) HttpURLConnection conn = (HttpURLConnection)url.openConnection() (5) int length = conn.getContentLength(); (6) byte[] buf = new byte[length]; (7) ...

(8)
DataInputStream dis = new DataInputStream(url.openStream()); (9) dis.readFully(buf); (10) . . . Another example in which the DRU resource is downloaded without validation check is illustrated in Listing 2. The code snippets give the details of an image being downloaded in Appnext SDK [15], a mobile monetization and app distribution platform. The SDK downloads image files in the form of ZIP archives from an external server with a fixed download link "http://www.appnext.com/android/images2.zip" (line (3) in Listing 2), which is hardcoded within the app. Then, it stores the downloaded ZIP archive in the "appnext" directory without any validation checks. In this case, if the DRU can be abused to overwrite the existing executables, attackers can successfully carry out a remote code inject attack. Note that DRUs such as image resource update occur with very high frequency in today's app implementations.

File Overwrite Vulnerabilities.
After attackers inject their payloads bypassing the validation checks for resources, the injected payloads are stored in a specified location in accordance with the app's DRU implementations, usually in the app's data directory (/data/data/PACKAGE NAME) or in external storage (such as an SD card). If the DRU that an attacker targets is the application code update, the injected code is replaced with the existing code resource (e.g., .dex, .jar, or .so) and then loaded and executed when the app triggers the update logic. In such cases, the attacker only needs to inject the payload without any considerations to carry out a successful remote code injection attack.
On the other hand, there could be no DCLs in the app, which is common in the majority of apps. However, attackers can still successfully perform remote code injection attacks by means of file overwrite vulnerabilities. If there is an arbitrary write vulnerability in the app, attackers can exploit it to overwrite the existing executables such as .dex, .so, and .jar. Watson [7] and Welton [8] showed that an unsafe ZIP extraction can be used for arbitrary write vulnerability; thus, remote code injection attacks can be successful with the aid of file overwrite vulnerabilities.
In this section, we further analyze other file overwrite vulnerabilities that can be used for remote code injection attacks. . . .
Unsafe Content-Disposition Implementation. Modern web browsers often utilize an HTTP header to forcefully download an external resource instead of rendering it on the browser. To forcefully download with the HTTP header, the server adds a Content-Disposition field that includes a filename parameter in the HTTP response header (line (3) in Listing 4) and, during the downloading of the external resource on the client side, the browser retrieves a filename from the HTTP response header and stores the downloaded resource with the provided filename.
Android developers often use the HTTP response header as metadata to retrieve the information of the downloaded resource. For example, an app can retrieve the filename of the downloaded resource from the HTTP response header to display the resource name on a screen or save the downloaded resource into internal storage. To do this, the app first obtains the value of the Content-Disposition field using network APIs such as org .apache.http.HttpResponse.getFirstHeader() and the java.net.HttpURLConnection.getHeaderField() method. The app then parses the filename using a dedicated API such as guessFileName() in the android.webkit.URLUtil class or a user-defined parser that implements a regular expression match. In this case, however, if the app does not properly parse the filename and simply uses this as a filename to create a file, an arbitrary overwriting vulnerability may exist. For example, when using regular expression matching with the pattern string attachment; "s * filename "s * ="s * ""([̂""] * )"", if the matcher evaluates the attachment; filename="./../../../target" string, the matcher would find a match in "./../../../target" string that contains path traversal information. Thus, as with the unsafe ZIP extraction, attackers can overwrite the arbitrary files by modifying the Content-Disposition field in the HTTP header.

Code Trigger Points.
To successfully carry out a remote code injection attack, the injected payload has to be executed in the context of the app when the app starts, or while it is running. Therefore, the attacker has to identify a code trigger point that loads the injected payload and executes it. A selfupdate is a good example of code containing a code trigger point, by which the payload is loaded and executed after downloading newly released code.
In this subsection, we investigate possible code trigger points that can be used for remote code injection attacks.
Runtime Library. Android apps can include runtimes libraries (such as .jar or .so file), which are loaded when the app starts or while the app is running by using loadLibrary (in case of .so) or DexClassLoader (in case of .jar) method. The Native Development Kit (NDK) allows developers to build their own C/C++ source code, or to take advantage of prebuilt libraries. Developers can utilize the native libraries to improve the app's performance or reuse their own or another developer's libraries. In addition, developers can load classes from .jar or .apk to execute methods that are not contained as part of an application. These runtime libraries can be used as a target for code trigger points. In Android, when an app uses native libraries, which is built by Android NDK, these libraries are stored in /lib directory and have system privilege. Therefore, native libraries in /lib cannot be used for trigger points. However, when developers create libraries (including .so and .jar) and mark them as writeable and put them in the app's /assets directory, these libraries can be considered as potential trigger points. After the app is installed, these libraries can be stored in the app's internal directory such as /data/data/PACKAGE NAME/files. Attackers who know this path information can overwrite one of these libraries to execute their injected payload.
Multidex. The Android platform supports a multidex to deal with the 64k reference limit that limits the total number of methods that can be invoked within a single DEX file to 65,536, including Android framework methods, library methods, and user-defined methods [17]. To support the multidex, during build time, an Android build tool constructs a primary dex (classes.dex) and other secondary dexes (e.g., classes2.dex and classes3.dex) as needed and packages them into a .apk file for distribution. While installing the app, the secondary dexes are extracted into the /data/data/PACKAGE NAME/code cache/secondarydexes/ directory and loaded when the app starts. For example, the Runtastic [18] app containing the multidex (classes2.dex) extracts the secondary dex into the corresponding directory and rename it as com .runtastic.android-1.apk.classes2.dex. Therefore, if attackers can overwrite this secondary dex, they can trigger their injected payload when the app starts.
Runtime.exec(). As with Java applications (an application cannot create an instance of the Runtime class, but can obtain an instance by invoking the getRuntime() method), Android apps can also get an instance of the Runtime class by invoking the getRuntime() method. Using the exec() method of the Runtime class, the apps can execute arbitrary executables in a separate native process by simply providing the specified shell command as an argument. This operates similar to Linux's system() method, and thus Android app developers often utilize this method for easy implementation. However, the exec() method of the Runtime class can be used for the code trigger point to launch remote code injection attacks. If an attacker knows the argument to be passed to the exec() method and replaces the existing file that the argument points to, she can execute her payload.
For example, Umeng PushSDK [19], one of China's popular push notification services, implements the exec() method to provide their push notification service through a shell command. Specifically, it creates an instance of Process by invoking Runtime.getRuntime().exec("sh") and redirects its data stream to DataInputStream/ DataOutputStream instances so that the app can execute the commands by writing to DataOutputStream or reading from DataInputStream. When the app starts, the push library checks whether a ServerDaemon file exists in the app's files directory (/data/data/APP PACKAGE NAME/files/), and if it exists, it executes the file through the exec() method with the specified arguments. In the case where the ServerDaemon file does not exist in the files folder, the library creates a new DaemonServer file and then executes it as well. Attackers can take advantage of this fact to identify the code trigger point; that is, they can execute their injected codes by overwriting the target file passed as a parameter of the exec() method, such as an example of the DaemonServer file.

Automatic Detection
In order to detect apps that are potentially vulnerable to remote code injection attacks, we developed a static analysis tool (https://gitlab.com/zemis0ls0l/remote code injection attack) that automatically identifies code snippets that meet the three conditions described in Section 4. In this section, we outline the design and implementation of our static detection tool. Figure 1 illustrates the three main components of our detection tool: preprocessor, program slicer, and vulnerability checker. At a high level, our static detection tool takes a .apk file as input and converts it to Jimple (Jimple is a popular intermediate language based on three components per statement in code that is often used for bytecode optimization) intermediate representation by means of Soot [20], a static analysis framework that provides Jimple for both Java and Android (Soot framework includes Dexpler [21] that converts Dalvik bytecode to Jimple) and call graph analysis. Then, based on program slicing [22] with interesting points (i.e., APIs) and heuristics, the detection tool analyzes DRU-related code to identify code snippets that meet the three conditions. The output of the tool includes a set of information that can be used to identify whether the app is vulnerable to remote code injection attacks. Note that the detection tool operates on top of Jimple and does not require the source code of the app to be analyzed.

Preprocessor.
As with other state-of-the-art static analysis studies for Android apps (such as Bartel et al. [23], Geneiatakis et al. [24], Woodpecker [25], Chex [26], and Paddyfrog [27]), our detection tool first translates Dalvik bytecode to an intermediate representation and then constructs an interprocedural control-flow graph (ICFG) (it is also known as a super control-flow graph (sCFG)) representing all possible execution paths of an app, for a given .apk file. Because the accuracy of static analysis relies on the controlflow graph, a precise ICFG needs to be constructed in order to improve the precision of static analysis. However, unlike Java applications, because Android apps are framework-based as well as event-driven, generating the corresponding ICFG is challenging. For example, instead of a main method, Android apps contain many entry points that are implicitly called by the Android framework. In addition, the Android framework allows apps to register various types of callbacks, which are also invoked by the framework. This means that code snippets contained in callback methods cannot be analyzed without recognizing such implicit edges because an incorrect ICFG does not have an outgoing control-flow edge to the callback method.
In this work, to construct precise ICFGs, we leverage FlowDroid [28], a static taint analysis framework that provides flow-and context-sensitive and interprocedural data flow analysis for Android apps. FlowDroid models the component lifecycle of the Android framework and incrementally reconstructs the control-flow graph when identifying newly discovered callback methods. Unfortunately, FlowDroid does not support identifying thread-related classes in Android apps (including AsyncTask, Thread, and Runnable) that generate implicit control flows through callbacks. However, in Android, resource download tasks are generally implemented by utilizing threading classes because network operations cannot be run on the main thread (developers can use Thread, AsyncTask for short-running tasks and Service for long-running tasks to do network operations). Thus, identification of thread-related edges is necessary for our detection tool. To correctly analyze the apps, we add extensions to support such threading classes. For example, to support the AsyncTask class, we identify all AsyncTask instances and augment the call graph by adding edges that connect to the AsyncTask instance. This can be accomplished by replacing the invocation of execute() with invoke calls to onPreExecute(), doInBackground(), and onPostExecute(). Finally, we reconstruct the ICFG by combining FlowDroid's incremental CFG construction with our extensions. The reconstructed ICFG is used for the next bidirectional program slicing step.

Program Slicer.
In order to extract the Jimple slices, we implement a forward and backward slicing algorithm (see Appendix A), which are based on the networkaware program slicing approach proposed by Choi et al. [29,30]. The slicing algorithm works bidirectionally based on the ICFG constructed in the previous step. The algorithm starts at the predefined interesting points (network I/O APIs and its parameters) such as java.net.URL.openConnection() and org.apache .http.client.HttpClient.execute() by adding the variables of these points to a worklist. Then, it walks forward and backward on the ICFG while analyzing the data dependency between the variables in the worklist and the current variable in the Jimple statements. For interprocedural forward/backward analysis, our algorithm keeps a call stack that records the current method (i.e., caller) and its location. In this way, the program slicer provides a context-sensitive data flow analysis. The program slicer continues this analysis recursively in this manner. The analysis terminates when there is no entry in the worklist, when it reaches the entry point of the app (in the case of backward slicing), or when it encounters sink APIs (in the case of forward slicing) such as FileOutputStream.write().
For example, in Listing 2, the program slicer starts at java.net.URL.openConnection() and walks forward and backward while analyzing the data dependency. The backward slicing terminates at line (3) in Listing 2, and the forward slicing terminates at line (18) in Listing 3. Note that, for readability, we used Java code instead of Jimple IR and listed only the results of the program slicing in the code snippets.
Once the program slicer has extracted slices, the interslice dependency analyzer identifies dependencies between the extracted slices. The goal of this analysis is to identify any dependencies between HTTP response and HTTP request. Figure 2 shows an example of how the interslice dependency analysis operates on request (requestA and requestB) and response (responseA and responseB) slices. In the figure, an app receives metadata (line (5) in requestA) and then parses it (line (2) in responseA) to obtain a resource download URL. Then, using the obtained URL, the app downloads and stores the resource (line (2) in requestB and line (3) in responseB, resp.). In this case, a dependency exists between responseA and requestB. To identify this, we leverage the taint-based approach proposed by Choi et al. [29,30], in which the dependency is determined by identifying the data flow from the source (line (1) in responseA) to the sink (line (1) in requestB).  (3) · · · (2) · · · Figure 2: URL building, string analysis, and interslice dependency analysis.
Finally, the program slicer records Jimple slices consisting of HTTP request/response and its dependencies for the next step, in which code snippets meeting the three conditions for vulnerability to remote code injection attacks are identified.

Vulnerability Checker.
After the program slicer extracts all the slices that affect network operations, as well as their dependencies (HTTP request and response), the vulnerability checker identifies code snippets that meet the three conditions for remote code injection attack vulnerability. The vulnerability checker achieves this by implementing the following heuristics. Note that the vulnerability checker utilizes FlowDroid's taint analysis that provides flow-and context-sensitive and interprocedural data flow analysis.

(1) Identifying No or Bypassable Validation Checks.
To identify instances where there are no validation checks, we find where the message digest class is used, such as java.security.MessageDigest, from the given response slices. If there is no use of invocation of message digest methods such as update() or digest(), we consider that no validation checks exist. However, although the app validates the resource using a provided hash value, if the hash value is transmitted via HTTP, the validation check can be bypassed as described in Section 4. Consequently, to identify bypassable validation checks, we utilize a URL builder that generates a URL that feeds into the network I/O APIs given HTTP request slices produced via backward slicing. The URL builder models high-level Java and Android APIs such as append() and toString(), which are often used for string manipulations (we further describe the URL builder below). If the generated URL is HTTP, we consider the validation checks to be bypassed even if the message digest functions are present in the slices.
(2) Detecting File Overwrite Vulnerabilities. Given HTTP response slices produced by forward slicing, to identify an unsafe ZIP extraction, we first find an instance of java.util.zip.ZipInputStream and file classes such as java.io.File. We then check whether the code validates the name of each entry before extracting it. To achieve this, the string analyzer randomly assigns an initial string that contains path traversal information (e.g., "./../../../target") as the value of java.util.zip.ZipEntry.getName(), and then the string analyzer tracks slices while updating its manipulations (based on API models) until it encounters the file method. When the initial string is passed to the parameter in the file method, if the path traversal information of the initial string does not change (or is filtered out), we consider it to be file overwrite vulnerabilities; that is, it is an unsafe ZIP extraction.
Similarly, in order to identify an unsafe Content-Disposition implementation, we first find a method invocation (either org.apache.http.HttpResponse .getFirstHeader() or java.net.HttpURLConnection .getHeaderField()) commonly used for parsing values in HTTP headers, and file write methods, and then check whether the code properly parses the filename from the Content-Disposition field. Either of two methods can be used to parse a filename from the Content-Disposition field: string manipulation APIs or regular expression matching. The first method usually finds a certain string using indexOf() and splits the string using subString(). Thus, the same approach used to find unsafe ZIP extractions can be leveraged to model these string manipulation APIs using the string analyzer. The second method parses the filename by matching a regular expression. To facilitate this method, we extract a constant string that is passed to the parameter in java.util.regex.Pattern.compile() and then evaluate this pattern string with our test string ("./../../../../../target.so"). After matching the regular expression, if the result is the same as our initial test string, we consider it to be a file overwrite vulnerability as well.
(3) Identifying Trigger Points. To identify trigger points, we utilize three different properties of Android apps. Identifying multidex is straightforward. Android apps contain dex files (.dex) in root directory inside a .apk file, which is a ZIP archive format. Thus, we can easily identify multidex by decompressing the .apk file and then checking whether secondary dex files (such as classes2.dex) exist. Unlike the multidex, identifying runtime library and Runtime.exec() need string analysis. The string analyzer starts by detecting dalvik.system.DexClassLoader(), java.lang.System.loadLibrary(), and java.lang .Runtime.exec() methods. If method invocations are found, the string analyzer performs backward analysis with respect to the parameter value of the method in order to build a constant string. Once the constant string is generated, the string analyzer examines it by comparing it with known executables. In case of runtime library, we compare the constant string with the names of executables, which are extracted in /assets directory inside the .apk file. If there is a match, we consider it contains a trigger point. In case of Runtime.exec(), we filter out cases in which the generated constant string contains one of the executables located in the "/bin" and "/sbin" directories. After filtering, if there are any remaining constant strings, we consider it as a trigger point. In this way, we can identify the possible trigger points.

(4) Distinguishing Whether the Communication Protocol Is
Secure. Even though apps may meet the three conditions identified for successful remote code injection attacks, if the apps use the HTTPS protocol, they may not be vulnerable to remote code injection attacks. Thus, we further analyze the apps to identify whether they use HTTPS. To this end, we implemented a URL builder module that generates URLs by walking back from a URL initializer such as java.net.URL.URL() or the org.apache.http.client.methods.HttpGet() method. During the backward analysis, the URL builder models string manipulation APIs such as java .lang.StringBuilder.append where the constant strings are appended to build a full URL. In addition, the URL builder handles references to resource objects, such as Android.R, whose values are stored as user-defined files in the .apk (e.g., res/values/strings.xml). On generation of the full URL, the URL builder distinguishes between a static URL, which is hardcoded in the codes, and a dynamic URL, which comes from other network inputs. This is achieved by identifying the results of interslice dependency analysis (using the interslice dependency analyzer in Section 4). In the case of the static URL, if the URL starts with "HTTP://," we consider it vulnerable to remote code injection attacks. However, in the case of dynamic URL, because we cannot identify dynamically generated URLs, owing to the limitation of static analysis, we consider it potentially vulnerable.
On the other hand, there are cases where the HTTPS is not properly implemented, meaning that attackers exploit mis-implemented HTTPS to carry out MITM attacks. To deal with such cases, using Mallodroid by Fahl et al. [9], we identify the mis-implemented HTTPS, such as trusting all certificates or allowing all hostnames. If mis-implemented HTTPS is found, we also consider it vulnerable to remote code injection attacks.

Large-Scale Analysis
In order to assess the current state of vulnerable apps to remote code injection attacks in the wild, we applied our detection tool to three different types of datasets. In this section, we first describe the three datasets analyzed and then present the results of our large-scale analysis for each dataset.

The Datasets.
To evaluate our detection tool and identify apps potentially vulnerable to remote code injection attacks, we collected three different datasets comprising thousands of apps. Table 1 gives a summary of the datasets. The first dataset is an official market dataset (Google Play), the second is a third-party market dataset (Tencent Myapp [31]), and the third is a system application dataset extracted from manufacturers' firmware images, including Samsung [32] and Huawei [33].
More details can be found in Appendix B.
6.2. Results. As described in Section 5, even where an app has file overwrite vulnerabilities, it is not necessarily vulnerable to remote code injection attacks, because if the app uses HTTPS properly, attackers cannot perform MITM attacks to inject their payload. Therefore, whether the app is vulnerable to remote code injection attacks depends on the request protocol (i.e., the URL string); URL strings starting with "http://(. * )" are vulnerable whereas those starting with "https://(. * )" are not. In addition, if the URL is from another HTTP transaction (in the case of dynamic URLs), we also cannot identify whether the app is vulnerable, because the URL string could not be determined via static analysis. For this reason, we divided the results into two groups: e potentially vulnerable apps and f flagged vulnerable apps. Potentially vulnerable apps contain a static HTTP URL and a dynamic URL, whereas flagged vulnerable apps contain only a static HTTP URL. Note that a mis-implemented HTTPS can also be vulnerable to MITM; flagged vulnerable apps contain HTTPS mis-implementation as well.
Official Market (Google Play). Tables 2 and 3 show the results obtained for the Google Play market dataset. Using our detection tool, we analyzed 4,718 diverse apps from 28 categories. Table 2 shows the number of vulnerable apps that met two conditions, CI (no validation check or bypassable validation check) and CII (arbitrary overwriting    Table 3 shows the number of apps that satisfied CIII (trigger point). In the table, 188 (3.9%) of the 4,718 apps contain the runtime libraries, 631 (13.3%) contain secondary dex files, and 173 contain the Runtime.exec(). Among them, 39 (= 0 + 17 + 22) apps contain file overwrite vulnerabilities (i.e., meeting all conditions, CI ∩ CII ∩ CIII). Finally, after removing multiple trigger points, we consequently obtained 25 apps vulnerable to remote code injection attack in the Google Play dataset. Particularly, some of vulnerable apps that we found are extremely popular such as Opera browser, Pandora Radio (with more than 500,000,000 downloads), and CM Locker Repair Privacy Risks (with more than 100,000,000 downloads). Tables 4 and 5 show the results for the Tencent Myapp market dataset. We analyzed 2,967 apps (from 29 categories). As shown in Table 4, we found 82 apps (2.7%) that satisfied CI and CII, that is, containing no or bypassable validation checks and file overwrite vulnerabilities. More specifically, 72 apps (2.4%) contained unsafe ZIP extraction, and 10 (0.3%) contained unsafe Content-Disposition implementation. This rate is almost twice that of the Google Play marketplace. After ruling out dynamic URLs, we identified 45 flagged vulnerable apps, 43 of which contained unsafe ZIP extraction and the remaining two containing unsafe Content-Disposition implementation.

Third-Party Market (Tencent Myapp).
In addition, Table 5 shows the number of trigger points in the dataset. In the third-party dataset, 1,828 apps (61.6%) contain runtime libraries, which is much more than the Google Play dataset. Other rates of trigger point were 440 (14.8%) for multidex and 368 (12.4%) for the Runtime.exec(). The number of apps with trigger point, no or bypassable validation checks, and file overwrite vulnerability simultaneously present was 20 for runtime library, 6 for Multidex, and 12 for Runtime.exec(). After removing multiple trigger points, we found 28 vulnerable apps, including extremely popular apps such as com.tencent.qqlive (1,200,000,000 downloads), com.baidu.BaiduMap (770,000,000 downloads), cn.kuwo.player (470,000,000 downloads), cn .eclicks.wzsearch (111,000,000 downloads), and com .og.danjiddz (32,370,000 downloads).
Preinstalled Apps. In Android, preinstalled apps are generally granted many more capabilities with critical permissions than a normal app. This means that if a preinstalled app is vulnerable to remote code injection attacks, attackers can have a much greater impact on the device. For example, an app holding an INSTALL PACKAGES permission can install another .apk silently. Therefore, if an attacker successfully gains target app's privilege by performing remote code injection attack, s/he could use the INSTALL PACKAGES permission to install malware on the device without any difficulty. Table 6 shows the vulnerable apps results from the system app dataset. In our study, two preinstalled apps that met two conditions (CI ∩ CII) for successful remote code injection attacks were found. The WildTangent Game app (http://www.wildtangent.com/), which is included in AT&T's Galaxy S6 (SM-G890A) model, contains an unsafe ZIP extraction and uses a dynamic   ) and one social library (QQ SDK). To measure how many apps can be affected by these vulnerable SDKs, we identified the proportion of vulnerable apps by referring to recent statistics [5]. As these libraries have around 0.91% market share overall (approximately 300,000 apps in Google Play), we can approximately guess the proportion of vulnerable apps in the wild.

Dangerous Permissions.
Once an attacker carries out a remote code injection attack by exploiting the three conditions, the attacker obtains the target app's privilege to more effectively perform sensitive operations (e.g., tasks that can cost money or access private user data). In our threat model, we assumed that after gaining the app's privilege the attacker can additionally perform a privilege escalation attack to obtain higher privilege (i.e., system or root). However, if the target device does not have such vulnerabilities that can be used for privilege escalation attacks, the attacker can only do limited operations (i.e., attacks) such as reading contacts, sending SMS, or getting location information, according to the permissions specified in the manifest of the app. In our evaluation, we further analyzed flagged vulnerable apps to identify what operations can be conducted by attackers even without escalating local privilege. To do this, in accordance with the Android Developers Guide [34], we first categorized permissions into nine permission groups (because the vulnerable apps that we found did not have sensor permissions, we ruled out the SENSORS permission group in the graph) and then counted the number of permissions belonging to each group. Figure 3 shows the number of dangerous permissions associated with each group. From the results, we found that most of vulnerable apps have at least PHONE, LOCATION, or STORAGE permission. This means that once an attacker gains the app's privilege on performing a remote code injection attack, s/he can attempt various attacks even without system level privilege. For example, with CALL PHONE permission, the attacker may make premium phone calls in the background, an overbilling attack [35]. The attacker can also perform a race condition attack via a shared SD card directory to install malware when the app has WRITE EXTERNAL STORAGE permission [6].
Manual Review. Finally, we manually review 97 apps which are confirmed vulnerable or potentially vulnerable (53 apps) in the static analysis tool. Given these results, we first decompile the apps and identify the DRU code points that the tool reported. Then, we verify whether the code snippets indeed met the aforementioned three conditions. We also test whether the code snippets are reachable from the entry points. The more complex the app is, the more timeconsuming and challenging such a review process becomes. For that reason, we also install the app and run it on an emulator to complement the static analysis. In this way, we Security and Communication Networks take a best-effort approach in combination with static and dynamic analysis. In the end, we confirmed that the reported apps are all vulnerable to remote code injection attacks.

Mitigations and Limitations
In this section, we discuss how remote code injection attacks in Android apps can be mitigated and the limitations of our static detection tool.

Mitigations.
Mitigation can be considered from two perspectives: the app developer level (filename sanitization and secure communication protocol) and the framework level (secure code execution).
Filename Sanitization. As described above, filenames containing path traversal information may cause them to be stored or extracted outside of the intended directory. Attackers can exploit this vulnerability by overwriting the existing arbitrary executables. To defend against remote code injection attacks caused by file overwrite vulnerabilities, app developers should sanitize an input of filename. For example, before storing external resources coming from networks, it is important to filter out any characters that should not be included in a filename such as "./../". In addition, recently, the CERT Division (http://www.cert .org/) updated its secure coding standards to show how to prevent arbitrary overwriting vulnerabilities using unsafe ZipInputStream. In the CERT Oracle Coding Standard for Java [16], with the compliant code example, the standard shows that directory traversal or path equivalence vulnerabilities can be eliminated by canonicalizing the path name and then validating the location before extraction. To prevent remote code injection attacks, developers should comply with this coding style when they need to implement ZIP archive downloads from external servers. Note that filename sanitization eliminates CII. Secure Code Execution. If app developers can employ secure APIs (such as SecureDexClassLoader [4]), which load and execute the downloaded executables in a secure manner, attackers would not be able to execute any arbitrary code within the context of an app even when successfully injecting their payload. During secure code execution, the involved API retrieves the certificate of the developer that signed and published the given code and verifies the downloaded code, which is cryptographically signed, using the retrieved certificate. Naturally, to implement such secure APIs, all possible trigger points described in Section 4.3 should be considered. Note that secure code execution eliminates CIII.
Use of Secure Communication Protocol. An ideal solution for preventing remote code injection attacks is to use a secure communication protocol (such as HTTPS) to download external resources. However, applying HTTPS for all communications is virtually impossible due to performance issues and operational costs. Because DRUs (such as downloading image files) occur very frequently in Android apps nowadays, applying HTTPS for all DRUs may affect performance. Brian Jackson [36] showed that making a lot of short requests over HTTPS will be slower than HTTP. In addition, the servers that provide HTTPS involve issuing and managing a certificate. Because of this, operational costs will increase as opposed to HTTP. Furthermore, there may be vulnerabilities in SSL/TLS implementations [9,11]. App developers should, at the very least, apply HTTPS to their sensitive communications such as downloading of executables (.dex, .so, etc.), selfupdates, or other critical procedures. Note that use of secure communication protocol eliminates CI.

Limitations
Implicit Control Flows. Our detection tool is subject to the limitations of static flow analysis. Consequently, it does not identify all implicit control flows introduced by callbacks. This can potentially affect our program slicing component. If code snippets that download external resources are invoked from a callback that is not handled by the detection tool, we cannot locate the code snippets inside the ICFG, which is required for accurate program slicing. Although we have added extensions that support threading classes such as AsyncTask, Thread, and Runnable, it is not complete. Discovering Android's implicit callbacks is an active area of research, and a number of studies [37][38][39] are currently devoted to addressing this issue.
Dynamic URLs. Even though DRU code snippets have been shown to be vulnerable to remote code injection attacks (meaning that there are file overwrite vulnerabilities and trigger points in the app), our detection tool cannot definitively conclude whether a remote code injection attack can be accomplished by a network attacker. This is because of dynamic URLs. Unlike static URLs, which are hardcoded in apps, dynamic URLs are only present at runtime. For example, if an app requests update metadata from a server and retrieves its update URL from the received metadata, the possibility of an attacker launching an attack is contingent on the retrieved URL. That is, if the URL starts with "https://," the app is not vulnerable to remote code injection attacks, whereas with "https://" it is vulnerable. Because of the limitation of static analysis, our detection tool cannot detect this, and hence, we consider dynamic URLs to be potentially vulnerable to remote code injection attacks. Note that, for the same reason, the detection tool cannot handle runtime libraries of dynamically loaded classes. We believe that these dynamic issues can be addressed by leveraging dynamic program analysis, as exemplified by Rocha et al. [40] and Sounthiraraj et al. [11].

Related Work
Code Injection Attacks in Android Apps. Poeplau et al. [6] showed that code-loading techniques are often implemented in a vulnerable manner that allows attackers to replace the legitimate codes with malicious codes. They categorized the code-loading techniques into five different groups: class loaders, package context, native code, APK installation, and Runtime.exec. Subsequently, they showed that attackers can abuse these techniques via insecure downloads or unprotected storage to execute their malicious codes. Their work is related to our study as it also covers remote code injection attacks, and some of the code-loading techniques they categorized can be considered as code trigger points. However, there are several differences between this work and ours: (1) They only focused on code resource (executables such as .apk, .so, and .dex) downloads, while we also focus on other resources such as ZIP archives and image files as well as executables.
(2) They only checked the existence of DCL component (i.e., they did not confirm if network attacks are feasible or not), while we checked if remote code injection is feasible using automatic static data flow analysis.
OS Update Attacks. Xing et al. [41] focused on the upgrading logic in the Android platform (specifically, the Package Management Service (PMS)) and found a new type of privilege escalation vulnerability, called Pileup, that occurs when the user upgrades the operating system on the device. By exploiting Pileup vulnerabilities, malicious apps can silently acquire system capabilities that are valid only in the new operating system after an upgrade whereas they did not exist in the old one. Note that threat model and the target (OS) of this work are different from ours.

Script Injection Attacks in Android
Apps. Several studies have been conducted on script injection attacks in Android apps [42][43][44][45][46]. Jin et al. [43] found a new class of script injection attacks on HTML5-based mobile applications. They identified many channels for script injections including barcode, SMS, file system, Contact, Wi-Fi, and NFC and showed that all these channels can be abused for script injection attacks such as Cross-Site Scripting (XSS). Hassanshahi et al. [42] presented web-to-app injection (W2AI) attacks, which allow malicious web attackers to inject malicious scripts by exploiting a web-to-app communication bridge, and showed that a successful attack can abuse WebView and Android native app interfaces. Zhang and Du [46] analyzed Android Clipboard and found that Clipboard data manipulation can lead to common script injection attacks, such as JavaScript injection and command injection.
Smith [45] analyzed injection attacks on Android OS and subsequently developed a detection tool based on taint analysis. The developed tool finds data flows in Android apps that lead from the input points to SQLite or OS Shell APIs.
The OS Shell APIs that load and execute injected scripts are the same as one type of our trigger points; however, by contrast, our work does not rely on data flows between input points and APIs. As described in Section 4, attackers can trigger injected codes using arbitrary overwriting vulnerabilities even without abusing the inputs. Therefore, the trigger points are independent of data flows. Hence, we only identify whether the trigger points are reachable or not.
Taint analysis tracks information flows to reveal unintended information leakage. TaintDroid [47] monitors an app's behavior in real time and performs dynamic taint analysis to detect privacy-sensitive information leaks in Android. While it is more accurate than static analysis, achieving high code coverage is a significant challenge. Dynamic analysis can also be fooled by malicious apps that act benign when they recognize that they are being analyzed [47,48].
To overcome this challenge, many studies employ static analysis [25,26,28,48,49,51]. These studies commonly reconstruct interprocedural control-flow graphs (ICFGs) by modeling the Android app's lifecycle. By analyzing the ICFG and data dependencies, they identify whether a path exists from a source to a sink (usually network I/O APIs). In this work, we leverage FlowDroid [28], a static taint analysis tool, to reconstruct the ICFG. CryptoLint [49] detects misuse of cryptographic libraries via static program slicing. SMV-Hunter [11] identifies Android apps that fail to properly validate SSL certificates based on a combination of static and dynamic analysis.

Conclusion
Android apps often rely on external servers to dynamically update a variety of resources such as executables, images, and temporary files, at runtime. However, these dynamic resource updates can be vulnerable to remote code injection attacks. For example, apps that download code resources (such as .so and .jar) can be abused by network attackers attempting to replace or modify the downloading codes.
While remote code injection attacks against such code resource updates are known, attacks against other resource updates and their impact are still largely unknown. As we have shown in this paper, when an app contains file overwrite vulnerabilities in the dynamic resource update and also contains possible trigger points, attackers can still carry out remote code injection attacks by exploiting these vulnerabilities.
In this work, to identify these kinds of threats, we first investigated three conditions for successful remote code injection attacks: no validation checks or bypassable validation checks, file overwriting vulnerabilities, and trigger points. We then developed a static detection tool that automatically identifies these conditions based on heuristics, string analysis, and data dependency analysis. Finally, we applied the detection tool to a large dataset comprising 9,054 apps, consisting of official market (Google Play), third-party market (Tencent Myapp), and preinstalled apps (system apps). Consequently, we discovered a total of 53 vulnerable apps comprising 25 official market apps and 28 third-party apps (including popular apps and libraries). Our results can provide a lower bound on the number of vulnerable apps in the wild.