Caching Eliminates the Wireless Bottleneck in Video-Aware Wireless Networks

Cellular data traffic almost doubles every year, greatly straining network capacity. The main driver for this development is wireless video. Traditional methods for capacity increase (like using more spectrum and increasing base station density) are very costly, and do not exploit the unique features of video, in particular a high degree of {\em asynchronous content reuse}. In this paper we give an overview of our work that proposed and detailed a new transmission paradigm exploiting content reuse, and the fact that storage is the fastest-increasing quantity in modern hardware. Our network structure uses caching in helper stations (femto-caching) and/or devices, combined with highly spectrally efficient short-range communications to deliver video files. For femto-caching, we develop optimum storage schemes and dynamic streaming policies that optimize video quality. For caching on devices, combined with device-to-device communications, we show that communications within {\em clusters} of mobile stations should be used; the cluster size can be adjusted to optimize the tradeoff between frequency reuse and the probability that a device finds a desired file cached by another device in the same cluster. We show that in many situations the network throughput increases linearly with the number of users, and that D2D communications also is superior in providing a better tradeoff between throughput and outage than traditional base-station centric systems. Simulation results with realistic numbers of users and channel conditions show that network throughput (possibly with outage constraints) can be increased by two orders of magnitude compared to conventional schemes.


I. INTRODUCTION
Demand for video content over wireless networks has grown significantly in recent years and shows no sign of letting up. According to the Cisco Visual Networking Index mobile forecast for 2012-2017 [1], mobile video data is expected to grow at a compound annual growth rate of 75 percent to 7.4 exabyes (one million gigabytes) by 2017. By this time, it is expected to be 66.5 percent of global mobile traffic data (11.2 exabytes), up from 51 percent in 2012 (see Fig. 1). We expect both broadcast and on-demand services will continue to expand, including traditional services like streaming TV content (e.g., sporting events) and newer services like video Twitter, video blogging, cloud-based live video broadcasting, and mobile-tomobile video conferencing and sharing. Meanwhile, hardware platforms (smart phones, tablets, notebooks, television/set-top boxes, in-vehicle infotainment systems) continue to push the envelope in performance and graphical quality. More capable processors, better performing graphics, increased storage capacities, and larger displays make devices more powerful and intelligent than ever before. And with this increase in device capability comes a corresponding increase in demand for high-quality video data; for example, increasing demand for high-definition (HD) and 3D data types. The implications of these trends for future wireless networks are significant. While continued evolution in spectral efficiency is to be expected, the maturity of MIMO, air interfaces using OFDM/OFMDA, and Shannon capacity-approaching codes mean that such spectral efficiency improvements will not deliver the increased capacity needed to support future demand for video data. Additional measures like the brute force expansion of wireless infrastructure (number of cells) and the licensing of more spectrum, while clearly addressing the problem of network capacity, may be prohibitively expensive, require significant time to implement, or be infeasible due to prior spectrum allocations which are not easily modified.
Recognizing these challenges, Intel and several industry partners jointly developed a program to explore nonincremental, systems-level solutions through university research. Known as Video-aware Wireless Networks or simply VAWN, the program considers various approaches to enabling a higher capacity in future wireless networks, and in enabling a higher quality of user experience for video and video-based services delivered over wireless networks to intelligent mobile devices. Broad strategies explored in the program include unconventional optimizations in video transport within the network, optimizations in video processing to reduce network transmission requirements and improve user experience, and novel network architectures better suited to address future capacity and quality of service challenges specific to video.
The approach taken by the group at the University of Southern California (including several of the authors), exploits a unique feature of wireless video, namely the high degree of (asynchronous) content reuse. Based on the fact that storage is cheap and ubiquitous in today's wireless devices, this group developed a new network structure that is based on replacing backhaul by caching. This approach, first proposed in [3], and expounded and refined in a series of papers [4]- [17]), is at the center of the present overview.
A first approach for exploiting asynchronous content reuse, termed Femto-Caching, uses dedicated "helper nodes" that can cache popular files and serve requests from wireless users by enabling localized wireless communication. Such helper nodes are similar to femto-BSs, but they have two key differences: they have added a large storage, 1 while they do not have or need a high-speed backhaul. An even higher density of caching can be achieved by using devices themselves as video caches -in other words, using devices such as tablets and laptops (which nowadays have ample storage) as mobile helper stations [5]. The simplest way of using this storage would have each user cache the most popular files. However, this approach is not efficient because many users are interested in similar files, and thus the same videos will be duplicated on a large number of devices. On the other hand, the cache on each device is too small to cache a reasonably large number of files. Thus, it is preferable that the devices "pool" their caching resources, so that different devices cache different files, and then exchange them, when the occasion arises, through shortrange, highly spectrally efficient, device-to-device (D2D) communications. If a requesting device does not find the file in its neighborhood (or in its own cache), it obtains the file in the traditional manner from the base station (the base station can also control any occuring D2D communications).
The remainder of the paper is organized as follows: in Section II, we describe video coding and video streaming techniques, as well as content reuse and viewing habits. The principle of the new network structure is described in Sec. III. The placement of files in helper nodes and devices is discussed in Sec. IV. Fundamental results about throughput and outage in networks with helper stations and D2D communications are described in Secs. V and VI, respectively. Conclusions in Sec. VII round off the paper.

A. Video Streaming and Quality Management
Wireless channels are inherently dynamic and time varying depending on a number of factors: (i) movement of device (walking, driving), (ii) changes in the reflectors in the environment (people moving, objects moving), (iii) changes in location (insider, outside), (iv) changes in selected wireless network (WiFi, cellular), and (v) changes in the amount of traffic using the network (i.e., congestion). For data and webbased applications, some latency due to changes in available network capacity, while annoying, can be tolerated. However, for video-based applications (especially interactive video conferencing, but also -depending on buffering capability -for video playback), simply treating data communications as latency tolerant is not sufficient. In order to maintain an acceptable quality of experience (QoE), it is necessary to adapt the rate of the streamed video using techniques that take into account such factors as the type of video being streamed (fast motion, complex scenes, interactive), the available capacity of the network, time variations in network and channel state, client device information (screen size, etc.) and playback buffer state. This section will describe some mechanisms for achieving this dynamic adaptation and the role of emerging standards. Fig. 2 below shows a simplified view of an end-to-end system, including a video server on the left, an end rendering device on the right, and a network lying in between. (Note that video streaming applications are the focus here.) Labels are included that identify potential opportunities for managing video traffic in intelligent ways. To accommodate different devices and to support multiple streaming rates, multiple copies (formats, bitrates) of the video content is stored on the server. Alternatively, the video can be transcoded on the fly. The decision of whether to transcode or store multiple copies depends on cost, complexity, and performance tradeoffs, and must take into account the facts of memory and compute resources in the underlying system. It may also depend on the popularity of the content and where the content is stored within the data center or network.
The availability of multiple video streaming rates makes possible dynamic adaptation during a streaming session in response to changes in wireless channel state. Today, multiple copies of the same video provide a range of bitrates to a client device which can choose among them. To improve user playback experience, however, as well as to improve the efficiency of data storage and transport, we believe QoE will be important in the future. Measures of QoE may take into account the quality of the displayed video (resolution, compression artifacts), re-buffering events, and lost packets. QoE metrics provide an alternative to throughput-based approaches which rely on the often mistaken assumption that higher bitrates mean higher quality. A key challenge here, however, is effectively estimating video quality independent of bitrate. Fortunately, a great deal of progress has been made recently by researchers estimating video quality based on both device and content characteristics (see [18]- [20]). This creates new opportunities for optimizing the end-to-end system when tighter coordination between the video server, network, and end devices can be realized.
Enhancements to emerging standards are helping to promote QoE-based optimization within end-to-end systems. In particular, standards supporting Dynamic Adaptive Streaming over HTTP (DASH) are being developed by the MPEG and 3GPP standards bodies (see [21]- [32]). Two recent additions to these standards are (1) the inclusion of QoE feedback metrics from the device to the network, and (2) support for providing QoE metrics along with video content that is sent to a device. (In some cases, video QoE metrics can also be computed directly by the end device.) These additions are important because they enable better system-wide optimization of video transport based on the end user QoE. For example, the device can decide which future segments to request based on the current status of its playback buffer and known quality levels of upcoming segments. This supports a more intelligent balancing of playback quality and re-buffering risk. The network can also make more informed decisions on how to allocate available bandwidth across multiple competing video flows by optimizing the quality jointly across all of them. Using rate-distortion information (a measure of video quality) and playback buffer state for each flow, for instance, a network scheduler can implement QoE-based resource allocation as an alternative to standard proportionally fair throughput schemes.

B. Content Reuse
Wireless Video distinguishes itself from other wireless content through its strong content reuse, i.e., the same content is seen by a large number of peoples. However, in contrast to TV, the bulk of wireless video traffic is due to asynchronous video on demand, where users request video files from some cloud-based server at arbitrary times. As indicated in Sec. I, the use of caching enables to exploit content overlap, even in the presence asynchronism of requests. In other words, a few popular videos (YouTube clips, sports highlights, and movies) account for a considerable percentage of video traffic on the Internet, even though they are viewed at different times by different people. Numerous experimental studies have indicated that Zipf distributions are good models for the measured popularity of video files [33], [34]. Under this model, the frequency of the i-th popular file, denoted by P r (f ), is inversely proportional to its rank: (1) The Zipf exponent γ r characterizes the distribution by controlling the relative popularity of files. Larger γ r exponents correspond to higher content reuse, i.e., the first few popular files account for the majority of requests. Here, m is the size of the library of files that are of interest to the set of considered users (note that the library size can be a function of the number of considered users n; we assume in the following m increase like n α , where α ≥ 0). A further important property of the library is that it changes only on a fairly slow timescale (several days or weeks); it can furthermore be shaped by content providers, e.g., through pricing policies, or other means.
Note, however, some caveats concerning the general applicability of the work in the remainder of the paper. It applies principally to a setting where a content library of relatively large files (e.g., movies and TV shows) is refreshed relatively slowly (e.g., on a daily basis), and where the number of users consuming such a library is significantly larger than the number of items in the library. This may apply to a possible future implementation of movie services, while collections of short videos (like YouTube) show wider ranges of interests. In short, this paper reflects a set of results and approaches that are relevant in the case where the caching phase (placement of content in the caches) occurs with a clear time-scale separation with respect to the delivery phase (the process of delivering video packets for streaming to the users), and where the size of the content library is moderate with respect to the users' population.

III. NETWORK STRUCTURE A. Helper Stations and File Requests
We first consider the network structure with helper stations. The wireless network consists of multiple helper stations H, talking to multiple users U; a central base station may be present to serve users that cannot find the files they want in the helper stations. An example network is shown in Fig. 3. Each user requests a video file from a library F of possible files. We denote the set of helpers in the vicinity of user u as N (u). Similarly, N (h) denotes the set of users in the vicinity of helper h. The helpers may not have access to the whole video library, because of backhaul constraints or caching constraints. In general, we denote by H(f ) the set of helpers that contain file f ∈ F. Hence, user u requesting file f u can only download video chunks from helpers in the set N (u) ∩ H(f u ). In Section V, we consider the problem of devising a dynamic scheduling scheme such that helpers feed the video files sequentially (chunk by chunk) to the requesting users. Given the high density of helpers, any user is typically in the range of multiple helpers. Hence, in order to cope with user-helper association, load balancing and inter-cell interference, an efficient video streaming policy is described in Section V which allows the users to dynamically select the helper node to download from, and determine adaptively the video quality level of the download.

B. Device-to-Device (D2D) Caching Networks
When users also have the ability of prefetching (video) files, instead of requesting the files from the base station or the helpers, we allow users make requests from other users and get served via high-spectral-efficiency D2D links (see Fig. 4). If the D2D links are not available for some users (see Section VI-B), then these unserved users are treated as in outage and  in practice, they can be simply served by the base station or the helpers. To make the network model tractable, we consider the transmission of the video files instead of streaming, and neglect the issue of rate adaptation. In addition, we consider a simple gird structure, which is formed by n user nodes U = {1, . . . , n} placed on a regular grid on the unit square, with minimum distance 1/ √ n. (see Fig. 5(a); we will replace this grid structure by the uniform distribution of the nodes when mentioned specifically.). Let each user u ∈ U request a file f ∈ F = {1, . . . , m} in an i.i.d. manner, according to a given request probability mass function P r (f ), which is assumed to be a Zipf distribution given by (1) with parameter 0 < γ r < 1 [35]. Moreover, we let each user cache M files. The BS keeps track of which devices can communicate with each other, and which files are cached on each device. Such BScontrolled D2D communication is more efficient (and more acceptable to spectrum owners if the communications occur in a licensed band) than traditional uncoordinated peer-to-peer communications.
Communications between nodes follow the protocol model [36]: 2 namely, transmission between user nodes u and v is possible if their distance d(u, v) is less than or equal to some fixed transmission range r, and if there is no other 2 In the simulations of Section VI-D, we relax the protocol model constraint and take interference into consideration by treating it like noise. √ n . b) An example of single-cell layout and the interference avoidance TDMA scheme. In this figure, each square represents a cluster. The gray squares represent the concurrent transmitting clusters. The red area is the disk where the protocol model imposes no other concurrent transmission. r is the worst case transmission range and ∆ is the interference parameter. We assume a common r for all the transmitter-receiver pairs. In this particular example, the TDMA parameter is K = 9.
active transmitter within distance (1 + ∆)r from destination v, where ∆ > 0 is the interference control parameter. Successful transmissions can take place at rate C r bit/s/Hz, which is a non-increasing function of the transmission range r [9]. In this model, we do not consider power control (which would allow different transmit powers, and thus transmission ranges), for each user. Moreover, we treat r as a design parameter that can be set as a function of m and n. 3 All communications are assumed to be single-hop (see also Section VI). These model assumptions allow for a sharp analytical characterization of the throughput scaling law including the leading constants. In Section VI, we will see that the schemes designed by this simple model yields promising performance also in realistic channel propagation and interference conditions. For many of our derivations, we furthermore subdivide the cell into equal-sized, disjoint groups of users that we call "clusters" of size (radius) r, with g c nodes in it. To further simplify the mathematical model, we assume that only nodes that are part of the same cluster can communicate with each other. If a user can find the requested file inside the cluster, we say there is one potential link in this cluster; when at least one link is scheduled, we say that the cluster is "active". We use an interference avoidance scheme, such that at most one link can be active in each cluster on one time-frequency resource.

IV. FILE PLACEMENT
The proposed system operates in two steps: (i) file placement (caching) and (ii) delivery. These two processes happen on different timescales: the cache content needs to change only on a timescale of days, weeks, or months, i.e., much slower than the actual delivery to the users. Thus, caches could be filled either through a very slow backhaul, or through cellular connection at night time, when the spectral resources are not required for other purposes.

A. File placement in helper stations
We start out with the case where complete files are stored in the helper stations. If the distance between helpers is large, and each MS can connect only to a single helper, each helper should cache the most popular files, in sequence of popularity, until its cache is full. However, when each MS can communicate with multiple helpers, the question on how to best assign files to different helpers becomes a more complicated. Consider the case in Figure 6. Users U 1 and U 2 would prefer helper H 1 to cache the M most popular files since this minimizes their expected downloading time. Similarly, user U 4 would prefer that helper H 2 also caches the M most popular files. However U 3 would prefer H 1 to cache the M most popular files and H 2 the second M most popular (or the opposite), thus creating a distributed cache of size 2M for user U 3 . Thus we can see that in the distributed caching problem, the individual objectives of different users may be in conflict, and we need sophisticated algorithms to find an optimum assignment.
Let us assume for the moment that 1) the network topology is known; 2) the long-term average link rates are known; 3) the user demand distribution (file popularity) is known. However, the actual demands are not known beforehand, so that caching placement must be done only based on the statistics of the user requests. Our goal is to minimize the average download time. We distinguish further between uncoded and coded caching In the uncoded case, video-encoded files are cached directly (with the possibility of storing the same file in multiple locations). In the coded case, we consider placing coded chunks of the files on different helper stations, such that obtaining any sufficiently large number of these chunks allows reconstruction of the original video file (e.g., using the scheme in [37]).
In [9] we showed that the uncoded-placement problem is NP-complete. However, it can be formulated as the maximization of a monotone submodular function over matroid constraints, for which a simple greedy strategy achieves at least 1 2 of the optimum value. For the coded case, the optimum cache placement can be formulated as a convex optimization problem, for which optimum solutions can be found through efficient algorithms. In general, the optimum value of delay obtained with the coded optimization is better than the un-coded optimization because any placement matrix with integer entries is a feasible solution to the coded problem. In this sense, the coded optimization is a convex relaxation of the uncoded problem.
We conclude this section by mentioning that the conditions under which we derived the optimum caching are rarely fulfilled in practice. While the user demand distribution P r (f ) may be well estimated and predicted, the network topology is typically time-varying with dynamics comparable or faster than the file transmission, therefore reconfiguring the caches at this time scale is definitely not practical. However, further computer experiments have also shown that the cache distribution obtained when the mobile stations are in "typical" distances from the helpers also provides good performances for various other realizations of random placement of nodes. Furthermore, distributed random caching turns out to be "good enough" as we shall see in Sec. VI. Hence, comparing optimal placement with random caching yields useful insight on the potential performance gap lost by a decentralized approach. Interestingly, in any reasonable network configuration it turns out that such a gap is very small.

B. File placement for D2D communications
Also for D2D communications, the question of which files should be cached by which user are essential. Building on the protocol model explained in Sec. III.B, a critical question for each user is whether the file it is interested in can be found within the communication radius r from its current location. In other words, in order to enable D2D communication it is not sufficient that the distance between two users be less than r; users should also find their desired files in the cache of another device with which they can communicate. The decision of what to store can be taken in a centralized or distributed way, called deterministic and random In deterministic caching a central control (typically the BS) orders the devices to cache specific files. Similar to the situation in femtocaching, we assume that the location of the caching nodes, and the demand distribution, is known. Finding the optimal deterministic file assignment for the general case follows the same principles as for femtocaching outlined above. A simplification occurs when the devices are grouped into clusters such that only communication within the cluster is possible (for more details see Sec. VI). In this case the deterministic caching algorithm is greatly simplified: the devices in the cluster should simply cache the most popular files in a disjoint manner, i.e., no file should be cached twice in the cluster. Deterministic caching is only feasible if the location of the nodes and the Channel State Information (CSI) is known a priori, and remains constant between the filling of the cache and the actual file transmission; thus it applies only if the caching nodes are fixed wireless devices. It is also useful for providing upper performance bounds for other caching strategies. In random caching, each device randomly and independently caches a set of files according to a common probability mass function. In our earlier papers, we assumed that the caching distribution is also a Zipf distribution, though with a parameter γ c that is different from γ r , and which has to be optimized for a particular γ r and r. Since the Zipf distribution is characterized by a single parameter, this description gives important intuitive insights about how concentrated the caching distribution should be.
In [16], we found that the optimal caching distribution P * c that maximizes the probability that any user finds its requested file inside its own cluster is given (for a node arrangement on a rectangular grid as described above) by where ν =

V. ADAPTIVE STREAMING FROM HELPER STATIONS
We now turn to the delivery phase, in particular for the femtocaching (helper station). We furthermore concentrate on the case that the video files are streamed, i.e., that replay at the receiver starts before the complete file has been transmitted. Such streaming is widely used for standard video-on-demand systems, using protocols such as Microsoft Smooth Streaming (Silverlight), Apple HTTP Live Streaming, and 3GPP Dynamic Adaptive Streaming over HTTP (DASH). We have adapted such on-demand streaming to our caching architectures, in particular the network setup with helper stations. Dividing each video stream into chunks, we solve the problem of "which user should get a video "chunk", at what quality, from which helper station".

A. Problem formulation
We represent a video file as a sequence of chunks of equal duration. Each chunk may contain a different number of source-encoded bits, due to variable bit-rate (VBR) coding (see Sec. II), and the same video file is encoded at different quality levels, such that lower quality levels correspond to fewer encoded bits. These quantities can vary across video files, and even for the same video they can vary across both chunks and quality levels. For example, the same compression level may produce a different user quality index as well as a different bit requirement from one chunk to the next, depending on if the video chunk is showing a constant blue sky or a busy city street.
In our system, the requested chunks are queued at the helpers, and each helper h maintains a queue Q hu pointing at each of the users u in its vicinity. We pose the Network Utility Maximization(NUM) problem of maximizing a concave and component wise non-decreasing network utility function φ u (·) of the users' long-term average quality indices D u , subject to stability of the queues Q hu at all the helpers. The concavity of the network utility function imposes some desired notion of fairness between the users. The problem formulation is given as: We solve this problem in [11] using the Lyapunov Drift Plus Penalty approach and obtain a policy that decomposes naturally into two distinct operations that can be implemented in a decentralized fashion: 1) Congestion control; 2) Transmission scheduling.

B. Congestion Control
Congestion control decisions are made at each streaming user, which decides from which helper to request the next chunk and at which quality index this shall be downloaded. For every time slot t, each u ∈ U chooses the helper in its neighborhood having the shortest queue, i.e., Then, it determines the quality level m u (t) of the requested chunk at time t as: where B fu (m, t) and D fu (m, t) are the size in bits and the quality index (could be some subjective measure of video quality; for example SSIM) respectively of chunk t at quality level m. Θ u (t) is a virtual queue introduced to solve the NUM problem. Notice that the streaming of the video file f u may be handled by different helpers across the streaming session, but each individual chunk is entirely downloaded from a single helper. Notice also that in order to compute the above quantities, each user needs to know only local information formed by the queue backlog Q hu (t) and the locally computed virtual queue value Θ u (t). This scheme is reminiscent of the current adaptive streaming technology for video on demand systems, referred to as DASH (Dynamic Adaptive Streaming over HTTP) [27], [38], where the client (user) progressively fetches a video file by downloading successive chunks, and makes adaptive decisions on the quality level based on its current knowledge of the congestion of the underlying serverclient connection. Our policy generalizes DASH by allowing the client to dynamically select the least backlogged server, for each chunk.

C. Transmission Scheduling
At time slot t, the general transmission scheduling consists of maximizing the weighted sum rate of the transmission rates achievable at scheduling slot t. Namely, the network of helpers must solve the Max-Weighted Sum Rate (MWSR) problem: where R(t) is the region of achievable rates supported by the network at time t and µ hu (t) is the scheduled rate from helper h to user u in time slot t. We particularize the above general MWSR problem to a simple physical layer system.

Macro-Diversity:
In this physical layer system, referred to as "macro-diversity", the users can decode multiple data streams from multiple helpers if they are scheduled with nonzero rate on the same slot. In this case, the rate region R(t) is given by the Cartesian product of the following orthogonal access regions where C hu (t) is the peak rate from helper h to user u in time slot t. In the macro-diversity system, the general MWSR problem (4) decomposes into individual problems, to be solved in a decentralized way at each helper node. The solution is given by each helper h independently choosing the user u * h (t) given by with rate vector given by µ hu * h (t) (t) = C hu * h (t) (t) and µ hu (t) = 0 for all u = u * h (t). Notice that here, unlike conventional cellular systems, we do not assign a fixed set of users to each helper. In contrast, the helper-user association is dynamic, and results from the transmission scheduling decision. Notice also that, despite the fact that each helper h is allowed to serve its queues with rates µ hu (t) satisfying (5), the proposed policy allocates the whole t-th downlink slot to a single user u * ∈ N (h), served at its own peak-rate C hu * (t).

D. Algorithm Performance
It can be shown that the time average utility achieved by the proposed policy comes within O( 1 V ) of the utility of a genieaided T -slot look ahead policy for any arbitrary sample path with a O(V ) tradeoff in time averaged backlog. Thus, the scheme provably achieves optimality of the network utility function under dynamic and arbitrarily changing network conditions; details of the proof can be found in [11].

E. Pre-buffering and Re-buffering Chunks
The NUM problem formulation (3) does not take into account the possibility of stall events, i.e., chunks that are not delivered within their playback deadline. This simplification has the advantage of yielding the simple and decentralized scheduling policy described in the previous sections. However, in order to make such policy useful in practice we have to force the system to work in the smooth streaming regime, i.e., in the regime where the stall events have small probability. This can be done by adaptively determining the pre-buffering time T u for each user u on the basis of an estimate of the largest delay of queues {Q hu : h ∈ N (u)}.
We define the size of the playback buffer Ψ t as the number of playable chunks in the buffer not yet played. Without loss of generality, assume that the streaming session starts at t = 1. Then, Ψ t is recursively given by the updating equation: 4 where |a t | is the number of chunks that are completely downloaded in slot t. Let A k denote the time slot in which chunk k arrives at the user and let W k denote the delay with which chunk k is delivered. Note that the longest period during which Ψ t is not incremented is given by the maximum delay to deliver chunks. Thus, each user u needs to adaptively estimate W k in order to choose T u . In the proposed method, at each time t = 1, 2, . . ., user u calculates the maximum observed delay E t in a sliding window of size ∆, by letting: Finally, user u starts its playback when Ψ t crosses the level ξE t , i.e., T u = min{t : Ψ t ≥ ξE t }. where ξ is a tuning parameter. If a stall event occurs at time t, i.e., Ψ t = 0 for t > T u , the algorithm enters a re-buffering phase in which the same algorithm presented above is employed again to determine the new instant t + T u + 1 at which playback is restarted.

F. Extensions
In [12], we consider extensions and improvements of our work. In Sections V-C and V-B, we treated the case of singleantenna base stations and, starting from a network utility maximization (NUM) formulation, we devised a "push" scheduling policy, where users place requests to sequential video chunks to possibly different base stations with adaptive video quality, and base stations schedule their downlink transmissions in order to stabilize their transmission queues. In [12], we consider a "pull" strategy, where every user maintains a request queue, such that users keep track of the video chunks that are effectively delivered. The pull scheme allows to download the chunks in the playback order without skipping or missing them. In addition, motivated by the recent/forthcoming progress in small cell networks (e.g., in wave-2 of the recent IEEE 802.11ac standard), we extend our dynamic streaming approach to the case of base stations capable of multiuser MIMO downlink, i.e., serving multiple users on the same time-frequency slot by spatial multiplexing. By exploiting the "channel hardening" effect of high dimensional MIMO channels, we devise a low complexity user selection scheme to solve the underlying max-weighted rate scheduling (4), which can be easily implemented and runs independently at each base station.

G. Preliminary Implementation
As observed in V-C and V-B, users place their chunk requests from the helpers having the shortest queue pointing at them. Then, transmission scheduling decisions are made by each helper, which maximizes at each scheduling decision time its downlink weighted sum rate where the weights are provided by the queue lengths. The scheme can be implemented in a decentralized manner, as long as each user knows the lengths of the queues of its serving helpers, and each helper knows the individual downlink rate supported to each served user. Queue lengths and link rates represent rather standard protocol overhead information in any suitable wireless scheduling scheme. We have also implemented a version of such scheme on a testbed formed by Android smartphones and tablets, using standard WiFi MAC/PHY [10].

VI. PERFORMANCE OF D2D CACHING NETWORKS
We now turn to D2D networks, i.e., architectures where the devices themselves act as caches. In contrast to our analysis of femtocaching, we consider here only the transmission of video files (i.e., no streaming), and also neglect the issue of video rate adaptation (these are topics of ongoing research). In this section, we first outline the principle and intuitive insights. We then discuss the fundamental scaling laws, both for the sum throughput in the cell (disregarding any fairness considerations), and for the tradeoff between throughput and outage. Combining D2D transmission with coding and multicasting is also discussed.

A. Principle and mathematical model
As outlined in Sec. III.B, we consider a network where each device can cache a fixed number M video files, and send them -upon request -to other devices nearby. If a device cannot obtain a file through D2D communications, it can obtain it from a macro cellular base station (BS) through conventional cellular transmission.
Consider a setup in which clustering is used (see Sec. III.B), and assume furthermore deterministic caching. The main performance factor that can be influenced by the system designer is the cluster size; this is regulated through the transmit power (we assume that it is the same for all users in a cell, but can be optimized as a function of user density, library size, and size of the caches). Increasing cluster size increases the probability for finding the desired file in the cluster, while it decreases the frequency reuse.
There are a number of different criteria for optimizing the system parameters. One obvious candidate is the total network throughput. It is maximized by maximizing the number of active clusters. In [14], we showed that for deterministic caching, the expected throughput can be computed as where P CV C (k) is the probability that the requested file is in the Common Virtual Cache (the union of all caches in the cluster), i.e., among the k most popular files. P r[K = k], the probability that there are k users in a cluster, is deterministic for the rectangular grid arrangement, and for random node placement.

B. Theoretical Scaling Laws analysis
We now turn to scaling laws, i.e., determine how the capacity scales up as more and more users are introduced into the network. We are dealing with "dense" networks, such that the user density increases, while the area covered by a cell remains the same. As mentioned in Section IV-B, for the achievable caching scheme, we consider a simple "decentralized" random caching strategy, where each user caches M files chosen independently on the library F with probability P * c (f ) given by (2). We furthermore deal again with the "clustered" case, i.e., the network is divided into clusters of equal size g c (m) A system admission control scheme decides whether to serve potential links or ignore them. The served potential links in the same cluster are scheduled with equal probability (or, equivalently, in round robin), such that all admitted user requests have the same average throughput E[T u ] = T min (see [16] for formal definitions.), for all users u, where expectation is with respect to the random user requests, random caching, and the link scheduling policy (which may be randomized or deterministic, as a special case). To avoid interference between clusters, we use a time-frequency reuse scheme [39,Ch. 17] with parameter K as shown in Fig. 5(b). In particular, we can pick K = √ 2(1 + ∆) + 1 2 , where ∆ is the interference parameter defined in the protocol model.
In [13], [40] we established lower and upper bounds for the throughput of D2D communications (this was done under the assumption of random node distribution and caching according to a Zipf distribution). The main conclusion from the scaling law is that for highly concentrated demand distribution, γ r > 1, the throughput scales linearly with the number of users, or equivalently the per-user throughput remains constant as the user density increases; the number of users in a cluster also stays constant. For heavy-tailed demand distributions, the throughput of the system increases only sub linearly, as the clusters have to become larger (in terms of number of nodes in the cluster), to be able to find requested files within the caches of the cluster members.
In [16] we sharpened the bounds and extended them to the case of throughput -outage tradeoff. Qualitatively (for formal definition see [16]), we say that a user is in outage if the user cannot be served in the D2D network. This can be caused by: (i) the file requested by the user cannot be found in the user's own cluster, (ii) that the system admission control decides to ignore the request. We define the outage probability p o as the average fraction of users in outage. At this point, we can define the throughput-outage tradeoff as follows: Definition 1: (Throughput-Outage Tradeoff) For a given network and request probability mass function {P r (f ) : f ∈ F}, an outage-throughput pair (p, t) is achievable if there exists a cache placement scheme and an admission control and transmission scheduling policy with outage probability p o ≤ p and minimum per-user average throughput T min ≥ t. The outage-throughput achievable region T (P r , n, m) is the closure of all achievable outage-throughput pairs (p, t). In particular, we let T * (p) = sup{t : (p, t) ∈ T (P r , n, m)}. ♦ Notice that T * (p) is the result of the optimization problem: where the maximization is with respect to the cache placement and transmission policies. Hence, it is immediate to see that T * (p) is non-decreasing in p.
The following results are proved in [16] and yield scaling law of the optimal throughput-outage tradeoff under the clustering transmission scheme defined above.
Although the results of [16] are more general, here we focus on the most relevant regime of the scaling of the file library size with the number of users, referred to as "small library size" in [16]. Namely, we assume that lim n→∞ m α n = 0, where α = 1−γr 2−γr . Since γ r ∈ (0, 1), we have α < 1/2. This means that the library size m can grow even faster than quadratically with the number of users n. In practice, however, the most interesting case is where m is sublinear with respect to n (see [16] for justifications.). Remarkably, any scaling of m versus n slower than n 1/α is captured by the following result: Theorem 1: Assume lim n→∞ m α n = 0. Then, the throughput-outage tradeoff achievable by one-hop D2D network with random caching and clustering behaves as: where a(γ r ), A, B, D are some constant depending on γ r and M , which can be found in [16], and where ρ 1 and ρ 2 are positive parameters satisfying ρ 1 ≥ γ r and ρ 2 ≥ The dominant term in (10) can accurately capture the system performance even in the finite-dimensional case shown by simulations in Fig. 7. Further, also in [16], we can show that the achievable throughput-outage trade-off given by (10) is order optimal. When M n ≥ m (the whole library can be cached in the network.), for arbitrarily small outage probability, by using (10), the per user throughput scales as T * (p) = Θ M m . This means that the per-user throughput is independent of the number of users (or in other words, the network throughput increases linearly with the number of users, as already indicated above. Furthermore, the throughput grows linearly with M . This can be very attractive since, for example, in order to double the throughput, instead of increasing the bandwidth or power, we can just double the (cheap) storage capacity per user.
Interestingly, our result shown by (10) coincides the achievable throughput by using the subpacketized caching and coded multicasting algorithms in [15], [41]. However, in realistic channel assumptions, the result is shown in Section VI-D.

C. Coded caching and multicasting
From the previous analysis of the D2D caching network, one important property of the proposed scheme is that in both the caching phase and the delivery phase, an uncoded approach is applied . The gain of the throughput is mainly obtained by spatial reuse (TDMA). At this point, a natural question to ask is whether coded multicasting for D2D transmissions can provide an additional gain, or whether the coding gain and the spatial reuse gain can accumulate. In [15], we designed a subpacketized caching and a network-coded delivery scheme for the D2D caching networks. The schemes are best to be explained by the example shown in Fig. 8, where we assume no spatial reuse can be used, or only one transmission per time-frequency slot is allowed but the transmission range can cover the whole network. This scheme can be generalized to Fig. 8. Illustration of the example of 3 users, 3 files and M = 2, achieving 1/2 transmissions in term of file. We divide each file into 6 packets (e.g. A is divided into A 1 , · · · , A 6 .) We let user 1 requests A; user 2 requests B and user 3 requests C. The cached packets are shown in the rectangles under each user. For the delivery phase, user 1 transmits B 3 ⊕ C 1 ; user 2 transmits A 5 ⊕ C 2 and user 3 transmits A 6 ⊕ B 4 . The normalized number of transmissions is 3 · 1 6 = 1 2 , which is also information theoretically optimal for this network [15]. any n, m, M . Without using spatial reuse, for zero outage, the achievable normalized number of transmissions such that every user can successfully decode is m M 1 − M m , 5 which is surprisingly almost the same as the result shown in [41], where instead of D2D communications, one central server (base station) which has access to all the files multicasts coded packets. In addition, it also has the same scaling law as the throughput by using our previously proposed decentralized caching and uncoded delivery scheme. 6 Moreover, it can be shown that there is no further gain when spatial reuse is also exploited. In another word, the gains of spatial reuse and coding cannot accumulate. Intuitively, because if spatial reuse is not allowed, a complicated caching scheme can be designed such that one transmission can be useful for as many users as possible. While if we reduce transmission range and perform our scheme in one cluster as shown in Fig. 5(b), then the number of users benefitted by one transmission is reduced but the D2D transmissions can operate simultaneously at a higher rate. Moreover, the complexity of caching subpactization and coding can also be reduced. Hence, the benefit of coding depends on the actual physical layer throughput (bits/s/Hz) and the caching/coding complexity rather than throughput scaling laws.

D. Simulation Results
To see the difference between the performance of the proposed D2D caching network and the state-of-the-art schemes for video streaming, we need to consider the realistic propagation and interference channel mode instead of the protocol model. One reason is that as mentioned in Section VI-B, for small outage probability, the throughput of the proposed D2D scheme has the same scaling laws as the coded multicasting scheme in [41]. The state-of-the-art schemes that will be compared with are conventional unicasting, harmonic broadcasting and coded multicasting, whose details can be found in [17]. In the following, for practice considerations, the proposed uncoded D2D scheme discussed in Section VI-B is used for simulations.
For simulations, we considered a network of size 600m × 600m, where we relax the grid structure of the users' distribution and let n = 10000 users distributed uniformly. The file library has size m = 300 (e.g., 300 popular movies and TV shows to be refreshed on a daily basis at off-peak times by the cellular network). The storage capacity in each user is M = 20 and the parameter for the Zipf distribution is γ r = 0.4 [35]. We considered a regular patterns of buildings of size 50m × 50m, separated by streets of widths 10m [17], with indoor, outdoor, indoor-to-outdoor and outdoor-to-indoor pathloss and shadowing models taken from [42], assuming that D2D links operate at 2.4GHz (WiFi-direct). We assumed a channel bandwidth of 20 MHz in order to provide throughput in bit/s. All the details of the simulation parameters, including the pathloss and shadowing models, can be found in [17]. The simulation 5 We normalize the number of transmissions by the file size, which is assumed to be same for all the files. 6 Notice that the reciprocal of the number of transmissions is proportional to the throughput under our protocol model assumption.
results of the throughput-outage tradeoff for different schemes are given in Fig. 9. We observe that in this realistic propagation scenario the D2D single-hop caching network can provide both large throughput, sufficient for streaming video at standard definition quality, and low outage probability. Also, the D2D caching scheme significantly outperforms the other schemes in the regime of low outage probability. This performance gain is particularly impressive with respect to conventional unicasting and harmonic broadcasting from the base station, which are representative of the current technology. We also note the distinct performance advantages compared to coded multicasting -despite the fact that the two schemes have the same scaling laws. The main reason for this development is that the capacity of multicasting is limited by the "weakest link" between BS and the various MSs, while for the D2D transmission scheme, short distance transmission (which usually has high SNR, shallow fading, and thus high capacity) determine the overall performance.
It is also worthwhile to notice that the scheduling scheme used in the simulations is based on the clustering structure and the interference avoidance (TDMA) discussed in Section VI-B without using any advanced interference management scheme such as FlashLinQ [43] and ITLinQ [44], which may provide an even higher gain in terms of throughput for the D2D caching networks.

VII. CONCLUSIONS
As user demand for video data continues to increase sharply in cellular networks, new approaches are needed to dramatically expand network capacity. This paper has provided an overview of an approach explored by the University of Southern California as part of the industry-sponsored research program, Video Aware Wireless Networks (VAWN). The approach exploits a key feature of wireless video, namely the high degree of (asynchronous) content reuse across users. To exploit this feature, we propose replacing expensive backhaul infrastructure with inexpensive caching capabilities. This can be realized in two ways: the use of Femto-Caching or dedicated helper nodes that cache popular files and serve nearby user requests, and the use of user devices themselves to cache and exchange files using device-to-device (D2D) communications. Simulations with realistic settings show that even for relatively low-density deployment of helper stations, throughput can be increased by a factor five. D2D networks allow in many situations a throughput increase that is linear with the number of users (thus making the per-user throughput independent of the number of users). Simulations in realistic propagation channels, storage capacity settings, video popularity distributions, and user densities show that (for constant outage), the throughput can be two orders of magnitude or more higher than the state-of-the-art multicast systems.
A key issue in our caching approach is that of file placement. In the helper node approach, we show that the problem of minimizing average file downloading time in the uncoded placement case (video-encoded files are cached directly on help nodes) is NP-complete, but can be reformulated and is solvable as a monotone submodular function over matroid constraints. For the coded case (coded chunks of files are placed on different helper stations), optimum cache placement can be formulated and is solvable as a convex optimization problem. Also for the D2D approach, the question of which files to cache is key. Two approaches are deterministic caching in which a BS instructs devices which files to cache (i.e., the most popular and in a disjoint manner), and random caching in which each device randomly caches a set of files according to a probability mass function. It is remarkable that the simple random caching is not only optimum from a scaling-law point of view, but also in numerical simulations provides throughputs that are close to the deterministic caching (which is ideal but difficult to realize for time-varying topologies).
An important area of future work is that of predicting user requests. The effectiveness of caching schemes depends not only on the degree of content reuse, but on our ability to understand and predict request behavior across clusters of users. Furthermore, the approach is predicated on a "timescale decomposition", namely that request distributions change much more slowly (over days or weeks) than the time it takes to stream a video (minutes to a couple of hours). For femto-caching, it is noteworthy that the type of users (and thus the requests) within range of a helper station might change over the course of a day; more research on how such spatio-temporal aspects can be predicted and accommodated is required. Similarly, the impact of social networks on user preferences could be exploited.
In the D2D sphere, research on new approaches for incentivizing users to participate in cooperative caching schemes is needed. Both helper node and D2D caching schemes would benefit from research into multi-hop cache retrieval schemes and PHY schemes that better exploit advances in wireless communication technology (e.g., multiuser MIMO). In the D2D area, we are/will be investigating how to optimize neighbor discovery, estimating channel conditions and then using the information to make scheduling optimizations, and transmission schemes closely tuned to existing communications standards like WiFi Direct.