|
SESSION: Invited talks |
|
|
|
Performance implications of flash and storage class memories |
|
Naresh M. Patel |
|
Pages: 1-2 |
|
doi>10.1145/2254756.2254758 |
|
Full text: PDF |
|
The storage industry has seen incredible growth in data storage needs by both consumers and enterprises. Long-term technology trends mean that the data deluge will continue well into the future. These trends include the big-data trend (driven by data ...
The storage industry has seen incredible growth in data storage needs by both consumers and enterprises. Long-term technology trends mean that the data deluge will continue well into the future. These trends include the big-data trend (driven by data mining analytics, high-bandwidth needs, and large content repositories), server virtualization, cloud storage, and Flash. We will cover how Flash and storage class memories (SCM) interact with some of these major trends from a performance perspective. expand |
|
High-performance computing in mobile services |
|
Zhen Liu |
|
Pages: 3-4 |
|
doi>10.1145/2254756.2254759 |
|
Full text: PDF |
|
With the ever increasing popularity of smart phones, mobile services have been evolving rapidly to allow users to enjoy localized and personalized experiences. Users can discover local information and keep connected with family and friends on the go, ...
With the ever increasing popularity of smart phones, mobile services have been evolving rapidly to allow users to enjoy localized and personalized experiences. Users can discover local information and keep connected with family and friends on the go, and ultimately to experience the convergence of cyber space and physical world where digital technologies are interwoven into the day-to-day life. A pivotal component of such a cyber-physical convergence is the contextual intelligence. The extraction and dissemination of contextual information around users is the key for the cyber capabilities to be applied to physical activities and for the cyber world to better reflect the physical reality. In this talk, we shall address some issues arising from context-based mobile services. In particular, we discuss how mobility impacts contextual relevancy and personalization in mobile services. The relevancy and timeliness of contextual information not only are essential for these services to deliver great user experiences, but also put significant computation pressure on service infrastructure that processes continuous data streams in real time and disseminate relevant data to a large amount of mobile users. This talk will explore the challenges and opportunities for high-performance computing in mobile services. Based on key findings from large-scale mobile measurement data, the talk will analyze the tradeoff of different computing architectures, present case studies of scalable system design and implementation for personalized mobile services, and conclude with open challenges for the broad research community in performance measurement and modeling. expand |
|
SESSION: Scheduling and load-balancing |
|
|
|
Delay tails in MapReduce scheduling |
|
Jian Tan, Xiaoqiao Meng, Li Zhang |
|
Pages: 5-16 |
|
doi>10.1145/2254756.2254761 |
|
Full text: PDF |
|
MapReduce/Hadoop production clusters exhibit heavy-tailed characteristics for job processing times. These phenomena are resultant of the workload features and the adopted scheduling algorithms. Analytically understanding the delays under different schedulers ...
MapReduce/Hadoop production clusters exhibit heavy-tailed characteristics for job processing times. These phenomena are resultant of the workload features and the adopted scheduling algorithms. Analytically understanding the delays under different schedulers for MapReduce can facilitate the design and deployment of large Hadoop clusters. The map and reduce tasks of a MapReduce job have fundamental difference and tight dependence between them, complicating the analysis. This also leads to an interesting starvation problem with the widely used Fair Scheduler due to its greedy approach to launching reduce tasks. To address this issue, we design and implement Coupling Scheduler, which gradually launches reduce tasks depending on map task progresses. Real experiments demonstrate improvements to job response times by up to an order of magnitude.
Based on extensive measurements and source code investigations, we propose analytical models for the default FIFO and Fair Scheduler as well as our implemented Coupling Scheduler. For a class of heavy-tailed map service time distributions, i.e., regularly varying of index -a, we derive the distribution tail of the job processing delay under the three schedulers, respectively. The default FIFO Scheduler causes the delay to be regularly varying of index -a+1. Interestingly, we discover a criticality phenomenon for Fair Scheduler, the delay under which can change from regularly varying of index -a to -a+1, depending on the maximum number of reduce tasks of a job. Other more subtle behaviors also exist. In contrast, the delay distribution tail under Coupling Scheduler can be one order lower than Fair Scheduler under some conditions, implying a better performance. expand |
|
Optimal queue-size scaling in switched networks |
|
Devavrat Shah, Neil Walton, Yuan Zhong |
|
Pages: 17-28 |
|
doi>10.1145/2254756.2254762 |
|
Full text: PDF |
|
We consider a switched (queueing) network in which there are constraints on which queues may be served simultaneously; such networks have been used to effectively model input-queued switches and wireless networks. The scheduling policy for such a network ...
We consider a switched (queueing) network in which there are constraints on which queues may be served simultaneously; such networks have been used to effectively model input-queued switches and wireless networks. The scheduling policy for such a network specifies which queues to serve at any point in time, based on the current state or past history of the system. In the main result of this paper, we provide a new class of online scheduling policies that achieve optimal average queue-size scaling for a class of switched networks including input-queued switches. In particular, it establishes the validity of a conjecture about optimal queue-size scaling for input-queued switches. expand |
|
Minimizing slowdown in heterogeneous size-aware dispatching systems |
|
Esa Hyyti?, Samuli Aalto, Aleksi Penttinen |
|
Pages: 29-40 |
|
doi>10.1145/2254756.2254763 |
|
Full text: PDF |
|
We consider a system of parallel queues where tasks are assigned (dispatched) to one of the available servers upon arrival. The dispatching decision is based on the full state information, i.e., on the sizes of the new and existing jobs. We are interested ...
We consider a system of parallel queues where tasks are assigned (dispatched) to one of the available servers upon arrival. The dispatching decision is based on the full state information, i.e., on the sizes of the new and existing jobs. We are interested in minimizing the so-called mean slowdown criterion corresponding to the mean of the sojourn time divided by the processing time. Assuming no new jobs arrive, the shortest-processing-time-product (SPTP) schedule is known to minimize the slowdown of the existing jobs. The main contribution of this paper is three-fold: 1) To show the optimality of SPTP with respect to slowdown in a single server queue under Poisson arrivals; 2) to derive the so-called size-aware value functions for M/G/1-FIFO/LIFO/SPTP with general holding costs of which the slowdown criterion is a special case; and 3) to utilize the value functions to derive efficient dispatching policies so as to minimize the mean slowdown in a heterogeneous server system. The derived policies offer a significantly better performance than e.g., the size-aware-task-assignment with equal load (SITA-E) and least-work-left (LWL) policies. expand |
|
Bipartite graph structures for efficient balancing of heterogeneous loads |
|
Mathieu Leconte, Marc Lelarge, Laurent Massoulié |
|
Pages: 41-52 |
|
doi>10.1145/2254756.2254764 |
|
Full text: PDF |
|
This paper considers large scale distributed content service platforms, such as peer-to-peer video-on-demand systems. Such systems feature two basic resources, namely storage and bandwidth. Their efficiency critically depends on two factors: (i) content ...
This paper considers large scale distributed content service platforms, such as peer-to-peer video-on-demand systems. Such systems feature two basic resources, namely storage and bandwidth. Their efficiency critically depends on two factors: (i) content replication within servers, and (ii) how incoming service requests are matched to servers holding requested content. To inform the corresponding design choices, we make the following contributions. We first show that, for underloaded systems, so-called proportional content placement with a simple greedy strategy for matching requests to servers ensures full system efficiency provided storage size grows logarithmically with the system size. However, for constant storage size, this strategy undergoes a phase transition with severe loss of efficiency as system load approaches criticality.
To better understand the role of the matching strategy in this performance degradation, we characterize the asymptotic system efficiency under an optimal matching policy. Our analysis shows that -in contrast to greedy matching- optimal matching incurs an inefficiency that is exponentially small in the server storage size, even at critical system loads. It further allows a characterization of content replication policies that minimize the inefficiency. These optimal policies, which differ markedly from proportional placement, have a simple structure which makes them implementable in practice.
On the methodological side, our analysis of matching performance uses the theory of local weak limits of random graphs, and highlights a novel characterization of matching numbers in bipartite graphs, which may both be of independent interest. expand |
|
SESSION: Characterization |
|
|
|
Workload analysis of a large-scale key-value store |
|
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, Mike Paleczny |
|
Pages: 53-64 |
|
doi>10.1145/2254756.2254766 |
|
Full text: PDF |
|
Key-value stores are a vital component in many scale-out enterprises, including social networks, online retail, and risk analysis. Accordingly, they are receiving increased attention from the research community in an effort to improve their performance, ...
Key-value stores are a vital component in many scale-out enterprises, including social networks, online retail, and risk analysis. Accordingly, they are receiving increased attention from the research community in an effort to improve their performance, scalability, reliability, cost, and power consumption. To be effective, such efforts require a detailed understanding of realistic key-value workloads. And yet little is known about these workloads outside of the companies that operate them. This paper aims to address this gap.
To this end, we have collected detailed traces from Facebook's Memcached deployment, arguably the world's largest. The traces capture over 284 billion requests from five different Memcached use cases over several days. We analyze the workloads from multiple angles, including: request composition, size, and rate; cache efficacy; temporal patterns; and application use cases. We also propose a simple model of the most representative trace to enable the generation of more realistic synthetic workloads by the community.
Our analysis details many characteristics of the caching workload. It also reveals a number of surprises: a GET/SET ratio of 30:1 that is higher than assumed in the literature; some applications of Memcached behave more like persistent storage than a cache; strong locality metrics, such as keys accessed many millions of times a day, do not always suffice for a high hit rate; and there is still room for efficiency and hit rate improvements in Memcached's implementation. Toward the last point, we make several suggestions that address the exposed deficiencies. expand |
|
A first look at cellular machine-to-machine traffic: large scale measurement and characterization |
|
Muhammad Zubair Shafiq, Lusheng Ji, Alex X. Liu, Jeffrey Pang, Jia Wang |
|
Pages: 65-76 |
|
doi>10.1145/2254756.2254767 |
|
Full text: PDF |
|
Cellular network based Machine-to-Machine (M2M) communication is fast becoming a market-changing force for a wide spectrum of businesses and applications such as telematics, smart metering, point-of-sale terminals, and home security and automation systems. ...
Cellular network based Machine-to-Machine (M2M) communication is fast becoming a market-changing force for a wide spectrum of businesses and applications such as telematics, smart metering, point-of-sale terminals, and home security and automation systems. In this paper, we aim to answer the following important question: Does traffic generated by M2M devices impose new requirements and challenges for cellular network design and management? To answer this question, we take a first look at the characteristics of M2M traffic and compare it with traditional smartphone traffic. We have conducted our measurement analysis using a week-long traffic trace collected from a tier-1 cellular network in the United States. We characterize M2M traffic from a wide range of perspectives, including temporal dynamics, device mobility, application usage, and network performance.
Our experimental results show that M2M traffic exhibits significantly different patterns than smartphone traffic in multiple aspects. For instance, M2M devices have a much larger ratio of uplink to downlink traffic volume, their traffic typically exhibits different diurnal patterns, they are more likely to generate synchronized traffic resulting in bursty aggregate traffic volumes, and are less mobile compared to smartphones. On the other hand, we also find that M2M devices are generally competing with smartphones for network resources in co-located geographical regions. These and other findings suggest that better protocol design, more careful spectrum allocation, and modified pricing schemes may be needed to accommodate the rise of M2M devices. expand |
|
Bundling practice in BitTorrent: what, how, and why |
|
Jinyoung Han, Seungbae Kim, Taejoong Chung, Ted Taekyoung Kwon, Hyun-chul Kim, Yanghee Choi |
|
Pages: 77-88 |
|
doi>10.1145/2254756.2254768 |
|
Full text: PDF |
|
We conduct comprehensive measurements on the current practice of content bundling to understand the structural patterns of torrents and the participant behaviors of swarms on one of the largest BitTorrent portals: The Pirate Bay. From the datasets of ...
We conduct comprehensive measurements on the current practice of content bundling to understand the structural patterns of torrents and the participant behaviors of swarms on one of the largest BitTorrent portals: The Pirate Bay. From the datasets of the 120K torrents and 14.8M peers, we investigate what constitutes torrents and how users participate in swarms from the perspective of bundling, across different content categories: Movie, TV, Porn, Music, Application, Game and E-book. In particular, we focus on: (1) how prevalent content bundling is, (2) how and what files are bundled into torrents, (3) what motivates publishers to bundle files, and (4) how peers access the bundled files. We find that over 72% of BitTorrent torrents contain multiple files, which indicates that bundling is widely used for file sharing. We reveal that profit-driven BitTorrent publishers who promote their own web sites for financial gains like advertising tend to prefer to use the bundling. We also observe that most files (94%) in a bundle torrent are selected by users and the bundle torrents are more popular than the single (or non-bundle) ones on average. Overall, there are notable differences in the structural patterns of torrents and swarm characteristics (i) across different content categories and (ii) between single and bundle torrents. expand |
|
SESSION: Networking |
|
|
|
Energy-efficient congestion control |
|
Lingwen Gan, Anwar Walid, Steven Low |
|
Pages: 89-100 |
|
doi>10.1145/2254756.2254770 |
|
Full text: PDF |
|
Various link bandwidth adjustment mechanisms are being developed to save network energy. However, their interaction with congestion control can significantly reduce network throughput, and is not well understood. We firstly put forward a framework to ...
Various link bandwidth adjustment mechanisms are being developed to save network energy. However, their interaction with congestion control can significantly reduce network throughput, and is not well understood. We firstly put forward a framework to study this interaction, and then propose an easily implementable dynamic bandwidth adjustment (DBA) mechanism for the links. In DBA, each link updates its bandwidth according to an integral control law to match its average buffer size with a target buffer size. We prove that DBA reduces link bandwidth without sacrificing throughput---DBA only turns off excess bandwidth---in the presence of congestion control. Preliminary ns2 simulations confirm this result. expand |
|
Uniform approximation of the distribution for the number of retransmissions of bounded documents |
|
Predrag R. Jelenkovic, Evangelia D. Skiani |
|
Pages: 101-112 |
|
doi>10.1145/2254756.2254771 |
|
Full text: PDF |
|
Retransmission-based failure recovery represents a primary approach in existing communication networks, on all protocol layers, that guarantees data delivery in the presence of channel failures. Contrary to the traditional belief that the number of retransmissions ...
Retransmission-based failure recovery represents a primary approach in existing communication networks, on all protocol layers, that guarantees data delivery in the presence of channel failures. Contrary to the traditional belief that the number of retransmissions is geometrically distributed, a new phenomenon was discovered recently, which shows that retransmissions can cause long (-tailed) delays and instabilities even if all traffic and network characteristics are light-tailed, e.g., exponential or Gaussian. Since the preceding finding holds under the assumption that data sizes have infinite support, in this paper we investigate the practically important case of bounded data units 0≤ Lb≤ b. To this end, we provide an explicit and uniform characterization of the entire body of the retransmission distribution Pr[Nb > n] in both n and b. This rigorous approximation clearly demonstrates the previously observed transition from power law distributions in the main body to exponential tails. The accuracy of our approximation is validated with a number of simulation experiments. Furthermore, the results highlight the importance of wisely determining the size of data units in order to accommodate the performance needs in retransmission-based systems. From a broader perspective, this study applies to any other system, e.g., computing, where restart mechanisms are employed after a job processing failure. expand |
|
Fluid limit of an asynchronous optical packet switch with shared per link full range wavelength conversion |
|
Benny Van Houdt, Luca Bortolussi |
|
Pages: 113-124 |
|
doi>10.1145/2254756.2254772 |
|
Full text: PDF |
|
We consider an asynchronous all optical packet switch (OPS) where each link consists of N wavelength channels and a pool of C ≤ N full range tunable wavelength converters. Under the assumption of Poisson arrivals with rate λ (per wavelength ...
We consider an asynchronous all optical packet switch (OPS) where each link consists of N wavelength channels and a pool of C ≤ N full range tunable wavelength converters. Under the assumption of Poisson arrivals with rate λ (per wavelength channel) and exponential packet lengths, we determine a simple closed-form expression for the limit of the loss probabilities Ploss(N) as N tends to infinity (while the load and conversion ratio σ=C/N remains fixed). More specifically, for σ ≤ λ2 the loss probability tends to (λ2-σ)/λ(1+λ), while for σ > λ2 the loss tends to zero. We also prove an insensitivity result when the exponential packet lengths are replaced by certain classes of phase-type distributions. A key feature of the dynamical system (i.e., set of ODEs) that describes the limit behavior of this OPS switch, is that its right-hand side is discontinuous. To prove the convergence, we therefore had to generalize some existing result to the setting of piece-wise smooth dynamical systems. expand |
|
Towards optimal error-estimating codes through the lens of Fisher information analysis |
|
Nan Hua, Ashwin Lall, Baochun Li, Jun Xu |
|
Pages: 125-136 |
|
doi>10.1145/2254756.2254773 |
|
Full text: PDF |
|
Error estimating coding (EEC) has recently been established as an important tool to estimate bit error rates in the transmission of packets over wireless links, with a number of potential applications in wireless networks. In this paper, we present an ...
Error estimating coding (EEC) has recently been established as an important tool to estimate bit error rates in the transmission of packets over wireless links, with a number of potential applications in wireless networks. In this paper, we present an in-depth study of error estimating codes through the lens of Fisher information analysis and find that the original EEC estimator fails to exploit the information contained in its code to the fullest extent. Motivated by this discovery, we design a new estimator for the original EEC algorithm, which significantly improves the estimation accuracy, and is empirically very close to the Cramer-Rao bound. Following this path, we generalize the EEC algorithm to a new family of algorithms called gEEC generalized EEC. These algorithms can be tuned to hold 25-35% more information with the same overhead, and hence deliver even better estimation accuracy---close to optimal, as evidenced by the Cramer-Rao bound. Our theoretical analysis and assertions are supported by extensive experimental evaluation. expand |
|
SESSION: Pricing |
|
|
|
How well can congestion pricing neutralize denial of service attacks? |
|
Ashish Vulimiri, Gul A. Agha, Philip Brighten Godfrey, Karthik Lakshminarayanan |
|
Pages: 137-150 |
|
doi>10.1145/2254756.2254775 |
|
Full text: PDF |
|
Denial of service protection mechanisms usually require classifying malicious traffic, which can be difficult. Another approach is to price scarce resources. However, while congestion pricing has been suggested as a way to combat DoS attacks, it has ...
Denial of service protection mechanisms usually require classifying malicious traffic, which can be difficult. Another approach is to price scarce resources. However, while congestion pricing has been suggested as a way to combat DoS attacks, it has not been shown quantitatively how much damage a malicious player could cause to the utility of benign participants. In this paper, we quantify the protection that congestion pricing affords against DoS attacks, even for powerful attackers that can control their packets' routes. Specifically, we model the limits on the resources available to the attackers in three different ways and, in each case, quantify the maximum amount of damage they can cause as a function of their resource bounds. In addition, we show that congestion pricing is provably superior to fair queueing in attack resilience. expand |
|
Pricing cloud bandwidth reservations under demand uncertainty |
|
Di Niu, Chen Feng, Baochun Li |
|
Pages: 151-162 |
|
doi>10.1145/2254756.2254776 |
|
Full text: PDF |
|
In a public cloud, bandwidth is traditionally priced in a pay-as-you-go model. Reflecting the recent trend of augmenting cloud computing with bandwidth guarantees, we consider a novel model of cloud bandwidth allocation and pricing when explicit bandwidth ...
In a public cloud, bandwidth is traditionally priced in a pay-as-you-go model. Reflecting the recent trend of augmenting cloud computing with bandwidth guarantees, we consider a novel model of cloud bandwidth allocation and pricing when explicit bandwidth reservation is enabled. We argue that a tenant's utility depends not only on its bandwidth usage, but more importantly on the portion of its demand that is satisfied with a performance guarantee. Our objective is to determine the optimal policy for pricing cloud bandwidth reservations, in order to maximize social welfare, i.e., the sum of the expected profits that can be made by all tenants and the cloud provider, even with the presence of demand uncertainty. The problem turns out to be a large-scale network optimization problem with a coupled objective function. We propose two new distributed solutions --- based on chaotic equation updates and cutting-plane methods --- that prove to be more efficient than existing solutions based on consistency pricing and subgradient methods. In addition, we address the practical challenge of forecasting demand statistics, required by our optimization problem as input. We propose a factor model for near-future demand prediction, and test it on a real-world video workload dataset. All included, we have designed a fully computerized trading environment for cloud bandwidth reservations, which operates effectively at a fine granularity of as small as ten minutes in our trace-driven simulations. expand |
|
SESSION: Green data centers |
|
|
|
Temperature management in data centers: why some (might) like it hot |
|
Nosayba El-Sayed, Ioan A. Stefanovici, George Amvrosiadis, Andy A. Hwang, Bianca Schroeder |
|
Pages: 163-174 |
|
doi>10.1145/2254756.2254778 |
|
Full text: PDF |
|
The energy consumed by data centers is starting to make up a significant fraction of the world's energy consumption and carbon emissions. A large fraction of the consumed energy is spent on data center cooling, which has motivated a large body of work ...
The energy consumed by data centers is starting to make up a significant fraction of the world's energy consumption and carbon emissions. A large fraction of the consumed energy is spent on data center cooling, which has motivated a large body of work on temperature management in data centers. Interestingly, a key aspect of temperature management has not been well understood: controlling the setpoint temperature at which to run a data center's cooling system. Most data centers set their thermostat based on (conservative) suggestions by manufacturers, as there is limited understanding of how higher temperatures will affect the system. At the same time, studies suggest that increasing the temperature setpoint by just one degree could save 2-5% of the energy consumption. This paper provides a multi-faceted study of temperature management in data centers. We use a large collection of field data from different production environments to study the impact of temperature on hardware reliability, including the reliability of the storage subsystem, the memory subsystem and server reliability as a whole. We also use an experimental testbed based on a thermal chamber and a large array of benchmarks to study two other potential issues with higher data center temperatures: the effect on server performance and power. Based on our findings, we make recommendations for temperature management in data centers, that create the potential for saving energy, while limiting negative effects on system reliability and performance. expand |
|
Renewable and cooling aware workload management for sustainable data centers |
|
Zhenhua Liu, Yuan Chen, Cullen Bash, Adam Wierman, Daniel Gmach, Zhikui Wang, Manish Marwah, Chris Hyser |
|
Pages: 175-186 |
|
doi>10.1145/2254756.2254779 |
|
Full text: PDF |
|
Recently, the demand for data center computing has surged, increasing the total energy footprint of data centers worldwide. Data centers typically comprise three subsystems: IT equipment provides services to customers; power infrastructure supports the ...
Recently, the demand for data center computing has surged, increasing the total energy footprint of data centers worldwide. Data centers typically comprise three subsystems: IT equipment provides services to customers; power infrastructure supports the IT and cooling equipment; and the cooling infrastructure removes heat generated by these subsystems. This work presents a novel approach to model the energy flows in a data center and optimize its operation. Traditionally, supply-side constraints such as energy or cooling availability were treated independently from IT workload management. This work reduces electricity cost and environmental impact using a holistic approach that integrates renewable supply, dynamic pricing, and cooling supply including chiller and outside air cooling, with IT workload planning to improve the overall sustainability of data center operations. Specifically, we first predict renewable energy as well as IT demand. Then we use these predictions to generate an IT workload management plan that schedules IT workload and allocates IT resources within a data center according to time varying power supply and cooling efficiency. We have implemented and evaluated our approach using traces from real data centers and production systems. The results demonstrate that our approach can reduce both the recurring power costs and the use of non-renewable energy by as much as 60% compared to existing techniques, while still meeting the Service Level Agreements. expand |
|
Energy storage in datacenters: what, where, and how much? |
|
Di Wang, Chuangang Ren, Anand Sivasubramaniam, Bhuvan Urgaonkar, Hosam Fathy |
|
Pages: 187-198 |
|
doi>10.1145/2254756.2254780 |
|
Full text: PDF |
|
Energy storage - in the form of UPS units - in a datacenter has been primarily used to fail-over to diesel generators upon power outages. There has been recent interest in using these Energy Storage Devices (ESDs) for demand-response (DR) to either shift ...
Energy storage - in the form of UPS units - in a datacenter has been primarily used to fail-over to diesel generators upon power outages. There has been recent interest in using these Energy Storage Devices (ESDs) for demand-response (DR) to either shift peak demand away from high tariff periods, or to shave demand allowing aggressive under-provisioning of the power infrastructure. All such prior work has only considered a single/specific type of ESD (typically re-chargeable lead-acid batteries), and has only employed them at a single level of the power delivery network. Continuing technological advances have provided us a plethora of competitive ESD options ranging from ultra-capacitors, to different kinds of batteries, flywheels and even compressed air-based storage. These ESDs offer very different trade-offs between their power and energy costs, densities, lifetimes, and energy efficiency, among other factors, suggesting that employing hybrid combinations of these may allow more effective DR than with a single technology. Furthermore, ESDs can be placed at different, and possibly multiple, levels of the power delivery hierarchy with different associated trade-offs. To our knowledge, no prior work has studied the extensive design space involving multiple ESD technology provisioning and placement options. This paper intends to fill this critical void, by presenting a theoretical framework for capturing important characteristics of different ESD technologies, the trade-offs of placing them at different levels of the power hierarchy, and quantifying the resulting cost-benefit trade-offs as a function of workload properties. expand |
|
SESSION: Epidemics |
|
|
|
Rumor centrality: a universal source detector |
|
Devavrat Shah, Tauhid Zaman |
|
Pages: 199-210 |
|
doi>10.1145/2254756.2254782 |
|
Full text: PDF |
|
We consider the problem of detecting the source of a rumor (information diffusion) in a network based on observations about which set of nodes possess the rumor. In a recent work [10], this question was introduced and studied. The authors proposed rumor ...
We consider the problem of detecting the source of a rumor (information diffusion) in a network based on observations about which set of nodes possess the rumor. In a recent work [10], this question was introduced and studied. The authors proposed rumor centrality as an estimator for detecting the source. They establish it to be the maximum likelihood estimator with respect to the popular Susceptible Infected (SI) model with exponential spreading time for regular trees. They showed that as the size of infected graph increases, for a line (2-regular tree) graph, the probability of source detection goes to 0 while for d-regular trees with d ≥ 3 the probability of detection, say αd, remains bounded away from 0 and is less than 1/2. Their results, however stop short of providing insights for the heterogeneous setting such as irregular trees or the SI model with non-exponential spreading times.
This paper overcomes this limitation and establishes the effectiveness of rumor centrality for source detection for generic random trees and the SI model with a generic spreading time distribution. The key result is an interesting connection between a multi-type continuous time branching process (an equivalent representation of a generalized Polya's urn, cf. [1]) and the effectiveness of rumor centrality. Through this, it is possible to quantify the detection probability precisely. As a consequence, we recover all the results of [10] as a special case and more importantly, we obtain a variety of results establishing the universality of rumor centrality in the context of tree-like graphs and the SI model with a generic spreading time distribution. expand |
|
Learning the graph of epidemic cascades |
|
Praneeth Netrapalli, Sujay Sanghavi |
|
Pages: 211-222 |
|
doi>10.1145/2254756.2254783 |
|
Full text: PDF |
|
We consider the problem of finding the graph on which an epidemic spreads, given only the times when each node gets infected. While this is a problem of central importance in several contexts -- offline and online social networks, e-commerce, ...
We consider the problem of finding the graph on which an epidemic spreads, given only the times when each node gets infected. While this is a problem of central importance in several contexts -- offline and online social networks, e-commerce, epidemiology -- there has been very little work, analytical or empirical, on finding the graph. Clearly, it is impossible to do so from just one epidemic; our interest is in learning the graph from a small number of independent epidemics.
For the classic and popular "independent cascade" epidemics, we analytically establish sufficient conditions on the number of epidemics for both the global maximum-likelihood (ML) estimator, and a natural greedy algorithm to succeed with high probability. Both results are based on a key observation: the global graph learning problem decouples into n local problems -- one for each node. For a node of degree d, we show that its neighborhood can be reliably found once it has been infected O(d2 log n) times (for ML on general graphs) or O(d log n) times (for greedy on trees). We also provide a corresponding information-theoretic lower bound of Ω(d log n); thus our bounds are essentially tight.
Furthermore, if we are given side-information in the form of a super-graph of the actual graph (as is often the case), then the number of epidemic samples required -- in all cases -- becomes independent of the network size n. expand |
|
Network forensics: random infection vs spreading epidemic |
|
Chris Milling, Constantine Caramanis, Shie Mannor, Sanjay Shakkottai |
|
Pages: 223-234 |
|
doi>10.1145/2254756.2254784 |
|
Full text: PDF |
|
Computer (and human) networks have long had to contend with spreading viruses. Effectively controlling or curbing an outbreak requires understanding the dynamics of the spread. A virus that spreads by taking advantage of physical links or user-acquaintance ...
Computer (and human) networks have long had to contend with spreading viruses. Effectively controlling or curbing an outbreak requires understanding the dynamics of the spread. A virus that spreads by taking advantage of physical links or user-acquaintance links on a social network can grow explosively if it spreads beyond a critical radius. On the other hand, random infections (that do not take advantage of network structure) have very different propagation characteristics. If too many machines (or humans) are infected, network structure becomes essentially irrelevant, and the different spreading modes appear identical. When can we distinguish between mechanics of infection? Further, how can this be done efficiently? This paper studies these two questions. We provide sufficient conditions for different graph topologies, for when it is possible to distinguish between a random model of infection and a spreading epidemic model, with probability of misclassification going to zero. We further provide efficient algorithms that are guaranteed to work in different regimes. expand |
|
SESSION: Memory and storage |
|
|
|
What is a good buffer cache replacement scheme for mobile flash storage? |
|
Hyojun Kim, Moonkyung Ryu, Umakishore Ramachandran |
|
Pages: 235-246 |
|
doi>10.1145/2254756.2254786 |
|
Full text: PDF |
|
Smartphones are becoming ubiquitous and powerful. The Achilles' heel in such devices that limits performance is the storage. Low-end flash memory is the storage technology of choice in such devices due to energy, size, and cost considerations. In this ...
Smartphones are becoming ubiquitous and powerful. The Achilles' heel in such devices that limits performance is the storage. Low-end flash memory is the storage technology of choice in such devices due to energy, size, and cost considerations. In this paper, we take a critical look at the performance of flash on smartphones for mobile applications. Specifically, we ask the question whether the state-of-the-art buffer cache replacement schemes proposed thus far (both flash-agnostic and flash-aware ones) are the right ones for mobile flash storage. To answer this question, we first expose the limitations of current buffer cache performance evaluation methods, and propose a novel evaluation framework that is a hybrid between trace-driven simulation and real implementation of such schemes inside an operating system. Such an evaluation reveals some unexpected and surprising insights on the performance of buffer management schemes that contradicts conventional wisdom. Armed with this knowledge, we propose a new buffer cache replacement scheme called SpatialClock.
Using our evaluation framework, we show the superior performance of SpatialClock relative to the state-of-the-art for mobile flash storage. expand |
|
Versatile refresh: low complexity refresh scheduling for high-throughput multi-banked eDRAM |
|
Mohammad Alizadeh, Adel Javanmard, Shang-Tse Chuang, Sundar Iyer, Yi Lu |
|
Pages: 247-258 |
|
doi>10.1145/2254756.2254787 |
|
Full text: PDF |
|
Multi-banked embedded DRAM (eDRAM) has become increasingly popular in high-performance systems. However, the data retention problem of eDRAM is exacerbated by the larger number of banks and the high-performance environment in which it is deployed: The ...
Multi-banked embedded DRAM (eDRAM) has become increasingly popular in high-performance systems. However, the data retention problem of eDRAM is exacerbated by the larger number of banks and the high-performance environment in which it is deployed: The data retention time of each memory cell decreases while the number of cells to be refreshed increases. For this, multi-bank designs offer a concurrent refresh mode, where idle banks can be refreshed concurrently during read and write operations. However, conventional techniques such as periodically scheduling refreshes---with priority given to refreshes in case of conflicts with reads or writes---have variable performance, increase read latency, and can perform poorly in worst case memory access patterns. We propose a novel refresh scheduling algorithm that is low-complexity, produces near-optimal throughput with universal guarantees, and is tolerant to bursty memory access patterns. The central idea is to decouple the scheduler into two simple-to-implement modules: one determines which cell to refresh next and the other determines when to force an idle cycle in all banks. We derive necessary and sufficient conditions to guarantee data integrity for all access patterns, with any given number of banks, rows per bank, read/write ports and data retention time. Our analysis shows that there is a tradeoff between refresh overhead and burst tolerance and characterizes this tradeoff precisely. The algorithm is shown to be near-optimal and achieves, for instance, 76.6% reduction in worst-case refresh overhead from the periodic refresh algorithm for a 250MHz eDRAM with 10us retention time and 16 banks each with 128 rows. Simulations with Apex-Map synthetic benchmarks and switch lookup table traffic show that VR can almost completely hide the refresh overhead for memory accesses with moderate-to-high multiplexing across memory banks. expand |
|
SESSION: System performance |
|
|
|
Does lean imply green?: a study of the power performance implications of Java runtime bloat |
|
Suparna Bhattacharya, Karthick Rajamani, K. Gopinath, Manish Gupta |
|
Pages: 259-270 |
|
doi>10.1145/2254756.2254789 |
|
Full text: PDF |
|
The presence of software bloat in large flexible software systems can hurt energy efficiency. However, identifying and mitigating bloat is fairly effort intensive. To enable such efforts to be directed where there is a substantial potential for energy ...
The presence of software bloat in large flexible software systems can hurt energy efficiency. However, identifying and mitigating bloat is fairly effort intensive. To enable such efforts to be directed where there is a substantial potential for energy savings, we investigate the impact of bloat on power consumption under different situations. We conduct the first systematic experimental study of the joint power-performance implications of bloat across a range of hardware and software configurations on modern server platforms. The study employs controlled experiments to expose different effects of a common type of Java runtime bloat, excess temporary objects, in the context of the SPECPower_ssj2008 workload. We introduce the notion of equi-performance power reduction to characterize the impact, in addition to peak power comparisons. The results show a wide variation in energy savings from bloat reduction across these configurations. Energy efficiency benefits at peak performance tend to be most pronounced when bloat affects a performance bottleneck and non-bloated resources have low energy-proportionality. Equi-performance power savings are highest when bloated resources have a high degree of energy proportionality.
We develop an analytical model that establishes a general relation between resource pressure caused by bloat and its energy efficiency impact under different conditions of resource bottlenecks and energy proportionality. Applying the model to different "what-if" scenarios, we predict the impact of bloat reduction and corroborate these predictions with empirical observations. Our work shows that the prevalent software-only view of bloat is inadequate for assessing its power-performance impact and instead provides a full systems approach for reasoning about its implications. expand |
|
D-factor: a quantitative model of application slow-down in multi-resource shared systems |
|
Seung-Hwan Lim, Jae-Seok Huh, Youngjae Kim, Galen M. Shipman, Chita R. Das |
|
Pages: 271-282 |
|
doi>10.1145/2254756.2254790 |
|
Full text: PDF |
|
Scheduling multiple jobs onto a platform enhances system utilization by sharing resources. The benefits from higher resource utilization include reduced cost to construct, operate, and maintain a system, which often include energy consumption. Maximizing ...
Scheduling multiple jobs onto a platform enhances system utilization by sharing resources. The benefits from higher resource utilization include reduced cost to construct, operate, and maintain a system, which often include energy consumption. Maximizing these benefits, while satisfying performance limits, comes at a price -- resource contention among jobs increases job completion time. In this paper, we analyze slow-downs of jobs due to contention for multiple resources in a system; referred to as dilation factor. We observe that multiple-resource contention creates non-linear dilation factors of jobs. From this observation, we establish a general quantitative model for dilation factors of jobs in multi-resource systems. A job is characterized by a vector-valued loading statistics and dilation factors of a job set are given by a quadratic function of their loading vectors. We demonstrate how to systematically characterize a job, maintain the data structure to calculate the dilation factor (loading matrix), and calculate the dilation factor of each job. We validated the accuracy of the model with multiple processes running on a native Linux server, virtualized servers, and with multiple MapReduce workloads co-scheduled in a cluster. Evaluation with measured data shows that the D-factor model has an error margin of less than 16%. We also show that the model can be integrated with an existing on-line scheduler to minimize the makespan of workloads. expand |
|
ADP: automated diagnosis of performance pathologies using hardware events |
|
Wucherl Yoo, Kevin Larson, Lee Baugh, Sangkyum Kim, Roy H. Campbell |
|
Pages: 283-294 |
|
doi>10.1145/2254756.2254791 |
|
Full text: PDF |
|
Performance characterization of applications' hardware behavior is essential for making the best use of available hardware resources. Modern architectures offer access to many hardware events that are capable of providing information to reveal architectural ...
Performance characterization of applications' hardware behavior is essential for making the best use of available hardware resources. Modern architectures offer access to many hardware events that are capable of providing information to reveal architectural performance bottlenecks throughout the core and memory hierarchy. These events can provide programmers with unique and powerful insights into the causes of the resource bottlenecks in their applications. However, interpreting these events has been a significant challenge. We present an automated system that uses machine learning to identify an application's performance problems. Our system provides programmers with insights about the performance of their applications while shielding them from the onerous task of digesting hardware events. It uses a decision tree algorithm, random forests on our micro-benchmarks to fingerprint the performance problems. Our system divides a profiled application into functions and automatically classifies each function by the dominant hardware resource bottlenecks. Using the classifications from the hotspot functions, we were able to achieve an average speedup of 1.73 from three applications in the PARSEC benchmark suite. Our system provides programmers with a guideline of where, what, and how to fix the detected performance problems in applications, which would have otherwise required considerable architectural knowledge. expand |
|
Providing fairness on shared-memory multiprocessors via process scheduling |
|
Di Xu, Chenggang Wu, Pen-Chung Yew, Jianjun Li, Zhenjiang Wang |
|
Pages: 295-306 |
|
doi>10.1145/2254756.2254792 |
|
Full text: PDF |
|
Competition for shared memory resources on multiprocessors is the most dominant cause for slowing down applications and makes their performance varies unpredictably. It exacerbates the need for Quality of Service (QoS) on such systems. In this paper, ...
Competition for shared memory resources on multiprocessors is the most dominant cause for slowing down applications and makes their performance varies unpredictably. It exacerbates the need for Quality of Service (QoS) on such systems. In this paper, we propose a fair-progress process scheduling (FPS) policy to improve system fairness. Its strategy is to force the equally-weighted applications to have the same amount of slowdown when they run concurrently. The basic approach is to monitor the progress of all applications at runtime. When we find an application suffered more slowdown and accumulated less effective work than others, we allocate more CPU time to give it a better parity. Our policy also allows different weights to different threads, and provides an effective and robust tuner that allows the OS to freely make tradeoffs between system fairness and higher throughput. Evaluation results show that FPS can significantly improve system fairness by an average of 53.5% and 65.0% on a 4-core processor with a private cache and a 4-core processor with a shared cache, respectively. The penalty is about 1.1% and 1.6% of the system throughput. For memory-intensive workloads, FPS also improves system fairness by an average of 45.2% and 21.1% on 4-core and 8-core system respectively at the expense of a throughput loss of about 2%. expand |
|
SESSION: Graphs |
|
|
|
Characterizing continuous time random walks on time varying graphs |
|
Daniel Figueiredo, Philippe Nain, Bruno Ribeiro, Edmundo de Souza e Silva, Don Towsley |
|
Pages: 307-318 |
|
doi>10.1145/2254756.2254794 |
|
Full text: PDF |
|
In this paper we study the behavior of a continuous time random walk (CTRW) on a stationary and ergodic time varying dynamic graph. We establish conditions under which the CTRW is a stationary and ergodic process. In general, the stationary distribution ...
In this paper we study the behavior of a continuous time random walk (CTRW) on a stationary and ergodic time varying dynamic graph. We establish conditions under which the CTRW is a stationary and ergodic process. In general, the stationary distribution of the walker depends on the walker rate and is difficult to characterize. However, we characterize the stationary distribution in the following cases: i) the walker rate is significantly larger or smaller than the rate in which the graph changes (time-scale separation), ii) the walker rate is proportional to the degree of the node that it resides on (coupled dynamics), and iii) the degrees of node belonging to the same connected component are identical (structural constraints). We provide examples that illustrate our theoretical findings. expand |
|
Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling |
|
Chul-Ho Lee, Xin Xu, Do Young Eun |
|
Pages: 319-330 |
|
doi>10.1145/2254756.2254795 |
|
Full text: PDF |
|
Graph sampling via crawling has been actively considered as a generic and important tool for collecting uniform node samples so as to consistently estimate and uncover various characteristics of complex networks. The so-called simple random walk with ...
Graph sampling via crawling has been actively considered as a generic and important tool for collecting uniform node samples so as to consistently estimate and uncover various characteristics of complex networks. The so-called simple random walk with re-weighting (SRW-rw) and Metropolis-Hastings (MH) algorithm have been popular in the literature for such unbiased graph sampling. However, an unavoidable downside of their core random walks -- slow diffusion over the space, can cause poor estimation accuracy. In this paper, we propose non-backtracking random walk with re-weighting (NBRW-rw) and MH algorithm with delayed acceptance (MHDA) which are theoretically guaranteed to achieve, at almost no additional cost, not only unbiased graph sampling but also higher efficiency (smaller asymptotic variance of the resulting unbiased estimators) than the SRW-rw and the MH algorithm, respectively. In particular, a remarkable feature of the MHDA is its applicability for any non-uniform node sampling like the MH algorithm, but ensuring better sampling efficiency than the MH algorithm. We also provide simulation results to confirm our theoretical findings. expand |
|
Clustered embedding of massive social networks |
|
Han Hee Song, Berkant Savas, Tae Won Cho, Vacha Dave, Zhengdong Lu, Inderjit S. Dhillon, Yin Zhang, Lili Qiu |
|
Pages: 331-342 |
|
doi>10.1145/2254756.2254796 |
|
Full text: PDF |
|
The explosive growth of social networks has created numerous exciting research opportunities. A central concept in the analysis of social networks is a proximity measure, which captures the closeness or similarity between nodes in the network. Despite ...
The explosive growth of social networks has created numerous exciting research opportunities. A central concept in the analysis of social networks is a proximity measure, which captures the closeness or similarity between nodes in the network. Despite much research on proximity measures, there is a lack of techniques to efficiently and accurately compute proximity measures for large-scale social networks. In this paper, we embed the original massive social graph into a much smaller graph, using a novel dimensionality reduction technique termed Clustered Spectral Graph Embedding. We show that the embedded graph captures the essential clustering and spectral structure of the original graph and allow a wide range of analysis to be performed on massive social graphs. Applying the clustered embedding to proximity measurement of social networks, we develop accurate, scalable, and flexible solutions to three important social network analysis tasks: proximity estimation, missing link inference, and link prediction. We demonstrate the effectiveness of our solutions to the tasks in the context of large real-world social network datasets: Flickr, LiveJournal, and MySpace with up to 2 million nodes and 90 million links. expand |
|
SESSION: Sampling and ranking |
|
|
|
Don't let the negatives bring you down: sampling from streams of signed updates |
|
Edith Cohen, Graham Cormode, Nick Duffield |
|
Pages: 343-354 |
|
doi>10.1145/2254756.2254798 |
|
Full text: PDF |
|
Random sampling has been proven time and time again to be a powerful tool for working with large data. Queries over the full dataset are replaced by approximate queries over the smaller (and hence easier to store and manipulate) sample. The sample constitutes ...
Random sampling has been proven time and time again to be a powerful tool for working with large data. Queries over the full dataset are replaced by approximate queries over the smaller (and hence easier to store and manipulate) sample. The sample constitutes a flexible summary that supports a wide class of queries. But in many applications, datasets are modified with time, and it is desirable to update samples without requiring access to the full underlying datasets. In this paper, we introduce and analyze novel techniques for sampling over dynamic data, modeled as a stream of modifications to weights associated with each key.
While sampling schemes designed for stream applications can often readily accommodate positive updates to the dataset, much less is known for the case of negative updates, where weights are reduced or items deleted altogether. We primarily consider the turnstile model of streams, and extend classic schemes to incorporate negative updates. Perhaps surprisingly, the modifications to handle negative updates turn out to be natural and seamless extensions of the well-known positive update-only algorithms. We show that they produce unbiased estimators, and we relate their performance to the behavior of corresponding algorithms on insert-only streams with different parameters. A careful analysis is necessitated, in order to account for the fact that sampling choices for one key now depend on the choices made for other keys.
In practice, our solutions turn out to be efficient and accurate. Compared to recent algorithms for Lp sampling which can be applied to this problem, they are significantly more reliable, and dramatically faster. expand |
|
Efficient rank aggregation using partial data |
|
Ammar Ammar, Devavrat Shah |
|
Pages: 355-366 |
|
doi>10.1145/2254756.2254799 |
|
Full text: PDF |
|
The need to rank items based on user input arises in many practical applications such as elections, group decision making and recommendation systems. The primary challenge in such scenarios is to decide on a global ranking based on partial preferences ...
The need to rank items based on user input arises in many practical applications such as elections, group decision making and recommendation systems. The primary challenge in such scenarios is to decide on a global ranking based on partial preferences provided by users. The standard approach to address this challenge is to ask users to provide explicit numerical ratings (cardinal information) of a subset of the items. The main appeal of such an approach is the ease of aggregation. However, the rating scale as well as the individual ratings are often arbitrary and may not be consistent from one user to another. A more natural alternative to numerical ratings requires users to compare pairs of items (ordinal information). On the one hand, such comparisons provide an "absolute" indicator of the user's preference. On the other hand, it is often hard to combine or aggregate these comparisons to obtain a consistent global ranking.
In this work, we provide a tractable framework for utilizing comparison data as well as first-order marginal information (see Section 2) for the purpose of ranking. We treat the available information as partial samples from an unknown distribution over permutations. We then reduce ranking problems of interest to performing inference on this distribution. Specifically, we consider the problems of (a) finding an aggregate ranking of n items, (b) learning the mode of the distribution, and (c) identifying the top k items. For many of these problems, we provide efficient algorithms to infer the ranking directly from the data without the need to estimate the underlying distribution. In other cases, we use the Principle of Maximum Entropy to devise a concise parameterization of a distribution consistent with observations using only O(n2) parameters, where n is the number of items in question. We propose a distributed, iterative algorithm for estimating the parameters of the distribution. We establish the correctness of the algorithm and identify its rate of convergence explicitly. expand |
|
Fair sampling across network flow measurements |
|
Nick Duffield |
|
Pages: 367-378 |
|
doi>10.1145/2254756.2254800 |
|
Full text: PDF |
|
Sampling is crucial for controlling resource consumption by internet traffic flow measurements. Routers use Packet Sampled NetFlow, and completed flow records are sampled in the measurement infrastructure. Recent research, motivated by the need of service ...
Sampling is crucial for controlling resource consumption by internet traffic flow measurements. Routers use Packet Sampled NetFlow, and completed flow records are sampled in the measurement infrastructure. Recent research, motivated by the need of service providers to accurately measure both small and large traffic subpopulations, has focused on distributing a packet sampling budget amongst subpopulations. But long timescales of hardware development and lower bandwidth costs motivate post-measurement analysis of complete flow records at collectors instead. Sampling in collector databases then manages data volumes, yielding general purpose summaries that are rapidly queried to trigger drill-down analysis on a time limited window of full data. These are sufficiently small to be archived. This paper addresses the problem of distributing a sampling budget over subpopulations of flow records. Estimation accuracy goals are met by fairly sharing the budget. We establish a correspondence between the type of accuracy goal, and the flavor of fair sharing used. A streaming Max-Min Fair Sampling algorithm fairly shares the sampling budget across subpopulations, with sampling as a mechanism to deallocate budget. This provides timely samples and is robust against uncertainties in configuration and demand. We illustrate using flow records from an access router of a large ISP, where rates over interface traffic subpopulations vary over several orders of magnitude. We detail an implementation whose computational cost is no worse than subpopulation-oblivious sampling. expand |
|
POSTER SESSION: Poster session |
|
|
|
TCAM-based NFA implementation |
|
Kunyang Peng, Qunfeng Dong |
|
Pages: 379-380 |
|
doi>10.1145/2254756.2254802 |
|
Full text: PDF |
|
Regular expression matching as the core packet inspection engine of network systems has long been striving to be both fast in matching speed (like DFA) and scalable in storage space (like NFA). Recently, ternary content addressable memory (TCAM) has ...
Regular expression matching as the core packet inspection engine of network systems has long been striving to be both fast in matching speed (like DFA) and scalable in storage space (like NFA). Recently, ternary content addressable memory (TCAM) has been investigated as a promising way out, by implementing DFA using TCAM for regular express matching. In this paper, we present the first method for implementing NFA using TCAM. Through proper TCAM encoding, our method matches each input byte with one single TCAM lookup --- operating at precisely the same speed as DFA, while using a number of TCAM entries that can be close to NFA size. These properties make our method an important step along a new path --- TCAM-based NFA implementation --- towards the long-standing goal of fast and scalable regular expression matching. expand |
|
Stable and efficient pricing for inter-domain traffic forwarding |
|
Elliot Anshelevich, Ameya Hate, Koushik Kar, Michael Usher |
|
Pages: 381-382 |
|
doi>10.1145/2254756.2254803 |
|
Full text: PDF |
|
We address the question of strategic pricing of inter-domain traffic forwarding services provided by ISPs, which is also closely coupled with the question of how ISPs route their traffic towards their neighboring ISPs. Posing this question as a non-cooperative ...
We address the question of strategic pricing of inter-domain traffic forwarding services provided by ISPs, which is also closely coupled with the question of how ISPs route their traffic towards their neighboring ISPs. Posing this question as a non-cooperative game between neighboring ISPs, we study the properties of this pricing game in terms of the existence and efficiency of the equilibrium. We observe that for "well-provisioned" ISPs, Nash equilibrium prices exist and they result in flows that maximize the overall network utility (generalized end-to-end throughput). For general ISP topologies, equilibrium prices may not exist; however, simulations on a large number of realistic topologies show that best-response based simple price update solutions converge to stable and efficient prices and flows for most topologies. expand |
|
Measuring and characterizing home networks |
|
Lucas DiCioccio, Renata Teixeira, Catherine Rosenberg |
|
Pages: 383-384 |
|
doi>10.1145/2254756.2254804 |
|
Full text: PDF |
|
This paper presents the design and evaluation of HomeNet Profiler, a tool that runs on an end-system in the home to collect data from home networks. HomeNet Profiler collects a wide range of measurements including: the set of devices, the set of services ...
This paper presents the design and evaluation of HomeNet Profiler, a tool that runs on an end-system in the home to collect data from home networks. HomeNet Profiler collects a wide range of measurements including: the set of devices, the set of services (with UPnP and Zeroconf), and the characteristics of the WiFi environment. Since the release of HomeNet Profiler in April 2011, we have collected data from over 2,400 distinct homes in 46 different countries. expand |
|
Comparing metro-area cellular and WiFi performance: extended abstract |
|
Joel Sommers, Paul Barford |
|
Pages: 385-386 |
|
doi>10.1145/2254756.2254805 |
|
Full text: PDF |
|
Cellular and 802.11 WiFi offer two compelling connectivity options for mobile users. The goal of our work is to better understand performance characteristics of these technologies in diverse environments and conditions. To that end, we compare and contrast ...
Cellular and 802.11 WiFi offer two compelling connectivity options for mobile users. The goal of our work is to better understand performance characteristics of these technologies in diverse environments and conditions. To that end, we compare and contrast cellular and Wifi performance using crowd-sourced data from speedtest.net. We consider spatio-temporal performance aspects (e.g., upload and download throughput and latency) using over 3 million user-initiated tests initiated in 15 different metro areas, collected over 15 weeks. In these preliminary results, we find that WiFi performance generally exceeds cellular performance, and that observed characteristics are highly variable across different locations and times of day. We also observe diverse performance characteristics resulting from the rollout of new cell access technologies and service differences among local providers. expand |
|
Towards a statistical characterization of the competitiveness of oblivious routing |
|
Gábor Németh, Gábor Rétvári |
|
Pages: 387-388 |
|
doi>10.1145/2254756.2254806 |
|
Full text: PDF |
|
Oblivious routing asks for a static routing that serves arbitrary user demands with minimal performance penalty. Performance is measured in terms of the competitive ratio, the proportion of the maximum congestion to the best possible congestion. In this ...
Oblivious routing asks for a static routing that serves arbitrary user demands with minimal performance penalty. Performance is measured in terms of the competitive ratio, the proportion of the maximum congestion to the best possible congestion. In this paper, we take the first steps towards extending this worst-case characterization to a more revealing statistical one. We define new performance metrics and we present numerical evaluations showing that, in statistical terms, oblivious routing is not as competitive as the worst-case performance characterizations would suggest. expand |
|
Range tomography |
|
Sajjad Zarifzadeh, Madhwaraj G K, Constantine Dovrolis |
|
Pages: 389-390 |
|
doi>10.1145/2254756.2254807 |
|
Full text: PDF |
|
|
|
A scalable architecture for maintaining packet latency measurements |
|
Myungjin Lee, Nick Duffield, Ramana Rao Kompella |
|
Pages: 391-392 |
|
doi>10.1145/2254756.2254808 |
|
Full text: PDF |
|
Latency has become an important metric for network monitoring since the emergence of new latency-sensitive applications (e.g., algorithmic trading and high-performance computing). In this paper, to provide latency measurements at both finer (e.g., packet) ...
Latency has become an important metric for network monitoring since the emergence of new latency-sensitive applications (e.g., algorithmic trading and high-performance computing). In this paper, to provide latency measurements at both finer (e.g., packet) as well as flexible (e.g., flow subsets) levels of granularity, we propose an architecture called MAPLE that essentially stores packet-level latencies in routers and allows network operators to query the latency of arbitrary traffic sub-populations. MAPLE is built using a scalable data structure called SVBF with small storage needs. expand |
|
Modeling randomness in network traffic |
|
Markus Laner, Philipp Svoboda, Markus Rupp |
|
Pages: 393-394 |
|
doi>10.1145/2254756.2254809 |
|
Full text: PDF |
|
A continuous challenge in the field of network traffic modeling is to map recorded traffic onto parameters of random processes, in order to enable simulations of the respective traffic. A key element thereof is a convenient model which is simple, yet, ...
A continuous challenge in the field of network traffic modeling is to map recorded traffic onto parameters of random processes, in order to enable simulations of the respective traffic. A key element thereof is a convenient model which is simple, yet, captures the most relevant statistics.
This work aims to find such a model which, more precisely, enables the generation of multiple random processes with arbitrary but jointly characterized distributions, auto-correlation functions and cross-correlations. Hence, we present the definition of a novel class of models, the derivation of a respective closed-form analytical representation and its application on real network traffic.
Our modeling approach comprises: (i) generating statistical dependent Gaussian random processes, (ii) introducing auto-correlation to each process with a linear filter and, (iii) transforming them sample-wise by real-valued polynomial functions in order to shape their distributions. This particular structure allows to split the parameter fitting problem into three independent parts, each of which solvable by standard methods. Therefore, it is simple and straightforward to fit the model to measurement data. expand |
|
Performance evaluation of the random replacement policy for networks of caches |
|
Massimo Gallo, Bruno Kauffmann, Luca Muscariello, Alain Simonian, Christian Tanguy |
|
Pages: 395-396 |
|
doi>10.1145/2254756.2254810 |
|
Full text: PDF |
|
Caching is a key component for Content Distribution Networks and new Information-Centric Network architectures. In this paper, we address performance issues of caching networks running the RND replacement policy. We first prove that when the popularity ...
Caching is a key component for Content Distribution Networks and new Information-Centric Network architectures. In this paper, we address performance issues of caching networks running the RND replacement policy. We first prove that when the popularity distribution follows a general power-law with decay exponent α > 1, the miss probability is asymptotic to O( C1-α) for large cache size C. We further evaluate network of caches under RND policy for homogeneous tree networks and extend the analysis to tandem cache networks where caches employ either LRU or RND policies. expand |
|
Saving on cooling: the thermal scheduling problem |
|
Koyel Mukherjee, Samir Khuller, Amol Deshpande |
|
Pages: 397-398 |
|
doi>10.1145/2254756.2254811 |
|
Full text: PDF |
|
|
|
Congestion control meets medium access: throughput, delay, and complexity |
|
Shreeshankar Bodas, Devavrat Shah, Damon Wischik |
|
Pages: 399-400 |
|
doi>10.1145/2254756.2254812 |
|
Full text: PDF |
|
This paper looks at the problem of designing medium access algorithm for wireless networks with the objective of providing high throughput and low delay performance to the users, while requiring only a modest computational effort at the transmitters ...
This paper looks at the problem of designing medium access algorithm for wireless networks with the objective of providing high throughput and low delay performance to the users, while requiring only a modest computational effort at the transmitters and receivers. Additive inter-user interference at the receivers is an important physical layer characteristic of wireless networks. Today's Wi-Fi networks are based upon the abstraction of physical layer where inter-user interference is considered as noise leading to the 'collision' model in which users are required to co-ordinate their transmissions through Carrier Sensing Multiple Access (CSMA)-based schemes to avoid interference. This, in turn, leads to an inherent performance trade-off [1]: it is impossible to obtain high throughput and low delay by means of low complexity medium access algorithm (unless P=NP). As the main result, we establish that this trade-off is primarily due to treating interference as noise in the current wireless architecture. Concretely, we develop a simple medium access algorithm that allows for simultaneous transmissions of users to the same receiver by performing joint decoding at receivers, over time. For a receiver to be able to decode multiple transmissions quickly enough, we develop appropriate congestion control where each transmitter maintains a "window" of undecoded transmitted data that is adjusted based upon the "feedback" from the receiver. In summary, this provides an efficient, low complexity "online" code operating at varying rate, and the system as a whole experiences only small amount of delay (including decoding time) while operating at high throughput. expand |
|
Optimized cloud placement of virtual clusters using biased importance sampling |
|
Asser N. Tantawi |
|
Pages: 401-402 |
|
doi>10.1145/2254756.2254813 |
|
Full text: PDF |
|
We introduce an algorithm for the placement of constrained, networked virtual clusters in the cloud, that is based on importance sampling (also known as cross-entropy). Rather than using a straightforward implementation of such a technique, which proved ...
We introduce an algorithm for the placement of constrained, networked virtual clusters in the cloud, that is based on importance sampling (also known as cross-entropy). Rather than using a straightforward implementation of such a technique, which proved inefficient, we considerably enhance the method by biasing the sampling process to incorporate communication needs and other constraints of placement requests to yield an efficient algorithm that is linear in the size of the cloud. We investigate the quality of the results of using our algorithm on a simulated cloud. expand |
|
Power and energy containers for multicore servers |
|
Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao Zhang |
|
Pages: 403-404 |
|
doi>10.1145/2254756.2254814 |
|
Full text: PDF |
|
Power capping and energy efficiency are critical concerns in server systems, particularly when serving dynamic workloads on resource-sharing multicores. We present a new operating system facility (power and energy containers) that accounts for and controls ...
Power capping and energy efficiency are critical concerns in server systems, particularly when serving dynamic workloads on resource-sharing multicores. We present a new operating system facility (power and energy containers) that accounts for and controls the power/energy usage of individual fine-grained server requests. This facility is enabled by novel techniques for multicore power attribution to concurrent tasks, measurement/modeling alignment to enhance predictability, and request power accounting and control. expand |
|
Characterizing the impact of the workload on the value of dynamic resizing in data centers |
|
Kai Wang, Minghong Lin, Florin Ciucu, Adam Wierman, Chuang Lin |
|
Pages: 405-406 |
|
doi>10.1145/2254756.2254815 |
|
Full text: PDF |
|
Energy consumption imposes a significant cost for data centers; yet much of that energy is used to maintain excess service capacity during periods of predictably low load. Resultantly, there has recently been interest in developing designs that allow ...
Energy consumption imposes a significant cost for data centers; yet much of that energy is used to maintain excess service capacity during periods of predictably low load. Resultantly, there has recently been interest in developing designs that allow the service capacity to be dynamically resized to match the current workload. However, there is still much debate about the value of such approaches in real settings. In this paper, we show that the value of dynamic resizing is highly dependent on statistics of the workload process. In particular, both slow time-scale non-stationarities of the workload (e.g., the peak-to-mean ratio) and the fast time-scale stochasticity (e.g., the burstiness of arrivals) play key roles. To illustrate the impact of these factors, we combine optimization-based modeling of the slow time-scale with stochastic modeling of the fast time scale. Within this framework, we provide both analytic and numerical results characterizing when dynamic resizing does (and does not) provide benefits. expand |
|
Provisioning for large scale cloud computing services |
|
Yue Tan, Yingdong Lu, Cathy H. Xia |
|
Pages: 407-408 |
|
doi>10.1145/2254756.2254816 |
|
Full text: PDF |
|
Resource provisioning, the task of planning sufficient amounts of resources to meet service level agreements, has become an important management task in emerging cloud computing services. In this paper, we present a stochastic modeling approach to guide ...
Resource provisioning, the task of planning sufficient amounts of resources to meet service level agreements, has become an important management task in emerging cloud computing services. In this paper, we present a stochastic modeling approach to guide the resource provisioning task for future service clouds as the demand grows large. We focus on on-demand services and consider service availability as the key quality of service constraint. A specific scenario under consideration is when resources can be measured in base instances. We develop an asymptotic provisioning methodology that utilizes tight performance bounds for the Erlang loss system to determine the minimum capacity levels that meet the service availability requirements. We show that our provisioning solutions are not only asymptotically exact but also provide better QoS guarantees at all load conditions. expand |
|
Distributed wide-area traffic management for cloud services |
|
Srinivas Narayana, Joe Wenjie Jiang, Jennifer Rexford, Mung Chiang |
|
Pages: 409-410 |
|
doi>10.1145/2254756.2254817 |
|
Full text: PDF |
|
The performance of interactive cloud services depends heavily on which data centers handle client requests, and which wide-area paths carry traffic. While making these decisions, cloud service providers also need to weigh operational considerations like ...
The performance of interactive cloud services depends heavily on which data centers handle client requests, and which wide-area paths carry traffic. While making these decisions, cloud service providers also need to weigh operational considerations like electricity and bandwidth costs, and balancing server loads across replicas. We argue that selecting data centers and network routes independently, as is common in today's services, can lead to much lower performance or higher costs than a coordinated decision. However, fine-grained joint control of two large distributed systems---e.g., DNS-based replica-mapping and data center multi-homed routing---can be administratively challenging. In this paper, we introduce the design of a system that jointly optimizes replica-mapping and multi-homed routing, while retaining the functional separation that exists between them today. We show how to construct a provably optimal distributed solution implemented through local computations and message exchanges between the mapping and routing systems. expand |
|
On the efficacy of fine-grained traffic splitting protocols in data center networks |
|
Advait Abhay Dixit, Pawan Prakash, Ramana Rao Kompella, Charlie Hu |
|
Pages: 411-412 |
|
doi>10.1145/2254756.2254818 |
|
Full text: PDF |
|
Current multipath routing techniques split traffic at a per-flow level because, according to conventional wisdom, forwarding packets of a TCP flow along different paths leads to packet reordering which is detrimental to TCP. In this paper, we revisit ...
Current multipath routing techniques split traffic at a per-flow level because, according to conventional wisdom, forwarding packets of a TCP flow along different paths leads to packet reordering which is detrimental to TCP. In this paper, we revisit this "myth" in the context of cloud data center networks which have regular topologies such as multi-rooted trees. We argue that due to the symmetry in the multiple equal-cost paths in such networks, simply spraying packets of a given flow among all equal-cost paths, leads to balanced queues across multiple paths, and consequently little packet reordering. Using a testbed comprising of NetFPGA switches, we show how cloud applications benefit from better network utilization in data centers. expand |
|
Content-aware traffic engineering |
|
Benjamin Frank, Ingmar Poese, Georgios Smaragdakis, Steve Uhlig, Anja Feldmann |
|
Pages: 413-414 |
|
doi>10.1145/2254756.2254819 |
|
Full text: PDF |
|
Recent studies show that a large fraction of Internet traffic is originated by Content Providers (CPs) such as content distribution networks and hyper-giants. To cope with the increasing demand for content, CPs deploy massively distributed server infrastructures. ...
Recent studies show that a large fraction of Internet traffic is originated by Content Providers (CPs) such as content distribution networks and hyper-giants. To cope with the increasing demand for content, CPs deploy massively distributed server infrastructures. Thus, content is available in many network locations and can be downloaded by traversing different paths in a network. Despite the prominent server location and path diversity, the decisions on how to map users to servers by CPs and how to perform traffic engineering by ISPs, are independent. This leads to a lose-lose situation as CPs are not aware about the network bottlenecks nor the location of end-users, and the ISPs struggle to cope with rapid traffic shifts caused by the dynamic CP server selection process.
In this paper we propose and evaluate Content-aware Traffic Engineering (CaTE), which dynamically adapts the traffic demand for content hosted on CPs by utilizing ISP network information and end-user location during the server selection process. This leads to a win-win situation because CPs are able to enhance their end-user to server mapping and ISPs gain the ability to partially influence the traffic demands in their networks. Indeed, our results using traces from a Tier-1 ISP show that a number of network metrics can be improved when utilizing CaTE. expand |
|
Understanding performance anomalies of SSDs and their impact in enterprise application environment |
|
Jian Hu, Hong Jiang, Prakash Manden |
|
Pages: 415-416 |
|
doi>10.1145/2254756.2254820 |
|
Full text: PDF |
|
SSD is known to have the erase-before-write and out-of-place update properties. When the number of invalidated pages is more than a given threshold, a process referred to as garbage collection (GC) is triggered to erase blocks after valid pages in these ...
SSD is known to have the erase-before-write and out-of-place update properties. When the number of invalidated pages is more than a given threshold, a process referred to as garbage collection (GC) is triggered to erase blocks after valid pages in these blocks are copied somewhere else. GC degrades both the performance and lifetime of SSD significantly because of the read-write-erase operation sequence.
In this paper, we conduct intensive experiments on a 120GB Intel 320 SATA SSD and a 320GB Fusion IO ioDrive PCI-E SSD to show and analyze the following important performance issues and anomalies.
The commonly accepted knowledge that the performance drops sharply as more data is being written is not always true. This is because GC efficiency, a more important factor affecting SSD performance, has not been carefully considered. It is defined as the percentage of invalid pages of a GC erased block. It is possible to avoid the performance degradation by managing the addressable LBA range.
Estimating the residual lifetime of an SSD is a very challenging problem because it involves several interdependent and mutually interacting factors such as FTL, GC, wear leveling, workload characteristics, etc. We develop an analytical model to estimate the residual lifetime of a given SSD.
The high random-read performance is widely accepted as one of the advantages of SSD. We will show that this is not true if the GC efficiency is low. expand |
|
Classifying internet one-way traffic |
|
Eduard Glatz, Xenofontas Dimitropoulos |
|
Pages: 417-418 |
|
doi>10.1145/2254756.2254821 |
|
Full text: PDF |
|
In this work we analyze a massive data-set that captures 5.23 petabytes of traffic to shed light into the composition of one-way traffic towards a large network based on a novel one-way traffic classifier. We find that one-way traffic makes a very large ...
In this work we analyze a massive data-set that captures 5.23 petabytes of traffic to shed light into the composition of one-way traffic towards a large network based on a novel one-way traffic classifier. We find that one-way traffic makes a very large fraction of all traffic in terms of flows, it can be primarily attributed to malicious causes, and it has declined since 2004 because of relative decrease of scan traffic. In addition, we show how our classifier is useful for detecting network outages. expand |
|
Fast cost efficient designs by building upon the plackett and burman method |
|
Manish Arora, Feng Wang, Bob Rychlik, Dean Tullsen |
|
Pages: 419-420 |
|
doi>10.1145/2254756.2254822 |
|
Full text: PDF |
|
CPU processor design involves a large set of increasingly complex design decisions, and simulating all possible designs is typically not feasible. Sensitivity analysis, a commonly used technique, can be dependent on the starting point of the design and ...
CPU processor design involves a large set of increasingly complex design decisions, and simulating all possible designs is typically not feasible. Sensitivity analysis, a commonly used technique, can be dependent on the starting point of the design and does not necessarily account for the cost of each parameter. This work proposes a method to simultaneously analyzes multiple parameters with a small number of experiments by leveraging the Plackett and Burman (P&B) analysis method. It builds upon the technique in two specific ways. It allows a parameter to take multiple values and replaces the unit-less impact factor with cost-proportional values. expand |
|
Multi-hop network tomography: path reconstruction and per-hop arrival time estimation from partial information |
|
Matthias Keller, Jan Beutel, Lothar Thiele |
|
Pages: 421-422 |
|
doi>10.1145/2254756.2254823 |
|
Full text: PDF |
|
In the context of low-power wireless sensor networks, this paper presents multi-hop network tomography (MNT), a novel, non-intrusive algorithm for reconstructing the path, the per-hop arrival order, and the per-hop arrival time of individual packets ...
In the context of low-power wireless sensor networks, this paper presents multi-hop network tomography (MNT), a novel, non-intrusive algorithm for reconstructing the path, the per-hop arrival order, and the per-hop arrival time of individual packets at runtime. While explicitly transmitting this information over the radio would negatively impact the performance of the system under investigation, information is instead reconstructed after packets have been received at the sink. expand |
|
Smartphones vs. laptops: comparing web browsing behavior and the implications for caching |
|
Ioannis Papapanagiotou, Erich M. Nahum, Vasileios Pappas |
|
Pages: 423-424 |
|
doi>10.1145/2254756.2254824 |
|
Full text: PDF |
|
In this work we present the differences and similarities of the web browsing behavior in most common mobile platforms. We devise a novel Operating System (OS) fingerprinting methodology to distinguish different types of wireless devices (smartphone vs ...
In this work we present the differences and similarities of the web browsing behavior in most common mobile platforms. We devise a novel Operating System (OS) fingerprinting methodology to distinguish different types of wireless devices (smartphone vs laptops) as well as operating system instances (iOS, Android, BlackBerry etc.). We showcase that most of the multimedia content in smartphone devices is delivered via Range-Requests, and a large portion of the video transfers are aborted. We also show that laptop devices have more intelligent browser caching capabilities. We investigate the impact of an additional browser cache, and demonstrate that a 10MB browser cache that is able to handle partial downloads in smartphones would be enough to handle the majority of the savings. Finally, we showcase that caching policies need to be amended to attain the maximum possible savings in proxy caches. Based on those optimizations the emulated proxy cache provides 10%-20% in bandwidth savings. expand |
|
TUTORIAL SESSION: Tutorials |
|
|
|
Micro and macro views of discrete-state markov models and their application to efficient simulation with phase-type distributions |
|
Philipp Reinecke, Miklós Telek, Katinka Wolter |
|
Pages: 425-426 |
|
doi>10.1145/2254756.2254826 |
|
Full text: PDF |
|
|
|
POTRA: a framework for building power models for next generation multicore architectures |
|
Ramon Bertran, Marc Gonzàlez, Xavier Martorell, Nacho Navarro, Eduard Ayguadé |
|
Pages: 427-428 |
|
doi>10.1145/2254756.2254827 |
|
Full text: PDF |
|
|
|
Basic theory and some applications of martingales |
|
Richard A. Hayden |
|
Pages: 429-430 |
|
doi>10.1145/2254756.2254828 |
|
Full text: PDF |
|
This tutorial surveys the fundamental results of the theory of martingales from the perspective of the performance engineer. We will present the fundamental results and illustrate their power through simple and elegant proofs of important and well-known ...
This tutorial surveys the fundamental results of the theory of martingales from the perspective of the performance engineer. We will present the fundamental results and illustrate their power through simple and elegant proofs of important and well-known results in performance analysis. The remainder of the tutorial will introduce the martingale functional central limit theorem and semi-martingale decomposition methodology for the characterization and proof of heavy-traffic limit results for Markovian queueing systems. expand |
|
Applications of machine learning to performance evaluation |
|
Edmundo de Souza e Silva, Daniel Sadoc Menasche |
|
Pages: 431-432 |
|
doi>10.1145/2254756.2254829 |
|
Full text: PDF |
|
|
|
Introduction to network experiments using the GENI cyberinfrastructure |
|
Jay Aikat, Kevin Jeffay |
|
Pages: 433-434 |
|
doi>10.1145/2254756.2254830 |
|
Full text: PDF |
|
In this tutorial, we will introduce the SIGMETRICS/Performance community to the vast testbeds, tools and resources openly available through the GENI (Global Environment for Network Innovations) project. We will present details about the distributed computing ...
In this tutorial, we will introduce the SIGMETRICS/Performance community to the vast testbeds, tools and resources openly available through the GENI (Global Environment for Network Innovations) project. We will present details about the distributed computing resources available on GENI for researchers interested in simulation as well as measurement-based performance evaluation experiments. We will demonstrate simple experiments on GENI, and leave them with information on how to run experiments for research and education using GENI resources. |