| Shark: SQL and Rich Analytics at Scale |
26 Nov 2012 |
15 pages |
| Authors:
Reynold Xin; Josh Rosen; Matei Zaharia; Michael J Franklin; Scott Shenker; Ion Stoica; CALIFORNIA UNIV BERKELEY DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
|
 | Shark is a new data analysis system that marries query processing with complex analytics on large clusters. It leverages a novel distributed memory abstraction to provide a unified engine that can run SQL queries and sophisticated analytics functions (e.g., iterative machine learning) at scale, and efficiently recovers from failures mid-query. This allows Shark to run SQL queries up to 100 faster than Apache Hive, and machine learning programs up to ... |
|
| A Million Cancer Genome Warehouse |
20 Nov 2012 |
62 pages |
| Authors:
David Haussler; David A Patterson; Mark Diekhans; Armando Fox; Michael Jordan; Anthony D Joseph; Singer Ma; Benedict Paten; Scott Shenker; Taylor Sittler; Ion Stoica; CALIFORNIA UNIV BERKELEY DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
|
 | This white paper discusses the motivation and issues surrounding the development of a repository and associated computational infrastructure to house and process a million genomes to help battle cancer, which we call the Million Cancer Genome Warehouse. It is proposed as an example of an information commons and a computing system that will bring about precision medicine, coupling established clinical pathological indexes with state-of-the-art molecular profiling to create diagnostic, prognostic, ... |
|
| Cake: Enabling High-level SLOs on Shared Storage Systems |
07 Nov 2012 |
18 pages |
| Authors:
Andrew Wang; Shivaram Venkataraman; Sara Alspaugh; Randy H Katz; Ion Stoica; CALIFORNIA UNIV BERKELEY DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
|
 | Cake is a coordinated, multi-resource scheduler for shared distributed storage environments with the goal of achieving both high throughput and bounded latency. Cake uses a two-level scheduling scheme to enforce high-level service-level objectives (SLOs). Firstlevel schedulers control consumption of resources such as disk and CPU. These schedulers (1) provide mechanisms for differentiated scheduling, (2) split large requests into smaller chunks, and (3) limit the number of outstanding device requests, which ... |
|
| Multi-Resource Fair Queueing for Packet Processing |
19 Jun 2012 |
35 pages |
| Authors:
Ali Ghodsi; Vyas Sekar; Matei Zaharia; Ion Stoica; CALIFORNIA UNIV BERKELEY DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
|
 | Middleboxes are ubiquitous in today's networks and perform a variety of important functions, including IDS, VPN, firewalling, and WAN optimization. These functions differ vastly in their requirements for hardware resources (e.g., CPU cycles and memory bandwidth). Thus, depending on the functions they go through different flows can consume different amounts of a middlebox's resources. While there is much literature on weighted fair sharing of link bandwidth to isolate flows it ... |
|
| Hypervisors as a Foothold for Personal Computer Security: An Agenda for the Research Community |
13 Jan 2012 |
8 pages |
| Authors:
Matei Zaharia; Sachin Katti; Chris Grier; Vern Paxson; Scott Shenker; Ion Stoica; Dawn Song; CALIFORNIA UNIV BERKELEY DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
|
 | The purpose of this paper is to propose the creation of a security-enhancing hypervisor for PCs as a collaborative agenda for the research community. This agenda is not necessarily about answering fundamentally new research questions. Rather, it is a call to action about a rare chance for the community to have substantial impact. If researchers demonstrate compelling near-term benefits from a modest security layer, then OS vendors may adopt such ... |
|
| Probabilistically Bounded Staleness for Practical Partial Quorums |
03 Jan 2012 |
15 pages |
| Authors:
Peter Bailis; Shivaram Venkataraman; Joseph M Hellerstein; Michael Franklin; Ion Stoica; CALIFORNIA UNIV BERKELEY DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
|
 | Modern storage systems employing quorum replication are often configured to use partial, non-strict quorums. These systems wait only for a subset of their replicas to respond to a request before returning an answer, without guaranteeing that read and write replica sets intersect. While these partial quorum mechanisms provide only basic eventual consistency guarantees, with no limit to the recency of data returned, these configurations are frequently good enough for practitioners ... |
|
| Exploring Congestion Control |
MAY 2002 |
24 pages |
| Authors:
Aditya Akella; Srinivasan Seshan; Scott Shenker; Ion Stoica; CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE
|
 | From the early days of modern congestion control, ushered in by the development of TCP's and DECbit's congestion control algorithm and by the pioneering theoretical analysis of Chiu and Jain, there has been widespread agreement that linear additive-increase-multiplicative-decrease (AIMD) control algorithms should be used. However, the early congestion control design decisions were made in a context where loss recovery was fairly primitive (e.g. TCP Reno) and often timed-out when more ... |
|