8th Parallel Data Storage Workshop

held in conjunction with

SC13

Chairs: ,

General Chairs: John Bent, EMC and Robert Ross, Argonne National Laboratory

Monday, November 18, 2013
Room 503, Colorado Convention Center
Denver, CO

SC13 Workshop Web Page

abstract / agenda / announcement: probe presentations
committees / call for probe proposals

The proceedings of the 8th Pdsw are now online in the
ACM DIgital Library

WORKSHOP ABSTRACT

Peta- and exascale computing infrastructures make unprecedented demands on storage capacity, performance, concurrency, reliability, availability, and manageability. This one-day workshop focuses on the data storage and management problems and emerging solutions found in peta- and exascale scientific computing environments, with special attention to issues in which community collaboration can be crucial for problem identification, workload capture, solution interoperability, standards with community buy-in, and shared tools. Addressing storage media ranging from tape, HDD, and SSD, to new media like NVRAM, the workshop seeks contributions on relevant topics, including but not limited to:

performance and benchmarking
failure tolerance problems and solutions
APIs for high performance features
parallel file systems
high bandwidth storage architectures
support for high velocity or complex data
metadata intensive workloads
autonomics for HPC storage
virtualization for storage systems
archival storage advances
resource management innovations
incorporation of emerging storage technologies

agenda

8:55am - 9:00am	Welcome – Dean Hildebrand, IBM and Karsten Schwan, Geogia Tech
9:00am – 10:00am	Keynote Speaker - Nisha Talagala, Fusion-io The All-Flash Datacenter Abstract & Speaker Bio \| Slides
10:00am - 10:30am	POSTER SESSION 1 - List of participants and links to posters
10:30am – 12:00pm	SESSION 1: Dealing the Cards Chair: Meghan McClelland, Xyratex
	Efficient Transactions for Parallel Data Movement Jay Lofstead, Sandia National Laboratories* Jai Dayal, Georgia Institute of Technology Ivo Jemenez, University of California, Santa Cruz Carlos Maltzahn, University of California, Santa Cruz Paper \| Slides
	Asynchronous Object Storage with QoS for Scientific and Commercial Big Data Michael J. Brim, ORNL David A. Dillow, ORNL Sarp Oral, ORNL Bradley W. Settlemyer, ORNL* Feiyi Wang, ORNL Paper \| Slides
	Performance and Scalability Evaluation of the Ceph Parallel File System Feiyi Wang, Oak Ridge National Laboratory* Mark Nelson, Ink Tank Inc Sarp Oral, Oak Ridge National Laboratory Scott Atchely, Oak Ridge National Laboratory Sage Weil, Ink Tank Inc. Brad Settlemyer, Oak Ridge National Laboratory Blake Caldwell, Oak Ridge National Laboratory Jason Hill, Oak Ridge National Laboratory Paper \| Slides
12:00pm - 1:30pm	Lunch (not provided)
1:30pm – 2:30pm	SESSION 2: Shuffling the Deck Chair: John Bent, EMC
	Structuring PLFS for Extensibility Chuck Cranor, Carnegie Mellon University Milo Polte, WibiData Garth Gibson, Carnegie Mellon University* Paper \| Slides
	SDS: A Framework for Scientific Data Services Bin Dong, Lawrence Berkeley National Laboratory, USA Surendra Byna, Lawrence Berkeley National Laboratory, USA Kesheng Wu, Lawrence Berkeley National Laboratory, USA Paper \| Slides
2:30pm – 3:00pm	Poster Presentations
3:00pm - 3:30pm	POSTER SESSION 2 - List of participants and links to posters
3:30pm – 5:00pm	SESSION 3: Playing with a Full Deck Chair: Carlos Maltzahn, UCSC
	Predicting Intermediate Storage Performance for Workflow Applications Lauro Beltrao Costa, University of British Columbia* Samer Al-Kiswany, University of British Columbia Abmar Barros, Universidade Federal de Campina Grande Hao Yang, University of British Columbia Matei Ripeanu, University of British Columbia Paper \| Slides
	Active Data: A Data-Centric Approach to Data Life-Cycle Management Anthony Simonet, INRIA/University of Lyon* Gilles Fedak, INRIA/University of Lyon Matei Ripeanu, University of British Columbia Samer Al-Kiswany, University of British Columbia Paper \| Slides
	Fourier-Assisted Machine Learning of Hard Disk Drive Access Time Models Adam Crume, University of California, Santa Cruz* Carlos Maltzahn, University of California, Santa Cruz Lee Ward, Sandia National Laboratories Thomas Kroeger, Sandia National Laboratories Matthew Curry, Sandia National Laboratories Ron Oldfield, Sandia National Laboratories Paper \| Slides
5:00pm - 5:30pm	Short Announcements

special note - probe presentations

There will be PRObE presentations, demo and posters in the NMC booth, #1732, on the exhibitor floor at SC13 on Tuesday and Wednesday, 1:30 - 2:15 pm MST

program COMMITTEE:

Ahmed Amer, Santa Clara University
John Bent, EMC (General Chair)
Randal Burns, Johns Hopkins University
Andreas Dilger, Intel
Fred Douglis, EMC
Garth Gibson, Carnegie Mellon University and Panasas Inc.
Dean Hildebrand, IBM (PC Chair)
Peter Honeyman, University of Michigan
Song Jiang, Wayne State University
Sanjay Kumar, Intel
Carlos Maltzahn, University of California, Santa Cruz
Meghan Wingate McClelland, Xyratex
Ron Oldfield, Sandia National Laboratories
Narasimha Reddy, Texas A&M University
Robert Ross, Argonne National Laboratory (General Chair)
Karsten Schwan, Georgia Tech (PC Chair)
Keith A. Smith, NetApp
Yuan Tian, Oak Ridge National Laboratory

STEERING COMMITTEE:

John Bent, EMC
Scott Brandt, University of California, Santa Cruz
Evan J. Felix, Pacific Northwest National Laboratory
Garth A. Gibson, Carnegie Mellon University and Panasas Inc.
Gary Grider, Los Alamos National Laboratory
Peter Honeyman, University of Michigan
Bill Kramer, National Center for Supercomputing Applications
University of Illinois Urbana-Champaign
Darrell Long, University of California, Santa Cruz
Carlos Maltzahn, University of California, Santa Cruz
Rob Ross, Argonne National Laboratory
Philip C. Roth, Oak Ridge National Laboratory
John Shalf, National Energy Research Scientific Computing Center,
Lawrence Berkeley National Laboratory
Lee Ward, Sandia National Laboratories

call for Proposals: Availability of 1000 Nodes for Systems Research Experiments

NSF's PRObE (www.nmc-probe.org) operates four clusters to support systems research at scale. The largest is Kodiak (https://www.nmc-probe.org/wiki/Machines:Kodiak), which is 1000 nodes (two core x86, 8GB DRAM, two 1TB disks, 1GE and 8Gbps IB) donated by Los Alamos National Laboratory.

Today Kodiak is hosting researchers from Georgia Tech, Carnegie Mellon and Los Alamos. Princeton researchers have published results from Kodiak at the most recent NSDI (Wyatt Lloyd, "Stronger Semantics for Low-Latency Geo-Replicated Storage", NSDI 2013). On PRObE staging clusters are researchers from U Central Florida, UT Austin, Georgia Tech and Carnegie Mellon.

PRObE resources are intended for (infrastructure) systems researchers committed to public release of their research results, typically publishing in distributed systems (eg. OSDI or SOSP), cloud computing (e.g. SOCC), supercomputing (e.g. SC or HPDC), storage (e.g. FAST), or networking (e.g. NSDI).

PRObE resources are managed by Emulab (www.emulab.org) a cluster manager for allocating physical nodes that has been in use for systems research for over a decade (Brian White, "An Experimental Environment for Distributed Systems and Networks," OSDI 2002). Users start by porting and demonstrating their code on a 100-node staging cluster such as Denali built from the same equipment donation from Los Alamos. With demonstrated success on a staging cluster, and a compelling research goal, Kodiak can be requested and allocated, possibly exclusively, for hours to days.

To start using PRObE resources:
- visit www.nmc-probe.org to learn about the resources
- visit portal.nmc-probe.org to request a PRObE-specific Emulab account
- have a research leader or faculty member get an account and define a project on portal.nmc-probe.org
- use Portal to get onto Denali, to allocate a single node experiment, login into that node to customize and resave the OS image for your project, then launch a multi-node experiment to demonstrate your system at <100 scale
- use https://www.nmc-probe.org/request/ to request a large allocation on Kodiak (this is a HotCRP paper review web site, where your paper is a short justification for your research, your preparedness for using Kodiak, and your credientials and appropriateness for using NSF resources)
- PRObE managers will review, approve and schedule your use of large allocations of Kodiak time

In a matter of weeks another style of large PRObE resource will come online. Susitna is 34 nodes of 64 core x86 processors, for a total of more than 2000 x86 cores. Susitna also has NVidia donated K20 GPU coprocessors with 2496 cuda cores each, for a total of 84,864 cuda cores. With 128 GB DRAM, a hard disk and an SSD each, Susitna nodes are interconnected by 40Gbps ethernet, 40 Gbps infiniband and 1Gbps ethernet.

NSF PRObE resources will be available for at least the next two years.

All uses of PRObE resources are obligated to publish their results, either in conferences or one their web sites, and acknowledge NSF PRObE resources used in these publications.

See also our PRObE introduction article in the June 2013 USENIX ;login: vol 38, no 3, 2013 (www.cs.cmu.edu/~garth/papers/07_gibson_036-039_final.pdf).

pdsw.org

8th Parallel Data Storage Workshop

held in conjunction with

SC13

Chairs: ,

abstract / agenda / announcement: probe presentations
committees / call for probe proposals

The proceedings of the 8th Pdsw are now online in the
ACM DIgital Library

WORKSHOP ABSTRACT

agenda

special note - probe presentations

program COMMITTEE:

STEERING COMMITTEE:

call for Proposals: Availability of 1000 Nodes for Systems Research Experiments

pdsw '23

past pdsw events

past discs events

pdsw.org

8th Parallel Data Storage Workshop

held in conjunction with

SC13

Chairs: emailE=('karsten.schwan@' + 'cc.gatech.edu') name=('Karsten Schwan, Georgia Tech') document.write( '<A href="mailto:' + emailE + '">' + name + '</a>' ), emailE=('dhildeb@' + 'us.ibm.com') name=('Dean Hildebrand, IBM') document.write( '<A href="mailto:' + emailE + '">' + name + '</a>' )

abstract / agenda / announcement: probe presentations committees / call for probe proposals

The proceedings of the 8th Pdsw are now online in the ACM DIgital Library

WORKSHOP ABSTRACT

agenda

special note - probe presentations

program COMMITTEE:

STEERING COMMITTEE:

call for Proposals: Availability of 1000 Nodes for Systems Research Experiments

pdsw '23

past pdsw events

past discs events

Chairs: ,

abstract / agenda / announcement: probe presentations
committees / call for probe proposals

The proceedings of the 8th Pdsw are now online in the
ACM DIgital Library