Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems

pdsw-DISCS 2017:

2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems

held in conjunction with SC17

Monday, November 13, 2017
Denver, CO

Program Co-Chairs:

Lawrence Livermore National Laboratory

Google

General Chair:

Google

abstract / agenda / keynote speaker / cfp / submissions / WIP session / committees

keynote speaker

PDSW-DISCS17 is proud to announce that Denis Serenyi, Google, will be our keynote speaker. He will be dicussing From GFS to Colossus: Cluster-Level Storage @ Google. Please see details here.

agenda

The proceedings of the 2nd PDSW-DISCS are now online in the ACM DIgital Library.

8:50am – 9:00am	Welcome & Introduction
9:00am – 10:00am	Keynote Speaker - Denis Serenyi, Google From GFS to Colossus: Cluster-Level Storage @ Google Slides
10:00am – 10:30am	Break
10:30am – 12:00pm	SESSION 1: Improving Storage System Performance Chair: Suren Byna, Lawrence Berkeley National Laboratory
	EMPRESS—Extensible Metadata PRovider for Extreme-scale Scientific Simulations Margaret Lawson (Sandia National Laboratories and Darmouth College) Jay Lofstead (Sandia National Laboratories) Scott Levy (Sandia National Laboratories) Patrick Widener (Sandia National Laboratories) Craig Ulmer (Sandia National Laboratories) Shyamali Mukherjee (Sandia National Laboratories) Gary Templet (Sandia National Laboratories) Todd Kordenbrock (DXC Technology) Paper \| Slides
	Taming Metadata Storms in Parallel Filesystems with MetaFS Tim Shaffer (University of Notre Dame) Douglas Thain (University of Notre Dame) Paper \| Slides
	Architecting HBM as a High Bandwidth, High Capacity, Self-Managed Last-Level Cache Tyler Stocksdale (North Carolina State University) Mu-Tien Chang (Samsung) Hongzhong Zheng (Samsung Semiconductor Inc.) Frank Mueller (NCSU) Paper \| Slides
11:45am – 12:00pm	WIP SESSION 1
	Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support Yue Zhu Teng Wang Kathryn Mohror Adam Moody Kento Sato Muhib Khan Weikuan Yu Abstract \| Slides
	Accelerating the Data Deduplication Performance with GPU in Hybrid Storage Systems Prince Hamandawana Awais Khan Changgyu Lee Sungyong Park Youngjae Kim Abstract \| Slides
	NUMA-Aware Thread and Resource Scheduling for Terabit Data Movement Taeuk Kim Awais Khan Youngjae Kim Sungyong Park Scott Atchley Abstract \| Slides
12:00pm – 1:30pm	Lunch (not provided)
1:30pm – 3:00pm	SESSION 2: Scalability of Storage Systems Chair: Carlos Maltzahn, University of California, Santa Cruz
	Optimized Scatter/Gather Data Operations for Parallel Storage Latchesar Ionkov (Los Alamos National Laboratory) Carlos Maltzahn (University of California, Santa Cruz) Michael Lang (Los Alamos National Laboratory) Paper \| Slides
	Software-Defined Storage for Fast Trajectory Queries using a DeltaFS Indexed Massive Directory Qing Zheng (Carnegie Mellon University) George Amvrosiadis (Carnegie Mellon University) Saurabh Kadekodi (Carnegie Mellon University) Garth A. Gibson (Carnegie Mellon University) Charles D. Cranor (Carnegie Mellon University) Bradley W. Settlemyer (Los Alamos National Laboratory) Gary Grider (Los Alamos National Laboratory) Fan Guo (Los Alamos National Laboratory) Paper \| Slides
	CoSS: Proposing a Contract-Based Storage System for HPC Matthieu Dorier (Argonne National Laboratory) Matthieu Dreher (Argonne National Laboratory) Tom Peterka (Argonne National Laboratory) Robert Ross (Argonne National Laboratory) Paper \| Slides
2:45pm – 3:00pm	WIP SESSION 2
	mpiFileUtils: A Parallel and Distributed Toolset for Managing Large Datasets Danielle Sikich Giuseppe Di Natale Matthew Legendre Adam Moody Abstract \| Slides
	Resource Requirement Specification for Novel Data-aware and Workflow-enabled HPC Job Schedulers Emmanouil Farsarakis Iakovos Panourgias Adrian Jackson Juan F. R. Herrera Michele Weiland Mark Parsons Abstract \| Slides
	A Study of NVRAM Performance Variability under Concurrent I/O Accesses Anthony Kougkas Hariharan Devarajan Xian-He Sun Abstract \| Slides
3:00pm – 3:30pm	Break
3:30pm – 5:10pm	SESSION 3: Understanding I/O Performance Chair: Elsa Gonsiorowski, Lawrence Livermore National Laboratory
	Diving into Petascale Production File Systems through Large Scale Profiling and Analysis Feiyi Wang (Oak Ridge National Laboratory) Hyogi Sim (Oak Ridge National Laboratory) Cameron Harr (Lawrence Livermore National Laboratory) Sarp Oral (Oak Ridge National Laboratory) Paper \| Slides
	Performance Analysis of Emerging Data Analytics and HPC Workloads Christopher Daley (Lawrence Berkeley National Laboratory) Sudip Dosanjh (Lawrence Berkeley National Laboratory) Prabhat (Lawrence Berkeley National Laboratory) Nicholas Wright (Lawrence Berkeley National Laboratory) Paper \| Slides
	Toward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure Arnab K. Paul (Virginia Tech) Ryan Chard (Argonne National Laboratory) Kyle Chard (University of Chicago) Steven Tuecke (University of Chicago) Ali R. Butt (Virginia Tech) Ian Foster (Argonne National Laboratory, University of Chicago) Paper \| Slides
	UMAMI: A Recipe for Generating Meaningful Metrics through Holistic I/O Performance Analysis Glenn Lockwood (Lawrence Berkeley National Laboratory) Shane Snyder (Argonne National Laboratory) Wucherl Yoo (Lawrence Berkeley National Laboratory) Kevin Harms (Argonne National Laboratory) Zachary Nault (Argonne National Laboratory) Suren Byna (Lawrence Berkeley National Laboratory) Philip Carns (Argonne National Laboratory) Nicholas Wright (Lawrence Berkeley National Laboratory) Paper \| Slides
5:10pm – 5:25pm	Break
5:25pm – 6:00pm	WIP SESSION 3 Chair: Jay Lofstead, Sandia National Laboratories
	Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience Onkar Patil Saurabh Hukerikar Frank Mueller Christian Engelmann Abstract \| Slides
	Evaluating Performance of Burst Buffer Models for Real-World Application Workloads in HPC Systems Harsh Khetawat Frank Mueller Christopher Zimmer Abstract \| Slides
	Towards Structure-Aware Earth System Data Management Jakob Lüttgau Julian Kunkel Bryan N. Lawrence Abstract \| Slides
	I/O Mini-apps, Compression, and I/O Libraries for Physics-based Simulations Sean Ziegeler Scot Breitenfeld Jose Renteria Jordan Henderson Abstract \| Slides
	Compiler-Assisted Scientific Workflow Optimization Hadia Ahmed Peter Pirkelbauer Purushotham Bangalore Anthony Skjellum Abstract \| Slides
	Micro-Storage Services for Open Ethernet Drive Hariharan Devarajan Anthony Kougkas Xian-He Sun Abstract \| Slides
	Comprehensive Burst Buffer Evaluation Eugen Betke Julian Kunkel Abstract \| Slides
	Virtualized Big Data: Reproducing Simulation Output on Demand Salvatore Di Girolamo Pirmin Schmid Thomas Schulthess Torsten Hoefler Abstract \| Slides
	Establishing the IO-500 Benchmark Julian Kunkel John Bent Jay Lofstead George S. Markomanolis Abstract \| Slides

* = speaker

WORKSHOP ABSTRACT

(Find the complete proposal outlining the merger between PDSW and DISCS here .)

We are pleased to announce that the second Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems (PDSW-DISCS’17) will be hosted at SC17: The International Conference for High Performance Computing, Networking, Storage and Analysis. The objective of this one day joint workshop is to combine two overlapping communities and to better promote and stimulate researchers’ interactions to address some of the most critical challenges for scientific data storage, management, devices, and processing infrastructure for both traditional compute intensive simulations and data-intensive high performance computing solutions. Special attention will be given to issues in which community collaboration can be crucial for problem identification, workload capture, solution interoperability, standards with community buy-in, and shared tools.

Many scientific problem domains continue to be extremely data intensive. Traditional high performance computing (HPC) systems and the programming models for using them such as MPI were designed from a compute-centric perspective with an emphasis on achieving high floating point computation rates. But processing, memory, and storage technologies have not kept pace and there is a widening performance gap between computation and the data management infrastructure. Hence data management has become the performance bottleneck for a significant number of applications targeting HPC systems. Concurrently, there are increasing challenges in meeting the growing demand for analyzing experimental and observational data. In many cases, this is leading new communities to look towards HPC platforms. In addition, the broader computing space has seen a revolution in new tools and frameworks to support Big Data analysis and machine learning.

There is a growing need for convergence between these two worlds. Consequently, the U.S. Congressional Office of Management and Budget has informed the U.S. Department of Energy that new machines beyond the first exascale machines must address both the traditional simulation workloads as well as data intensive applications. This coming convergence prompts integrating these two workshops into a single entity to address the common challenges.

The scope of the joint PDSW-DISCS workshop is summarized as:

Scalable storage architectures, archival storage, storage virtualization, emerging storage devices and techniques
Performance benchmarking, resource management, and workload studies from production systems including both traditional HPC and data-intensive workloads.
Programmability, APIs, and fault tolerance of storage systems
Parallel file systems, metadata management, and complex data management, object and key-value storage, and other emerging data storage/retrieval techniques
Programming models and frameworks for data intensive computing including extensions to traditional and nontraditional programming models, asynchronous multi-task programming models, or to data intensive programming models
Techniques for data integrity, availability and reliability especially
Productivity tools for data intensive computing, data mining and knowledge discovery
Application or optimization of emerging “big data” frameworks towards scientific computing and analysis
Techniques and architectures to enable cloud and container-based models for scientific computing and analysis
Techniques for integrating compute into a complex memory and storage hierarchy facilitating in situ and in transit data processing
Data filtering/compressing/reduction techniques that maintain sufficient scientific validity for large scale compute-intensive workloads
Tools and techniques for managing data movement among compute and data intensive components both solely within the computational infrastructure as well as incorporating the memory/storage hierarchy

CALL FOR PAPERS

CALL FOR PAPERS POSTER - download, print, and hang one up at your office / department

The Parallel Data Storage Workshop holds a peer reviewed competitive process for selecting short papers. Submit a not previously published short paper of up to 5 pages, not less than 10 point font and not including references, in a PDF file as instructed on the workshop web site. Submitted papers will be reviewed under the supervision of the workshop program committee. Submissions should indicate authors and affiliations. Final papers must not be longer than 5 pages (excluding references). Selected papers and associated talk slides will be made available on the workshop web site; the papers will also be published in the digital library of the IEEE or ACM.

SUBMISSIONS

Paper Submissions: NOW CLOSED

* Submissions must be in the IEEE format (see http://www.ieee.org/conferences_events/conferences/publishing/templates.html).

Paper Submission Details:

The PDSW-DISCS Workshop holds a peer reviewed competitive process for selecting short papers. Submit a not previously published short paper of up to 5 pages, not less than 10 point font and not including references, in a PDF file as instructed on the workshop web site. Submitted papers will be reviewed under the supervision of the workshop program committee. Submissions should indicate authors and affiliations. Papers must not be longer than 5 pages (excluding references). Selected papers and associated talk slides will be made available on the workshop web site; the papers will also be published in the digital libraries of the IEEE and ACM.

Work-in-progress (WIP) Submissions

wip Submissions: NOW CLOSED

There will also be a WIP session at the workshop, where presenters give 5-minute brief talks on their on-going work, with fresh problems/solutions, but may not be mature or complete yet for paper submission. A 1-page abstract is required.

ATTENDING THE WORKSHOP

Please be aware that all attendees to the workshop, both speakers and participants, will have to pay the SC17 registration fee. Workshops are no longer included as part of the technical program registration. With a paid Technical Program registration, workshop fees are $50 for Members/Non-Members and $25 for Students. A workshop only fee is available for $200 for Members/Non-Members and $100 for Students.

PROGRAM COMMITTEE:

Kathryn Mohror, Lawrence Livermore National Laboratory, Program Co-Chair
Brent Welch, Google, Program Co-Chair
Janine Bennett, Sandia National Laboratories
Angela Demke Brown, University of Toronto
Suren Byna, Lawrence Berkeley National Laboratory
Shane Canon, Lawrence Berkeley National Laboratory
Raghunath Raja Chandrasekar, Amazon Web Services
Yong Chen, Texas Tech University
Toni Cortes, Universitat Politècnica de Catalunya
Garth Gibson, Carnegie Mellon
Elsa Gonsiorowski, Lawrence Livermore National Laboratory
Bingsheng He, National University of Singapore
Shadi Ibrahim, Inria Dries Kimpe, KCG
Jay Lofstead, Sandia National Laboratories
Xiaosong Ma, Qatar Computing Research Institute
Carlos Maltzhan, University of California, Santa Cruz
Suzanne McIntosh, New York University
Sangmi Pallickara, Colorado State University
Rob Ross, Argonne National Labs
Philip C. Roth, Oak Ridge National Laboratory
Kento Sato, Lawrence Livermore National Laboratory

STEERING COMMITTEE:

John Bent, Cray
Ali R. Butt, Virginia Tech
Shane Canon, Lawrence Berkeley National Laboratory
Yong Chen, Texas Tech University
Evan J. Felix, Pacific Northwest National Laboratory
Garth A. Gibson, Carnegie Mellon University
William D. Gropp, University of Illinois at Urbana-Champaign
Gary Grider, Los Alamos National Laboratory
Dean Hildebrand, Google
Dries Kimpe, KCG, USA
Jay Lofstead, Sandia National Laboratories
Darrell Long, University of California, Santa Cruz
Xiaosong Ma, Qatar Computing Research Institute, Qatar
Carlos Maltzahn, University of California, Santa Cruz
Robert Ross, Argonne National Laboratory
Philip C. Roth, Oak Ridge National Laboratory
John Shalf, NERSC, Lawrence Berkeley National Laboratory
Xian-He Sun, Illinois Institute of Technology
Rajeev Thakur, Argonne National Laboratory
Lee Ward, Sandia National Laboratories

8:50am – 9:00am	Welcome & Introduction
9:00am – 10:00am	Keynote Speaker - Denis Serenyi, Google From GFS to Colossus: Cluster-Level Storage @ Google Slides
10:00am – 10:30am	Break
10:30am – 12:00pm	SESSION 1: Improving Storage System Performance Chair: Suren Byna, Lawrence Berkeley National Laboratory
	EMPRESS—Extensible Metadata PRovider for Extreme-scale Scientific Simulations Margaret Lawson (Sandia National Laboratories and Darmouth College) Jay Lofstead (Sandia National Laboratories) Scott Levy (Sandia National Laboratories) Patrick Widener (Sandia National Laboratories) Craig Ulmer (Sandia National Laboratories) Shyamali Mukherjee (Sandia National Laboratories) Gary Templet (Sandia National Laboratories) Todd Kordenbrock (DXC Technology) Paper \| Slides
	Taming Metadata Storms in Parallel Filesystems with MetaFS Tim Shaffer (University of Notre Dame) Douglas Thain (University of Notre Dame) Paper \| Slides
	Architecting HBM as a High Bandwidth, High Capacity, Self-Managed Last-Level Cache Tyler Stocksdale (North Carolina State University) Mu-Tien Chang (Samsung) Hongzhong Zheng (Samsung Semiconductor Inc.) Frank Mueller (NCSU) Paper \| Slides
11:45am – 12:00pm	WIP SESSION 1
	Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support Yue Zhu Teng Wang Kathryn Mohror Adam Moody Kento Sato Muhib Khan Weikuan Yu Abstract \| Slides
	Accelerating the Data Deduplication Performance with GPU in Hybrid Storage Systems Prince Hamandawana Awais Khan Changgyu Lee Sungyong Park Youngjae Kim Abstract \| Slides
	NUMA-Aware Thread and Resource Scheduling for Terabit Data Movement Taeuk Kim Awais Khan Youngjae Kim Sungyong Park Scott Atchley Abstract \| Slides
12:00pm – 1:30pm	Lunch (not provided)
1:30pm – 3:00pm	SESSION 2: Scalability of Storage Systems Chair: Carlos Maltzahn, University of California, Santa Cruz
	Optimized Scatter/Gather Data Operations for Parallel Storage Latchesar Ionkov (Los Alamos National Laboratory) Carlos Maltzahn (University of California, Santa Cruz) Michael Lang (Los Alamos National Laboratory) Paper \| Slides
	Software-Defined Storage for Fast Trajectory Queries using a DeltaFS Indexed Massive Directory Qing Zheng (Carnegie Mellon University) George Amvrosiadis (Carnegie Mellon University) Saurabh Kadekodi (Carnegie Mellon University) Garth A. Gibson (Carnegie Mellon University) Charles D. Cranor (Carnegie Mellon University) Bradley W. Settlemyer (Los Alamos National Laboratory) Gary Grider (Los Alamos National Laboratory) Fan Guo (Los Alamos National Laboratory) Paper \| Slides
	CoSS: Proposing a Contract-Based Storage System for HPC Matthieu Dorier (Argonne National Laboratory) Matthieu Dreher (Argonne National Laboratory) Tom Peterka (Argonne National Laboratory) Robert Ross (Argonne National Laboratory) Paper \| Slides
2:45pm – 3:00pm	WIP SESSION 2
	mpiFileUtils: A Parallel and Distributed Toolset for Managing Large Datasets Danielle Sikich Giuseppe Di Natale Matthew Legendre Adam Moody Abstract \| Slides
	Resource Requirement Specification for Novel Data-aware and Workflow-enabled HPC Job Schedulers Emmanouil Farsarakis Iakovos Panourgias Adrian Jackson Juan F. R. Herrera Michele Weiland Mark Parsons Abstract \| Slides
	A Study of NVRAM Performance Variability under Concurrent I/O Accesses Anthony Kougkas Hariharan Devarajan Xian-He Sun Abstract \| Slides
3:00pm – 3:30pm	Break
3:30pm – 5:10pm	SESSION 3: Understanding I/O Performance Chair: Elsa Gonsiorowski, Lawrence Livermore National Laboratory
	Diving into Petascale Production File Systems through Large Scale Profiling and Analysis Feiyi Wang (Oak Ridge National Laboratory) Hyogi Sim (Oak Ridge National Laboratory) Cameron Harr (Lawrence Livermore National Laboratory) Sarp Oral (Oak Ridge National Laboratory) Paper \| Slides
	Performance Analysis of Emerging Data Analytics and HPC Workloads Christopher Daley (Lawrence Berkeley National Laboratory) Sudip Dosanjh (Lawrence Berkeley National Laboratory) Prabhat (Lawrence Berkeley National Laboratory) Nicholas Wright (Lawrence Berkeley National Laboratory) Paper \| Slides
	Toward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure Arnab K. Paul (Virginia Tech) Ryan Chard (Argonne National Laboratory) Kyle Chard (University of Chicago) Steven Tuecke (University of Chicago) Ali R. Butt (Virginia Tech) Ian Foster (Argonne National Laboratory, University of Chicago) Paper \| Slides
	UMAMI: A Recipe for Generating Meaningful Metrics through Holistic I/O Performance Analysis Glenn Lockwood (Lawrence Berkeley National Laboratory) Shane Snyder (Argonne National Laboratory) Wucherl Yoo (Lawrence Berkeley National Laboratory) Kevin Harms (Argonne National Laboratory) Zachary Nault (Argonne National Laboratory) Suren Byna (Lawrence Berkeley National Laboratory) Philip Carns (Argonne National Laboratory) Nicholas Wright (Lawrence Berkeley National Laboratory) Paper \| Slides
5:10pm – 5:25pm	Break
5:25pm – 6:00pm	WIP SESSION 3 Chair: Jay Lofstead, Sandia National Laboratories
	Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience Onkar Patil Saurabh Hukerikar Frank Mueller Christian Engelmann Abstract \| Slides
	Evaluating Performance of Burst Buffer Models for Real-World Application Workloads in HPC Systems Harsh Khetawat Frank Mueller Christopher Zimmer Abstract \| Slides
	Towards Structure-Aware Earth System Data Management Jakob Lüttgau Julian Kunkel Bryan N. Lawrence Abstract \| Slides
	I/O Mini-apps, Compression, and I/O Libraries for Physics-based Simulations Sean Ziegeler Scot Breitenfeld Jose Renteria Jordan Henderson Abstract \| Slides
	Compiler-Assisted Scientific Workflow Optimization Hadia Ahmed Peter Pirkelbauer Purushotham Bangalore Anthony Skjellum Abstract \| Slides
	Micro-Storage Services for Open Ethernet Drive Hariharan Devarajan Anthony Kougkas Xian-He Sun Abstract \| Slides
	Comprehensive Burst Buffer Evaluation Eugen Betke Julian Kunkel Abstract \| Slides
	Virtualized Big Data: Reproducing Simulation Output on Demand Salvatore Di Girolamo Pirmin Schmid Thomas Schulthess Torsten Hoefler Abstract \| Slides
	Establishing the IO-500 Benchmark Julian Kunkel John Bent Jay Lofstead George S. Markomanolis Abstract \| Slides

pdsw-discs

pdsw-DISCS 2017:

2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems

held in conjunction with SC17

Monday, November 13, 2017
Denver, CO

keynote speaker

agenda

WORKSHOP ABSTRACT

CALL FOR PAPERS

CALL FOR PAPERS POSTER - download, print, and hang one up at your office / department

SUBMISSIONS

Paper Submissions: NOW CLOSED

Paper Submission Details:

Work-in-progress (WIP) Submissions

wip Submissions: NOW CLOSED

ATTENDING THE WORKSHOP

PROGRAM COMMITTEE:

STEERING COMMITTEE:

pdsw '25

past pdsw events

past discs events

pdsw-discs

pdsw-DISCS 2017:

2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems

held in conjunction with SC17

Monday, November 13, 2017 Denver, CO

keynote speaker

agenda

WORKSHOP ABSTRACT

CALL FOR PAPERS

CALL FOR PAPERS POSTER - download, print, and hang one up at your office / department

SUBMISSIONS

Paper Submissions: NOW CLOSED

Paper Submission Details:

Work-in-progress (WIP) Submissions

wip Submissions: NOW CLOSED

ATTENDING THE WORKSHOP

PROGRAM COMMITTEE:

STEERING COMMITTEE:

pdsw '25

past pdsw events

past discs events

Monday, November 13, 2017
Denver, CO