Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems

pdsw-DISCS 2016:

1st Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems

held in conjunction with SC16

monday, November 14, 2016
Salt Lake City, UT

TiME: 9 am - 6:00 pm

salt palace convention center
LOCATION: Room 155-C
SC WORkshop page

Program Co-Chairs:

Lawrence Berkeley National Laboratory

IBM

General Co-Chairs:

Carnegie Mellon University

Texas Tech University

abstract / agenda / keynote speaker / cfp / submissions / WIP session / committees

keynote speaker

PDSW-DISCS16 is proud to announce that Ion Stoica, UC Berkeley, will be our keynote speaker. He will be talking about Trends and Challenges in Big Data Processing. Please see details here.

agenda

The proceedings of the 1st PDSW-DISCS are now online in the IEEE DIgital Library.

8:55am – 9:00am	Welcome & Introduction
9:00am – 10:00am	Keynote Speaker - Dr. Ion Stoica, UC Berkeley Trends and Challenges in Big Data Processing Slides
10:00am – 10:30am	Morning Break
10:30am – 12:00pm	SESSION 1: I/O Insight Chair: Jay Lofstead, Sandia National Laboratories
	Scientific Workflows at DataWarp-Speed: Accelerated Data-Intensive Science using NERSC's Burst Buffer *Andrey Ovsyannikov (Lawrence Berkeley National Laboratory) Melissa Romanus (Rutgers University) Brian Van Straalen (Lawrence Berkeley National Laboratory) Gunther H. Weber (Lawrence Berkeley National Laboratory; University of California, Davis) David Trebotich (Lawrence Berkeley National Laboratory) Paper \| Slides
	Parallel I/O Characterisation Based on Server-Side Performance Counters *Salem El Sayed (Jülich Supercomputing Centre) Matthias Bolteny (Institut für Mathematik) Dirk Pleiter (Jülich Supercomputing Centre) Wolfgang Frings (Jülich Supercomputing Centre) Paper \| Slides
	Replicating HPC I/O Workloads with Proxy Applications *James Dickson (University of Warwick) Steven Wright (University of Warwick) Satheesh Maheswaran (UK Atomic Weapons Establishment) Andy Herdman (UK Atomic Weapons Establishment) Mark C. Miller (Lawrence Livermore National Laboratory) Stephen Jarvis (University of Warwick) Paper \| Slides
11:45am – 12:00pm	WIP SESSION 1
	Use of a New I/O Stack for Extreme-scale Systems in Scientific Applications Michael Breitenfeld Quincey Koziol Neil Fortner *Jerome Soumagne Mohamad Chaarawi Abstract \| Slides
	Towards Optimizing Large-Scale Data Transfers with End-to-End Integrity Verification Si Liu Eun-Sung Jung *Rajkumar Kettimuthu Xian-He Sun Abstract \| Slides
	MarFS Metadata Scaling Brett Kettering *David Bonnie Gary Grider Hsing-Bung Chen William Vining Jeffrey Inman Abstract \| Slides
12:00pm – 1:30pm	Lunch (not provided)
1:30pm – 3:00pm	SESSION 2: Data Insight Chair: John Bent, Seagate Government Solutions
	Can Non-Volatile Memory Benefit MapReduce Applications on HPC Clusters? *Md. Wasi-ur-Rahman (Ohio State University) Nusrat Sharmin Islam (Ohio State University) Xiaoyi Lu (Ohio State University) Dhabaleswar K. (DK) Panda (Ohio State University) Paper \| Slides
	FatMan vs. LittleBoy: Scaling up Linear Algebraic Operations in Scale-out Data Platforms *Luna Xu (Virginia Tech) Seung-Hwan Lim (Oak Ridge National Laboratory) Ali R. Butt (Virginia Tech) Sreenivas R. Sukumar (Oak Ridge National Laboratory) Ramakrishnan Kannan Paper \| Slides
	Klimatic: A Virtual Data Lake for Harvesting and Distribution of Geospatial Data *Tyler J. Skluzacek (University of Chicago) Kyle Chard (University of Chicago) Ian Foster (University of Chicago, Argonne National Laboratory) Paper \| Slides
2:45pm – 3:00pm	WIP SESSION 2
	Interference-aware Scheduling for Data-processing Frameworks in Container-based Clusters *Miguel Xavier Cesar De Rose Abstract \| Slides
	Towards A Scalable, Resilient, and Efficient Data Service for Exascale Computing Michael Brim *Tonglin Li Sarp Oral Geoffroy Vallee Feiyi Wang Scott Atchley Abstract \| Slides
	Mero: Co-Designing an Object Store for Extreme Scale Nikita Danilov Nathan Rutman *Sai Narasimhamurthy John Bent Abstract \| Slides
3:00pm – 3:30pm	Afternoon Break
3:30pm – 6:00pm	SESSION 3: Performance and Testing Insight Chair: Carlos Maltzahn, University of California, Santa Cruz
	Get out of the Way! Applying Compression to Internal Data Structures *Rob Latham (Argonne National Laboratory) Matthieu Dorier (Argonne National Laboratory) Rob Ross (Argonne National Laboratory) Paper \| Slides
	Towards Energy Efficient Data Management in HPC: The Open Ethernet Drive Approach *Anthony Kougkas (Illinois Institute of Technology) Anthony Fleck (Illinois Institute of Technology) Xian-He Sun (Illinois Institute of Technology) Paper \| Slides
	A Generic Framework for Testing Parallel File Systems Jinrui Cao (New Mexico State University) Simeng Wang (New Mexico State University) Dong Dai (Texas Tech University) *Mai Zheng (New Mexico State University) Yong Chen (Texas Tech University) Paper \| Slides
	A Bloom Filter Based Scalable Data Integrity Check Tool for Large-scale Dataset *Sisi Xiong (University of Tennessee Knoxville) Feiyi Wang (Oak Ridge National Laboratory) Qing Cao (University of Tennessee Knoxville) Paper \| Slides
5:10pm – 6:00pm	WIP SESSION 3 / ANNOUNCEMENTS Chair: Sarp Oral, Oak Ridge National Laboratory
	Exploring Opportunities for Job-temporal File Systems with ADA-FS Sebastian Oeste *Michael Kluge Mehmet Soysal Achim Streit Marc-André Vef André Brinkmann Abstract \| Slides
	Time Taken to Write Data in Parallel on Lustre Follows Extreme Statistics *Richard Henwood N. W. Watkins S.C. Chapman R. McLay Abstract \| Slides
	Containerizing Byte-Addressable NVM *Ellis Giles Abstract \| Slides
	Middleware for Earth System Data *Julian Kunkel Jakob Luettgau Bryan Lawrence Jens Jensen Giuseppe Congiu John Readey Abstract \| Slides
	Popper: Practical Reproducible Evaluation of Systems *Ivo Jimenez Michael Sevilla Noah Watkins Carlos Maltzahn Jay Lofstead Kathryn Mohror Remzi Arpaci-Dusseau Andrea Arpaci-Dusseau Abstract \| Slides
	Partially-Decompressible Dictionary Based Compression Format for All Flash Array *Yosuke Oyama Hiroki Ohtsuji Jun Kato Kosuke Suzuki Mitsuru Sato Eiji Yoshida. Abstract \| Slides
	Toward an Architecture for mHealth Web Data Choreography *Deger Cenk Erdil Saranya Radhakrishnan Abstract \| Slides
	Implementation, Evaluation and Analysis of Block index for ADIOS *Tzuhsien Wu Jerry Chou Norbert Podhorszki Yuan Tian Junmin Gu Kesheng Wu Abstract \| Slides
	MPI-IO In-Memory Storage with the Kove XPD *Julian Kunkel Eugen Betke Abstract \| Slides

* = speaker

WORKSHOP ABSTRACT

(Find the complete proposal outlining the merger between PDSW and DISCS here.)

We are pleased to announce that the first Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems (PDSW-DISCS’16) will be hosted at SC16: The International Conference for High Performance Computing, Networking, Storage and Analysis. The objective of this one day joint workshop is to combine two overlapping communities and to better promote and stimulate researchers’ interactions to address some of the most critical challenges for scientific data storage, management, devices, and processing infrastructure for both traditional compute intensive simulations and data-intensive high performance computing solutions. Special attention will be given to issues in which community collaboration can be crucial for problem identification, workload capture, solution interoperability, standards with community buy-in, and shared tools.

Many scientific problem domains continue to be extremely data intensive. Traditional high performance computing (HPC) systems and the programming models for using them such as MPI were designed from a compute-centric perspective with an emphasis on achieving high floating point computation rates. But processing, memory, and storage technologies have not kept pace and there is a widening performance gap between computation and the data management infrastructure. Hence data management has become the performance bottleneck for a significant number of applications targeting HPC systems. Concurrently, there are increasing challenges in meeting the growing demand for analyzing experimental and observational data. In many cases, this is leading new communities to look towards HPC platforms. In addition, the broader computing space has seen a revolution in new tools and frameworks to support Big Data analysis and machine learning.

There is a growing need for convergence between these two worlds. Consequently, the U.S. Congressional Office of Management and Budget has informed the U.S. Department of Energy that new machines beyond the first exascale machines must address both the traditional simulation workloads as well as data intensive applications. This coming convergence prompts integrating these two workshops into a single entity to address the common challenges.

The scope of the proposed joint PDSW-DISCS workshop is summarized as:

Scalable storage architectures, archival storage, storage virtualization, emerging storage devices and techniques
Performance benchmarking, resource management, and workload studies from production systems including both traditional HPC and data-intensive workloads.
Programmability, APIs, and fault tolerance of storage systems
Parallel file systems, metadata management, and complex data management, object and key-value storage, and other emerging data storage/retrieval techniques
Programming models and frameworks for data intensive computing including extensions to traditional and nontraditional programming models, asynchronous multi-task programming models, or to data intensive programming models
Techniques for data integrity, availability and reliability especially
Productivity tools for data intensive computing, data mining and knowledge discovery
Application or optimization of emerging “big data” frameworks towards scientific computing and analysis
Techniques and architectures to enable cloud and container-based models for scientific computing and analysis
Techniques for integrating compute into a complex memory and storage hierarchy facilitating in situ and in transit data processing
Data filtering/compressing/reduction techniques that maintain sufficient scientific validity for large scale compute-intensive workloads
Tools and techniques for managing data movement among compute and data intensive components both solely within the computational infrastructure as well as incorporating the memory/storage hierarchy

CALL FOR PAPERS

CALL FOR PAPERS POSTER - download, print, and hang one up at your office / department!

The Parallel Data Storage Workshop holds a peer reviewed competitive process for selecting short papers. Submit a not previously published short paper of up to 5 pages, not less than 10 point font and not including references, in a PDF file as instructed on the workshop web site. Submitted papers will be reviewed under the supervision of the workshop program committee. Submissions should indicate authors and affiliations. Final papers must not be longer than 5 pages (excluding references). Selected papers and associated talk slides will be made available on the workshop web site; the papers will also be published in the digital library of the IEEE or ACM.

Full text of Call for Papers.

SUBMISSIONS

Paper Submissions: NOW CLOSED

DEADLINE EXTENDED! Paper (in pdf format) due Sunday, Sept. 11, 2016, 11:59PM AoE
Notification: Friday, Sept. 30, 2016
Camera ready and copyright forms due: Friday, Oct. 7, 2016
Slides due before workshop: Sunday, Nov. 13, 2016, 5:00 pm PDT
* Submissions must be in the IEEE format (see http://www.ieee.org/conferences_events/conferences/publishing/templates.html).

Paper Submission Details:

The PDSW-DISCS Workshop holds a peer reviewed competitive process for selecting short papers. Submit a not previously published short paper of up to 5 pages, not less than 10 point font and not including references, in a PDF file as instructed on the workshop web site. Submitted papers will be reviewed under the supervision of the workshop program committee. Submissions should indicate authors and affiliations. Papers must not be longer than 5 pages (excluding references). Selected papers and associated talk slides will be made available on the workshop web site; the papers will also be published in the digital libraries of the IEEE and ACM.

Work-in-progress (WIP) Submissions

wip Submissions: NOW CLOSED

There will also be a WIP session at the workshop, where presenters give 5-minute brief talks on their on-going work, with fresh problems/solutions, but may not be mature or complete yet for paper submission. A 1-page abstract is required.

WIP Submission Deadline: Tuesday, Nov. 1, 2016
WIP Notification: Monday, Nov. 7, 2016

ATTENDING THE WORKSHOP

Please be aware that all attendees to the workshop, both speakers and participants, will have to pay the SC16 registration fee. Workshops are no longer included as part of the technical program registration. With a paid Technical Program registration, workshop fees are $50 for Members/Non-Members and $25 for Students. A workshop only fee is available for $200 for Members/Non-Members and $100 for Students.

To attend the workshop, please register through the Supercomputing '16 registration page. Registration opens in July.

PROGRAM COMMITTEE:

Shane Canon, Lawrence Berkeley National Lab, Program Co-Chair
Dean Hildebrand, IBM Research, Program Co-Chair
Jialin Liu, Lawrence Berkeley National Lab, Publications Chair
Gabriel Antoniu, INRIA
John Bent, Seagate Government Solutions
André Brinkmann, Universität Mainz
Ali R. Butt, Virginia Tech
Pietro Cicotti, SDSC
Toni Cortes, Universitat Politècnica de Catalunya
Andreas Dilger, Intel
Shuibing He, Wuhan University
Quincey Koziol, Lawrence Berkeley National Laboratories
Julian Kunkel, DKRZ
John Leidel, Texas Tech University
Jay Lofstead, Sandia National Laboratories
Carlos Maltzahn, University of California, Santa Cruz
Suzanne McIntosh, NYU
Sarp Oral, Oak Ridge National Laboratories
Ioan Raicu, Illinois Institute of Technology
Robert Ross, Argonne National Laboratory
Frank Schmuck, IBM Research
Douglas Thain, University of Notre Dame
Brent Welch, Google
Meghan Wingate McClelland, Seagate
Seung Woo Son, University of Massachusetts-Lowell
Ming Zhao, Arizona State University

STEERING COMMITTEE:

John Bent, Seagate Government Solutions
Ali R. Butt, Virginia Tech
Yong Chen, Texas Tech University, General Co-Chair
Evan J. Felix, Pacific Northwest National Laboratory
Garth A. Gibson, Carnegie Mellon University, General Co-Chair
William D. Gropp, University of Illinois at Urbana-Champaign
Gary Grider, Los Alamos National Laboratory
Dean Hildebrand, IBM Research
Dries Kimpe, KCG, USA
Jay Lofstead, Sandia National Laboratories
Darrell Long, University of California, Santa Cruz
Xiaosong Ma, Qatar Computing Research Institute, Qatar
Carlos Maltzahn, University of California, Santa Cruz
Robert Ross, Argonne National Laboratory
Philip C. Roth, Oak Ridge National Laboratory
John Shalf, National Energy Research Scientific Computing Center,
Lawrence Berkeley National Laboratory
Xian-He Sun, Illinois Institute of Technology
Rajeev Thakur, Argonne National Laboratory
Lee Ward, Sandia National Laboratories

8:55am – 9:00am	Welcome & Introduction
9:00am – 10:00am	Keynote Speaker - Dr. Ion Stoica, UC Berkeley Trends and Challenges in Big Data Processing Slides
10:00am – 10:30am	Morning Break
10:30am – 12:00pm	SESSION 1: I/O Insight Chair: Jay Lofstead, Sandia National Laboratories
	Scientific Workflows at DataWarp-Speed: Accelerated Data-Intensive Science using NERSC's Burst Buffer *Andrey Ovsyannikov (Lawrence Berkeley National Laboratory) Melissa Romanus (Rutgers University) Brian Van Straalen (Lawrence Berkeley National Laboratory) Gunther H. Weber (Lawrence Berkeley National Laboratory; University of California, Davis) David Trebotich (Lawrence Berkeley National Laboratory) Paper \| Slides
	Parallel I/O Characterisation Based on Server-Side Performance Counters *Salem El Sayed (Jülich Supercomputing Centre) Matthias Bolteny (Institut für Mathematik) Dirk Pleiter (Jülich Supercomputing Centre) Wolfgang Frings (Jülich Supercomputing Centre) Paper \| Slides
	Replicating HPC I/O Workloads with Proxy Applications *James Dickson (University of Warwick) Steven Wright (University of Warwick) Satheesh Maheswaran (UK Atomic Weapons Establishment) Andy Herdman (UK Atomic Weapons Establishment) Mark C. Miller (Lawrence Livermore National Laboratory) Stephen Jarvis (University of Warwick) Paper \| Slides
11:45am – 12:00pm	WIP SESSION 1
	Use of a New I/O Stack for Extreme-scale Systems in Scientific Applications Michael Breitenfeld Quincey Koziol Neil Fortner *Jerome Soumagne Mohamad Chaarawi Abstract \| Slides
	Towards Optimizing Large-Scale Data Transfers with End-to-End Integrity Verification Si Liu Eun-Sung Jung *Rajkumar Kettimuthu Xian-He Sun Abstract \| Slides
	MarFS Metadata Scaling Brett Kettering *David Bonnie Gary Grider Hsing-Bung Chen William Vining Jeffrey Inman Abstract \| Slides
12:00pm – 1:30pm	Lunch (not provided)
1:30pm – 3:00pm	SESSION 2: Data Insight Chair: John Bent, Seagate Government Solutions
	Can Non-Volatile Memory Benefit MapReduce Applications on HPC Clusters? *Md. Wasi-ur-Rahman (Ohio State University) Nusrat Sharmin Islam (Ohio State University) Xiaoyi Lu (Ohio State University) Dhabaleswar K. (DK) Panda (Ohio State University) Paper \| Slides
	FatMan vs. LittleBoy: Scaling up Linear Algebraic Operations in Scale-out Data Platforms *Luna Xu (Virginia Tech) Seung-Hwan Lim (Oak Ridge National Laboratory) Ali R. Butt (Virginia Tech) Sreenivas R. Sukumar (Oak Ridge National Laboratory) Ramakrishnan Kannan Paper \| Slides
	Klimatic: A Virtual Data Lake for Harvesting and Distribution of Geospatial Data *Tyler J. Skluzacek (University of Chicago) Kyle Chard (University of Chicago) Ian Foster (University of Chicago, Argonne National Laboratory) Paper \| Slides
2:45pm – 3:00pm	WIP SESSION 2
	Interference-aware Scheduling for Data-processing Frameworks in Container-based Clusters *Miguel Xavier Cesar De Rose Abstract \| Slides
	Towards A Scalable, Resilient, and Efficient Data Service for Exascale Computing Michael Brim *Tonglin Li Sarp Oral Geoffroy Vallee Feiyi Wang Scott Atchley Abstract \| Slides
	Mero: Co-Designing an Object Store for Extreme Scale Nikita Danilov Nathan Rutman *Sai Narasimhamurthy John Bent Abstract \| Slides
3:00pm – 3:30pm	Afternoon Break
3:30pm – 6:00pm	SESSION 3: Performance and Testing Insight Chair: Carlos Maltzahn, University of California, Santa Cruz
	Get out of the Way! Applying Compression to Internal Data Structures *Rob Latham (Argonne National Laboratory) Matthieu Dorier (Argonne National Laboratory) Rob Ross (Argonne National Laboratory) Paper \| Slides
	Towards Energy Efficient Data Management in HPC: The Open Ethernet Drive Approach *Anthony Kougkas (Illinois Institute of Technology) Anthony Fleck (Illinois Institute of Technology) Xian-He Sun (Illinois Institute of Technology) Paper \| Slides
	A Generic Framework for Testing Parallel File Systems Jinrui Cao (New Mexico State University) Simeng Wang (New Mexico State University) Dong Dai (Texas Tech University) *Mai Zheng (New Mexico State University) Yong Chen (Texas Tech University) Paper \| Slides
	A Bloom Filter Based Scalable Data Integrity Check Tool for Large-scale Dataset *Sisi Xiong (University of Tennessee Knoxville) Feiyi Wang (Oak Ridge National Laboratory) Qing Cao (University of Tennessee Knoxville) Paper \| Slides
5:10pm – 6:00pm	WIP SESSION 3 / ANNOUNCEMENTS Chair: Sarp Oral, Oak Ridge National Laboratory
	Exploring Opportunities for Job-temporal File Systems with ADA-FS Sebastian Oeste *Michael Kluge Mehmet Soysal Achim Streit Marc-André Vef André Brinkmann Abstract \| Slides
	Time Taken to Write Data in Parallel on Lustre Follows Extreme Statistics *Richard Henwood N. W. Watkins S.C. Chapman R. McLay Abstract \| Slides
	Containerizing Byte-Addressable NVM *Ellis Giles Abstract \| Slides
	Middleware for Earth System Data *Julian Kunkel Jakob Luettgau Bryan Lawrence Jens Jensen Giuseppe Congiu John Readey Abstract \| Slides
	Popper: Practical Reproducible Evaluation of Systems *Ivo Jimenez Michael Sevilla Noah Watkins Carlos Maltzahn Jay Lofstead Kathryn Mohror Remzi Arpaci-Dusseau Andrea Arpaci-Dusseau Abstract \| Slides
	Partially-Decompressible Dictionary Based Compression Format for All Flash Array *Yosuke Oyama Hiroki Ohtsuji Jun Kato Kosuke Suzuki Mitsuru Sato Eiji Yoshida. Abstract \| Slides
	Toward an Architecture for mHealth Web Data Choreography *Deger Cenk Erdil Saranya Radhakrishnan Abstract \| Slides
	Implementation, Evaluation and Analysis of Block index for ADIOS *Tzuhsien Wu Jerry Chou Norbert Podhorszki Yuan Tian Junmin Gu Kesheng Wu Abstract \| Slides
	MPI-IO In-Memory Storage with the Kove XPD *Julian Kunkel Eugen Betke Abstract \| Slides

pdsw-discs

pdsw-DISCS 2016:

1st Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems

held in conjunction with SC16

monday, November 14, 2016
Salt Lake City, UT

TiME: 9 am - 6:00 pm

salt palace convention center
LOCATION: Room 155-C
SC WORkshop page

keynote speaker

agenda

WORKSHOP ABSTRACT

CALL FOR PAPERS

CALL FOR PAPERS POSTER - download, print, and hang one up at your office / department!

SUBMISSIONS

Paper Submissions: NOW CLOSED

Paper Submission Details:

Work-in-progress (WIP) Submissions

wip Submissions: NOW CLOSED

ATTENDING THE WORKSHOP

PROGRAM COMMITTEE:

STEERING COMMITTEE:

pdsw '26

past pdsw events

past discs events

pdsw-discs

pdsw-DISCS 2016:

1st Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems

held in conjunction with SC16

monday, November 14, 2016 Salt Lake City, UT

TiME: 9 am - 6:00 pm

salt palace convention center LOCATION: Room 155-C SC WORkshop page

keynote speaker

agenda

WORKSHOP ABSTRACT

CALL FOR PAPERS

CALL FOR PAPERS POSTER - download, print, and hang one up at your office / department!

SUBMISSIONS

Paper Submissions: NOW CLOSED

Paper Submission Details:

Work-in-progress (WIP) Submissions

wip Submissions: NOW CLOSED

ATTENDING THE WORKSHOP

PROGRAM COMMITTEE:

STEERING COMMITTEE:

pdsw '26

past pdsw events

past discs events

monday, November 14, 2016
Salt Lake City, UT

salt palace convention center
LOCATION: Room 155-C
SC WORkshop page