pdsw 2023:

8th International Parallel Data Systems Workshop


HELD IN CONJUNCTION WITH SC23: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS

In cooperation with IEEE Computer Society & THE ASSOCIATION FOR COMPUTER MACHINERY


DATE: SUNDAY, November 12, 2023
Colorado Convention Center
DENVER, CO

Time: 2:00 PM - 5:30 pm (MST)
ROOM 607
SC Workshop page


 

Program Co-Chairs:

Microsoft, USA


The Ohio State University, USA

Reproducibility Co-Chairs:


DePaul University, USA


Lawerence Berkeley National Laboratory, USA
General Chair:

Hong Kong Baptist University, China

Publicity Chair:

EPFL, Switzerland

Web & Publications Chair:

Carnegie Mellon University, USA

abstract / cfp / submissions / WIP session
workshop registration / committees
PDSW23 Reproducability Addendum (updated Aug 29, 2023)
NEW SUBMISSION DEADLINE: AUG 6, 2023
- closed


Invited speaker:


Gokul Soundararajan, AWS

AWS - Running Amazon Redshift at Scale
Amazon Redshift is a high performance, secure, scalable and highly available managed data-warehouse service. In this talk, we explore practical aspects of running Amazon Redshift at scale: [more].


agenda


You may also view the official agenda on the SC workshop page for the latest information and abstracts for each of the talks.

2:00pm - 2:05pm PDSW'23 - Organizers' Welcome & Introduction

INVITED TALK:
2:05pm - 2:45pm Invited Speaker: Gokul Soundararajan, AWS
AWS - Running Amazon Redshift at Scale
Slides
MAIN SESSION:
2:45pm - 2:50pm [WiP] Toward Standardized, Open Object-Based Computational Storage For Large-Scale Scientific Data Analytics
Qing Zheng, Los Alamos National Laboratory
Jason Lee, Los Alamos National Laboratory
Dominic A. Manno, Los Alamos National Laboratory
Gary Grider,Los Alamos National Laboratory
Abstract | Slides

2:50pm - 2:55pm [WiP] DAOS as HPC Storage: Exploring Interfaces
Adrian Jackson, University of Edinburgh
Nicolau Manubens Gil, European Centre Medium-Range Weather Forecasts
Abstract | Slides

2:55pm - 3:00pm

[WiP] Towards a Peer-to-Peer Data Distribution Layer for Efficient and Collaborative Resource Optimization of Distributed Dataflow Applications
Dominik Scheinert, Technische Universität Berlin
Soeren Becker, Technische Universität Berlin
Jonathan Will, Technische Universität Berlin
Luis Englaender, Technische Universität Berlin
Lauritz Thamsen, University of Glasgow
Abstract | Slides

3:00pm - 3:30pm Afternoon Break

3:30pm - 3:48pm Enhancing Metadata Transfer Efficiency: Unlocking the Potential of DAOS in the ADIOS Context
Ranjan Sarpangala Venkatesh, Georgia Institute of Technology
Greg Eisenhauer, Georgia Institute of Technology
Scott Klasky, Oak Ridge National Laboratory
Ada Gavrilovska, Georgia Institute of Technology
Paper | Slides

3:48pm - 4:06pm IOMax: Maximizing Out-of-Core I/O Analysis Performance on HPC Systems
Izzet Yildirim, Illinois Institute of Technology
Hariharan Devarajan, Lawrence Livermore National Laboratory
Anthony Kougkas, Illinois Institute of Technology
Xian-He Sun, Illinois Institute of Technology
Kathryn Mohror, Lawrence Livermore National Laboratory
Paper | Slides

4:06pm - 4:11pm [WiP] Domain-aware Performant AI-based Compression
Boyuan Zhang, Indiana University
Luanzheng Guo, Pacific Northwest National Laboratory
Nathan R. Tallent, Pacific Northwest National Laboratory
Jan Strube, Pacific Northwest National Laboratory
Dingwen Tao, Indiana University
Abstract | Slides

4:11pm - 4:29pm The I/O Trace Initiative: Building a Collaborative I/O Archive to Advance HPC
Nafiseh Moti, Johannes Gutenberg University Mainz
André Brinkmann, Johannes Gutenberg University Mainz
Marc-André Vef, Johannes Gutenberg University Mainz
Philippe Deniel, CEA
Jesus Carretero, Universidad Carlos III de Madrid
Philip Carns, Argonne National Laboratory
Jean-Thomas Acquaviva, DataDirect Networks
Reza Salkhordeh, Johannes Gutenberg University Mainz
Paper | Slides

4:29pm - 4:47pm GrIOt: Graph-based Modeling of HPC Application I/O Call Stacks for Predictive Prefetch
Louis-Marie Nicolas, Atos BDS R&D Data Management
Salim Mimouni, Atos BDS R&D Data Management
Phillipe Couvée, Atos BDS R&D Data Management
Jalil Boukhobza, ENSTA Bretagne, Lab-STICC, CNRS
Paper | Slides

4:47pm - 4:52pm [WiP] Advancing Automated I/O Analysis with Multi-Perspective Views
Izzet Yildirim, Illinois Institute of Technology
Hariharan Devarajan, Lawrence Livermore National Laboratory
Anthony Kougkas, Illinois Institute of Technology
Xian-He Sun, Illinois Institute of Technology
Kathryn Mohror, Lawrence Livermore National Laboratory
Abstract | Slides

4:52pm - 5:10pm PoliMOR: A Policy Engine "Made-to-Order" for Automated and Scalable Data Management in Lustre
Anjus George, Oak Ridge National Laboratory
Christopher D. Brumgard, Oak Ridge National Laboratory
Rick Mohr, Sarp Oral, Oak Ridge National Laboratory
Ketan Maheshwari, Oak Ridge National Laboratory
James Simmons, Oak Ridge National Laboratory
Sarp Oral, Oak Ridge National Laboratory
Jesse Hanley, Oak Ridge National Laboratory
Paper | Slides

5:10pm - 5:15pm [WiP] Accelerate Stage-out in Single Shared Files from Node-local Burst-buffers
Kohei Sugihara, University of Tsukuba
Osamu Tatebe, University of Tsukuba
Abstract | Slides

5:15pm - 5:20pm [WiP] DAOS Project Update
Mohamad Chaarawi, Intel Corporation
Michael Hennecke, Intel Corporation
Abstract | Slides

5:20pm - 5:25pm [WiP] Compression of Scientific Simulation Data by Stochastic Basis Expansion - Example on Multiple Computer Systems
Kohei Fujita, University of Tokyo
Tsuyoshi Ichimura, University of Tokyo
Maddegedara Lalith, University of Tokyo
Muneo Hori, Japan Agency for Marine-Earth Science and Technology
Abstract | Slides

5:25pm - 5:30pm PDSW23 – Closing Remarks
Slides

WORKSHOP ABSTRACT


We are pleased to announce the 8th International Parallel Data Systems Workshop (PDSW’23). PDSW'23 will be hosted in conjunction with SC23: The International Conference for High Performance Computing, Networking, Storage and Analysis, in Denver, CO.

Efficient data storage and data management are crucial to scientific productivity in both traditional simulation-oriented HPC environments and Big Data analysis environments. This issue is further exacerbated by the growing volume of experimental and observational data, the widening gap between the performance of computational hardware and storage hardware, and the emergence of new data-driven algorithms in machine learning. The goal of this workshop is to facilitate research that addresses the most critical challenges in scientific data storage and data processing. PDSW will continue to build on the successful tradition established by its predecessor workshops: the Petascale Data Storage Workshop (PDSW, 2006-2015) and the Data Intensive Scalable Computing Systems (DISCS 2012-2015) workshop. These workshops were successfully combined in 2016, and the resulting joint workshop has attracted up to 38 full paper submissions and 140 attendees per year from 2016 to 2022.

  • We encourage the community to submit original manuscripts that:
  • introduce and evaluate novel algorithms or architectures,
  • inform the community of important scientific case studies or workloads, or
  • validate the reproducibility of previously published work

Special attention will be given to issues in which community collaboration is crucial for problem identification, workload capture, solution interoperability, standardization, and shared tools. We also strongly encourage papers to share complete experimental environment information (software version numbers, benchmark configurations, etc.) to facilitate collaboration.

Topics of interest include the following:

  • Large-scale data caching architectures
  • Scalable architectures for distributed data storage, archival, and virtualization
  • The application of new data processing models and algorithms towards computing and analysis
  • Performance benchmarking, resource management, and workload studies
  • Enabling cloud and container-based models for scientific data analysis
  • Techniques for data integrity, availability, reliability, and fault tolerance
  • Programming models and big data frameworks for data intensive computing
  • Hybrid cloud/on-premise data processing
  • Cloud-specific data storage and transit costs and opportunities
  • Programmability of storage systems
  • Data filtering, compression, reduction techniques
  • Data and metadata indexing and querying
  • Parallel file systems, metadata management, and complex data management
  • Integrating computation into the memory and storage hierarchy to facilitate in-situ and in-transit data processing
  • Alternative data storage models, including object stores and key-value stores
  • Productivity tools for data intensive computing, data mining, and knowledge discovery
  • Tools and techniques for managing data movement among compute and data intensive components
  • Cross-cloud data management
  • Storage system optimization and data analytics with machine learning
  • Innovative techniques and performance evaluation for new memory and storage systems


CALL FOR PAPERS

 

Call for papers available now (pdf).

SUBMISSION DEADLINE EXTENSION: NOW DUE AUG 6, 2023


Regular paper SUBMISSIONS

All papers will be evaluated by a competitive peer review process under the supervision of the workshop program committee. Selected papers and associated talk slides will be made available on the workshop web site. The papers will also be published in the SC23 Workshop Proceedings.

Authors of regular papers are strongly encouraged to submit Artifact Description (AD) Appendices that can help to reproduce and validate their experimental results. While the inclusion of the AD Appendices is optional for PDSW’23, submissions that are accompanied by AD Appendices will be given favorable consideration for the PDSW Best Paper award.

PDSW’23 follows the SC23 Reproducibility Initiative. For Artifact Description (AD) Appendices, we will use the format of the SC23 for PDSW'23 submissions. The AD should include a field for one or more links to data (zenodo, figshare, etc.) and code (github, gitlab, bitbucket, etc.) repositories. For the Artifacts that will be placed in the code repository, we encourage authors to follow the PDSW 2023 Reproducibility Addendum on how to structure the artifact, as it will make it easier for the reviewing committee and readers of the paper in the future. For PDSW 2023, we WILL NOT be taking applications for badges or awarding them due to time constraints. We will still provide reviews and feedback on the ADs, but AE will not be reviewed and badges will not be awarded.

Submit a not previously published paper as a PDF file, indicate authors and affiliations. Papers must be up to 5 pages, not less than 10 point font and not including references and optional reproducibility appendices. Submission site: https://submissions.supercomputing.org/

Submissions due: July 30th, 2023, 11:59 PM AoE Papers must use the ACM conference paper template available at: https://www.acm.org/publications/proceedings-template

Deadlines - Regular Papers and Reproducibility Study Papers


Submissions due:
Aug 6th, 2023, 11:59 PM AoE - DEADLINE EXTENDED
Submissions website: https://submissions.supercomputing.org/
Notification: Sep. 8, 2023
Badge applications due: Sep 15, 2023
Artifact freeze: Sep 22, 2023
Camera ready files due:
Sep. 29, 2023, 11:59 PM AoE
Copyright forms due: TBD
Slides due before workshop: TBD


Work In Progress (WIP) Session


There will be a WIP session where presenters provide brief 5-minute talks on their on-going work, with fresh problems/solutions. WIP content is typically material that may not be mature or complete enough for a full paper submission and will not be included in the proceedings. A one-page abstract is required. Submission site: https://submissions.supercomputing.org/

Submissions due: Sept 15th, 2023, 11:59PM AoE
WIP Notification: On or before Sept 23nd, 2023



Workshop Registration

Registration opens July 12, 2023. To allow you to prepare, find further details on registration pricing, and policies affecting registration changes and cancellations.


PROGRAM COMMITTEE:

 

  • Jean Luca Bez, Lawrence Berkeley National Laboratory
  • Jalil Boukhobza, National Institute of Advanced Technologies of Brittany (ENSTA Bretagne)
  • Wei Der Chien, University of Edinburgh
  • Dong Dai, University of North Carolina, Charlotte
  • Qian Gong, Oak Ridge National Laboratory
  • Luanzheng Guo, Pacific Northwest National Laboratory
  • Shadi Ibrahim, INRIA 
  • Tanzima Islam, Texas State University
  • Anthony Kougkas, Illinois Institute of Technology
  • Quincey Koziol, Amazon Web Services
  • Michael Kuhn, Otto von Guericke University Magdeburg, Germany
  • Wei-keng Liao, Northwestern University
  • Johann Lombardi, Intel Corporation
  • Xiaoyi Lu, University of California, Merced
  • Preeti Malakar, Indian Institute of Technology (IIT), Kanpur
  • Sarah M. Neuwirth, Goethe University Frankfurt, Juelich Supercomputing Centre (JSC)
  • Line Pouchard, Brookhaven National Laboratory
  • M. Mustafa Rafique, Rochester Institute of Technology
  • Woong Shin, Oak Ridge National Laboratory
  • Masahiro Tanaka, Microsoft Corporation
  • Osamu Tatebe, University of Tsukuba
  • Chen Wang, Lawrence Livermore National Laboratory
  • Qing Zheng, Los Alamos National Laboratory

STEERING COMMITTEE:

  • John Bent, Cray
  • Ali R. Butt, Virginia Tech
  • Philip Carns, Argonne National Laboratory
  • Shane Canon, Lawrence Berkeley National Laboratory
  • Raghunath Raja Chandrasekar, Amazon Web Services
  • Yong Chen, Texas Tech University
  • Evan J. Felix, Pacific Northwest National Laboratory
  • Gary Grider, Los Alamos National Laboratory
  • William D. Gropp, University of Illinois at Urbana-Champaign
  • Dean Hildebrand, Google
  • Shadi Ibraim, Inria, France
  • Dries Kimpe, KCG, USA
  • Glenn Lockwood, Lawrence Berkeley National Laboratory
  • Jay Lofstead, Sandia National Laboratories
  • Xiaosong Ma, Qatar Computing Research Institute, Qatar
  • Carlos Maltzahn, University of California, Santa Cruz
  • Suzanne McIntosh, New York University
  • Kathryn Mohror, Lawrence Livermore National Laboratory
  • Robert Ross, Argonne National Laboratory
  • Philip C. Roth, Oak Ridge National Laboratory
  • Kento Sato, Riken, Japan
  • John Shalf, NERSC, Lawrence Berkeley National Laboratory
  • Xian-He Sun, Illinois Institute of Technology
  • Rajeev Thakur, Argonne National Laboratory
  • Lee Ward, Sandia National Laboratories
  • Brent Welch, Google
  • Amelie Chi Zhou, Hong Kong Baptist University, China