International Parallel Data Systems Workshop

pdsw 2024:

9th International
Parallel Data Systems Workshop

HELD IN CONJUNCTION WITH SC24: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS

In cooperation with IEEE Computer Society & THE ASSOCIATION FOR COMPUTER MACHINERY

DATE: Sunday, November 17, 2024
Georgia World Congress Center
Atlanta, GA

Time: 9:00 AM - 5:30 pm (EST)
ROOM B309
SC Workshop page

Program Co-Chairs:

The Ohio State University, USA

Illinois Institute of Technology, USA

Reproducibility Co-Chairs:

Lawerence Berkeley National Laboratory, USA

RWTH Aachen University, Germany

General Chair:

Microsoft, USA

Publicity Chair:

Oak Ridge National Laboratory, USA

Web & Publications Chair:

Carnegie Mellon University, USA

abstract / cfp / submissions / WIP session / workshop registration / committees
PDSW24 Reproducability Addendum
SUBMISSION DEADLINE EXTENDED: AUG 9, 2024 - final deadline

Invited speaker: DR. İlkay AltintaŞ, University of California, San Diego

agenda

Any additional genda information, slides and abstracts will be posted here as soon as it becomes available. You will also be able to view the official agenda on the SC workshop page for the latest information and abstracts for each of the talks at a future date.

9am-9:10am	PDSW 2024 Welcome Bing Xie, Microsoft Slides
INVITED TALK:
9:10am- 10am	Invited Speaker: Bridging the Data Gaps in Computing for Science, Education and Society Dr. İlkay Altintaş, University of California, San Diego Slides
MAIN SESSION:
10am- 10:30am	Morning Break
10:30am- 11am	Fault-Tolerant Deep Learning Cache with Hash Ring for Load Balancing in HPC Systems Seoyeong Lee, Sogang University Awais Khan, Oak Ridge National Laboratory (ORNL) Yoochan Kim, Sogang University, South Korea Junghwan Park, Sogang University, South Korea Soon Hwang, Sogang University, South Korea Jae-Kook Lee, Korea Inst of Science and Technology Information (KISTI) Taeyoung Hong, Korea Inst of Science and Technology Information (KISTI) Chris Zimmer, Oak Ridge National Laboratory (ORNL) Youngjae Kim, Sogang University, South Korea Paper \| Slides
11am- 11:30am	MOSAIC: Detection and Categorization of I/O Patterns in HPC Applications Théo Jolivel, French Institute for Research in Computer Science and Automation (INRIA) François Tessier, INRIA Julien Monniot, INRIA Guillaume Pallez, INRIA Paper \| Slides
11:30am- 12pm	Exploring DAOS Interfaces and Performance Nicolau Manubens Gil, European Centre for Medium-Range Weather Forecasts (ECMWF) Johann Lombardi, DAOS Foundation Simon Smart, ECMWF Emanuele Danovaro ECMWF Tiago Quintino, ECMWF Dean Hildebrand, Google Cloud Adrian Jackson, EPCC, The University of Edinburgh Paper \| Slides
12pm- 12:05pm	[WiP] Scalable RPC Layer Towards Millions of IOPS per Server Hiroki Ohtsuji, Fujtisu Limited Munenori Maeda, Fujtisu Limited Reika Kinoshita, Fujtisu Limited Masahiro Miwa, Fujtisu Limited Osamu Tatebe, (University of Tsukuba Abstract \| Slides
12:05pm- 12:10pm	[WiP] Reducing I/O Bottleneck for Pretraining AI Foundation Models for Climate Gabriele Padovani, University of Trento, Italy Awais Khan, Oak Ridge National Laboratory, USA Sandro Fiore, University of Trento, Italy Valentine Anantharaj, Oak Ridge National Laboratory, USA Abstract \| Slides
12:10pm- 12:15pm	[WiP] BULKI - Binary Unified Layout for Key-value Interchange Wei Zhang, Lawrence Berkeley National Laboratory Houjun Tang, Lawrence Berkeley National Laboratory Suren Byna, The Ohio State University Abstract \| Slides
12:15pm- 12:20pm	[WiP] Distributed, Resilient and In-Memory Storage of Key-Value Data for HPC Rüdiger Nather, University of Kassel, Germany Mia Reitz, University of Kassel, Germany Claudia Fohry, University of Kassel, Germany Abstract \| Slides
12:20pm- 12:25pm	[WiP] A Global In-Memory Cache and Computation Tier for DAOS John Byrne, Hewlett Packard Enterprise, HPE Clarete Crasta, Hewlett Packard Enterprise, HPE Abhishek Dwaraki, Hewlett Packard Enterprise, HPE David Emberson, Hewlett Packard Enterprise, HPE Harumi Kuno, Hewlett Packard Enterprise, HPE Sekwon Lee, Hewlett Packard Enterprise, HPE Sharad Singhal, Hewlett Packard Enterprise, HPE Ramya Ahobala Rao, Hewlett Packard Enterprise, HPE Shreyas Vinayaka Basri K S, Hewlett Packard Enterprise, HPE Amitha C, Chinmay Ghosh, Hewlett Packard Enterprise, HPE Rishi Kesh Rajak, Hewlett Packard Enterprise, HPE Sriram Ravishankar, Hewlett Packard Enterprise, HPE Porno Shome, Hewlett Packard Enterprise, HPE Lance Evans, Hewlett Packard Enterprise, HPE Sherin George, Hewlett Packard Enterprise, HPE Kevan Rehm, Hewlett Packard Enterprise, HPE Myungjun (MJ) Son, Hewlett Packard Enterprise, HPE Taeklim Kim, Hewlett Packard Enterprise, HPE Shiyue (Jason) Hou, Hewlett Packard Enterprise, HPE Abstract \| Slides
12:25pm- 12:30pm	[WiP] Are Streaming Engines and Vector Databases Integrated Well? Yeonwoo Jeong, Sogang University, Republic of Korea Sungyong Park, Sogang University, Republic of Korea Abstract \| Slides
12:30pm- 2pm	Lunch Break
2pm-2:30pm	Initial Experiences With DAOS Object Storage on Aurora Rob Latham, Argonne National Laboratory Robert Ross, Argonne National Laboratory Phillip Carns, Argonne National Laboratory Shane Snyder, Argonne National Laboratory Kevin Harms, Argonne National Laboratory Kaushik Velusamy, Argonne National Laboratory Paul Coffman, Argonne National Laboratory Gordon McPheeters, Argonne National Laboratory Paper \| Slides
2:30pm-3pm	Understanding and Predicting Cross-Application I/O Interference in HPC Storage Systems Chris Egersdoerfer, University of Delaware Hasanur Rashid, University of Delaware Dong Dai, University of Delaware Bo Fang, Pacific Northwest National Laboratory (PNNL) Nathan Tallent, Pacific Northwest National Laboratory (PNNL) Paper \| Slides
3pm-3:30pm	Afternoon Break
3:30pm-4pm	Copper: Cooperative Caching Layer for Scalable Data Loading in Exascale Supercomputers Noah Lewis, Ohio State University Kevin Harms, Argonne National Laboratory Kaushik Velusamy, Argonne National Laboratory Huihuo Zheng, Argonne National Laboratory Paper \| Slides
4pm-4:05pm	[WiP] Jarvis: Towards a Shared, User- Friendly, and Reproducible, I/O Infrastructure Jaime Cernuda, Luke Logan, Illinois Institute of Technology Noah Lewis, Illinois Institute of Technology Suren Byna, The Ohio State University Xian-He Sun, The Ohio State University Anthony Kougkas, Illinois Institute of Technology Abstract \| Slides
4:05pm- 4:10pm	[WiP] DAOS Project Update - One Year in the DAOS Foundation Michael Hennecke, Intel Corporation Johann Lombardi, DAOS Foundation Abstract \| Slides
4:10pm- 4:15pm	[WiP] Improving SQL Query Execution of Distributed Query Engines on Object- Based Computational Storage through Multi-Layere... Soon Hwang, Sogang University, Republic of Korea Junhyeok Park, Sogang University, Republic of Korea Junghyun Ryu, Sogang University, Republic of Korea Jungahn Park, Memory System Research, SK hynix Inc. Jeongjin Lee, Memory System Research, SK hynix Inc. Jungki Noh, Memory System Research, SK hynix Inc. Soonyeal Yang, Memory System Research, SK hynix Inc. Woosuk Chung, Memory System Research, SK hynix Inc. Youngjae Kim, Sogang University, Republic of Korea Abstract \| Slides
4:15pm- 4:20pm	[WiP] Lustre for Grace Hopper: Current Status Report Sohei Koyama, DataDirect Networks, Japan Shuichi Ihara, DataDirect Networks, Japan Abstract \| Slides
4:20pm- 4:25pm	[WiP] Exploring the Proactive Data Containers Runtime System in VAST - A Case Study Jean Luca Bez (Lawrence Berkeley National Laboratory) Suren Byna (The Ohio State University) Abstract \| Slides
4:25pm- 4:30pm	[WiP] Silent Errors to Scientific Applications: Impacts of PFS Metadata Corruptions Dong Dai (University of Delaware) Mai Zheng (Iowa State University) Bo Fang (Pacific Northwest National Laboratory (PNNL) Abstract \| Slides
4:30pm- 4:35pm	[WiP] When Stream Processing Engine Meets Log-structured Merge-tree as State Store Kyuli Park, Sogang University, Republic of Korea Sungyong Park, Sogang University, Republic of Korea Abstract \| Slides
4:35pm - 5:30pm	Panel: Data, Data Everywhere Moderator: Kathryn Mohror, Lawrence Livermore Lab Panelists: Laura Biven, Jefferson Lab Eli Dart, Lawrence Berkeley National Laboratory Sarp Oral, Oak Ridge National Laboratory Manish Parashar, University of Utah Adam Thompson, NVIDIA

WORKSHOP ABSTRACT

We are excited to announce the 9th International Parallel Data Systems Workshop (PDSW’24), to be held in conjunction with SC24: The International Conference for High Performance Computing, Networking, Storage, and Analysis, in Atlanta, GA. PDSW’24 builds upon the rich legacy of its predecessor workshops, the Petascale Data Storage Workshop (PDSW, 2006–2015) and the Data Intensive Scalable Computing Systems (DISCS, 2012–2015) workshop. Since their successful merger in 2016, the joint workshop has drawn an average of 200 attendees annually.

The increasing importance of efficient data storage and management continues to drive scientific productivity across traditional simulation-based HPC environments and emerging Cloud, AI/ML, and Big Data analysis frameworks. Challenges are compounded by the rapidly expanding volumes of experimental and observational data, the growing disparity between computational and storage hardware performance, and the rise of novel data-driven algorithms in machine learning. This workshop aims to advance research and development by addressing the most pressing challenges in large-scale data storage and processing.

We invite the community to contribute original research manuscripts that introduce and evaluate novel algorithms or architectures, share significant scientific case studies or workloads, or assess the reproducibility of previously published work. We emphasize the importance of community collaboration for problem identification, workload capture, solution interoperability, standardization, and shared tools. Authors are encouraged to provide comprehensive experimental environment details (software versions, benchmark configurations, etc.) to promote transparency and facilitate collaborative progress.

Topics of Interest:

Scalable Architectures: Distributed data storage, archival, and virtualization.
New Data Processing Models and Algorithms: Application of innovative data processing models and algorithms for parallel computing and analysis.
Performance Analysis: Benchmarking, resource management, and workload studies.
Cloud and Container-Based Models: Enabling cloud and container-based frameworks for large-scale data analysis.
Storage Technologies: Adaptation to emerging hardware and computing models.
Data Integrity: Techniques to ensure data integrity, availability, reliability, and fault tolerance.
Programming Models and Frameworks: Big data solutions for data-intensive computing.
Hybrid Cloud Data Processing: Integration of hybrid cloud and on-premise data processing.
Cloud-Specific Opportunities: Data storage and transit opportunities specific to cloud computing.
Storage System Programmability: Enhancing programmability in storage systems.
Data Reduction Techniques: Filtering, compression, and reduction techniques for large-scale data.
File and Metadata Management: Parallel file systems, metadata management at scale.
In-Situ and In-Transit Processing: Integrating computation into the memory and storage hierarchy for in-situ and in-transit data processing.
Alternative Storage Models: Object stores, key-value stores, and other data storage models.
Productivity Tools: Tools for data-intensive computing, data mining, and knowledge discovery.
Data Movement: Managing data movement between compute and data-intensive components.
Cross-Cloud Data Management: Efficient data management across different cloud environments.
AI-enhanced Systems: Storage system optimization and data analytics using machine learning.
New Memory and Storage Systems: Innovative techniques and performance evaluation for new memory and storage systems.

CALL FOR PAPERS

Call for papers available now (pdf).

Regular paper SUBMISSIONS

All submissions to the PDSW’24 will undergo a rigorous double-anonymous peer review process overseen by the workshop program committee. Successful submissions will be published in the SC24 Workshop Proceedings and featured on the workshop website alongside associated talk slides.

Template and Submission

A full paper up to 6 pages in length, excluding references and AD/AE appendices.
Artifact Description (AD) Appendix is mandatory and Artifact Evaluation (AE) Appendix is optional.
- AD due: Aug 16th, 2024, 11:59 PM AoE - DEADLINE EXTENDED
- Submissions with AD and AE Appendix will be considered favorably for the PDSW Best Paper award.
Papers must adhere to the IEEE proceedings template. Download it here.
EXTENDED FINAL DEADLINE - Submit your papers by Aug 9th, 2024, 11:59 PM AoE at https://submissions.supercomputing.org/

Reproducibility Initiative

Aligned with the SC24 Reproducibility Initiative, we encourage detailed and structured artifact descriptions (AD) using the SC24 format. The AD should include a field for one or more links to data (zenodo, figshare, etc.) and code (Github, GitLab, Bitbucket, etc.) repositories. For the artifacts that will be placed in the code repository, we encourage authors to follow the PDSW 2024 Reproducibility Addendum on how to structure the artifact, as it will make it easier for the reviewing committee and readers of the paper in the future.

Deadlines - Regular Papers and Reproducibility Study Papers

Submissions website: https://submissions.supercomputing.org/

Submissions due: EXTENDED DEADLINE - Aug 9th, 2024, 11:59 PM AoE
AD due: EXTENDED DEADLINE - Aug 16th, 2024, 11:59 PM AoE
Paper Notification: Sep 6th, 2024, 11:59 PM AoE
Camera ready due: Sep 27th, 2024, 11:59 PM AoE
Final AD/AE due: Oct 15, 2024, 11:59 PM AoE

Copyright forms due: TBD
Slides due before workshop: TBD

Work In Progress (WIP) Session

The WIP session will showcase brief 5-minute presentations on ongoing work that may not yet be ready for a full paper submission. WIP papers will not be included in the proceedings. A one-page abstract is required for participation.

Submissions due: Sept 13th, 2024, 11:59PM AoE
WIP Notification: On or before Sept 21st, 2024

Workshop Registration

Registration opens July 10, 2024. To allow you to prepare, find further details on registration pricing, and policies affecting registration changes and cancellations.

PDSW 24 Committee Members:

Technical Committee

Jalil Boukhobza, University of Western Brittany, France
Wei Der Chen, The University of Edinburgh
Dong Dai, University of North Carolina at Charlotte
Hariharan Devarajan, Lawrence Livermore National Lab
Andreas Dilger, Whamcloud
Kira Duwe, EPFL, Switzerland
Qian Gong, Oak Ridge National Laboratory
Velusamy Kaushik, Argonne National Laboratory
Youngjae Kim, Sogang University
Johann Lambardi, DAOS
Xiaoyi Lu, University of California, Merced
Preeti Malakar, Indian Institute of Technology, Kanpur
Qizhong Mao, Bytedance Inc
Sarah Neuwirth, Habilitation Candidate at Goethe University
Joao Paulo, INESC TEC
M. Mustafa Rafique, Rochester Institute of Technology
Woong Shin, Oak Ridge National Laboratory
Masahiro Tanaka, Microsoft
Osamu Tatebe, University of Tsukuba
Lipeng Wan, Georgia State University
Wei Zhang, Lawrence Berkeley National Laboratory
Qing Zheng, Los Alamos National Lab
Mai Zheng, Iowa State University

Steering Committee

John Bent, Cray
Ali R. Butt, Virginia Tech
Philip Carns, Argonne National Laboratory
Shane Canon, Lawrence Berkeley National Laboratory
Raghunath Raja Chandrasekar, Amazon Web Services
Yong Chen, Texas Tech University
Evan J. Felix, Pacific Northwest National Laboratory
Gary Grider, Los Alamos National Laboratory
William D. Gropp, University of Illinois at Urbana-Champaign
Dean Hildebrand, Google
Shadi Ibraim, Inria, France
Dries Kimpe, KCG, USA
Glenn Lockwood, Lawrence Berkeley National Laboratory
Jay Lofstead, Sandia National Laboratories
Xiaosong Ma, Qatar Computing Research Institute, Qatar
Carlos Maltzahn, University of California, Santa Cruz
Suzanne McIntosh, New York University
Kathryn Mohror, Lawrence Livermore National Laboratory
Robert Ross, Argonne National Laboratory
Philip C. Roth, Oak Ridge National Laboratory
Kento Sato, Riken, Japan
John Shalf, NERSC, Lawrence Berkeley National Laboratory
Xian-He Sun, Illinois Institute of Technology
Rajeev Thakur, Argonne National Laboratory
Lee Ward, Sandia National Laboratories
Brent Welch, Google
Amelie Chi Zhou, Hong Kong Baptist University, China

pdsw

pdsw 2024:

9th International
Parallel Data Systems Workshop

HELD IN CONJUNCTION WITH SC24: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS

In cooperation with IEEE Computer Society & THE ASSOCIATION FOR COMPUTER MACHINERY

DATE: Sunday, November 17, 2024
Georgia World Congress Center
Atlanta, GA

Time: 9:00 AM - 5:30 pm (EST)
ROOM B309
SC Workshop page

Invited speaker: DR. İlkay AltintaŞ, University of California, San Diego

agenda

WORKSHOP ABSTRACT

CALL FOR PAPERS

Regular paper SUBMISSIONS

Template and Submission

Reproducibility Initiative

Deadlines - Regular Papers and Reproducibility Study Papers

Work In Progress (WIP) Session

Workshop Registration

PDSW 24 Committee Members:

Technical Committee

Steering Committee

pdsw '24

past pdsw events

past discs events

pdsw

pdsw 2024:

9th International Parallel Data Systems Workshop

HELD IN CONJUNCTION WITH SC24: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS In cooperation with IEEE Computer Society & THE ASSOCIATION FOR COMPUTER MACHINERY

DATE: Sunday, November 17, 2024 Georgia World Congress Center Atlanta, GA

Time: 9:00 AM - 5:30 pm (EST) ROOM B309 SC Workshop page

Invited speaker: DR. İlkay AltintaŞ, University of California, San Diego

agenda

WORKSHOP ABSTRACT

CALL FOR PAPERS

Regular paper SUBMISSIONS

Template and Submission

Reproducibility Initiative

Deadlines - Regular Papers and Reproducibility Study Papers

Work In Progress (WIP) Session

Workshop Registration

PDSW 24 Committee Members:

Technical Committee

Steering Committee

pdsw '24

past pdsw events

past discs events

9th International
Parallel Data Systems Workshop

HELD IN CONJUNCTION WITH SC24: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS

In cooperation with IEEE Computer Society & THE ASSOCIATION FOR COMPUTER MACHINERY

DATE: Sunday, November 17, 2024
Georgia World Congress Center
Atlanta, GA

Time: 9:00 AM - 5:30 pm (EST)
ROOM B309
SC Workshop page