GPUZIP v2.0 Reproducibility Dataset
This dataset provides all the necessary materials to reproduce the results presented in the GPUZIP v2.0 article. It is organized into folders, each containing a README.md.txt file that describes its contents and explains how to interpret the files.
Note:This dataset is organized as a directory structure, so for better visualization change the "View type" to "Tree" before explore the dataset through this web application.
Types of Files
The repository contains the following file types:
- .md.txt: Markdown-formatted README files. For optimal readability, use a Markdown viewer such as VSCode or Learn More, however, as a straightforward approach any text reader (e.g., Notepad,
cat, vi, nano) can also read them.
- .*.zipfile: Compressed file (usually called .zip). Files with the .extension.zipfile format (e.g., large-mod.su.zipfile) should be unzipped to access their original format (e.g., large-mod.su). Throughout the documentation, files are always referenced by their uncompressed extensions (e.g., .su). To ensure consistency and avoid confusion, it is recommended that all .zipfile files be unzipped before exploring the repository. Hint: Please see the scripts below for unzipping all files.
- .xlsx: Excel files. Compatible with LibreOffice, Google Sheets, and Numbers.
- .par: Configuration files for proprietary RTM runs. Readable with any text editor.
- .hdr: Header files for velocity models. Refer to
Datasets/HowToReadDatasetFiles.md.txt for details.
- .bin: Raw binary data files containing velocity models in float format. See
Datasets/HowToReadDatasetFiles.md.txt for parsing instructions.
- .data: Binary data files, similar to
.bin.
- .su: Seismic Unix files containing seismic traces. Refer to
Datasets/HowToReadDatasetFiles.md.txt for details.
- .png, .jpg, .jpeg, .gif: Rendered visuals of velocity models or diagrams.
- .qdrep: Nsight Systems profiling files. Compatible with Nsight Systems 2024.01.1.
Root Directory Contents
Datasets/
Contains input datasets, including velocity models, seismic traces, and configurations. Detailed information is provided in Datasets/HowToReadDatasetFiles.md.txt.
DataWarmUp/
Holds results from compressor calibration experiments, including raw data, logs, and the compiled .xlsx summaries. Experiments were conducted with two shots. See DataWarmUp/README.md.txt for more information.
GeometryScript/
Utility script for rendering shot distributions in the datasets. Helpful in visualizing experiment setups.
NSight/
This folder contains a subset of Nsight profiling files for the Marmousi3D dataset, covering all compressors and a cache size of two across all checkpointing algorithms. If needed, contact the authors for additional profiling data.
Quality/
Contains the results for all shots for quality assessment (Section 7.6). See Quality/README.md.txt.
TimeBreakdown/
Complete results for Section 7.4 of the GPUZIP v2.0 article. This folder includes detailed breakdowns of two-shot experiments. See TimeBreakdown/README.md.txt for details.
SpeedupAndMemory.xlsx
Comprehensive data used to generate charts in Figure 6 and Table 4 (Sections 7.2 and 7.1) of the article.
Extra: Util for Unzipping All Files
We provide a simple script to unzip all files so that data exploration can be more fluid. Feel free to use it.
Windows (.bat)
@echo off
setlocal enabledelayedexpansion
for /r %%f in (*.zipfile) do (
echo Decompressing: %%f
powershell -Command "Expand-Archive -Path '%%f' -DestinationPath '%%~dpf' -Force"
if not errorlevel 1 (
echo Decompressed successfully: %%f
del "%%f"
) else (
echo Failed to decompress: %%f
)
)
echo All zip files processed.
pause
Shell script (MacOS, Linux, Unix)
#!/bin/bash
find . -type f -name "*.zipfile" | while read -r zipfile; do
echo "Decompressing: $zipfile"
unzip -o "$zipfile" -d "$(dirname "$zipfile")"
if [ $? -eq 0 ]; then
echo "Successfully decompressed: $zipfile"
rm "$zipfile"
else
echo "Failed to decompress: $zipfile"
fi
done
echo "All zip files processed."
How Do I Read .bin, .data, and .su Files?
See: Datasets/HowToReadDatasetFiles.md.txt
How Do I Read .par and .hdr Files?
See: Datasets/HowToReadDatasetFiles.md.txt
How to Interpret Log Files?
To analyze cache hits, misses, and memory consumption, refer to the logs in the TimeBreakdown folder (decom-*.txt files). Key metrics can be extracted as follows:
- Cache Hits: Search for
RET_HIT.
- Cache Misses: Search for
RET_MIS.
- Prefetched Items: Search for
===> Prefetching:.
- Prefetch Action Vector (PAV): Search for
PAV:.
- Memory Consumption: Search for
[MEM_TRACK].
- Checkpoint Pool Size: Search for
Checkpoint Pool Size.
Each log file concludes with a summary from Nsight.