Skip to content

swarbricklab/dvc_tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

149 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DVC Tools

Convenient tools for working with DVC in HPC environments with shared external caches and SSH remotes.

Installation

pip install git+ssh://git@github.com/swarbricklab/dvc_tools.git

Quick Start

# Create a new DVC project
mkdir my-analysis && cd my-analysis
dt init my-analysis

# Or clone an existing project  
dt clone git@github.com:myorg/existing-project.git

# Check configuration
dt doctor

Commands

The dt command provides subcommands for managing DVC projects:

dt init       # Initialize a new DVC project with cache and remote
dt clone      # Clone an existing DVC project with local configuration
dt add        # Add files to DVC tracking via compute node
dt fetch      # Fetch import files into cache from local sources
dt pull       # Pull DVC-tracked files, handling imports automatically
dt push       # Push files to all configured remotes
dt import     # Import data from other repositories using local caches
dt mv         # Move/rename files, preserving import metadata
dt cache      # Manage external shared caches
dt remote     # Manage remote storage
dt config     # View and modify configuration settings
dt doctor     # Diagnose common setup issues

See the Command Reference for full documentation, or use dt <command> --help.

Architecture

On HPC systems, dt supports the following pattern:

  • Workspaces on fast scratch storage (e.g., /scratch/${PROJECT}/${USER}/)
  • Shared caches on scratch for team collaboration (e.g., /scratch/${PROJECT}/dvc/cache/)
  • Remotes on persistent storage (e.g., /g/data/${PROJECT}/dvc/)
  • SSH access to remotes from external systems

Documentation

External Resources

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages