Skip to content

swarbricklab/ega-pload

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

EGA Upload

1. Samplesheet

Check if config/samplesheet.csv exists. If not, create one with columns: fastq_id,assay,seq_run,path.

Each row maps a FASTQ capture directory to its assay type and sequencing run.

fastq_id,assay,seq_run,path
ITM_trial5,HTO,210910_A00152_0466_BHH7CNDSX2,data/chromium/fastq/fastq_path/210910_A00152_0466_BHH7CNDSX2/HTO/ITM_trial5
ITM_trial5,VDJ-B,210910_A00152_0466_BHH7CNDSX2,data/chromium/fastq/fastq_path/210910_A00152_0466_BHH7CNDSX2/VDJ-B/ITM_trial5
ITM_trial5,VDJ,210910_A00152_0466_BHH7CNDSX2,data/chromium/fastq/fastq_path/210910_A00152_0466_BHH7CNDSX2/VDJ/ITM_trial5

2. Concatenate FASTQs

Per-lane FASTQs are merged into single files per sample/read-type using two scripts:

  • merge_lanes.sh — merges lanes for a single capture directory
  • merge_lanes_batch.sh — reads a samplesheet and submits one PBS job per row

Samplesheet

config/samplesheet.csv with columns: fastq_id,assay,seq_run,path

Usage

# Dry run
bash /scratch/a56/vl2560/scripts/merge_lanes_batch.sh config/samplesheet.csv -n

# Submit jobs
bash /scratch/a56/vl2560/scripts/merge_lanes_batch.sh config/samplesheet.csv

Output is written to <common_prefix>/merged/ as flat files named <seq_run>_<fastq_id>_<assay>_<S_part>_merged_R{1,2}_001.fastq.gz.

3. Encrypt files

From the merged output directory, encrypt all .fastq.gz files using ega-cryptor.

cd data/chromium/fastq/fastq_path/merged
qx --env ega-upload --mem 64GB --cpus 48 --runtime 8h -- ega-cryptor -i . -o . -f

Produces .gpg (encrypted) and .md5 (checksums) files alongside each .fastq.gz.

4. Upload to EGA

TODO

5. Register metadata

TODO

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors