Check if config/samplesheet.csv exists. If not, create one with columns: fastq_id,assay,seq_run,path.
Each row maps a FASTQ capture directory to its assay type and sequencing run.
fastq_id,assay,seq_run,path
ITM_trial5,HTO,210910_A00152_0466_BHH7CNDSX2,data/chromium/fastq/fastq_path/210910_A00152_0466_BHH7CNDSX2/HTO/ITM_trial5
ITM_trial5,VDJ-B,210910_A00152_0466_BHH7CNDSX2,data/chromium/fastq/fastq_path/210910_A00152_0466_BHH7CNDSX2/VDJ-B/ITM_trial5
ITM_trial5,VDJ,210910_A00152_0466_BHH7CNDSX2,data/chromium/fastq/fastq_path/210910_A00152_0466_BHH7CNDSX2/VDJ/ITM_trial5Per-lane FASTQs are merged into single files per sample/read-type using two scripts:
merge_lanes.sh— merges lanes for a single capture directorymerge_lanes_batch.sh— reads a samplesheet and submits one PBS job per row
config/samplesheet.csv with columns: fastq_id,assay,seq_run,path
# Dry run
bash /scratch/a56/vl2560/scripts/merge_lanes_batch.sh config/samplesheet.csv -n
# Submit jobs
bash /scratch/a56/vl2560/scripts/merge_lanes_batch.sh config/samplesheet.csvOutput is written to <common_prefix>/merged/ as flat files named <seq_run>_<fastq_id>_<assay>_<S_part>_merged_R{1,2}_001.fastq.gz.
From the merged output directory, encrypt all .fastq.gz files using ega-cryptor.
cd data/chromium/fastq/fastq_path/merged
qx --env ega-upload --mem 64GB --cpus 48 --runtime 8h -- ega-cryptor -i . -o . -fProduces .gpg (encrypted) and .md5 (checksums) files alongside each .fastq.gz.
TODO
TODO