Skip to content

vg giraffe will take bad tail alignments that probably should've been softclipped #4905

@faithokamoto

Description

@faithokamoto

Run with the current master branch:

PROJ_DIR=/private/groups/patenlab/fokamoto/giraffe-loops
GRAPH=$PROJ_DIR/graph/hprc-v2.1-mc-chm13-eval-sampled16o_fragmentlinked-for-real-r10y2025-HG002-full
ALN=$PROJ_DIR/alignments/sim_hifi_HG002_xvn_1m

NAME=S11_17926
ONE=$ALN.${NAME}
REALIGN=$ALN.${NAME}.realigned

vg giraffe --gbz-name $GRAPH.gbz -b hifi -G $ONE.gam --threads 1 --show-work \
    --dist-name $GRAPH.dist --minimizer-name $GRAPH.normal.longread.withzip.min \
    --zipcode-name $GRAPH.normal.longread.zipcodes > $REALIGN.new.gam 2> $REALIGN.new.log

The alignment produced is hilariously awful (space placed intentionally): 1537X1M5843X18M1X2M1X1M1X2M1X1M1X2M1X 2431M1I266M1X2128M1D690M@130045518- score -926

The chain has a left tail of length 7432 which is basically eaten up by that left-hand part of the CIGAR. Feels like it should've been softclipped. We're really aligning >7k bases of substitutions instead of just admitting that this is hard?

Aligning using https://github.com/vgteam/vg/tree/heuristic-new-dist-index got a much better looking alignment, a few bp over and with that tail softclipped: 7413I 2431M1I266M1X2128M1D690M@130020003- score 5400. So this is probably a highly repetitive region (to explain the small shift) and once we shift the chain anchor over a tad, the tail aligner suddenly decides to avoid taking a softclip.

Should this be a softclip? If so, what happened?

vg version v1.74.1-39-g361e2bec6 "Petrie"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Using HTSlib headers 101990, library 1.19.1-29-g3cfe8769
Built by fokamoto@mustard

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions