`piawka`

Calculate SNP-based population statistics over groups of samples in VCF files with:

indexable BED output
correct handling of missing data
support for polyploid variant calls
higher data yield due to per-group ALT-agnostic SNP retrieval
a broad selection of statistics, extensible with modules
convenient helper tools for making genomic windows, filtering and summarizing the results
the power of GNU AWK: no installation, competitive speed, low memory footprint, and multiprocessing 👀

Warning

piawka is under development. At this stage, breaking changes are not unthinkable of. If something does not seem to work well, check newer versions and do not hesitate to file an issue!

Installation

conda install -c bioconda piawka

Alternatively, have the following programs available in the command line and clone the repo:

gawk>=v5.2.0
tabix
bgzip

git clone https://github.com/novikovalab/piawka.git
export PATH="$( realpath ./piawka ):${PATH}"

Usage

Docs are available at https://novikovalab.github.io/piawka.

Input and output

Mandatory (for piawka calc):

VCF file -- bgzipped and tabixed

Optional:

groups file -- 2-column TSV with sample ID and group ID (may include relevant samples only)
regions/targets file -- BED file to restrict/split output by regions

Output is a BED file:

$ cd piawka/examples
$ piawka calc -v alyrata_scaff_1_10000k-10500k.vcf.gz -b genes.bed -g groups.tsv -s pi,dxy
#chr        start     end       locus      pop1              pop2              stat    value        numerator  denominator
scaffold_1  10035093  10035276  AL5G20950  CESiberia_2n      LE_2n             dxy     0.0071137    460        64664
scaffold_1  10035093  10035276  AL5G20950  PUWS_4n           .                 pi      0.00588993   640        108660
scaffold_1  10035093  10035276  AL5G20950  LE_2n             PUWS_4n           dxy     0.00881262   1102       125048
scaffold_1  10035093  10035276  AL5G20950  LE_2n             .                 pi      0.00772461   1078       139554
...

Subcommands

piawka calc: calculate various population statistics from a VCF file
piawka dist: convert calc output to PHYLIP or NEXUS distance matrix
piawka filt: filter piawka output using AWK expressions
piawka list: show all statistics available for calculation
piawka sum: summarize stats from calc output across regions
piawka win: prepare genomic windows from various sources

Statistics

Within groups:

lines: number of lines used in calculation
miss: share of missing genotype calls
pi: expected heterozygosity = nucleotide diversity
maf: minor allele frequency
daf: alternative ("derived") allele frequency
tajima: Tajima's D
tajimalike: Tajima's D interpolated for missing genotypes (experimental)
theta_w: Watterson's theta
theta_low: Theta estimator based on sites with 0<allele_freq<0.33
theta_mid: Theta estimator based on sites with 0.33<=allele_freq<0.66
theta_high: Theta estimator based on sites with 0.33<=allele_freq<0.66

Between groups (pairwise):

afd: average allele frequency difference
dxy: absolute nucleotide divergence
fst: fixation index, Hudson's estimator
fstwc: fixation index, Weir & Cockerham's estimator
rho: Ronfort's rho
nei: Nei's D standard genetic distance

Citation

First mention of piawka as well as the test data are coming from https://doi.org/10.1093/molbev/msaf153.

Name		Name	Last commit message	Last commit date
Latest commit History 365 Commits
.conda_recipe		.conda_recipe
examples		examples
include		include
logo		logo
.gitattributes		.gitattributes
CHANGELOG		CHANGELOG
LICENSE		LICENSE
README.md		README.md
piawka		piawka

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`piawka`

Installation

Usage

Input and output

Subcommands

Statistics

Citation

About

Uh oh!

Releases 13

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

piawka

Installation

Usage

Input and output

Subcommands

Statistics

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`piawka`

Packages