Arguments
Below is the full list of arguments accepted by pixy. You can also see the
canonical help text at any time with pixy --help.
Required
- --stats [pi|dxy|fst|watterson_theta|tajima_d]
Which statistics to calculate from the VCF. Provide one or more, separated by spaces. For example,
--stats pi fstwill compute π and FST.- --vcf [path/to/vcf.vcf.gz]
Path to the input VCF. The VCF must be bgzipped and indexed with
tabix(.tbi) or withbcftools index(.csi).- --populations [path/to/populations_file.txt]
Path to a headerless, tab-separated populations file with columns
[SampleID Population]. See Companion files for the exact format.
In addition, one of
- --window_size [integer]
Window size, in base pairs, over which to calculate statistics. Window coordinates are determined automatically across the chromosomes selected.
- --bed_file [path/to/regions.bed]
Path to a headerless BED file containing custom regions (
chrom,chromStart,chromEnd) over which to compute the statistics. Useful when windows must match a specific genomic feature set (genes, introns, etc.) and may be heterogeneous in size.
Optional
- --n_cores [integer]
Number of CPUs to use for parallel processing (default:
1).- --output_folder [path/to/output/folder]
Folder where output will be written. Defaults to the current working directory.
- --output_prefix [prefix]
Prefix for output file(s). For example
--output_prefix run1produces[output_folder]/run1_pi.txtand so on. Defaults topixy.- --chromosomes ['list,of,chromosomes']
A single-quoted, comma-separated list of chromosomes (e.g.
'X,1,2'). Defaults to all chromosomes in the VCF.- --interval_start [integer]
Start position of an interval to restrict the analysis to. Only valid when computing over a single chromosome. Defaults to position 1.
- --interval_end [integer]
End position of an interval to restrict the analysis to. Only valid when computing over a single chromosome. Defaults to the last position on the chromosome.
- --sites_file [path/to/sites_file.txt]
Path to a headerless, tab-separated file with columns
[CHROM POS]defining the sites over which statistics should be calculated. Can be combined with--window_sizeand--bed_file.- --chunk_size [integer]
Approximate number of sites to read from the VCF at a time (default:
100000). Smaller values reduce memory use; larger values reduce I/O overhead.- --fst_type [wc|hudson]
FST estimator to use:
wc(Weir & Cockerham 1984) orhudson(Hudson 1992 / Bhatia et al. 2013). Defaults towc.- --include_multiallelic_snps
Include sites with more than two alleles in the calculation. Disabled by default because biallelic-only mode is slightly faster. Added in
pixy 2.0.- --bypass_invariant_check
Skip the check that ensures invariant sites are present in the VCF. Disabled by default. Use with extreme caution. Without invariant sites, estimates of π and dxy are systematically biased and will be wrong unless your data are simulated.
- --silent
Suppress all console output.
- --version
Print the
pixyversion number and exit.- --citation
Print the
pixycitation and exit.- --help
Print the full help message and exit.
Example
A typical multi-statistic run including the new Watterson's θ and Tajima's D estimators:
pixy --stats pi fst dxy watterson_theta tajima_d \
--vcf data/vcf/ag1000/chrX_36Ag_allsites.vcf.gz \
--populations Ag1000_sampleIDs_popfile.txt \
--window_size 10000 \
--n_cores 4 \
--output_folder output \
--output_prefix pixy_output