FINEMAP


Command-line arguments | Input | Output | Examples

FINEMAP-ing articles

- Refining fine-mapping: effect sizes and regional heritability. bioRxiv. (2018).
- Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet. (2017).
- FINEMAP: Efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493-1501 (2016).

FINEMAP is a program for

in genomic regions associated with complex traits and disease. FINEMAP is computationally efficient by using summary statistics from genome-wide association studies and robust by applying a shotgun stochastic search algorithm (Hans et al., 2007). It produces accurate results in a fraction of processing time of existing approaches. It is therefore the ideal tool for analyzing growing amounts of data produced in genome-wide association studies and emerging sequencing or biobank projects.

Download

(license)

Command-line arguments

--cond Fine-mapping with stepwise conditional search Subprogram
--config Evaluate a single causal configuration without performing shotgun stochastic search Subprogram
--corr-config Option to set the posterior probability of a causal configuration to zero if it includes a pair of SNPs with absolute correlation above this threshold Default is 0.95
--corr-group Option to set the threshold for grouping a pair of SNPs with absolute correlation above this threshold Default is 0.99
--dataset Option to specify a delimiter-separated list of datasets for fine-mapping as given in the master file (e.g. 1,2 or 1|2) All datasets are processed by default
--flip-beta Option to read a column 'flip' in the Z file with binary indicators specifying if the direction of the estimated SNP effect sizes needs to be flipped to match SNP correlations With --cond, --config and --sss
--group-snps Option to group SNPs on the basis of their correlations With --cond and --sss
--help Command-line help
--in-files Master file (see below) With --cond, --config and --sss
--log Option to write output to log files specified in column 'log' in the master file No log files are written by default
--n-causal-snps Option to set the maximum number of allowed causal SNPs Default is 5
--n-configs-top Option to set the number of top causal configurations to be saved Default is 50000
--n-convergence Option to set the number of iterations that the added probability mass is required to be below the specified threshold (--prob-tol) before the shotgun stochastic search is terminated Default is 1000
--n-iterations Option to set the maximum number of iterations before the shotgun stochastic search is terminated Default is 100000
--prior-k Option to use prior probabilities for the number of causal SNPs as specified in K files (see below) in the master file SNPs are by default assumed to be causal with probability 1 / (# of SNPs in the genomic region)
--prior-k0 Option to set the prior probability that there is no causal SNP in the genomic region. Only used when computing posterior probabilities for the number of causal SNPs but not during fine-mapping itself Default is 0.0
--prior-std Option to specify a comma-separated list of prior standard deviations of effect sizes. Default is 0.05
--prob-tol Option to set the tolerance at which the added probability mass (over --n-convergence iterations) is considered small enough to terminate the shotgun stochastic search Default is 0.001
--rsids Option to sepcify a comma-separated list of SNP identifiers corresponding with the rsid column in Z files (see below) With --config
--sss    Fine-mapping with shotgun stochastic search    Subprogram

Input

(1) Master file

The master file is a semicolon-separated text file and contains no space. It contains the following mandatory column names and one dataset per line.

(2) Z file

The dataset.z file is a space-delimited text file and contains the GWAS summary statistics one SNP per line. It contains the mandatory column names in the following order.

(3) LD file

The dataset.ld file is a space-delimited text file and contains the SNP correlation matrix (Pearson's correlation).

(4) BGEN, BGI, SAMPLE and INCL file

These are Oxford file formats and described here (BGEN), here (BGI) and here (SAMPLE). The dataset.incl file is a text file to restrict estimation of SNP correlations to genotype data from a subset of samples in dataset.sample. It constains one sample ID per line.

(5) Optional K file

By default, FINEMAP assumes that SNPs are causal with prior probability 1 / (# of SNPs in the genomic region). As an alternative, it is possible to specify prior probabilities for the number of causal SNPs in the genomic region by using a dataset.k file. This is a space-delimited text file and contains the prior probabilities pk = Pr(# of causal SNPs is k) for k = 1,...,K, where K is the number of entries in the dataset.k file. The prior probabilities must be non-negative and will be normalized to sum to one.

Output

(1) SNP file

The dataset.snp file is a space-delimited text file. It contains the GWAS summary statistics and model-averaged posterior summaries for each SNP one per line.

(2) CONFIG file

The dataset.config file is a space-delimited text file. It contains the posterior summaries for each causal configuration one per line.

(3) CRED file

The dataset.cred file is a space-delimited text file. It contains the 95% credible sets for each causal signal conditional on other causal signals in the genomic region together with conditional posterior inclusion probabilities for each variant. More detailed information TBA.

(4) LOG file

The dataset.log file outputs additional information. It contains the following output.

(5) DOSE file

The dataset.dose file is a binary file with allele dosage data. A DOSE file contains the following information.

Fine-mapping example

Using genotype data with 50 SNPs and 5363 individuals, a quantitative phenotype was simulated using a linear model with 2 causal SNPs. Single-SNP testing was performed to obtain z-scores. SNP correlations were computed from GWAS genotype data.

Single causal configuration example

The same data as in the fine-mapping example above are used. Without having to perform shotgun stochastic search, information about a single causal configuration can be obtain by specifying SNP identifiers as follows

./finemap_v1.3_MacOSX --config --in-files example/data --dataset 1 --rsids rs30,rs11
./finemap_v1.3_x86_64 --config --in-files example/data --dataset 1 --rsids rs30,rs11

References

Benner, C. et al. FINEMAP: Efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493-1501 (2016).
Hans, D. et al. Shotgun stochastic search for "large p" regression. J Am Stat Assoc 102, 507-516 (2007).

Acknowledgements

Matti Pirinen contributed to the design and implementation of FINEMAP.