Christian Benner


I am a PhD student at the Institute for Molecular Medicine Finland (FIMM) and the University of Helsinki under supervision of Matti Pirinen and Samuli Ripatti. I have a background in mathematical statistics and a keen interest in computational statistical genetics and genetic epidemiology.


FINEMAP-ing articles

- Refining fine-mapping: effect sizes and regional heritability. bioRxiv. (2018).
- Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet. (2017).
- FINEMAP: Efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493-1501 (2016).

Talks

14/10/2018 27th annual meeting of the International Genetic Epidemiology Society (IGES)
16/10/2018 San Diego, USA
Efficient fine-mapping to identify causal genetic variants and quantify their contribution to complex phenotypes
 
26/08/2018 Joint conference of the International Society for Clinical Biostatistics (ISCB) and Statistical Society of Australia (SSA)
30/08/2018 Melbourne, Australia
Efficient variable selection to identify causal genetic variants and quantify their contribution to complex phenotypes
 
21/09/2017 MRC Biostatistics Unit
Cambridge, UK
Prospects of fine-mapping trait-associated genomic regions using summary statistics from genome-wide association studies
 
14/09/2017 University of Oxford Big Data Institute
Li Ka Shing Centre for Health Information and Discovery
Prospects of fine-mapping trait-associated genomic regions using summary statistics from genome-wide association studies
 
21/06/2017 Montreal Heart Institute and McGill University
22/06/2017 Montréal, Québec, Canada
Efficiency and accuracy of fine-mapping using GWAS summary data
 
27/05/2017 50th annual meeting of the European Society of Human Genetics (ESHG)
30/05/2017 Copenhagen, Denmark
Prospects of fine-mapping trait-associated genomic regions using summary statistics from genome-wide association studies
 
16/03/2017 Erasmus University Medical Center
Rotterdam, The Netherlands
Prospects of fine-mapping trait-associated genomic regions using summary statistics from genome-wide association studies
 
18/10/2016 66th annual meeting of the American Society of Human Genetics (ASHG)
22/10/2016 Vancouver, Canada
Efficiency and accuracy of fine-mapping using GWAS summary data
 
28/07/2016 Wellcome Trust Sanger Institute
Hinxton, Cambridge, UK
Efficiency and accuracy of fine-mapping using GWAS summary data
 
21/05/2016 49th annual meeting of the European Society of Human Genetics (ESHG)
24/05/2016 Barcelona, Spain
FINEMAP: Ultrafast high-resolution fine-mapping using summary data from genome-wide association studies
 
03/04/2016 13th International Congress of Human Genetics (ICHG)
07/04/2016 Kyoto, Japan
FINEMAP: Ultrafast high-resolution fine-mapping using summary data from genome-wide association studies
 
19/11/2015 Broad Institute of MIT and Harvard
Boston, USA
Efficient fine-mapping using summary data from genome-wide association studies
 
04/10/2015 24th annual meeting of the International Genetic Epidemiology Society (IGES)
06/10/2015 Baltimore, USA
Mixed models for time-to-event outcomes with large-scale population cohorts and genome-wide data
 
09/09/2015 Wellcome Trust Center of Human Genetics
Oxford, UK
Efficient variable selection among thousands of correlated genetic variants using summary data from genome-wide association studies
 
20/08/2015 Statistical days - Big data in biological and medical research
21/08/2015 Helsinki, Finland
Efficient fine-mapping of thousands of correlated genetic variants using summary data from genome-wide association studies
 
28/08/2014 23rd annual meeting of the International Genetic Epidemiology Society (IGES)
30/08/2014 Vienna, Austria
Mixed modeling for time-to-event outcomes with large-scale population cohorts and genome-wide data (Roger Williams award for the best presentation by a PhD student)

FINEMAP


Command-line arguments | Input | Output | Examples

FINEMAP is a program for

in genomic regions associated with complex traits and disease. FINEMAP is computationally efficient by using summary statistics from genome-wide association studies and robust by applying a shotgun stochastic search algorithm (Hans et al., 2007). It produces accurate results in a fraction of processing time of existing approaches. It is therefore the ideal tool for analyzing growing amounts of data produced in genome-wide association studies and emerging sequencing or biobank projects.

Download

(license)

Command-line arguments

--cond Fine-mapping with stepwise conditional search Subprogram
--config Evaluate a single causal configuration without performing shotgun stochastic search Subprogram
--corr-config Option to set the posterior probability of a causal configuration to zero if it includes a pair of SNPs with absolute correlation above this threshold Default is 0.95
--corr-group Option to set the threshold for grouping a pair of SNPs with absolute correlation above this threshold Default is 0.99
--dataset Option to specify a delimiter-separated list of datasets for fine-mapping as given in the master file (e.g. 1,2 or 1|2) All datasets are processed by default
--flip-beta Option to read a column 'flip' in the Z file with binary indicators specifying if the direction of the estimated SNP effect sizes needs to be flipped to match SNP correlations With --cond, --config and --sss
--group-snps Option to group SNPs on the basis of their correlations With --cond and --sss
--help Command-line help
--in-files Master file (see below) With --cond, --config and --sss
--log Option to write output to log files specified in column 'log' in the master file No log files are written by default
--n-causal-snps Option to set the maximum number of allowed causal SNPs Default is 5
--n-configs-top Option to set the number of top causal configurations to be saved Default is 50000
--n-convergence Option to set the number of iterations that the added probability mass is required to be below the specified threshold (--prob-tol) before the shotgun stochastic search is terminated Default is 1000
--n-iterations Option to set the maximum number of iterations before the shotgun stochastic search is terminated Default is 100000
--prior-k Option to use prior probabilities for the number of causal SNPs as specified in K files (see below) in the master file SNPs are by default assumed to be causal with probability 1 / (# of SNPs in the genomic region)
--prior-k0 Option to set the prior probability that there is no causal SNP in the genomic region. Only used when computing posterior probabilities for the number of causal SNPs but not during fine-mapping itself Default is 0.0
--prior-std Option to specify a comma-separated list of prior standard deviations of effect sizes. Default is 0.05
--prob-tol Option to set the tolerance at which the added probability mass (over --n-convergence iterations) is considered small enough to terminate the shotgun stochastic search Default is 0.001
--rsids Option to sepcify a comma-separated list of SNP identifiers corresponding with the rsid column in Z files (see below) With --config
--sss    Fine-mapping with shotgun stochastic search    Subprogram

Input

(1) Master file

The master file is a semicolon-separated text file and contains no space. It contains the following mandatory column names and one dataset per line.

(2) Z file

The dataset.z file is a space-delimited text file and contains the GWAS summary statistics one SNP per line. It contains the mandatory column names in the following order.

(3) LD file

The dataset.ld file is a space-delimited text file and contains the SNP correlation matrix (Pearson's correlation).

(4) BGEN, BGI, SAMPLE and INCL file

These are Oxford file formats and described here (BGEN), here (BGI) and here (SAMPLE). The dataset.incl file is a text file to restrict estimation of SNP correlations to genotype data from a subset of samples in dataset.sample. It constains one sample ID per line.

(5) Optional K file

By default, FINEMAP assumes that SNPs are causal with prior probability 1 / (# of SNPs in the genomic region). As an alternative, it is possible to specify prior probabilities for the number of causal SNPs in the genomic region by using a dataset.k file. This is a space-delimited text file and contains the prior probabilities pk = Pr(# of causal SNPs is k) for k = 1,...,K, where K is the number of entries in the dataset.k file. The prior probabilities must be non-negative and will be normalized to sum to one.

Output

(1) SNP file

The dataset.snp file is a space-delimited text file. It contains the GWAS summary statistics and model-averaged posterior summaries for each SNP one per line.

(2) CONFIG file

The dataset.config file is a space-delimited text file. It contains the posterior summaries for each causal configuration one per line.

(3) LOG file

The dataset.log file outputs additional information. It contains the following output.

(4) DOSE file

The dataset.dose file is a binary file with allele dosage data. A DOSE file contains the following information.

Fine-mapping example

Using genotype data with 50 SNPs and 5363 individuals, a quantitative phenotype was simulated using a linear model with 2 causal SNPs. Single-SNP testing was performed to obtain z-scores. SNP correlations were computed from GWAS genotype data.

Single causal configuration example

The same data as in the fine-mapping example above are used. Without having to perform shotgun stochastic search, information about a single causal configuration can be obtain by specifying SNP identifiers as follows

./finemap_v1.3_MacOSX --config --in-files example/data --dataset 1 --rsids rs30,rs11
./finemap_v1.3_x86_64 --config --in-files example/data --dataset 1 --rsids rs30,rs11

References

Benner, C. et al. FINEMAP: Efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493-1501 (2016).
Hans, D. et al. Shotgun stochastic search for "large p" regression. J Am Stat Assoc 102, 507-516 (2007).

Acknowledgements

Matti Pirinen contributed to the design and implementation of FINEMAP.

LDstore


Command-line arguments | Input | Output | Example

LDstore is a computationally efficient program for estimating and storing Linkage Disequilibrium (LD) between variants (i.e. Pearson correlations). It combines some of the best features from RAREMETALWORKER and PLINK by implementing 1) parallel processing using OPENMP, 2) sparse estimation to achieve smaller file size, and 3) storing of the LD information with additional variant information in the same file to enable fast lookups of LD information. For instance, LDstore can generate LD information for 5,000 variants in less than 30 seconds on an off-the-shelf laptop and store the LD information using less than 100 megabytes. LDstore is therefore the ideal tool for sharing LD information in large-scale meta-analyses of genome-wide association studies and for on-the fly computing within web portals.

Download

(license)

Command-line arguments

--bcor Name of the BCOR input/output file(s) Always required
--bgen    Name of the BGEN input file    Requires --bcor
--bplink    Basename of the PLINK BED, BIM and FAM input files    Requires --bcor
--merge Merge xx BCOR files (having file extensions .bcor_processNumber) into a common BCOR file where xx is the total number of parallel processes used during estimation of LD information Requires --bcor
--samples    Name of the SNPTEST2 sample file when using BGEN input    Requires --bgen and --incl-samples
--meta Extract variant information and store them in the specified text file Requires --bcor
--matrix Extract LD information in matrix format and store them in the specified text file Requires --bcor
--table Extract LD information in table format and store them in the specified text file Requires --bcor
--incl-range Specify a genomic range xx-yy to operate on where xx and yy are the start and end coordinate in base pairs Default is genomic range of the input file
--incl-samples Include only samples in the estimation of LD information whose sample ID (ID_1 with BGEN input and IID with PLINK input) lies in the specified text file Requires --samples with BGEN input
--incl-variants Extract LD information for variants given in the specified text file. The specified file has 5 columns with a header: RSID, position, chromosome, A_allele and B_allele Requires --matrix or --table
--ld-thold LD information for two variants is only stored or extracted if their absolute Pearson correlation is above this threshold Requires --bgen, --bplink or --table. Default is 0.001
--ld-n-samples-avail-prop LD information for two variants is only stored or extracted if the proportion of all samples with genotype data for the two variants is above this threshold Requires --bgen, --bplink or --table. Default is 0.1
--n-variants-chunk Number of variants processed together Requires --bgen or --bplink. Default is 1000
--variant-window-size LD information for two variants A and B is computed if B is xx base pairs downstream of A Requires --bgen or --bplink. Default is 5 megabase pairs
--accuracy LD information is stored using either low, medium or high accuracy Requires --bgen or --bplink. Default is medium
--n-threads Specify the number of parallel processes during estimation of LD information Requires --bgen or --bplink. Default is max number of CPU cores available
--help Command-line help

Input

LDstore supports BGEN files and PLINK BED, BIM and FAM files as input.

Output

LDstore writes LD information between variants (i.e. Pearson correlations) into a binary file format called BCOR. The BCOR format reduces data storage requirements and enables fast lookups of LD information. To speed up computation of LD information, LDstore uses OPENMP for parallel processing and sparse estimation via a window approach because LD between two variants decreases with their physical distance. This means that LDstore creates multiple BCOR files (having file extensions .bcor_processNumber) that can be merged into a common BCOR file.

Examples

Genotype data on 50 SNPs and 5,000 samples is provided in BGEN and PLINK files. The second variant is monomorphic and there are 500 samples with missing genotype data at the third variant.

Estimation of LD information

Merging of multiple BCOR files

Although only a single thread was used to generate LD information, the --merge option can still be used:

./ldstore \
--bcor example/data_bgen.bcor \
--merge 1
./ldstore \
--bcor example/data_plink.bcor \
--merge 1

Note that the value after --bcor is the same value specified after --bcor during estimation of LD information. LDstore searches for xx BCOR files (having file extensions .bcor_processNumber) where xx is the total number of parallel processes used during estimation of LD information and merges them into a common file called data_bgen.bcor or data_plink.bcor

Extraction of variant information

Variant information can be extracted and stored in a text file as follows:

./ldstore \
--bcor example/data_bgen.bcor \
--meta example/data_bgen.meta

The first 3 lines in the file data_bgen.meta are:

index RSID position chromosome A_allele B_allele A_allele_freq B_allele_freq
1 rs1 1 01 A G 0.2267 0.7733
2 rs2 2 01 A G 1.0000 0.0000
3 rs3 3 01 A G 0.4239 0.4761

Extraction of LD information

LDstore implements several ways to extract LD information from BCOR files. Below are a few examples.

References

Benner, C. et al. Prospects of fine-papping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet. (2017).

Acknowledgements

Past

Employment

since Teacher of (computational) statistics courses
03/2013 University of Helsinki, Finland
 
06/2012 Research assistant of Jukka Corander
11/2012 University of Helsinki, Finland

Education

since Doctor of Philosophy in Statistical genetics
04/2013 Institute for Molecular Medicine Finland (FIMM) and University of Helsinki, Finland
Supervisors: Matti Pirinen and Samuli Ripatti
PhD project:
Statistical genetics and method development
 
09/2011 Master of Science in Bayesian Statistics and Decision Analysis
03/2013 University of Helsinki, Finland
Supervisors: Jukka Corander and Petri Koistinen
MSc thesis:
Bayesian confirmatory factor analysis for detection of differential gene expression (eximia cum laude approbatur)
 
10/2007 Bachelor of Science in Statistics
03/2011 University of Applied Sciences Magdeburg-Stendal, Germany
Supervisors: Petra Weber-Kurth and Andreas Felgenhauer
BSc thesis:
Bayesian variable selection for DNA copy number data about esophageal adenocarcinoma (excellent)