
ImprintCapASM is an R package for SNP-phased allele-specific methylation (ASM) analysis across the 41 known human imprinted differentially methylated regions (DMRs). It is designed for clinical diagnostic workflows that profile imprint disorder cases — including Beckwith-Wiedemann syndrome (BWS), Silver-Russell syndrome (SRS), Prader-Willi syndrome (PWS), Angelman syndrome (AS), and related conditions — from bisulfite sequencing data produced by targeted capture panels.
The package provides three core functions that form a sequential pipeline:
prepare_cpg_snp_input() — Links CpG
methylation values to nearby heterozygous SNPs; produces a per-sample
Excel table and a BED fileextract_bam_regions() — Extracts and
sorts a BAM subset covering the SNP windows for each sampleASM() — Reads the extracted BAM,
assigns each read to a parental allele, and computes allele-specific
methylation statistics; returns three output tables and a line-plot
PDFFor processing multiple samples together — the
standard diagnostic use case — use run_pipeline(), which
runs the full three-step pipeline for all control samples as a batch,
and separately for all patient samples as a batch. Controls and patients
are always run independently using their respective
filter_cpgs reference files.
Genomic imprinting is an epigenetic phenomenon whereby a subset of genes are expressed in a parent-of-origin dependent manner, regulated by differentially methylated regions (DMRs). Loss or gain of methylation at these DMRs underlies a class of rare congenital disorders collectively known as imprinting disorders. Accurate diagnosis requires quantifying the methylation of each parental allele separately — a task that standard bisulfite sequencing alone cannot achieve without phasing methylation data to nearby heterozygous SNPs.
ImprintCapASM implements a SNP-phasing strategy: heterozygous SNPs detected in bisulfite sequencing reads are used to assign each read to a parental allele (REF or ALT), and CpG methylation values on each allele are computed and compared. Deviation from the expected allele-specific methylation pattern at a given DMR indicates a potential imprinting disorder.
install.packages("ImprintCapASM")# install.packages("remotes")
remotes::install_github("19-saha/ImprintCapASM")ImprintCapASM depends on several Bioconductor packages. Install them first if not already present:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c(
"BiocParallel",
"Rsamtools",
"GenomicRanges",
"IRanges",
"S4Vectors",
"SummarizedExperiment",
"VariantAnnotation"
))filter_cpgs Reference FilesA key concept in ImprintCapASM is that controls and patients
each have their own filter_cpgs reference file.
These are not interchangeable:
| File | Used with | Contains | Purpose |
|---|---|---|---|
inst/extdata/filter_cpgs_ctrl.xlsx |
sample_type = "control" |
Control_1, Control_2, … columns |
Computes mean/SD methylation and CpG variance categories from the control cohort |
inst/extdata/filter_cpgs_pat.xlsx |
sample_type = "patient" |
Patient_1, Patient_2, … columns |
Computes mean/SD methylation and CpG variance categories from the patient cohort |
Both files share the same structure (chr,
5_location, 3_location, DMR, then
sample columns). The ASM() function auto-detects sample
columns by matching the pattern ^Control_ or
^Patient_ in the column names. Passing the wrong file to
the wrong sample_type will produce incorrect variance
categories and misleading plots.
Both files are used identically for CpG window definition in
prepare_cpg_snp_input() — what differs is the
cohort-specific methylation statistics computed during
ASM().
Bisulfite sequencing run (targeted imprint capture panel)
│
├── bssnper SNP calling → sample.SNPs.out (VCFv4.3, plain text)
├── bssnper CG methylation → sample.CGmeth.txt (9-column TSV)
└── Picard MarkDuplicates → sample_markdup.bam + .bai
│
▼ [per sample, run separately for controls and patients]
┌──────────────────────────────────────────┐
│ prepare_cpg_snp_input() │
│ Input: sample.SNPs.out │ Filters heterozygous SNPs (GT=0/1),
│ sample.CGmeth.txt │ overlaps with CpG panel windows,
│ inst/extdata/filter_cpgs_ctrl.xlsx │ joins CpG methylation fractions
│ OR inst/extdata/filter_cpgs_pat.xlsx │
│ Output: cpg_snps_CG_{type}_{id}.xlsx │
│ cpg_snps_CG_{type}_{id}.bed │
└──────────────┬───────────────────────────┘
│
┌──────────────▼───────────────────────────┐
│ extract_bam_regions() │
│ Input: sample_markdup.bam │ Subsets BAM to SNP windows,
│ cpg_snps_CG_{type}_{id}.bed │ sorts and indexes the output
│ Output: {type}_{id}_wide.bam + .bai │
└──────────────┬───────────────────────────┘
│
[all samples of same type combined]
│
┌──────────────▼───────────────────────────┐
│ ASM() │
│ Input: cpg_snps_CG_{type}_{id}.xlsx │ Bisulfite-aware allele assignment,
│ {type}_{id}_wide.bam │ per-read methylation scoring,
│ inst/extdata/filter_cpgs_ctrl.xlsx │ CpG variance classification using
│ OR inst/extdata/filter_cpgs_pat.xlsx │ cohort-matched reference
│ Output: asm_{type}_{id}.xlsx │
│ snp_cpg_{type}_{id}.xlsx │
│ meth_summary_{type}_{id}.xlsx │
│ lineplot_{type}_{id}.pdf │
└──────────────────────────────────────────┘
sample.SNPs.out (bssnper VCFv4.3)Produced by BS-Snper. Plain-text
VCF — no bgzip or tabix index required. The function
reads the GT FORMAT field and retains only heterozygous
SNPs (GT == "0/1") with sufficient depth:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
chr11 2016400 . G A 85 PASS DP=28;AD=15,13; GT:DP:AD:ADF:ADR:BSD:BSQ:ALFR 0/1:28:15,13:...
sample.CGmeth.txt (bssnper CG
output)Tab-delimited, 9 columns with a #CHROM header. Watson
and Crick strand methylation and coverage are merged internally by the
function:
#CHROM POS CONTEXT Watson-METH Watson-COVERAGE Watson-QUAL Crick-METH Crick-COVERAGE Crick-QUAL
chr11 2016405 CG 155 169 33 365 494 33
inst/extdata/filter_cpgs_ctrl.xlsx and
inst/extdata/filter_cpgs_pat.xlsxTwo separate reference Excel files — one for controls, one for
patients. Both share the same column structure: genomic coordinates and
DMR name, followed by per-sample methylation percentages. The
ASM() function detects sample columns automatically by
matching ^Control_ or ^Patient_ column name
prefixes:
chr 5_location 3_location DMR Control_1 Control_2 Control_3 ...
chr11 2016404 2016406 H19/IGF2 82 84 81 ...
chr 5_location 3_location DMR Patient_1 Patient_2 Patient_3 ...
chr11 2016404 2016406 H19/IGF2 45 83 80 ...
sample_markdup.bam + .bam.baiDuplicate-marked, coordinate-sorted BAM produced by Picard
MarkDuplicates. The .bai index must be present
alongside the BAM. If the index is missing,
extract_bam_regions() creates it automatically via
Rsamtools::indexBam().
Organise your project with controls and patients in separate folders
so that run_pipeline() can glob files cleanly:
project/
├── controls/
│ ├── snps/
│ │ ├── CTRL_01.SNPs.out
│ │ ├── CTRL_02.SNPs.out
│ │ └── ...
│ ├── meth/
│ │ ├── CTRL_01.CGmeth.txt
│ │ ├── CTRL_02.CGmeth.txt
│ │ └── ...
│ ├── bams/
│ │ ├── CTRL_01_markdup.bam
│ │ ├── CTRL_01_markdup.bam.bai
│ │ └── ...
│ └── output/
│
├── patients/
│ ├── snps/
│ ├── meth/
│ ├── bams/
│ └── output/
│
├── inst/extdata/filter_cpgs_ctrl.xlsx ← control reference panel
└── inst/extdata/filter_cpgs_pat.xlsx ← patient reference panel
library(ImprintCapASM)
# Step 1
prepare_cpg_snp_input(
snp_file = "controls/snps/CTRL_01.SNPs.out",
meth_file = "controls/meth/CTRL_01.CGmeth.txt",
cpg_ref_file = "inst/extdata/filter_cpgs_ctrl.xlsx",
sample_type = "control"
)
# Writes: cpg_snps_CG_control_CTRL_01.xlsx
# cpg_snps_CG_control_CTRL_01.bed
# Step 2
extract_bam_regions(
bam_file = "controls/bams/CTRL_01_markdup.bam",
bed_file = "cpg_snps_CG_control_CTRL_01.bed",
output_dir = "controls/output/",
sample_type = "control"
)
# Writes: controls/output/control_CTRL_01_wide.bam + .bai
# Step 3
ASM(
cpg_snp_file = "cpg_snps_CG_control_CTRL_01.xlsx",
sam_file = "controls/output/control_CTRL_01_wide.bam",
filter_cpgs_file = "inst/extdata/filter_cpgs_ctrl.xlsx",
sample_type = "control"
)
# Writes: asm_control_CTRL_01.xlsx
# snp_cpg_control_CTRL_01.xlsx
# meth_summary_control_CTRL_01.xlsx
# lineplot_control_CTRL_01.pdf# Step 1
prepare_cpg_snp_input(
snp_file = "patients/snps/PAT_01.SNPs.out",
meth_file = "patients/meth/PAT_01.CGmeth.txt",
cpg_ref_file = "inst/extdata/filter_cpgs_pat.xlsx",
sample_type = "patient"
)
# Step 2
extract_bam_regions(
bam_file = "patients/bams/PAT_01_markdup.bam",
bed_file = "cpg_snps_CG_patient_PAT_01.bed",
output_dir = "patients/output/",
sample_type = "patient"
)
# Step 3
ASM(
cpg_snp_file = "cpg_snps_CG_patient_PAT_01.xlsx",
sam_file = "patients/output/patient_PAT_01_wide.bam",
filter_cpgs_file = "inst/extdata/filter_cpgs_pat.xlsx", # <-- patient reference
sample_type = "patient"
)run_pipeline()run_pipeline() processes all samples in a given folder
in batch. Controls and patients are always run as separate
calls with their respective reference files:
library(ImprintCapASM)
# --- Run all controls ---
run_pipeline(
snp_dir = "controls/snps/",
meth_dir = "controls/meth/",
bam_dir = "controls/bams/",
filter_cpgs_file = "inst/extdata/filter_cpgs_ctrl.xlsx",
output_dir = "controls/output/",
sample_type = "control"
)
# --- Run all patients (separate call, separate reference file) ---
run_pipeline(
snp_dir = "patients/snps/",
meth_dir = "patients/meth/",
bam_dir = "patients/bams/",
filter_cpgs_file = "inst/extdata/filter_cpgs_pat.xlsx",
output_dir = "patients/output/",
sample_type = "patient"
)run_pipeline() automatically matches files across
snp_dir, meth_dir, and bam_dir by
sample ID, iterates Steps 1 and 2 per sample, then calls
ASM() on the combined output for that cohort.
The package ships with minimal example files covering two chr11 DMRs (H19/IGF2 and KCNQ1OT1):
library(ImprintCapASM)
snp_file <- system.file("extdata", "example_snp.vcf", package = "ImprintCapASM")
meth_file <- system.file("extdata", "example_cgmeth.txt", package = "ImprintCapASM")
cpg_ref_file <- system.file("extdata", "example_filter_cpgs.xlsx", package = "ImprintCapASM")
bam_file <- system.file("extdata", "example.bam", package = "ImprintCapASM")
# Step 1
prepare_cpg_snp_input(
snp_file = snp_file,
meth_file = meth_file,
cpg_ref_file = cpg_ref_file,
sample_type = "control"
)
# Step 2
extract_bam_regions(
bam_file = bam_file,
bed_file = list.files(tempdir(), pattern = "\\.bed$", full.names = TRUE)[1],
output_dir = tempdir(),
sample_type = "control"
)
# Step 3
ASM(
cpg_snp_file = list.files(tempdir(), pattern = "cpg_snps.*\\.xlsx$", full.names = TRUE)[1],
sam_file = list.files(tempdir(), pattern = "_wide\\.bam$", full.names = TRUE)[1],
filter_cpgs_file = cpg_ref_file,
sample_type = "control"
)ASM() writes three Excel files and
one PDF per run.
asm_{type}_{sample_id}.xlsx — Read-level allele-methylation
tableOne row per read–CpG combination. The most granular output.
| Column | Description |
|---|---|
sample_id |
Sample identifier (derived from BAM filename) |
sample_type |
"control" or "patient" |
id |
Read name |
read_sequence |
Raw read sequence |
read_start |
Leftmost mapping position of the read |
flag |
SAM FLAG value |
flag_context |
Human-readable FLAG interpretation |
combined_tags |
Concatenated SAM optional tags |
strand |
"Forward" or "Reverse" |
chr |
Chromosome |
DMR |
Imprinted DMR name (e.g. H19/IGF2) |
snp_pos |
Genomic position of the phasing SNP |
cpg_pos |
Genomic position of the CpG |
allele_type |
"REF" or "ALT" (parental allele
assignment) |
ref_allele |
Reference base at the SNP |
alt_allele |
Alternative base at the SNP |
assignment_note |
Bisulfite-aware logic used for allele assignment |
n_methylated |
1 if the CpG is methylated on this read, else 0 |
n_unmethylated |
1 if the CpG is unmethylated on this read, else 0 |
meth_frac |
Same as n_methylated (numeric; used for summaries) |
Padded_Sequence |
Read sequence left-padded for DMR alignment visualisation |
mean_methylation |
Cohort mean methylation for this CpG (from
filter_cpgs) |
sd_methylation |
Cohort SD for this CpG (from filter_cpgs) |
Category |
CpG variance class: LOWvar, MSDvar,
SDvar, or Mvar |
snp_cpg_{type}_{sample_id}.xlsx — Per SNP–CpG pair
summaryOne row per unique (SNP position, CpG position) combination, with allele-stratified read counts and methylation fractions.
| Column | Description |
|---|---|
snp_pos |
Genomic position of the phasing SNP |
cpg_pos |
Genomic position of the CpG |
sample_id |
Sample identifier |
chr |
Chromosome |
DMR |
Imprinted DMR name |
ref_allele |
Reference base at the SNP |
alt_allele |
Alternative base at the SNP |
REF_m |
Methylated read count on the REF allele |
REF_um |
Unmethylated read count on the REF allele |
ALT_m |
Methylated read count on the ALT allele |
ALT_um |
Unmethylated read count on the ALT allele |
REF_tot |
Total reads assigned to the REF allele |
ALT_tot |
Total reads assigned to the ALT allele |
MI |
Combined methylation index across both alleles |
REF_f |
REF allele methylation fraction (0–1, rounded to 3 dp) |
ALT_f |
ALT allele methylation fraction (0–1, rounded to 3 dp) |
ref_alt_ratio |
REF/ALT read ratio (balance check; expected ≈ 1.0) |
mean_methylation |
Cohort mean methylation for this CpG (from
filter_cpgs) |
sd_methylation |
Cohort SD for this CpG (from filter_cpgs) |
Category |
CpG variance class: LOWvar, MSDvar,
SDvar, or Mvar |
meth_summary_{type}_{sample_id}.xlsx — Per allele
methylation summaryOne row per (sample, SNP position, CpG position, allele type) combination.
| Column | Description |
|---|---|
sample_id |
Sample identifier |
snp_pos |
Genomic position of the phasing SNP |
cpg_pos |
Genomic position of the CpG |
DMR |
Imprinted DMR name |
allele_type |
"REF" or "ALT" |
total_reads |
Total reads for this allele at this CpG |
methylated |
Methylated read count |
unmethylated |
Unmethylated read count |
meth_frac |
Methylation fraction (methylated / total_reads, rounded to 3 dp) |
mean_methylation |
Cohort mean methylation for this CpG (from
filter_cpgs) |
sd_methylation |
Cohort SD for this CpG (from filter_cpgs) |
Category |
CpG variance class: LOWvar, MSDvar,
SDvar, or Mvar |
lineplot_{type}_{sample_id}.pdf — DMR methylation line
plotsOne page per DMR. Each plot shows REF_f and
ALT_f (REF and ALT allele methylation fractions) across all
CpG positions within the DMR, faceted by SNP. Points are shaped by CpG
Category. Expected pattern for a normally imprinted DMR:
one allele near 100% methylation, the other near 0%.
prepare_cpg_snp_input()
output — cpg_snps_CG_{type}_{id}.xlsx| Column | Description |
|---|---|
chr |
Chromosome |
pos |
CpG position |
context |
Always "CG" |
total_meth |
Watson + Crick methylated read count |
total_cov |
Watson + Crick total coverage |
meth_frac |
total_meth / total_cov |
DMR |
Imprinted DMR name |
snp_pos |
Position of the linked heterozygous SNP |
REF |
Reference allele at the SNP |
ALT |
Alternative allele at the SNP |
GT |
Genotype (always "0/1" — heterozygous only) |
AD |
Allelic depth string (e.g. "15,13") |
DP |
Total SNP read depth |
ref_depth |
REF allele read depth |
alt_depth |
ALT allele read depth |
total_depth |
ref_depth + alt_depth |
sample_id |
Sample identifier |
The package covers the 41 canonical human imprinted DMRs on GRCh38, including:
| DMR | Chromosome | Associated disorder |
|---|---|---|
| H19/IGF2 | chr11p15.5 | BWS (hypometh) / SRS (hypermeth) |
| KCNQ1OT1 | chr11p15.5 | BWS (hypometh) |
| SNRPN | chr15q11-q13 | PWS / AS |
| MEG3/DLK1 | chr14q32 | Temple syndrome / Kagami-Ogata syndrome |
| PLAGL1 | chr6q24 | Transient neonatal diabetes mellitus |
| GRB10 | chr7p12 | SRS |
| DIRAS3 | chr1p31 | — |
| PPIEL | chr1p36 | — |
| …and 33 more |
PATH — required
at runtime by extract_bam_regions() (calls
samtools view, samtools sort,
samtools index)If you use ImprintCapASM in your research, please cite:
Saha S. et al. (2026). ImprintCapASM: SNP-phased allele-specific methylation analysis for imprint disorder diagnostics. R package version 0.1.0. https://CRAN.R-project.org/package=ImprintCapASM
MIT © Subham Saha
Bug reports and feature requests are welcome via GitHub
Issues. Pull requests should be submitted against the
dev branch.
Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.
This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.