Welcome to ClientVPS Mirrors

README

ImprintCapASM

R-CMD-check CRAN status License: MIT

Overview

ImprintCapASM is an R package for SNP-phased allele-specific methylation (ASM) analysis across the 41 known human imprinted differentially methylated regions (DMRs). It is designed for clinical diagnostic workflows that profile imprint disorder cases — including Beckwith-Wiedemann syndrome (BWS), Silver-Russell syndrome (SRS), Prader-Willi syndrome (PWS), Angelman syndrome (AS), and related conditions — from bisulfite sequencing data produced by targeted capture panels.

The package provides three core functions that form a sequential pipeline:

  1. prepare_cpg_snp_input() — Links CpG methylation values to nearby heterozygous SNPs; produces a per-sample Excel table and a BED file
  2. extract_bam_regions() — Extracts and sorts a BAM subset covering the SNP windows for each sample
  3. ASM() — Reads the extracted BAM, assigns each read to a parental allele, and computes allele-specific methylation statistics; returns three output tables and a line-plot PDF

For processing multiple samples together — the standard diagnostic use case — use run_pipeline(), which runs the full three-step pipeline for all control samples as a batch, and separately for all patient samples as a batch. Controls and patients are always run independently using their respective filter_cpgs reference files.


Background

Genomic imprinting is an epigenetic phenomenon whereby a subset of genes are expressed in a parent-of-origin dependent manner, regulated by differentially methylated regions (DMRs). Loss or gain of methylation at these DMRs underlies a class of rare congenital disorders collectively known as imprinting disorders. Accurate diagnosis requires quantifying the methylation of each parental allele separately — a task that standard bisulfite sequencing alone cannot achieve without phasing methylation data to nearby heterozygous SNPs.

ImprintCapASM implements a SNP-phasing strategy: heterozygous SNPs detected in bisulfite sequencing reads are used to assign each read to a parental allele (REF or ALT), and CpG methylation values on each allele are computed and compared. Deviation from the expected allele-specific methylation pattern at a given DMR indicates a potential imprinting disorder.


Installation

From CRAN (stable release)

install.packages("ImprintCapASM")

From GitHub (development version)

# install.packages("remotes")
remotes::install_github("19-saha/ImprintCapASM")

Bioconductor dependencies

ImprintCapASM depends on several Bioconductor packages. Install them first if not already present:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(c(
    "BiocParallel",
    "Rsamtools",
    "GenomicRanges",
    "IRanges",
    "S4Vectors",
    "SummarizedExperiment",
    "VariantAnnotation"
))

The Two filter_cpgs Reference Files

A key concept in ImprintCapASM is that controls and patients each have their own filter_cpgs reference file. These are not interchangeable:

File Used with Contains Purpose
inst/extdata/filter_cpgs_ctrl.xlsx sample_type = "control" Control_1, Control_2, … columns Computes mean/SD methylation and CpG variance categories from the control cohort
inst/extdata/filter_cpgs_pat.xlsx sample_type = "patient" Patient_1, Patient_2, … columns Computes mean/SD methylation and CpG variance categories from the patient cohort

Both files share the same structure (chr, 5_location, 3_location, DMR, then sample columns). The ASM() function auto-detects sample columns by matching the pattern ^Control_ or ^Patient_ in the column names. Passing the wrong file to the wrong sample_type will produce incorrect variance categories and misleading plots.

Both files are used identically for CpG window definition in prepare_cpg_snp_input() — what differs is the cohort-specific methylation statistics computed during ASM().


Pipeline Overview

Bisulfite sequencing run (targeted imprint capture panel)
        │
        ├── bssnper SNP calling    →  sample.SNPs.out      (VCFv4.3, plain text)
        ├── bssnper CG methylation →  sample.CGmeth.txt    (9-column TSV)
        └── Picard MarkDuplicates  →  sample_markdup.bam + .bai
                │
                ▼  [per sample, run separately for controls and patients]
        ┌──────────────────────────────────────────┐
        │   prepare_cpg_snp_input()                │
        │   Input:  sample.SNPs.out                │  Filters heterozygous SNPs (GT=0/1),
        │           sample.CGmeth.txt              │  overlaps with CpG panel windows,
        │           inst/extdata/filter_cpgs_ctrl.xlsx          │  joins CpG methylation fractions
        │             OR inst/extdata/filter_cpgs_pat.xlsx      │
        │   Output: cpg_snps_CG_{type}_{id}.xlsx   │
        │           cpg_snps_CG_{type}_{id}.bed    │
        └──────────────┬───────────────────────────┘
                       │
        ┌──────────────▼───────────────────────────┐
        │   extract_bam_regions()                  │
        │   Input:  sample_markdup.bam             │  Subsets BAM to SNP windows,
        │           cpg_snps_CG_{type}_{id}.bed    │  sorts and indexes the output
        │   Output: {type}_{id}_wide.bam + .bai    │
        └──────────────┬───────────────────────────┘
                       │
             [all samples of same type combined]
                       │
        ┌──────────────▼───────────────────────────┐
        │   ASM()                                  │
        │   Input:  cpg_snps_CG_{type}_{id}.xlsx   │  Bisulfite-aware allele assignment,
        │           {type}_{id}_wide.bam           │  per-read methylation scoring,
        │           inst/extdata/filter_cpgs_ctrl.xlsx          │  CpG variance classification using
        │             OR inst/extdata/filter_cpgs_pat.xlsx      │  cohort-matched reference
        │   Output: asm_{type}_{id}.xlsx           │
        │           snp_cpg_{type}_{id}.xlsx       │
        │           meth_summary_{type}_{id}.xlsx  │
        │           lineplot_{type}_{id}.pdf       │
        └──────────────────────────────────────────┘

Input File Formats

1. SNP file — sample.SNPs.out (bssnper VCFv4.3)

Produced by BS-Snper. Plain-text VCF — no bgzip or tabix index required. The function reads the GT FORMAT field and retains only heterozygous SNPs (GT == "0/1") with sufficient depth:

#CHROM  POS     ID  REF ALT QUAL  FILTER  INFO             FORMAT                                 SAMPLE
chr11   2016400 .   G   A   85    PASS    DP=28;AD=15,13;  GT:DP:AD:ADF:ADR:BSD:BSQ:ALFR          0/1:28:15,13:...

2. Methylation file — sample.CGmeth.txt (bssnper CG output)

Tab-delimited, 9 columns with a #CHROM header. Watson and Crick strand methylation and coverage are merged internally by the function:

#CHROM  POS       CONTEXT  Watson-METH  Watson-COVERAGE  Watson-QUAL  Crick-METH  Crick-COVERAGE  Crick-QUAL
chr11   2016405   CG       155          169              33           365         494             33

3. CpG panel reference files — inst/extdata/filter_cpgs_ctrl.xlsx and inst/extdata/filter_cpgs_pat.xlsx

Two separate reference Excel files — one for controls, one for patients. Both share the same column structure: genomic coordinates and DMR name, followed by per-sample methylation percentages. The ASM() function detects sample columns automatically by matching ^Control_ or ^Patient_ column name prefixes:

chr    5_location  3_location  DMR        Control_1  Control_2  Control_3  ...
chr11  2016404     2016406     H19/IGF2   82         84         81         ...
chr    5_location  3_location  DMR        Patient_1  Patient_2  Patient_3  ...
chr11  2016404     2016406     H19/IGF2   45         83         80         ...

4. BAM file — sample_markdup.bam + .bam.bai

Duplicate-marked, coordinate-sorted BAM produced by Picard MarkDuplicates. The .bai index must be present alongside the BAM. If the index is missing, extract_bam_regions() creates it automatically via Rsamtools::indexBam().


Organise your project with controls and patients in separate folders so that run_pipeline() can glob files cleanly:

project/
├── controls/
│   ├── snps/
│   │   ├── CTRL_01.SNPs.out
│   │   ├── CTRL_02.SNPs.out
│   │   └── ...
│   ├── meth/
│   │   ├── CTRL_01.CGmeth.txt
│   │   ├── CTRL_02.CGmeth.txt
│   │   └── ...
│   ├── bams/
│   │   ├── CTRL_01_markdup.bam
│   │   ├── CTRL_01_markdup.bam.bai
│   │   └── ...
│   └── output/
│
├── patients/
│   ├── snps/
│   ├── meth/
│   ├── bams/
│   └── output/
│
├── inst/extdata/filter_cpgs_ctrl.xlsx   ← control reference panel
└── inst/extdata/filter_cpgs_pat.xlsx    ← patient reference panel

Usage

Running a single control sample

library(ImprintCapASM)

# Step 1
prepare_cpg_snp_input(
    snp_file     = "controls/snps/CTRL_01.SNPs.out",
    meth_file    = "controls/meth/CTRL_01.CGmeth.txt",
    cpg_ref_file = "inst/extdata/filter_cpgs_ctrl.xlsx",
    sample_type  = "control"
)
# Writes: cpg_snps_CG_control_CTRL_01.xlsx
#         cpg_snps_CG_control_CTRL_01.bed

# Step 2
extract_bam_regions(
    bam_file    = "controls/bams/CTRL_01_markdup.bam",
    bed_file    = "cpg_snps_CG_control_CTRL_01.bed",
    output_dir  = "controls/output/",
    sample_type = "control"
)
# Writes: controls/output/control_CTRL_01_wide.bam + .bai

# Step 3
ASM(
    cpg_snp_file     = "cpg_snps_CG_control_CTRL_01.xlsx",
    sam_file         = "controls/output/control_CTRL_01_wide.bam",
    filter_cpgs_file = "inst/extdata/filter_cpgs_ctrl.xlsx",
    sample_type      = "control"
)
# Writes: asm_control_CTRL_01.xlsx
#         snp_cpg_control_CTRL_01.xlsx
#         meth_summary_control_CTRL_01.xlsx
#         lineplot_control_CTRL_01.pdf

Running a single patient sample

# Step 1
prepare_cpg_snp_input(
    snp_file     = "patients/snps/PAT_01.SNPs.out",
    meth_file    = "patients/meth/PAT_01.CGmeth.txt",
    cpg_ref_file = "inst/extdata/filter_cpgs_pat.xlsx",       
    sample_type  = "patient"
)

# Step 2
extract_bam_regions(
    bam_file    = "patients/bams/PAT_01_markdup.bam",
    bed_file    = "cpg_snps_CG_patient_PAT_01.bed",
    output_dir  = "patients/output/",
    sample_type = "patient"
)

# Step 3
ASM(
    cpg_snp_file     = "cpg_snps_CG_patient_PAT_01.xlsx",
    sam_file         = "patients/output/patient_PAT_01_wide.bam",
    filter_cpgs_file = "inst/extdata/filter_cpgs_pat.xlsx",   # <-- patient reference
    sample_type      = "patient"
)

Running a full cohort with run_pipeline()

run_pipeline() processes all samples in a given folder in batch. Controls and patients are always run as separate calls with their respective reference files:

library(ImprintCapASM)

# --- Run all controls ---
run_pipeline(
    snp_dir          = "controls/snps/",
    meth_dir         = "controls/meth/",
    bam_dir          = "controls/bams/",
    filter_cpgs_file = "inst/extdata/filter_cpgs_ctrl.xlsx",
    output_dir       = "controls/output/",
    sample_type      = "control"
)

# --- Run all patients (separate call, separate reference file) ---
run_pipeline(
    snp_dir          = "patients/snps/",
    meth_dir         = "patients/meth/",
    bam_dir          = "patients/bams/",
    filter_cpgs_file = "inst/extdata/filter_cpgs_pat.xlsx",
    output_dir       = "patients/output/",
    sample_type      = "patient"
)

run_pipeline() automatically matches files across snp_dir, meth_dir, and bam_dir by sample ID, iterates Steps 1 and 2 per sample, then calls ASM() on the combined output for that cohort.


Toy Example with Built-in Data

The package ships with minimal example files covering two chr11 DMRs (H19/IGF2 and KCNQ1OT1):

library(ImprintCapASM)

snp_file     <- system.file("extdata", "example_snp.vcf",         package = "ImprintCapASM")
meth_file    <- system.file("extdata", "example_cgmeth.txt",       package = "ImprintCapASM")
cpg_ref_file <- system.file("extdata", "example_filter_cpgs.xlsx", package = "ImprintCapASM")
bam_file     <- system.file("extdata", "example.bam",              package = "ImprintCapASM")

# Step 1
prepare_cpg_snp_input(
    snp_file     = snp_file,
    meth_file    = meth_file,
    cpg_ref_file = cpg_ref_file,
    sample_type  = "control"
)

# Step 2
extract_bam_regions(
    bam_file    = bam_file,
    bed_file    = list.files(tempdir(), pattern = "\\.bed$", full.names = TRUE)[1],
    output_dir  = tempdir(),
    sample_type = "control"
)

# Step 3
ASM(
    cpg_snp_file     = list.files(tempdir(), pattern = "cpg_snps.*\\.xlsx$", full.names = TRUE)[1],
    sam_file         = list.files(tempdir(), pattern = "_wide\\.bam$",        full.names = TRUE)[1],
    filter_cpgs_file = cpg_ref_file,
    sample_type      = "control"
)

Output Files and Column Descriptions

ASM() writes three Excel files and one PDF per run.


1. asm_{type}_{sample_id}.xlsx — Read-level allele-methylation table

One row per read–CpG combination. The most granular output.

Column Description
sample_id Sample identifier (derived from BAM filename)
sample_type "control" or "patient"
id Read name
read_sequence Raw read sequence
read_start Leftmost mapping position of the read
flag SAM FLAG value
flag_context Human-readable FLAG interpretation
combined_tags Concatenated SAM optional tags
strand "Forward" or "Reverse"
chr Chromosome
DMR Imprinted DMR name (e.g. H19/IGF2)
snp_pos Genomic position of the phasing SNP
cpg_pos Genomic position of the CpG
allele_type "REF" or "ALT" (parental allele assignment)
ref_allele Reference base at the SNP
alt_allele Alternative base at the SNP
assignment_note Bisulfite-aware logic used for allele assignment
n_methylated 1 if the CpG is methylated on this read, else 0
n_unmethylated 1 if the CpG is unmethylated on this read, else 0
meth_frac Same as n_methylated (numeric; used for summaries)
Padded_Sequence Read sequence left-padded for DMR alignment visualisation
mean_methylation Cohort mean methylation for this CpG (from filter_cpgs)
sd_methylation Cohort SD for this CpG (from filter_cpgs)
Category CpG variance class: LOWvar, MSDvar, SDvar, or Mvar

2. snp_cpg_{type}_{sample_id}.xlsx — Per SNP–CpG pair summary

One row per unique (SNP position, CpG position) combination, with allele-stratified read counts and methylation fractions.

Column Description
snp_pos Genomic position of the phasing SNP
cpg_pos Genomic position of the CpG
sample_id Sample identifier
chr Chromosome
DMR Imprinted DMR name
ref_allele Reference base at the SNP
alt_allele Alternative base at the SNP
REF_m Methylated read count on the REF allele
REF_um Unmethylated read count on the REF allele
ALT_m Methylated read count on the ALT allele
ALT_um Unmethylated read count on the ALT allele
REF_tot Total reads assigned to the REF allele
ALT_tot Total reads assigned to the ALT allele
MI Combined methylation index across both alleles
REF_f REF allele methylation fraction (0–1, rounded to 3 dp)
ALT_f ALT allele methylation fraction (0–1, rounded to 3 dp)
ref_alt_ratio REF/ALT read ratio (balance check; expected ≈ 1.0)
mean_methylation Cohort mean methylation for this CpG (from filter_cpgs)
sd_methylation Cohort SD for this CpG (from filter_cpgs)
Category CpG variance class: LOWvar, MSDvar, SDvar, or Mvar

3. meth_summary_{type}_{sample_id}.xlsx — Per allele methylation summary

One row per (sample, SNP position, CpG position, allele type) combination.

Column Description
sample_id Sample identifier
snp_pos Genomic position of the phasing SNP
cpg_pos Genomic position of the CpG
DMR Imprinted DMR name
allele_type "REF" or "ALT"
total_reads Total reads for this allele at this CpG
methylated Methylated read count
unmethylated Unmethylated read count
meth_frac Methylation fraction (methylated / total_reads, rounded to 3 dp)
mean_methylation Cohort mean methylation for this CpG (from filter_cpgs)
sd_methylation Cohort SD for this CpG (from filter_cpgs)
Category CpG variance class: LOWvar, MSDvar, SDvar, or Mvar

4. lineplot_{type}_{sample_id}.pdf — DMR methylation line plots

One page per DMR. Each plot shows REF_f and ALT_f (REF and ALT allele methylation fractions) across all CpG positions within the DMR, faceted by SNP. Points are shaped by CpG Category. Expected pattern for a normally imprinted DMR: one allele near 100% methylation, the other near 0%.


prepare_cpg_snp_input() output — cpg_snps_CG_{type}_{id}.xlsx

Column Description
chr Chromosome
pos CpG position
context Always "CG"
total_meth Watson + Crick methylated read count
total_cov Watson + Crick total coverage
meth_frac total_meth / total_cov
DMR Imprinted DMR name
snp_pos Position of the linked heterozygous SNP
REF Reference allele at the SNP
ALT Alternative allele at the SNP
GT Genotype (always "0/1" — heterozygous only)
AD Allelic depth string (e.g. "15,13")
DP Total SNP read depth
ref_depth REF allele read depth
alt_depth ALT allele read depth
total_depth ref_depth + alt_depth
sample_id Sample identifier

Supported Imprinted DMRs

The package covers the 41 canonical human imprinted DMRs on GRCh38, including:

DMR Chromosome Associated disorder
H19/IGF2 chr11p15.5 BWS (hypometh) / SRS (hypermeth)
KCNQ1OT1 chr11p15.5 BWS (hypometh)
SNRPN chr15q11-q13 PWS / AS
MEG3/DLK1 chr14q32 Temple syndrome / Kagami-Ogata syndrome
PLAGL1 chr6q24 Transient neonatal diabetes mellitus
GRB10 chr7p12 SRS
DIRAS3 chr1p31
PPIEL chr1p36
…and 33 more

System Requirements


Citation

If you use ImprintCapASM in your research, please cite:

Saha S. et al. (2026). ImprintCapASM: SNP-phased allele-specific methylation analysis for imprint disorder diagnostics. R package version 0.1.0. https://CRAN.R-project.org/package=ImprintCapASM


License

MIT © Subham Saha


Contributing

Bug reports and feature requests are welcome via GitHub Issues. Pull requests should be submitted against the dev branch.

Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.

This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.