Perform Genomic Liftover
Abdullah Al Nahid
Department of Biochemistry and Molecular Biology, Shahjalal University of Science and Technology, Sylhet, BangladeshHervé Pagès
Fred Hutchinson Cancer Research Center, Seattle, WA, USAMichael I. Love
Biostatistics Department, University of North Carolina-Chapel Hill, Chapel Hill, NC, USAGenetics Department, University of North Carolina-Chapel Hill, Chapel Hill, NC, USASource:
vignettes/easylift.Rmd
easylift.Rmd
Abstract
An R/Bioconductor library to perform genomic liftover between
different genome assemblies with GRanges
and
chain
file. Source Code: https://github.com/nahid18/easylift
Getting Started
Installation
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("easylift")
Import
library("easylift")
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#> get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#> match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
#> table, tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr
Arguments
Create a GRanges
object, assign a genome to it, and
specify chain file
gr <- GRanges(
seqname = Rle(
c("chr1", "chr2"),
c(100000, 100000)
),
ranges = IRanges(
start = 1, end = 200000
)
)
# Here, "hg19" is the source genome
genome(gr) <- "hg19"
# Here, we use the `system.file()` function because the chain file is in the
# package (however if you need to point to any other file on your machine,
# just do 'chain <- "path/to/your/hg19ToHg38.over.chain.gz"'):
chain <- system.file("extdata", "hg19ToHg38.over.chain.gz", package = "easylift")
gr
#> GRanges object with 200000 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 1-200000 *
#> [2] chr1 1-200000 *
#> [3] chr1 1-200000 *
#> [4] chr1 1-200000 *
#> [5] chr1 1-200000 *
#> ... ... ... ...
#> [199996] chr2 1-200000 *
#> [199997] chr2 1-200000 *
#> [199998] chr2 1-200000 *
#> [199999] chr2 1-200000 *
#> [200000] chr2 1-200000 *
#> -------
#> seqinfo: 2 sequences from hg19 genome; no seqlengths
Run
Call easylift
with GRanges
object, target
genome and the chain file.
# Here, "hg38" is the target genome
easylift(gr, "hg38", chain)
#> GRanges object with 300000 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 10001-177376 *
#> [2] chr19 242824-242864 *
#> [3] chr1 10001-177376 *
#> [4] chr19 242824-242864 *
#> [5] chr1 10001-177376 *
#> ... ... ... ...
#> [299996] chr2 10001-200000 *
#> [299997] chr2 10001-200000 *
#> [299998] chr2 10001-200000 *
#> [299999] chr2 10001-200000 *
#> [300000] chr2 10001-200000 *
#> -------
#> seqinfo: 25 sequences (1 circular) from hg38 genome
Run with BiocFileCache
To use BiocFileCache
for the chain file, add it to the
cache:
chain_file <- "/path/to/your/hg19ToHg38.over.chain.gz"
bfc <- BiocFileCache()
# Add chain file to cache if already not available
if (nrow(bfcquery(bfc, basename(chain_file))) == 0)
bfcadd(bfc, chain_file)
Then, use it in easylift
:
Citation
To cite package easylift
in publications use:
Al Nahid A, Pagès H, Love M (2023). easylift: An R package to perform genomic liftover. R package version 1.0.0, https://github.com/nahid18/easylift.
A BibTeX entry for LaTeX users is
@Manual{,
title = {easylift: An R package to perform genomic liftover},
author = {Abdullah Al Nahid, Hervé Pagès, Michael Love},
year = {2023},
note = {R package version 1.0.0},
url = {https://github.com/nahid18/easylift},
}
Please note that the easylift
was only made possible
thanks to many other R and bioinformatics software authors, which are
cited either in the vignettes and/or the paper(s) describing this
package.
Code of Conduct
Please note that the easylift
project is released with a
Contributor
Code of Conduct. By contributing to this project, you agree to abide
by its terms.
Session information
sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] easylift_1.1.0 BiocFileCache_2.8.0 dbplyr_2.4.0
#> [4] GenomicRanges_1.52.1 GenomeInfoDb_1.36.4 IRanges_2.34.1
#> [7] S4Vectors_0.38.2 BiocGenerics_0.46.0 BiocStyle_2.28.1
#>
#> loaded via a namespace (and not attached):
#> [1] SummarizedExperiment_1.30.2 rjson_0.2.21
#> [3] xfun_0.41 bslib_0.5.1
#> [5] lattice_0.21-9 Biobase_2.60.0
#> [7] vctrs_0.6.4 tools_4.3.2
#> [9] bitops_1.0-7 generics_0.1.3
#> [11] parallel_4.3.2 curl_5.1.0
#> [13] tibble_3.2.1 fansi_1.0.5
#> [15] RSQLite_2.3.2 blob_1.2.4
#> [17] pkgconfig_2.0.3 R.oo_1.25.0
#> [19] Matrix_1.6-1.1 desc_1.4.2
#> [21] lifecycle_1.0.3 GenomeInfoDbData_1.2.10
#> [23] compiler_4.3.2 stringr_1.5.0
#> [25] Rsamtools_2.16.0 Biostrings_2.68.1
#> [27] textshaping_0.3.7 codetools_0.2-19
#> [29] htmltools_0.5.6.1 sass_0.4.7
#> [31] RCurl_1.98-1.13 yaml_2.3.7
#> [33] crayon_1.5.2 pillar_1.9.0
#> [35] pkgdown_2.0.7 jquerylib_0.1.4
#> [37] R.utils_2.12.2 BiocParallel_1.34.2
#> [39] DelayedArray_0.26.7 cachem_1.0.8
#> [41] abind_1.4-5 tidyselect_1.2.0
#> [43] digest_0.6.33 stringi_1.7.12
#> [45] restfulr_0.0.15 dplyr_1.1.3
#> [47] purrr_1.0.2 bookdown_0.36
#> [49] grid_4.3.2 rprojroot_2.0.3
#> [51] fastmap_1.1.1 cli_3.6.1
#> [53] magrittr_2.0.3 S4Arrays_1.0.6
#> [55] XML_3.99-0.15 utf8_1.2.4
#> [57] filelock_1.0.2 bit64_4.0.5
#> [59] rmarkdown_2.25 XVector_0.40.0
#> [61] httr_1.4.7 matrixStats_1.0.0
#> [63] bit_4.0.5 ragg_1.2.6
#> [65] R.methodsS3_1.8.2 memoise_2.0.1
#> [67] evaluate_0.23 knitr_1.45
#> [69] BiocIO_1.10.0 rtracklayer_1.60.1
#> [71] rlang_1.1.1 glue_1.6.2
#> [73] DBI_1.1.3 BiocManager_1.30.22
#> [75] jsonlite_1.8.7 R6_2.5.1
#> [77] MatrixGenerics_1.12.3 GenomicAlignments_1.36.0
#> [79] systemfonts_1.0.5 fs_1.6.3
#> [81] zlibbioc_1.46.0