博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
TCGA下载神器--TCGAbiolinks
阅读量:5825 次
发布时间:2019-06-18

本文共 7112 字,大约阅读时间需要 23 分钟。

http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/tcgaBiolinks.html#gdcquery:_searching_tcga_open-access_data

 

举例:

Updates

Recently the TCGA data has been moved from the DCC server to The National Cancer Institute (NCI) Genomic Data Commons (GDC) Data Portal In this version of the package, we rewrote all the functions that were acessing the old TCGA server to GDC.

The GDC, which receives, processes, harmonizes, and distributes clinical, biospecimen, and genomic data from multiple cancer research programs, has data from the following programs:

  • The Cancer Genome Atlas (TCGA)
  • Therapeutically Applicable Research to Generate Effective Treatments (TARGET)
  • the Cancer Genome Characterization Initiative (CGCI)

The big change is that the GDC data is harmonized against GRCh38. However, not all data has been harmonized yet. The old TCGA data can be acessed through GDC legacy Archive, in which the majority of data can be found.

More information about the project can be found in 

The functions TCGAqueryTCGAdownloadTCGAPrepareTCGAquery_mafTCGAquery_clinical, were replaced by GDCqueryGDCdownloadGDCprepareGDCquery_mafGDCquery_clinical.

And it can acess both the GDC and GDC Legacy Archive.

Note: Not all the examples in this vignette were updated.

Introduction

Motivation: The Cancer Genome Atlas (TCGA) provides us with an enormous collection of data sets, not only spanning a large number of cancers but also a large number of experimental platforms. Even though the data can be accessed and downloaded from the database, the possibility to analyse these downloaded data directly in one single R package has not yet been available.

TCGAbiolinks consists of three parts or levels. Firstly, we provide different options to query and download from TCGA relevant data from all currently platforms and their subsequent pre-processing for commonly used bio-informatics (tools) packages in Bioconductor or CRAN. Secondly, the package allows to integrate different data types and it can be used for different types of analyses dealing with all platforms such as diff.expression, network inference or survival analysis, etc, and then it allows to visualize the obtained results. Thirdly we added a social level where a researcher can found a similar intereset in a bioinformatic community, and allows both to find a validation of results in literature in pubmed and also to retrieve questions and answers from site such as support.bioconductor.org, biostars.org, stackoverflow,etc.

This document describes how to search, download and analyze TCGA data using the TCGAbiolinks package.

Installation

To install use the code below.

source("https://bioconductor.org/biocLite.R")biocLite("TCGAbiolinks")

For a Graphical User Interface, please see . The GUI in under review and will soon be available in Bioconductor repository.

Citation

Please cite TCGAbiolinks package:

  • “TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data.” Nucleic acids research (2015): . (Colaprico, Antonio and Silva, Tiago C. and Olsen, Catharina and Garofano, Luciano and Cava, Claudia and Garolini, Davide and Sabedot, Thais S. and Malta, Tathiane M. and Pagnotta, Stefano M. and Castiglioni, Isabella and Ceccarelli, Michele and Bontempi, Gianluca and Noushmehr, Houtan 2016)

Related publications to this package:

  • “TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages”. F1000Research  (Silva, TC and Colaprico, A and Olsen, C and D’Angelo, F and Bontempi, G and Ceccarelli, M and Noushmehr, H 2016)

Also, if you have used ELMER analysis please cite:

  • Yao, L., Shen, H., Laird, P. W., Farnham, P. J., & Berman, B. P. “Inferring regulatory element landscapes and transcription factor networks from cancer methylomes.” Genome Biol 16 (2015): 105.
  • Yao, Lijing, Benjamin P. Berman, and Peggy J. Farnham. “Demystifying the secret mission of enhancers: linking distal regulatory elements to target genes.” Critical reviews in biochemistry and molecular biology 50.6 (2015): 550-573.
 

GDCquery: Searching TCGA open-access data

 

GDCquery: Searching GDC data for download

You can easily search GDC data using the GDCquery function.

Using a summary of filters as used in the TCGA portal, the function works with the following arguments:

  • project A list of valid project (see table below)
  • data.category A valid project (see list with getProjectSummary(project))
  • data.type A data type to filter the files to download
  • sample.type A sample type to filter the files to download (See table below)
  • workflow.type GDC workflow type
  • barcode A list of barcodes to filter the files to download
  • legacy Search in the legacy repository? Default: FALSE
  • platform Experimental data platform (HumanMethylation450, AgilentG4502A_07 etc). Used only for legacy repository
  • file.type A string to filter files, based on its names. Used only for legacy repository

The next subsections will detail each of the search arguments. Below, we show some search examples:

#---------------------------------------------------------------#  For available entries and combinations please se table below#---------------------------------------------------------------# Gene expression aligned against Hg38query <- GDCquery(project = "TARGET-AML", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "HTSeq - Counts") # All DNA methylation data for TCGA-GBM and TCGA-GBM query.met <- GDCquery(project = c("TCGA-GBM","TCGA-LGG"), legacy = TRUE, data.category = "DNA methylation", platform = c("Illumina Human Methylation 450", "Illumina Human Methylation 27")) # Using sample type to get only Primary solid Tumor samples and Solid Tissue Normal query.mirna <- GDCquery(project = "TCGA-ACC", data.category = "Transcriptome Profiling", data.type = "miRNA Expression Quantification", sample.type = c("Primary solid Tumor","Solid Tissue Normal")) # Example Using legacy to accessing hg19 and filtering by barcode query <- GDCquery(project = "TCGA-GBM", data.category = "DNA methylation", platform = "Illumina Human Methylation 27", legacy = TRUE, barcode = c("TCGA-02-0047-01A-01D-0186-05","TCGA-06-2559-01A-01D-0788-05")) # Gene expression aligned against hg19. query.exp.hg19 <- GDCquery(project = "TCGA-GBM", data.category = "Gene expression", data.type = "Gene expression quantification", platform = "Illumina HiSeq", file.type = "normalized_results", experimental.strategy = "RNA-Seq", barcode = c("TCGA-14-0736-02A-01R-2005-01", "TCGA-06-0211-02A-02R-2005-01"), legacy = TRUE) # Searching idat file for DNA methylation query <- GDCquery(project = "TCGA-OV", data.category = "Raw microarray data", data.type = "Raw intensities", experimental.strategy = "Methylation array", legacy = TRUE, file.type = ".idat", platform = "Illumina Human Methylation 450")

转载地址:http://oysdx.baihongyu.com/

你可能感兴趣的文章
Using RequireJS in AngularJS Applications
查看>>
hdu 2444(二分图最大匹配)
查看>>
【SAP HANA】关于SAP HANA中带层次结构的计算视图Cacultation View创建、激活状况下在系统中生成对象的研究...
查看>>
DevOps 前世今生 | mPaaS 线上直播 CodeHub #1 回顾
查看>>
iOS 解决UITabelView刷新闪动
查看>>
CentOS 7 装vim遇到的问题和解决方法
查看>>
JavaScript基础教程1-20160612
查看>>
ios xmpp demo
查看>>
python matplotlib 中文显示参数设置
查看>>
【ros】Create a ROS package:package dependencies报错
查看>>
通过容器编排和服务网格来改进Java微服务的可测性
查看>>
re:Invent解读:没想到你是这样的AWS
查看>>
PyTips 0x02 - Python 中的函数式编程
查看>>
使用《Deep Image Prior》来做图像复原
查看>>
Linux基础命令---rmdir
查看>>
Android图片添加水印图片并把图片保存到文件存储
查看>>
BigDecimal 舍入模式(Rounding mode)介绍
查看>>
开源 免费 java CMS - FreeCMS1.2-标签 infoSign
查看>>
Squid 反向代理服务器配置
查看>>
Java I/O操作
查看>>