《組學(xué)數(shù)據(jù)生物信息學(xué):研究方法與實(shí)驗(yàn)方案(導(dǎo)讀版)》特邀本領(lǐng)域?qū)I(yè)研究人員撰寫(xiě),以便向讀者提供一本實(shí)用指南。《組學(xué)數(shù)據(jù)生物信息學(xué):研究方法與實(shí)驗(yàn)方案(導(dǎo)讀版)》向讀者展示了一個(gè)全新的研究領(lǐng)域——組學(xué)數(shù)據(jù)生物信息學(xué)。這一新領(lǐng)域交匯并整合了分子生物學(xué)、應(yīng)用信息學(xué)和統(tǒng)計(jì)學(xué)等不同學(xué)科。
《組學(xué)數(shù)據(jù)生物信息學(xué):研究方法與實(shí)驗(yàn)方案(導(dǎo)讀版)》內(nèi)容十分詳盡,全書(shū)分為三大部分。首先介紹組學(xué)數(shù)據(jù)的基本分析策略、標(biāo)準(zhǔn)化、管理指南,以及基礎(chǔ)統(tǒng)計(jì)學(xué)等。接著,按基因組、轉(zhuǎn)錄組、蛋白質(zhì)組、代謝組等不同專題介紹各種數(shù)據(jù)的特定分析策略。最后,以疾病相關(guān)生物標(biāo)記和靶標(biāo)鑒定等為例,說(shuō)明組學(xué)生物信息學(xué)的具體應(yīng)用!督M學(xué)數(shù)據(jù)生物信息學(xué):研究方法與實(shí)驗(yàn)方案(導(dǎo)讀版)》秉承Springer《分子生物學(xué)方法》系列叢書(shū)的一貫風(fēng)格,闡述明晰、便于使用,各章包括專題簡(jiǎn)介、必備材料、易于操作的實(shí)驗(yàn)方案、疑難問(wèn)題的主意事項(xiàng),以及如何避免常見(jiàn)錯(cuò)誤。
《組學(xué)數(shù)據(jù)生物信息學(xué):研究方法與實(shí)驗(yàn)方案(導(dǎo)讀版)》既具權(quán)威性,又力求通俗易懂,叫作為不同專業(yè)北京研究人員的理想指南,也為讀者描繪了本研究領(lǐng)域引人入勝的圖景。
更多科學(xué)出版社服務(wù),請(qǐng)掃碼獲取。
《組學(xué)數(shù)據(jù)生物信息學(xué)(研究方法與實(shí)驗(yàn)方案導(dǎo)讀版)》從多個(gè)側(cè)面對(duì)組學(xué)數(shù)據(jù)生物信息學(xué)做了詳盡的介紹。本書(shū)共分三篇。第一篇介紹核心分析策略、標(biāo)準(zhǔn)分析規(guī)范、數(shù)據(jù)管理指南,以及用于分析組學(xué)數(shù)據(jù)的基本統(tǒng)計(jì)方法。第二篇介紹用于基因組、轉(zhuǎn)錄組、蛋白質(zhì)組、代謝組等各種不同組學(xué)數(shù)據(jù)的生物信息學(xué)分析方法,包括基本概念和實(shí)驗(yàn)背景,以及原始數(shù)據(jù)預(yù)處理和深入分析的基本方法。第三篇?jiǎng)t介紹如何利用生物信息學(xué)進(jìn)行組學(xué)數(shù)據(jù)分析的實(shí)例,包括人類疾病相關(guān)生物標(biāo)記鑒定和靶標(biāo)識(shí)別等具體例子。本書(shū)由邁爾著。
目錄
前言 v
撰稿人 ix
第一篇 組學(xué)生物信息學(xué)基礎(chǔ)
第一章 組學(xué)技術(shù)、數(shù)據(jù)和生物信息學(xué)原理 3
第二章 組學(xué)數(shù)據(jù)的數(shù)據(jù)標(biāo)準(zhǔn):數(shù)據(jù)共享和重用 3l
第三章 組學(xué)數(shù)據(jù)管理和注釋 7l
第四章 交叉組學(xué)研究項(xiàng)目的數(shù)據(jù)和知識(shí)管理 97
第五章 組學(xué)數(shù)據(jù)的統(tǒng)計(jì)分析原理 ll3
第六章 不同層次組學(xué)數(shù)據(jù)綜合分析的統(tǒng)計(jì)方法和模型 l33
第七章 時(shí)序組學(xué)數(shù)據(jù)集的分析 l53
第八章 “組學(xué)”術(shù)語(yǔ)的恰當(dāng)使用 l73
第二篇 幾種常用組學(xué)數(shù)據(jù)及分析方法
第九章 高通量測(cè)序數(shù)據(jù)的計(jì)算分析 199
第十章 對(duì)照研究中的單核苷酸多態(tài)性分析 219
第十一章 拷貝數(shù)變異數(shù)據(jù)的生物信息學(xué)分析 235
第十二章 基于免疫共沉淀的芯片數(shù)據(jù)處理:從原始圖像生成到分析結(jié)果瀏覽 25l
第十三章 基于基因表達(dá)譜的全局機(jī)制分析和疾病相關(guān)性 269
第十四章 轉(zhuǎn)錄組數(shù)據(jù)的生物信息學(xué)分析 299
第十五章 定性和定量蛋白組數(shù)據(jù)的生物信息學(xué)分析 33l
第十六章 質(zhì)譜數(shù)據(jù)代謝組數(shù)據(jù)的生物信息學(xué)分析 35l
第三篇 實(shí)用組學(xué)生物信息學(xué)
第十七章 組掌數(shù)據(jù)處理過(guò)程中的計(jì)算分析流程 379
第十八章 組學(xué)數(shù)據(jù)的整合、儲(chǔ)存和分析策略 399
第十九章 信號(hào)通路、相互作用網(wǎng)絡(luò)構(gòu)建和功能分析研究中組學(xué)數(shù)據(jù)的整合 415
第二十章 時(shí)間依賴型組學(xué)數(shù)據(jù)的網(wǎng)絡(luò)推斷 435
第二十一章 組學(xué)和文獻(xiàn)挖掘 457
第二十二章 組學(xué)和生物信息學(xué)在臨床數(shù)據(jù)處理中的應(yīng)用 479
第二十三章 基于組學(xué)的病理和生理過(guò)程分析 499
第二十四章 基于組學(xué)的生物標(biāo)記發(fā)現(xiàn)中的數(shù)據(jù)挖掘方法 5ll
第二十五章 癌癥靶標(biāo)識(shí)別的綜合生物信息學(xué)分析 527
第二十六章 基于組學(xué)的分子靶標(biāo)和生物標(biāo)記鑒定 547
索引 573
(羅靜初 譯)
Contents
Preface v
Contributors ix
PART I OMICS BIOINFORMATICS FUNDAMENTALS
1 Omics Technologies, Data and Bioinformatics Principles 3
Maria V.Schneider and Sandra Orchard
2 Data Standards for Omics Data: The Basis of Data Sharing and Reuse 31
Stephen A.Chervitz, Eric W.Deutsch, Dawn Field, Helen Parkinson,John Quackenbush, Phillipe Rocca-Serra, Susanna-Assunta Sansone,Christian J.Stoeckert, Jr., Chris F.Taylor, Ronald Taylor,and Catherine A.Ball
3 Omics Data Management and Annotation 71
Arye Harel, Irina Dalah, Shmuel Pietrokovski, Marilyn Safran,and Doron Lancet
4 Data and Knowledge Management in Cross-Omics Research Projects 97
Martin Wiesinger, Martin Haiduk, Marco Behr, Henrique Lopes de Abreu Madeira, Gernot Glockler, Paul Perco, and Arno Lukas
5 Statistical Analysis Principles for Omics Data 113
Daniela Dunkler, Fatima Sanchez-Cabo, and Georg Heinze
6 Statistical Methods and Models for Bridging Omics Data Levels 133
Simon Rogers
7 Analysis of Time Course Omics Datasets 153
Martin G.Grigorov
8 The Use and Abuse of-Omes 173
Sonja J.Prohaska and Peter F.Stadler
PART II OMICS DATA AND ANALYSIS TRACKS
9 Computational Analysis of High Throughput Sequencing Data 199
Steve Hoffmann
10 Analysis of Single Nucleotide Polymorphisms in Case–Control Studies 219
Yonghong Li, Dov Shiffman, and Rainer Oberbauer
11 Bioinformatics for Copy Number Variation Data 235
Melissa Warden, Roger Pique-Regi, Antonio Ortega,and Shahab Asgharzadeh
12 Processing ChIP-Chip Data: From the Scanner to the Browser 251
Pierre Cauchy, Touati Benoukraf, and Pierre Ferrier
13 Insights Into Global Mechanisms and Disease by Gene Expression Profiling 269
Fatima Sanchez-Cabo, Johannes Rainer, Ana Dopazo,Zlatko Trajanoski, and Hubert Hackl
14 Bioinformatics for RNomics 299
Kristin Reiche, Katharina Schutt, Kerstin Boll,Friedemann Horn, and Jorg Hackermüller
15 Bioinformatics for Qualitative and Quantitative Proteomics 331
Chris Bielow, Clemens Gropl, Oliver Kohlbacher, and Knut Reinert
16 Bioinformatics for Mass Spectrometry-Based Metabolomics 351
David P.Enot, Bernd Haas, and Klaus M.Weinberger
PART III APPLIED OMICS BIOINFORMATICS
17 Computational Analysis Workflows for Omics Data Interpretation 379
Irmgard Mühlberger, Julia Wilflingseder, Andreas Bernthaler,Raul Fechete, Arno Lukas, and Paul Perco
18 Integration, Warehousing, and Analysis Strategies of Omics Data 399
Srinubabu Gedela
19 Integrating Omics Data for Signaling Pathways, Interactome Reconstruction,and Functional Analysis 415
Paolo Tieri, Alberto de la Fuente, Alberto Termanini,and Claudio Franceschi
20 Network Inference from Time-Dependent Omics Data 435
Paola Lecca, Thanh-Phuong Nguyen, Corrado Priami, and Paola Quaglia
21 Omics and Literature Mining 457
Vinod Kumar
22 Omics–Bioinformatics in the Context of Clinical Data 479
Gert Mayer, Georg Heinze, Harald Mischak, Merel E.Hellemons,Hiddo J.Lambers Heerspink, Stephan J.L.Bakker, Dick de Zeeuw,Martin Haiduk, Peter Rossing, and Rainer Oberbauer
23 Omics-Based Identification of Pathophysiological Processes 499
Hiroshi Tanaka and Soichi Ogishima
24 Data Mining Methods in Omics-Based Biomarker Discovery 511
Fan Zhang and Jake Y.Chen
25 Integrated Bioinformatics Analysis for Cancer Target Identification 527
Yongliang Yang, S.James Adelstein, and Amin I.Kassis
26 Omics-Based Molecular Target and Biomarker Identification 547
Zgang–Zhi Hu, Hongzhan Huang, Cathy H.Wu, Mira Jung,Anatoly Dritschilo, Anna T.Riegel, and Anton Wellstein
Index 573
Chapter 1
Omics Technologies, Data and Bioinformatics Principles
Maria V. Schneider and Sandra Orchard
Abstract
We provide an overview on the state of the art for the Omics technologies, the types of omics data and the bioinformatics resources relevant and related to Omics. We also illustrate the bioinformatics chal-lenges of dealing with high-throughput data. This overview touches several fundamental aspects of Omics and bioinformatics: data standardisation, data sharing, storing Omics data appropriately and exploring Omics data in bioinformatics. Though the principles and concepts presented are true for the various dif-ferent technological .elds, we concentrate in three main Omics .elds namely: genomics, transcriptomics and proteomics. Finally we address the integration of Omics data, and provide several useful links for bioinformatics and Omics.
Key words: Omics, Bioinformatics, High-throughput, Genomics, Transcriptomics, Proteomics, Interactomics, Data integration, Omics databases, Omics tools
1. Introduction
The last decade has seen an explosion in the amount of biological data generated by an ever-increasing number of techniques enabling the simultaneous detection of a large number of altera-tions in molecular components (1). The Omics technologies uti-lise these high-throughput (HT) screening techniques to generate the large amounts of data required to enable a system level under-standing of correlations and dependencies between molecular components.
Omics techniques are required to be high throughput because they need to analyse very large numbers of genes, gene expression, or proteins either in a single procedure or a combina-tion of procedures. Computational analysis, i.e., the discipline now known as bioinformatics, is a key requirement for the study of the vast amounts of data generated. Omics requires the use of
Bernd Mayer (ed.), Bioinformatics for Omics Data: Methods and Protocols, Methods in Molecular Biology, vol. 719, DOI 10.1007/978-1-61779-027-0_1, . Springer Science+Business Media, LLC 2011
3
Schneider and Orchard
techniques that can handle extremely complex biological samples in large quantities (e.g. high throughput) with high sensitivity and speci.city. Next generation analytical tools require improved robustness, .exibility and cost ef.ciency. All of these aspects are being continuously improved, potentially enabling institutes such as the Wellcome Trust Sanger Sequencing Centre (see Note 1) to generate thousands of millions of base pairs per day, rather than the current output of 100 million per day (http://www. yourgenome.org/sc/nt).
However, all this data production makes sense only if one is equipped with the necessary analytical resources and tools to understand it. The evolution of the laboratory techniques has therefore to occur in parallel with a corresponding improvement in analytical methodology and tools to handle the data. The phrase Omics ? a suf.x signifying the measurement of the entire comple-ment of a given level of biological molecules and information ? encompasses a variety of new technologies that can help explain both normal and abnormal cell pathways, networks, and processes via the simultaneous monitoring of thousands of molecular com-ponents. Bioinformaticians use computers and statistics to perform extensive Omics-related research by searching biological databases and comparing gene sequences and proteins on a vast scale to identify sequences or proteins that differ between diseased and healthy tissues, or more general between different phenotypes.
“Omics” spans an increasingly wide range of .elds, which now range from genomics (the quantitative study of protein coding genes, regulatory elements and noncoding sequences), transcrip-tomics (RNA and gene expression), proteomics (e.g. focusing on protein abundance), and metabolomics (metabolites and meta-bolic networks) to advances in the era of post-genomic biology and medicine: pharmacogenomics (the quantitative study of how genetics affects a host response to drugs), physiomics (physiologi-cal dynamics and functions of whole organisms) and in other .elds: nutrigenomics (a rapidly growing discipline that focuses on iden-tifying the genetic factors that in.uence the body’s response to diet and studies how the bioactive constituents of food affect gene expression), phylogenomics (analysis involving genome data and evolutionary reconstructions, especially phylogenetics) and inter-actomics (molecular interaction networks). Though in the remain-der of this chapter we concentrate on an isolated few examples of Omics technologies, much of what is said, for example about data standardisation, data sharing, storage and analysis requirements are true for all of these different technological .elds.
There are already large amounts of data generated by these technologies and this trend is increasing, for example second and third generation sequencing technologies are leading to an exponential increase in the amount of sequencing data available. From a computational point of view, in order to address the
2. Materials
2.1. Genomics High-Throughput Technologies
2.2. Transcriptomics High-Throughput Technologies
Omics Technologies, Data and Bioinformatics Principles
complexity of these data, understand molecular regulation and gain the most from such comprehensive set of information, knowledge discovery ? the process of automatically searching large volumes of data for patterns ? is a crucial step. This process of bioinformatics analysis includes: (1) data processing and molecule (e.g. protein) identi.cation, (2) statistical data analysis,
(3) pathway analysis, and (4) data modelling in a system wide context. In this chapter we will present some of these analytical methods and discuss ways in which data can be made accessible to both the specialised bioinformatician, but in particular to the research scientist.
There are a variety of de.nitions of the term HT; however we can loosely apply this term to cases where automation is used to increase the throughput of an experimental procedure. HT tech-nologies exploit robotics, optics, chemistry, biology and image analysis research. The explosion in data production in the public domain is a consequence of falling equipment prices, the opening of major national screening centres and new HT core facilities at universities and other academic institutes. The role of bioinfor-matics in HT technologies is of essential importance.
High-Throughput Sequencing (HTS) technologies are used not only for traditional applications in genomics and metagenomics (see Note 2), but also for novel applications in the .elds of tran-scriptomics, metatranscriptomics (see Note 3), epigenomics (see Note 4), and studies of genome variation (see Note 5). Next gen-eration sequencing platforms allow the determination of the sequence data from ampli.ed single DNA fragments and have been developed speci.cally to lend themselves to robotics and par-allelisation. Current methods can directly sequence only relatively short (300?1,000 nucleotides long) DNA fragments in a single reaction. Short-read sequencing technologies dramatically reduce the sequencing cost. There were initial fears that the increase in quantity might result in a decrease in quality, and improvements in accuracy and read length are being looked for. However, despite this, these advances have signi.cantly reduced the cost of several sequencing applications, such as resequencing individual genomes
(2) readout assays (e.g. ChIP-seq (3) and RNAseq (4)).
The transcriptome is the set of all messenger RNA (mRNA) molecules, or “transcripts”, produced in one or a population of cells. Several methods have been developed in order to gain expression information at high throughput level.