Jump to ContentJump to Main Navigation
Computational Text Analysisfor functional genomics and bioinformatics$
Users without a subscription are not able to see the full content.

Soumya Raychaudhuri

Print publication date: 2006

Print ISBN-13: 9780198567400

Published to Oxford Scholarship Online: November 2020

DOI: 10.1093/oso/9780198567400.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (oxford.universitypressscholarship.com). (c) Copyright Oxford University Press, 2021. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in OSO for personal use. date: 16 June 2021

Text-Based Analysis of a Single Series of Gene Expression Measurements

Text-Based Analysis of a Single Series of Gene Expression Measurements

5 (p.123) Text-Based Analysis of a Single Series of Gene Expression Measurements
Computational Text Analysis

Soumya Raychaudhuri

Oxford University Press

In this chapter we begin to address the issue of the analysis of gene expression data with the scientific literature. Here we describe methods for the analysis of a single experiment—one where a single expression measurement has been made for many genes within the same organism. In Chapter 7 we will address the analysis of larger data sets with multiple expression measurements for each of the genes; the questions that occur in that setting are often more complex and utilization of scientific text in that setting can be more useful. But focusing on a single series of expression measurements is an effective starting point in understanding the scientific literature and how it can be used with experimental data. The lessons here can be applied to a wide array of genomic assays besides gene arrays. These methods can be applied to any assay that assigns a single value to each gene In addition, many investigators generate single-condition expression data sets, and these methods are widely applicable. One of the great difficulties in analyzing a single expression series is that context is lacking. That is, we have a large set of isolated measurements. Each measurement corresponds to the log of the relative ratio of a single gene’s expression in an experimental condition compared to its expression in a control condition. These measurements represent a single snapshot of a cell’s physiologic status. One of the great challenges is sorting out the physiologically important expression changes compared to random experimental and physiologic aberrations and fluctuations. Gene expression measurements are subject to a great amount of noise and distinguishing true positives from genes that are not truly induced or repressed is a great challenge. Typically, investigators use their knowledge of biology to prioritize likely positives. In this chapter we argue that text-mining approaches can be used to help prioritize these genes instead. Another equally important challenge is to discern broadly what biological functions are active in a given experiment.

Keywords:   binding proteins, genome sequence information, keywords assignment, matrices reference matrix (R), noise, phosphate metabolism study, text matrix (T), words expression value

Oxford Scholarship Online requires a subscription or purchase to access the full text of books within the service. Public users can however freely search the site and view the abstracts and keywords for each book and chapter.

Please, subscribe or login to access full text content.

If you think you should have access to this title, please contact your librarian.

To troubleshoot, please check our FAQs , and if you can't find the answer there, please contact us .