Jump to ContentJump to Main Navigation
Pattern Discovery in Biomolecular DataTools, Techniques, and Applications$
Users without a subscription are not able to see the full content.

Jason T. L. Wang, Bruce A. Shapiro, and Dennis Shasha

Print publication date: 1999

Print ISBN-13: 9780195119404

Published to Oxford Scholarship Online: November 2020

DOI: 10.1093/oso/9780195119404.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (oxford.universitypressscholarship.com). (c) Copyright Oxford University Press, 2021. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in OSO for personal use. date: 20 October 2021

Discovering Patterns in DNA Sequences by the Algorithmic Significance Method

Discovering Patterns in DNA Sequences by the Algorithmic Significance Method

(p.3) Chapter 1 Discovering Patterns in DNA Sequences by the Algorithmic Significance Method
Title Pages

Aleksandar Milosavljević

Oxford University Press

The parsimony method for reconstruction of evolutionary trees (Sober, 1988) and the minimal edit distance method for DNA sequence alignments (e.g., Waterman, 1984) are both based on the principle of Occam’s Razor (e.g., Losee, 1980; also known as the Parsimony principle). This principle favors the most concise theories among the multitudes that can possibly explain observed data. The conciseness may be measured by the number of postulated mutations within an evolutionary tree, by the number of edit operations that transform one DNA sequence into the other, or by another implicit or explicit criterion. A very general mathematical formulation of Occam’s Razor has been proposed via minimal length encoding by computer programs (for recent reviews, see Cover and Thomas, 1991; Li and Vitányi, 1993). Algorithmic significance is a general method for pattern discovery based on Occam’s Razor. The method measures parsimony in terms of encoding length, in bits, of the observed data. Patterns are defined as datasets that can be concisely encoded. The method is not limited to any particular class of patterns; the class of patterns is determined by specifying an encoding scheme. To illustrate the method, consider the following unusual discovery experiment: . . . 1. Pick a simple pseudorandom generator for digits from the set {0, 1, 2, 3}. 2. Pick a seed value for the generator and run it to obtain a sequence of 1000 digits; convert the digits to a DNA sequence by replacing all occurrences of digit 0 by letter A, 1 by G, 2 by C, and 3 by T. 3. Submit the sequence to a similarity search against a database containing a completely sequenced genome of a particular organism. . . . Assume that after an unspecified number of iterations of the three steps, with each iteration involving a different random generator or seed value or both, the search in the third step finally results in a genomic sequence highly similar to the query sequence. Does the genomic sequence contain a pattern? To argue for the presence of a pattern, one may directly apply the algorithmic significance method.

Keywords:   Algorithmic significance method, Clone, Data compression, Encoding, Gene, Hybridization, Information theory, Kolmogorov complexity, Low-complexity DNA sequence

Oxford Scholarship Online requires a subscription or purchase to access the full text of books within the service. Public users can however freely search the site and view the abstracts and keywords for each book and chapter.

Please, subscribe or login to access full text content.

If you think you should have access to this title, please contact your librarian.

To troubleshoot, please check our FAQs , and if you can't find the answer there, please contact us .