Jump to ContentJump to Main Navigation
Pattern Discovery in Biomolecular DataTools, Techniques, and Applications$
Users without a subscription are not able to see the full content.

Jason T. L. Wang, Bruce A. Shapiro, and Dennis Shasha

Print publication date: 1999

Print ISBN-13: 9780195119404

Published to Oxford Scholarship Online: November 2020

DOI: 10.1093/oso/9780195119404.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (oxford.universitypressscholarship.com). (c) Copyright Oxford University Press, 2021. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in OSO for personal use. date: 18 June 2021

A Framework for Biological Pattern Discovery on Networks of Workstations

A Framework for Biological Pattern Discovery on Networks of Workstations

(p.133) Chapter 8 A Framework for Biological Pattern Discovery on Networks of Workstations
Pattern Discovery in Biomolecular Data

Bin Li

Dennis Shasha

Oxford University Press

Biological pattern discovery problems are computationally expensive. A possible technique for reducing the time to perform pattern discovery is parallelization. Since each task in a biological pattern discovery application is usually time-consuming by itself, we might be able to use networks of workstations (NOWs) that communicate infrequently. Persistent Linda (PLinda) is a distributed parallel computing system that runs on NOWs and it automatically utilizes idle workstations (Anderson and Shasha, 1992; Jeong, 1996). This means that labs can do parallel pattern discovery without buying new hardware. We propose an acyclic directed graph structure, exploration dag (E-dag for short), to characterize computational models of biological pattern discovery applications. An E-dag can first be constructively formed from specifications of a pattern discovery problem; then an E-dag traversal is performed on the fly to solve the problem. When done in parallel, the process of E-dag construction and traversal efficiently solves pattern discovery problems. Parallel E-dag construction and traversal can be easily programmed in PLinda. Finding active motifs in sets of protein sequences and in multiple RNA secondary structures are two examples of biological pattern discovery. Before discussing the framework, we introduce these two applications and briefly describe their computational models. Consider a database of imaginary protein sequences D = {FFRR, MRRM, MTRM, DPKY, AVLG} and the query “Find the patterns P of the form *X* where P occurs in at least two sequences in D and the size of P |P| ≥ 2.” (X can be a segment of a sequence of any length, and * represents a variable length don’t care [VLDC].) The good patterns are *RR* (which occurs in FFRR and MRRM) and *RM* (which occurs in MRRM and MTRM). Pattern discovery in sets of sequences concerns finding commonly occurring subsequences (sometimes called motifs). The structures of the motifs we wish to discover are regular expressions of the form *S1 * S2 * ... where S1,S2,… are segments of a sequence, that is, subsequences made up of consecutive letters, and * represents a VLDC. In matching the expression *S1 * S2 * … with a sequence S, the VLDCs may substitute for zero or more letters in S.

Keywords:   Exploration dag (E-dag), Graph traversal, Http, Motif, Network of workstations (NOWs), Protein, RNA, World Wide Web (WWW)

Oxford Scholarship Online requires a subscription or purchase to access the full text of books within the service. Public users can however freely search the site and view the abstracts and keywords for each book and chapter.

Please, subscribe or login to access full text content.

If you think you should have access to this title, please contact your librarian.

To troubleshoot, please check our FAQs , and if you can't find the answer there, please contact us .