Jump to ContentJump to Main Navigation
Pattern Discovery in Biomolecular DataTools, Techniques, and Applications$
Users without a subscription are not able to see the full content.

Jason T. L. Wang, Bruce A. Shapiro, and Dennis Shasha

Print publication date: 1999

Print ISBN-13: 9780195119404

Published to Oxford Scholarship Online: November 2020

DOI: 10.1093/oso/9780195119404.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (oxford.universitypressscholarship.com). (c) Copyright Oxford University Press, 2021. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in OSO for personal use. date: 14 June 2021

Assembling Blocks

Assembling Blocks

(p.24) Chapter 2 Assembling Blocks
Pattern Discovery in Biomolecular Data

Jorja G. Henikoff

Oxford University Press

A block is an ungapped local multiple alignment of amino acid sequences from a group of related proteins. Ideally, the contiguous stretch of residues represented by a block is conserved for biological function. Blocks have depth (the number of sequences) and width (the number of aligned positions). There are currently several useful programs for finding blocks in a group of related sequences that I do not discuss in detail here. Among these, Motif (Smith et al., 1990) and Asset (Neuwald and Green, 1994) both align blocks on occurrences of certain types of patterns found in the sequences; Gibbs (Lawrence et al., 1993; Neuwald et al., 1995) and MEME (Bailey and Elkan, 1994) both look for statistically optimal local alignments; and Macaw (Schuler et al., 1991) and Somap (Parry-Smith and Attwood, 1992) both give the user assistance in finding blocks interactively. After candidate blocks are identified by a block-finding method, they can be evaluated and assembled into a set representing the protein group, resulting in a multiple alignment consisting of ungapped regions separated by unaligned regions of variable length. The block assembly process is the subject of this chapter. Both the Blocks (Henikoff and Henikoff, 1996a) and Prints (Attwood and Beck, 1994) databases consist of such sets of blocks and between them currently represent 1,163 different protein groups. These collections of blocks are more sensitive and efficient for classifying new sequences into known protein groups than are collections of individual sequences, as demonstrated by comprehensive evaluations (Henikoff and Henikoff, 1994b, 1997), by genomic studies (Green et al., 1993), and by individual studies (Posfai et al., 1988; Henikoff, 1992, 1993; Attwood and Findlay, 1993; Pietrokovski, 1994; Brown, 1995). Issues that must be addressed during block assembly include the number of blocks provided to the assembly module by the block finders, block width, the number of times a block occurs in each sequence (zero to many), overlap of blocks, and the order of multiple blocks within each sequence. Once these issues are decided, it is necessary to score individual competing blocks and then competing sets of blocks.

Keywords:   Alignment, Block, Depth first search in graph, Gibbs, MEME, Macaw, Protein, SP-score, World Wide Web (WWW)

Oxford Scholarship Online requires a subscription or purchase to access the full text of books within the service. Public users can however freely search the site and view the abstracts and keywords for each book and chapter.

Please, subscribe or login to access full text content.

If you think you should have access to this title, please contact your librarian.

To troubleshoot, please check our FAQs , and if you can't find the answer there, please contact us .