Jump to ContentJump to Main Navigation
Pattern Discovery in Biomolecular DataTools, Techniques, and Applications$
Users without a subscription are not able to see the full content.

Jason T. L. Wang, Bruce A. Shapiro, and Dennis Shasha

Print publication date: 1999

Print ISBN-13: 9780195119404

Published to Oxford Scholarship Online: November 2020

DOI: 10.1093/oso/9780195119404.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (oxford.universitypressscholarship.com). (c) Copyright Oxford University Press, 2021. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in OSO for personal use. date: 05 December 2021

Discovering Concepts in Structural Data

Discovering Concepts in Structural Data

(p.145) Chapter 9 Discovering Concepts in Structural Data
Pattern Discovery in Biomolecular Data

Diane J. Cook

Lawrence B. Holder

Oxford University Press

The large amount of data collected today is quickly overwhelming researchers’ abilities to interpret the data and discover interesting patterns. In response to this problem, a number of researchers have developed techniques for discovering concepts in databases. These techniques work well for data expressed in a nonstructural, attribute-value representation and address issues of data relevance, missing data, noise and uncertainty, and utilization of domain knowledge (Fisher, 1987; Cheeseman and Stutz, 1996). However, recent data acquisition projects are collecting structural data describing the relationships among the data objects. Correspondingly, there exists a need for techniques to analyze and discover concepts in structural databases (Fayyad et al., 1996b). One method for discovering knowledge in structural data is the identification of common substructures. The goal is to find substructures capable of compressing the data and to identify conceptually interesting substructures that enhance the interpretation of the data. Substructure discovery is the process of identifying concepts describing interesting and repetitive substructures within structural data. Once discovered, the substructure concept can be used to simplify the data by replacing instances of the substructure with a pointer to the newly discovered concept. The discovered substructure concepts allow abstraction over detailed structure in the original data and provide new, relevant attributes for interpreting the data. Iteration of the substructure discovery and replacement process constructs a hierarchical description of the structural data in terms of the discovered substructures. This hierarchy provides varying levels of interpretation that can be accessed based on the goals of the data analysis. We describe a system called Subdue that discovers interesting substructures in structural data based on the minimum description length (MDL) principle. The Subdue system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously discovered substructures, multiple passes of Subdue produce a hierarchical description of the structural regularities in the data. Subdue uses a computationally bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints.

Keywords:   Chemical compound, Data mining, Encoding, Graph isomorphism, Knowledge-based technique, Metis graph partitioning package, Parallel and distributed processing, Subdue system

Oxford Scholarship Online requires a subscription or purchase to access the full text of books within the service. Public users can however freely search the site and view the abstracts and keywords for each book and chapter.

Please, subscribe or login to access full text content.

If you think you should have access to this title, please contact your librarian.

To troubleshoot, please check our FAQs , and if you can't find the answer there, please contact us .