Jump to ContentJump to Main Navigation
Spoken Natural Language Dialog SystemsA Practical Approach$
Users without a subscription are not able to see the full content.

Ronnie W. Smith and D. Richard Hipp

Print publication date: 1995

Print ISBN-13: 9780195091878

Published to Oxford Scholarship Online: November 2020

DOI: 10.1093/oso/9780195091878.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (oxford.universitypressscholarship.com). (c) Copyright Oxford University Press, 2022. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in OSO for personal use.date: 22 May 2022

Performance of the Speech Recognizer and Parser

Performance of the Speech Recognizer and Parser

Chapter:
(p.219) Chapter 8 Performance of the Speech Recognizer and Parser
Source:
Spoken Natural Language Dialog Systems
Author(s):

Ronnie W. Smith

D. Richard Hipp

Publisher:
Oxford University Press
DOI:10.1093/oso/9780195091878.003.0010

The results of the experiments reported in the previous chapter were independently analyzed in order to measure the accuracy of the speech recognizer and parser. A summary of the key results of this analysis follows. • On average, one out of every three words emitted by the speech recognizer was in error. • On average, three out of every four sentences contained one or more speech recognition errors. • In spite of the high speech recognition error rate, the meanings of the spoken utterances were correctly deduced 83% of the time. • Finally, and perhaps most surprisingly, it was found that dialog expectation was helpful only as a tie-breaker in deducing the correct meaning of spoken utterances. . . . The remainder of this chapter describes the analysis in detail. The performance measurements of the speech recognizer and parser were computed from transcripts of 2804 individual utterances2 taken from the second and third sessions of the 8 experimental subjects. No information from the pilot subjects or from the first session with each subject was used in this analysis. Information about each utterance was collected and converted to a standardized, machine-readable format. The information that was collected follows. • The sequence of words actually spoken by the user. These were manually entered by the experimenters based on the audio recordings of the experiment. • The sequence of words recognized by the speech recognizer. This information was recorded automatically during the experiments. • The set of up to K minimum matching strings between elements of the hypothesis set and dialog expectations, together with an utterance cost and an expectation cost for each. (See section 5.8.5). • The final output of the parser. • The text spoken by the dialog controller immediately prior to the user’s utterance, and notes concerning the user’s utterance which were entered by the person who transcribed the utterance from the audio tapes. This information was used to assist in manually judging the correctness of each parse. . . . After the above information was collected and carefully audited to remove errors, the following additional features of each utterance were computed through a combination of automatic and manual processing.

Keywords:   Audio tape transcription, English grammar, Hypothesis set, Language coverage, Minimum matching string, Non-trivial utterances, Perplexity, SPHINX speech recognizer, Speech signal bandwidth

Oxford Scholarship Online requires a subscription or purchase to access the full text of books within the service. Public users can however freely search the site and view the abstracts and keywords for each book and chapter.

Please, subscribe or login to access full text content.

If you think you should have access to this title, please contact your librarian.

To troubleshoot, please check our FAQs , and if you can't find the answer there, please contact us .