Jump to ContentJump to Main Navigation
Automated Machine Learning for Business$
Users without a subscription are not able to see the full content.

Kai R. Larsen and Daniel S. Becker

Print publication date: 2021

Print ISBN-13: 9780190941659

Published to Oxford Scholarship Online: July 2021

DOI: 10.1093/oso/9780190941659.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (oxford.universitypressscholarship.com). (c) Copyright Oxford University Press, 2022. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in OSO for personal use.date: 01 July 2022

Model Data

Model Data

Chapter:
(p.95) Section IV Model Data
Source:
Automated Machine Learning for Business
Author(s):

Kai R. Larsen

Daniel S. Becker

Publisher:
Oxford University Press
DOI:10.1093/oso/9780190941659.003.0004

After preparing your dataset, the business problem should be quite familiar, along with the subject matter and the content of the dataset. This section is about modeling data, using data to train algorithms to create models that can be used to predict future events or understand past events. The section shows where data modeling fits in the overall machine learning pipeline. Traditionally, we store real-world data in one or more databases or files. This data is extracted, and features and a target (T) are created and submitted to the “Model Data” stage (the topic of this section). Following the completion of this stage, the model produced is examined (Section V) and placed into production. With the model in the production system, present data generated from the real-world environment is inputted into the system. In the example case of a diabetes patient, we enter a new patient’s information electronic health record into the system, and a database lookup retrieves additional data for feature creation.

Keywords:   machine learning pipelines, automated machine learning, AutoML, Driverless AI, features, descriptive statistics, LogLoss, cross validation, learning curves, model speed, machine learning blueprints, imputation, one-hot encoding, regression, classification, model evaluation, ROC, F1-score, confusion matrix, DataRobot

Oxford Scholarship Online requires a subscription or purchase to access the full text of books within the service. Public users can however freely search the site and view the abstracts and keywords for each book and chapter.

Please, subscribe or login to access full text content.

If you think you should have access to this title, please contact your librarian.

To troubleshoot, please check our FAQs , and if you can't find the answer there, please contact us .