書籍詳細

書籍詳細




洋書

データサイエンスのための特徴抽出・選択:予測モデルへの実践的アプローチ

Feature Engineering and Selection : A Practical Approach for Predictive Models

(Chapman & Hall/crc Data Science)

Kuhn, Max   Johnson, Kjell

Chapman & Hall 2019/08
297 p. 25 cm   
装丁: Hrd    装丁について
テキストの言語: ENG    出版国: GB
ISBN: 9781138079229
KCN: 1035826075
紀伊國屋書店 選定タイトル
標準価格:¥12,603(本体 ¥11,458)   
Web販売価格あり    Web販売価格について

為替レートの変動や出版社の都合によって、価格が変動する場合がございます。

この商品は提携先の海外出版社在庫からの取り寄せとなります。品切れの場合、恐れ入りますがご了承下さい。

納期について
DDC: 005
KDC: F88 データベース
関連書リスト: SB3072B 人工知能とデータサイエンス2019/2020
SB3104B 人工知能とデータサイエンス2020
ご購入を希望される方は、
下のリンクをクリックしてください。

Annotation

This book describes techniques for finding the best representations of predictors for modeling the best subset of predictors for improving model performance. A variety of example data sets are used to illustrate the techniques along with R programs for reproducing the results.

Full Description

The process of developing predictive models includes many stages. Most resources focus on the modeling algorithms but neglect other critical aspects of the modeling process. This book describes techniques for finding the best representations of predictors for modeling and for nding the best subset of predictors for improving model performance. A variety of example data sets are used to illustrate the techniques along with R programs for reproducing the results.

Table of Contents

1. Introduction A Simple Example Important Concepts A More Complex Example Feature Selection An Outline of the Book Computing 2. Illustrative Example: Predicting Risk of Ischemic Stroke Splitting Preprocessing Exploration Predictive Modeling Across Sets Other Considerations Computing 3. A Review of the Predictive Modeling Process Illustrative Example: OkCupid Profile Data Measuring Performance Data Splitting Resampling Tuning Parameters and Overfitting Model Optimization and Tuning Comparing Models Using the Training Set Feature Engineering Without Overfitting Summary Computing 4. Exploratory Visualizations Introduction to the Chicago Train Ridership Data Visualizations for Numeric Data: Exploring Train Ridership Data Visualizations for Categorical Data: Exploring the OkCupid Data Post Modeling Exploratory Visualizations Summary Computing 5. Encoding Categorical Predictors Creating Dummy Variables for Unordered Categories Encoding Predictors with Many Categories Approaches for Novel Categories Supervised Encoding Methods Encodings for Ordered Data Creating Features from Text Data Factors versus Dummy Variables in Tree-Based Models Summary Computing 6. Engineering Numeric Predictors Transformations Many Transformations Many: Many Transformations Summary Computing 7. Detecting Interaction Effects Guiding Principles in the Search for Interactions Practical Considerations The Brute-Force Approach to Identifying Predictive Interactions Approaches when Complete Enumeration is Practically Impossible Other Potentially Useful Tools Summary Computing 8. Handling Missing Data Understanding the Nature and Severity of Missing Information Models that are Resistant to Missing Values Deletion of Data Encoding Missingness Imputation methods Special Cases Summary Computing 9. Working with Profile Data Illustrative Data: Pharmaceutical Manufacturing Monitoring What are the Experimental Unit and the Unit of Prediction? Reducing Background Reducing Other Noise Exploiting Correlation Impacts of Data Processing on Modeling Summary Computing 10. Feature Selection Overview Goals of Feature Selection Classes of Feature Selection Methodologies Effect of Irrelevant Features Overfitting to Predictors and External Validation A Case Study Next Steps Computing 11. Greedy Search Methods Illustrative Data: Predicting Parkinson's Disease Simple Filters Recursive Feature Elimination Stepwise Selection Summary Computing 12. Global Search Methods Naive Bayes Models Simulated Annealing Genetic Algorithms Test Set Results Summary Computing