數(shù)據(jù)準(zhǔn)備和特征工程——數(shù)據(jù)工程師必知必會技能
定 價:45 元
- 作者:齊偉
- 出版時間:2020/3/1
- ISBN:9787121382635
- 出 版 社:電子工業(yè)出版社
- 中圖法分類:TP274;TP18
- 頁碼:208
- 紙張:
- 版次:01
- 開本:16開
本書詳細(xì)地介紹了大數(shù)據(jù)、人工智能等項目中不可或缺的環(huán)節(jié)和內(nèi)容:數(shù)據(jù)準(zhǔn)備和特征工程。書中的每節(jié)首先以簡明方式介紹了基本知識;然后通過實際案例演示了基本知識的實際應(yīng)用,并提供了針對性練習(xí)項目,將“知識、案例、練習(xí)”融為一體;最后以“擴展探究”方式引導(dǎo)讀者進入更深廣的領(lǐng)域。本書既適合作為大學(xué)相關(guān)專業(yè)的教材,也適合作為大數(shù)據(jù)、人工智能等領(lǐng)域的開發(fā)人員的參考讀物。
齊偉,自稱老齊,現(xiàn)居蘇州,所著在線教程《零基礎(chǔ)學(xué)Python》及《零基礎(chǔ)學(xué)Python(第2版)》在業(yè)內(nèi)引起非常大的反響。愿意和來自各方的朋友討論技術(shù)問題,并能提供相關(guān)技術(shù)服務(wù)。
目錄
第1 章 感知數(shù)據(jù) ·································.001
1.0 了解數(shù)據(jù)科學(xué)項目 ································001
1.1 文件中的數(shù)據(jù) ··································003
1.1.1 CSV文件 ····································003
1.1.2 Excel文件 ···································009
1.1.3 圖像文件 ···································015
1.2 數(shù)據(jù)庫中的數(shù)據(jù) ·································019
1.3 網(wǎng)頁上的數(shù)據(jù) ··································029
1.4 來自API 的數(shù)據(jù) ·································039
第2 章 數(shù)據(jù)清理 ··································044
2.0 基本概念 ····································045
2.1 轉(zhuǎn)化數(shù)據(jù)類型 ··································046
2.2 處理重復(fù)數(shù)據(jù) ··································054
2.3 處理缺失數(shù)據(jù) ··································057
2.3.1 檢查缺失數(shù)據(jù) ·································058
2.3.2 用指定值填補 ·································063
2.3.3 根據(jù)規(guī)律填補 ·································069
2.4 處理離群數(shù)據(jù) ··································076
第3 章 特征變換 ···································083
3.0 特征的類型 ···································084
3.1 特征數(shù)值化 ···································085
3.2 特征二值化 ···································088
3.3 OneHot編碼 ···································093
3.4 數(shù)據(jù)變換 ····································098
3.5 特征離散化 ···································104
3.5.1 無監(jiān)督離散化 ·································104
3.5.2 有監(jiān)督離散化 ·································110
3.6 數(shù)據(jù)規(guī)范化 ···································113
第4 章 特征選擇 ···································124
4.0 特征選擇簡述 ··································124
4.1 封裝器法 ····································127
4.1.1 循序特征選擇 ·································127
4.1.2 窮舉特征選擇 ·································135
4.1.3 遞歸特征消除 ·································140
4.2 過濾器法 ····································144
4.3 嵌入法 ·····································149
第5 章 特征抽取 ···································154
5.1 無監(jiān)督特征抽取··································154
5.1.1 主成分分析 ··································154
5.1.2 因子分析 ···································161
5.2 有監(jiān)督特征抽取 ·································167
附錄A Jupyter簡介 ·································173
附錄B NumPy簡介 ··································176
附錄C Pandas簡介 ··································185
附錄D Matplotlib簡介 ································194
后記 ········································199