Visits: 1

Machine learning class note

內容目錄

[Week1] Introduction/Linear regression with one variable/Linear algebra

Introduction

機器學習的定義
- 首先由 Arthur Samuel於1959年提出：Field of study that ives computers the ability to learn without being explictly programmed.
- 近代由Tom Mitchell於1998年提出：A program learn from experience(E) with respect to some task(T) and some performance measure(P), will improves E.
機器學習的主要演算法
- 監督式學習(Supervised learning)
- 非監督式學習(Unsupervised learning)
- 其他
  - 加強式學習(Reinforcement learning)
  - 推薦系統(Recommendent system)
監督式學習
- 給予正確的值，要求演算法回傳正確的答案(Regression problem)
- 給予觀測值，要求演算法回推機率(Classification problem: 0 vs 1)
非監督式學習
- 圖解監督vs非監督
- clustering problem
- 給電腦一群資料、一群特徵，請電腦幫我叢集(cluster)

Linear regression with one variable

為了檢視回歸預測的預測值與實際值的差距，給定一個cost function來評估模型的效益。

cost function
- J: cost function

test the latex string

$latex i\hbar\frac{\partial}{\partial t}\left|\Psi(t)\right>=H\left|\Psi(t)\right>$

cost function的最佳化(此例簡化，只有theta1)
cost function(同時有theta0與theta1)
- 實際上會用3D圖同時表示cost function(J)、theta0與theta1，如下
- 但是但是實際上很難懂，因此會把他壓成平面的等高線圖(countour figure)如下圖右所示，X橫軸是theta0縱軸是theta1，值就是cost function的值，輻射狀的圓心就是最佳的cost function
Gradient descent: 最小化cost function的方法
- 通常開始於將theta0=0, theta1=0來開始評估
- 持續的改變theta0與theta1直到cost function最小
- 以下圖說明，假設從最高點要往最低點走，概念上就是持續找尋能夠向下的點，直到你達到最低點為止，不過下圖顯示出有兩個可能的最低點(紅色箭頭所指)
- 數學式
  - learning rate: how big tha step we take downhill(就是要走多大步向下的意思)
  - derivative(導函數): 也就是求曲線的切線斜率
  - 同時更新theta0與theta1後再更新cost function的結果
- 從數學式看cost function如何收斂
  - 若在最低點左側，此時的導函數是負的，因此cost function會持續往右走
  - 若在最低點右側，此時的導函數是正的，因此cost function會持續往左走
  - 持續上面兩步驟，完成最低點的收斂，找出最好的cost function

Linear algebra

Matrices and vectors

Matrix
- 講解矩陣的基本，包含dimension, elements的表示法
- 書寫時通常使用大寫(A, B, C, …)
vector
- n*1的matrix
- 書寫時通常使用小寫(a, b, c, …)

Addition and scalar multiplication

Matrix addition
- 必須同樣的維度才可相加
Scalar multiplication
- scalar: 實數
- 實際上跟矩陣加法一樣，把矩陣內每個元素都與實數進行運算即可，不用考慮位置與維度

Matrix vector multiplication

算法
- yi的算法是，把矩陣A的每一列，分別與x向量的每個元素相乘後相加，得到y的第i個元素。
範例
以矩陣運算的概念執行線性回歸
- 把House sizes變成4*2的矩陣，第一欄都是1
- 把回歸式的theta0, theta1變成2*1的矩陣
- 進行乘法運算，結果即為h(x)

Matrix matrix multiplication

算法
- 計算矩陣C的第i欄位，是以矩陣A乘上矩陣B的第i欄
  - 矩陣A乘上矩陣B的第i欄(aka 矩陣*向量)
範例
- 2*3的矩陣，乘上3*2的矩陣，成為2*2的矩陣
以矩陣運算的概念快速執行多種的線性回歸預測

Matrix multiplication properties

矩陣運算幾種特性
- not commutative: 要注意先後順序會造成結果的不同
- associative: A*B*C = A*(B*C) = (A*B)*C
- identity matrix(I): 對角線是1其餘都是0的矩陣，任何矩陣*I都會是自己

Inverse and transpose

inverse: 反數，12*12^-1 = 1
square matrix: 如果矩陣A是m*m的矩陣，稱為square matrix，具有A*A^-1 = A^-1*A = I的特性
singular/degenerate: 意思是指沒有反矩陣的矩陣(例如 A = [0 0;0 0])
transpose: 矩陣的行列互換

[Week2] Linear Regression with Multiple Variables

Multivariate linear regression

Multiple features

file

Gradient descent for multiple variables

file

Gradient descent in practive I – feature scaling

就是把變數放到同樣的尺度上(類似標準化)，讓gradient descent進行的效率變快，能更快找到最適點
目標就是把feature scale到 -1 <= x <= 1之間，又稱mean normalization

Gradient descent in practive II – learning rate

file

Features and polynomial regression

假設要預測房價，變數有土地寬度與土地長度，可以將土地寬度與土地長度相乘，得到另外一個變數(土地面積)，用土地面積去評估可能更佳
多項式回歸

Computing parameters analytically

Normal equation

中文翻譯稱作正規方程式
正規方程式是除了gradient descent外，另外一種最小化cost function J的方法，作法是將m*n的變數矩陣，加上X0=1的向量成為變數矩陣X，並與實際值y進行矩陣運算，得到最小化的J(theta)。
與gradient descent的比較
- - gradient:
    - 要選擇alpha
    - 需要迭代
    - feature很多時表現良好
  - normal equation:
    - 不用選擇alpha
    - 不用迭代
    - feature很多時計算效率差

Normal equation noninvertibility

什麼情況下會有non-invertible的矩陣呢
- 有重複的特徵值(例如平方公尺與平方英寸)，在使用線性代數求解時，會發現兩者相關的變數並無法求出反矩陣
  - 建議修正：只保留一個特徵即可
- 資料列數比特徵數還少
  - 建議修正：刪掉部分特徵，或是進行regularization

Octave tutorial

Basicoperations

% 1. BASIC MATH
%: comment
5+6 &gt;&gt;&gt; ans = 11
3-2 &gt;&gt;&gt; ans = 1
5*8 &gt;&gt;&gt; ans = 40
1/2 &gt;&gt;&gt; ans = 0.50000
2^6 &gt;&gt;&gt; ans = 64
1 == 2 &gt;&gt;&gt; 0 % 0 is false, 1 is true
1 ~= 2 &gt;&gt;&gt; 1
1 &amp;&amp; 0  % &amp;&amp; means and
1 || 0  % || means or

% 2. VARIABLE
a = 3
a = 3;  % semicolon will supressing print output

a = pi;  % 3.1416
disp(sprintf(&#039;2 decimals: %0.2f&#039;, a))  % C style print func

format long(short) % 可指定顯示的小數位數為long或short

% 3. VECTORS AND MATRIXS
A = [1 2 3]
>&gt;&gt; A =
   1   2   3
A = [1 2; 3 4; 5 6]  % 分號(;)代表換列
>&gt;&gt; A =
   1   2
   3   4
   5   6
v = 1:2:10  % start:step:end
>&gt;&gt; 1  3  5  7  9

ones/zeros(x, y) % 可產生x列y欄的1/0矩陣
ones(2, 2)
>&gt;&gt; ans =

   1   1
   1   1
randn(x, y)  % 可產生x列y欄且符合均勻分布(uniform distribution)的矩陣
eye(x)  % 可產生x*x的單位矩陣(identity matrix)

Computing on data

% 取出矩陣中，各欄/列所出現的最大值

C = [1 2 3;4 5 6;7 8 9;];
max(C, [], 1)  % 各欄最大值, ans = 7 8 9
max(C, [], 2)  % 各列最大值, ans = 
    3
    6
    9

% 矩陣運算
sum(C, 1)  % 按照欄位加總, ans = 12   15   18

sum(C, 2)  % 按照列位加總, ans = 
    6
    15
    24

Vectorization

要計算下列函數時，有兩種解法，
- 左邊是未使用向量式的算法，採用迴圈運算。
- 右邊是使用向量式的算法，可直接將向量/矩陣轉置後，求得內積。
Octave語法

a = [1;2;3;4];
aa = [1 1; 1 2; 1 3; 1 4];
b = [4;5;6;7];

a * b  % error: operator *: nonconformant arguments (op1 is 4x1, op2 is 4x1)
aa * b  % error: operator *: nonconformant arguments (op1 is 4x2, op2 is 4x1)
a&#039; * b  % 回傳內積, ans = 60
a. * b  % 回傳矩陣/向量(加個小數點代表這是element-wise的作法，意思就是把矩陣/向量中的每個元素都進行處理，因此會回傳每個element), 
ans =

    4
   10
   18
   28

向量內積

Programming Assignment: linear regression

[Week3]

Classification and representation

Classification

什麼是分類問題？簡單來說，可以是二元分類問題(是/否)，也可以是其他的多元分群問題，此張先說明簡單的二元分類問題如下：

email是不是垃圾郵件？
癌症是良性?惡性?
鐵達尼號的乘客存活?死亡

這樣的問題，通常要評估的問題(也就是Y)會是0或1，0表示不是垃圾郵件/良性腫瘤/存活，而1代表的是垃圾郵件/惡性腫瘤/死亡…等等。

那麼要怎麼預測怎麼預測上述問題是0還是1呢？一個可行的作法是，使用線性回歸，來推斷機率，若機率 >= 0.5則預測y為1，反之為0，如下圖

file

但是這樣的作法有個問題，就是如果有個outlier出現，很容易造成誤差，如下圖所示，假設最右方出現了一個新的觀測值，這樣會造成新預測的回歸線(藍色)，預測為良性腫瘤的腫瘤大小變大，造成部分應該預測為惡性腫瘤的結果變成良性腫瘤。

file

因此需要其他的演算法來解決這樣的問題(如邏輯斯回歸logistic regression)。

Hypothesis representation

邏輯斯回歸(logistic regression)又稱為sigmoid function (g)，這個函數的目標是求出 0 <= y <= 1 的目標，數學上，就是把線性回歸的theta^T*x以sigmoid function 來表示，而g的定義是1/1+e^-z，把z用theta^T*x代入，即為sigmoid function。

file

那要怎麼解釋邏輯斯回歸的output呢？以下圖來解釋，邏輯斯回歸的解釋是我的輸入值x是y=1的機率有多高，以條件機率來表示：h(x) = P(y=1 | x;theta)。換成腫瘤的例子來看，就是某個腫瘤大小(x)是惡性腫瘤的機率有多高？

file

Decision boundary

這節講解的是，給定特定X後，我預測出來的h(x)要達到多少，我才會判定y=1或y=0呢？一個常用的作法是，h(x) >= 0.5則y=1；h(x) < 0.5則y=0，，參見下圖右上角的函數圖

file

那什麼是decision boundary呢？我們以下圖為例：設方程式為h(x) = g(5-x1+0*X2)，範例中我們把變數x1, x2彼此作圖，可以發現x1 > 5時，h(x) < 0，也就是y=0的區域，當x1 < 5時，h(x) > 0，也就是y=1的區域。這條x1=5的直線，就是x1與x2兩個自變數的desicion boundary。

file

Logistic regression model

Cost function

在邏輯斯回歸中，cost function的寫法為Cost(h(x), y)，因為是二元分類問題，y=1與y=0這兩種分類的cost function的寫法如下

file

Simplified cost function and gradient descent

上節提到的cost function，可改寫成一行的寫法如下：

file

gradient descent的方程式跟上一章的linear regression一樣，只是要把h(x)的定義換成logistic regression即可

file

Advanced optimization

講了幾種除了gradient descent以外，其他能將cost function最小化的演算法，以及octave的code

file

Multiclass classification

Multiclass classification: one vs all

先前的課程中，提到的是二元分類，而今天若是資料變成多元分類(如下圖右)，要怎麼作呢？

file

可行的作法是one-vs-all的方法，也就是把三種分類，兩兩之間進行邏輯斯回歸的比較，之後有新的x要分類時，才求出 max(h(x))即可判斷該x應該屬於何種分類。

file

Solving the problem of overfitting

The problem of overfitting

過度擬合(overfitting)的意思是為了讓資料完美的符合回歸函數，而導入了過多的特徵值，導致在預測訓練資料的準度非常高(cost function趨近於0)，但是在預測訓練資料以外時則非常不準確(這種現象稱為不夠通用、不夠generalize)。下圖是線性回歸的範例

file

這邊則提供了另外一個邏輯斯回歸的範例
file

避免overfitting的方法

把資料的特徵減少
- 根據領域知識(domain knowledge)決定要排除哪些特徵
- 透過model selection algorithm來決定
Regularization
- 把所有特徵都保留，但是把變數theta的值縮小

Cost function

把cost function中，把多餘特徵值的theta設的很大，讓多餘特徵值的值變得非常小，如此就可有效的避免overfitting。以下圖為例，把theta3與theta4的常數項設定一個非常大的值(1000)

file

結合上述觀念，我可以在cost function加入一個控制項(regularization term: lambda * theta^2)，讓特徵值可被regularization，要注意的是若lambda太大，在cost function的最小化過程中，因為過大的lambda會導致theta變的趨近於零，造成underfitting，連訓練資料都會fit的很差。

file

Regularized linear regression

Gradient descent

gradient descent的算法在前面有提過了，這邊要提的是1-(alpha*lambda)/m這個值，(alpha*lambda)/m > 0，所以1-(alpha*lambda)/m < 1。

file

Normal equation

file

Regularized logistic regression

file

[Week4] Non-linear hypotheses

Motivations

Non-linear hypotheses

假設現在要根據照片的pixel1與pixel2的顏色值來判斷這個物體是不是車子，他具有非線性的分配，且具有非常多的特徵值，如果用logistic regression來處理，過多的features會造成overfitting的現象，因此在下一章節會提出類神經網路(Neural Networks)來嘗試解決。
file

Neural networks

主要是講一些案例，如何透過刺激讓感覺受器的學習到別的感覺(例如中斷聽覺受器對聽的連結，變成讓視覺受器導向聽覺受器的話，看久了也會聽)

文獻：Roe et al., 1992

file

Neural networks

Model representation I & II

簡單的邏輯斯神經元模型，theta可被稱為weights(aka. parameters)
file

若複雜一點，變成多個神經元的組合預測模型，稱為 neural network ，如下圖
file

這邊定義兩個名詞： activation of unit i in layer j 與 matrix of weights，定義如下
file

類神經網路跟前一章學到的邏輯斯回歸很像，差別在於Layer2的特徵組合，是透過Layer1自己學習後產生的排列組合，因此在數學式上的表示方法為使用向量(matix of weights)來表示。而從input layer -> hidden layer -> output layer的這段過程，稱為 forward propagation。
file

Applications

Examples and intuitions I & II

Example 1

講了一下什麼是 XOR ，定義是：當兩個條件中有一個條件不成立時 (0 vs 1)，就等於條件成立了，用R來實作

p = c(0, 0, 1, 1)
q = c(0, 1, 0, 1)
xor(p, q)

>&gt;&gt; [1] FALSE  TRUE  TRUE FALSE

這邊講一下 AND function 與 OR function

file

Example 2
這邊示範如如何導出 x1 XNOR x2 的function
file

過程：從 AND 到 NOT AND 到 OR

input layer: +1(bias), x1, x2 [這是解AND]
 &gt; weight: -30, 20, 20
hidden layer: +1(bias), x1, x2  [這是解NOT AND]
 &gt; weight: 10, -20, -20
output layer: +1(bias), x1, x2  [這是解NOT AND]
 &gt; weight: -10, 20, 20

Multiclass classification

[Week5] Neural Networks: Learning

Cost function and backpropagation

Cost function

file

L = total number of layers in the network
$s_l$ = number of units (not counting bias unit) in layer l
K = number of output units/classes
$y^{(i)}_k$: 第i層的output layer中第k個元素

Backpropagation algorithm

最小化neural network的cost function的方法backpropagation

從output layer的&\delta&(delta)開始，往input layer算起

file

(好複雜)

Backpropagation intuition

file

這邊圖解一下什麼叫做forward propagation：假設有訓練資料$(x^{(i)},y^{(i)})$，在第一層中算出input的weights: $z^{(2)}_1$與$z^{(2)}_2$後，透過sigmoid function算出$a^{(2)}_1$與$a^{(2)}_2$，接下來再到下一層進行同樣的運算直到最後的output layer的這個過程。而到了第三層要計算$z^{(3)}_1$時的算式就是三種顏色的組合運算。

file
而backward propagation的作法就是從forward propagation的反向施行。可以把cost function想成是((預測值-實際值)^2)。

Backpropagation in practice

Implementation note: unrolling parameters

file

在neural network中，要把\Theta$(parameter)$與$D$(gradient)從matrix轉為vector(unroll)，才可進行最佳化，以下圖為例

file

底下示範將matrix轉為vector的指令

file

若要將vector轉回matrix可用reshape指令

file

Gradient checking

在套用backward propagation的時候，常常會有些微妙的bug出現，可用gradient checking來檢視。

作法如下圖，在要計算的$\theta$兩側，加減一個$\varepsilon$，並以公式計算。
file

Octave的寫法如下，最後要檢查gradApprox與DVec(backward propagion的值)是否趨近。
file

Random initialization

(不知道在幹嘛的一章orz|||)
翻譯：在神經網路的演算法中，將起始函數theta通通設為0是不可行的，因為在執行backpropagate時，所有的nodes會同時更新值。因此我們必須以隨機的$\theta$值來啟動神經網路演算法。
file

putting it together

統整一下在neural network學到的東西

pick a network architecture
- number of input units
- number of output units
- number of hidden layer(common default is 1; usually the more is better)

file

start training
- randomly initialize weights
- forward propagation
- cost function
- backward propagation
- gradient checking
- gradient descent to minimize $J(\theta)$

Application of neural networks

Autonomous driving

以自駕車為範例，neural network

[Week6] Advice for Applying Machine Learning

Evaluating a learning algorithm

Deciding what to try next

這邊以先前提過的房價預測問題，來討論模型評估效果太差時的作法，有幾種常用作法：

獲得更多的訓練資料
特徵值選擇少一些，避免過度擬合
選擇更多的特徵值
增加多項式的特徵值
增加/減少 regularlization的值

那具體來說，什麼情況要怎麼作呢？是憑感覺嗎？還是有什麼規則可以遵循呢？有種方法叫做 machine learning diagnostic: A test that you can run to gain insight what is/isn’t working with a learning algorithm, and gain guidance as to how best to imporve its performance.

evaluating a hypothesis

如何正確的訓練模型？
- 資料分割(split)成Train/Test set: 通常是7:3的比例，而且這是要隨機選取Train/Test的資料
線性回歸的模型評估

其實就是$(y預測值-y實際值)^2$
file

邏輯斯回歸的模型評估

其實就是$0/1錯誤分類$的值，(預測為0但實際為1與預測為1但實際為0)
file

model selection and train/validation/test sets

為了準確的評估模型，這邊會把所有資料分成 train, validation, test sets，分別是6:2:2的比例。多分出一個 validation 的意思是，用 validation set 的資料來評估訓練組的資料，得出一個最為通用的模型後，再用這個模型去 fit test資料。比較進階的作法是使用 k-fold cross validation ，請見我的另外一篇文章。

file

Bias vs variance

Diagnosing bias vs variance

如何定義bias與variance?這邊可以從 training error 與 cross validation error 開始講起，讓我們把這兩種 error 作圖如下，可以看到紫色的 training error 會隨著 degree of polynomial (d) 的增加而降低(符合 over fitting 的定義，表示train出來的模型跟非train data的資料fitting時的error很高)，而紅色的 cv error 首先在 d=1 時因為 under fitting而有較高的error ，在 d=2 時有較低的 error ，而在 d 越來越高時，同樣有著over fitting的現象導致 error 的增加。

file

下圖解釋什麼叫做 bias 與 variance ，

bias(underfit): train error 與 cv error相近，且都很高
variance(overfit): train error 很低但 cv error 很高

file

regularization and bias/variance

前面的章節提到的是比較簡單的沒有 regulation term 的狀況，那現在我們把它加進來討論吧。可以發現 $\lambda$很大的時候，會導致 bias 的情況(因為很大的$\lambda$會造成$\theta$趨近於0)，而很小的$\lambda$則會導致 over fitting 。

file

根據這樣的定義，我們可以把 cv error 與 train error的 $\J(\theta)$與$\lamdba$作圖，畫出下圖。這個圖的好處是，可以用來觀察與判斷 regulation term 要怎麼取。

file

Learning curves

High Bias情況
將cv error與train error的error與training set size作圖，所得出的曲線就是learning curve，以下圖為例，我們用線性回歸去預測右圖的資料，隨著訓練樣本的增加，cv error會持續下降(資料越多，分配到cv組的資料也越多，因此cv的誤差越小)並與持續增加的train error(資料越多，預測誤差越大)共同收斂。由此可知，當模型出現高bias的時候，增加訓練樣本並不會改善模型的效能。

file

High variance的情況
現在換成High variance(over fitting)的情況下，可以得知train error會隨著資料量的增加而增加，cv error一樣會隨著資料量的增加而降低，可以發現，cv error與train error間的差距是非常大的。這點可透過增加資料量而改善，因此當模型出現高variance的時候，增加訓練樣本可以改善模型的效能。
file

Deciding what to do next revisited

這邊先總結了前面幾章提到的，如何改善模型的評估。

file

套用到類神經網路中，要如何決定網路的複雜度呢？通常預設會選擇只有一層的hidden layer，先快速的算出模型後，再持續的嘗試。每一次的嘗試，都要用cv error來檢視，這次的嘗試(例如我把hiddey layer從1層 -> 2層)，我的cv error如何變化。

Building a spam classifier

Prioritizing what to work on

當我們要設計一個ML系統時有哪些事項要注意，課堂以建立spam classifier為例，在訓練資料的準備上，大概會有幾個方向

垃圾信件的範例
找出垃圾信件的特徵(如一些關鍵字、錯誤拼字、header與router等等)
發展演算法，偵測特徵(如錯誤拼字)

這些事情作完後，還可以作一點error analysis(下一章提到)，幫助ML系統的建置。

Error analysis

Start with a simple algorithm, implement it quickly, and test it early on your cross validation data.
Plot learning curves to decide if more data, more features, etc. are likely to help.
Manually examine the errors on examples in the cross validation set and try to spot a trend where most of the errors were made.

file

Handling skewed data

Error metrics for skewed classes

假設今天有個判斷腫瘤為良性/惡性的演算法，若我原本的分類演算法，能達到99%的正確率(1%錯誤率)，但是若調整我的演算法為 總是猜測為良性腫瘤 之後，得到的結果是99.5%的正確率(0.5%錯誤率)。我們並不能說我們的演算法有因此而改進，因為我們的資料非常的 skew (或可稱為數據傾斜)，需要其他的指標(metrics)來判斷這個演算法的好壞。以下是課堂上有提到的指標

Precision 與 Recall

通常會用precision與recall作為評估分類器的指標，如下投影片

file

Trading off precision and recall

上一章提過的precision與recall會有個權衡

file

那要怎麼選擇precision(P)與recall(R)呢？有幾種方法可參考
file

取平均：(P + R)/2
F1 Score: 2 ((P R) / (P + R ))

最多人使用的是F Score，在validation set中使用最為有效

[Coursera] Machine learning – Andrew Ng@Standford university

Machine learning class note

[Week1] Introduction/Linear regression with one variable/Linear algebra

Introduction

Linear regression with one variable

Linear algebra

[Week2] Linear Regression with Multiple Variables

Multivariate linear regression

Multiple features

Gradient descent for multiple variables

Gradient descent in practive I – feature scaling

Gradient descent in practive II – learning rate

Features and polynomial regression

Computing parameters analytically

Normal equation

Normal equation noninvertibility

Octave tutorial

Basicoperations

Computing on data

Vectorization

Programming Assignment: linear regression

[Week3]

Classification and representation

Classification

Hypothesis representation

Decision boundary

Logistic regression model

Cost function

Simplified cost function and gradient descent

Advanced optimization

Multiclass classification

Multiclass classification: one vs all

Solving the problem of overfitting

The problem of overfitting

Cost function

Regularized linear regression

Regularized logistic regression

[Week4] Non-linear hypotheses

Motivations

Non-linear hypotheses

Neural networks

Neural networks

Model representation I & II

Applications

Examples and intuitions I & II

Multiclass classification

[Week5] Neural Networks: Learning

Cost function and backpropagation

Cost function

Backpropagation algorithm

Backpropagation intuition

Backpropagation in practice

Implementation note: unrolling parameters

Gradient checking

Random initialization

putting it together

Application of neural networks

Autonomous driving

[Week6] Advice for Applying Machine Learning

Evaluating a learning algorithm

Deciding what to try next

evaluating a hypothesis

model selection and train/validation/test sets

Bias vs variance

Diagnosing bias vs variance

regularization and bias/variance

Learning curves

Deciding what to do next revisited

Building a spam classifier

Prioritizing what to work on

Error analysis

Handling skewed data

Error metrics for skewed classes

Trading off precision and recall

Using large datasets

Data for machine learning

[Week7]

[Week8]

[Week9]

[Week10]

發佈留言取消回覆