逻辑回归

发表于 2019-04-17 更新于 2021-07-09 分类于机器学习/machine learning 阅读次数：12

本文字数： 9.9k 阅读时长 ≈ 18 分钟

逻辑回归（logistic regression）是分类算法，常用于二元分类

基本求导公式

对数函数求导

$y = l o g_{a}^{x} \Rightarrow y^{'} = \frac{1}{x l n (a)} \Rightarrow {l n (x)}^{'} = \frac{1}{x}$

幂函数求导

$y = \frac{1}{x^{n}} \Rightarrow y^{'} = - \frac{n}{x^{n + 1}}$

指数函数求导

$y = n^{x} \Rightarrow y^{'} = n^{x} \cdot l n (n) \Rightarrow {(e^{x})}^{'} = e^{x}$

sigmoid函数

sigmoid函数简称为S型函数，也称为logistic函数，公式如下：

$g (z) = \frac{1}{1 + e^{- z}}$

$e^{- z}$ 常写为 $e x p (- z)$ ，求导如下

$\frac{φ}{φ z} g (z) = \frac{- 1}{(1 + e^{- z})^{2}} \cdot {(e^{- z})}^{'} = \frac{- e^{- z}}{(1 + e^{- z})^{2}} \cdot {(- z)}^{'} = \frac{e^{- z}}{(1 + e^{- z})^{2}}$

$= \frac{1}{1 + e^{- z}} \cdot \frac{e^{- z}}{1 + e^{- z}} = \frac{1}{1 + e^{- z}} \cdot (1 - \frac{1}{1 + e^{- z}}) = g (z) \cdot (1 - g (z))$

其实现特性如下：

当输入值大于0时，输出趋近于1
当输入值小于0时，输出趋近于-1
当输入值等于0时，输出为0.5

import matplotlib.pyplot as plt
import numpy as np


def sigmoid(x):
    return 1 / (1 + np.exp(-1 * x))


def draw_sigmoid():
    x = np.linspace(-10, 10)
    y = sigmoid(np.array(x))

    plt.plot(x, y)
    plt.show()


if __name__ == '__main__':
    draw_sigmoid()

负对数似然代价函数

负对数似然代价函数计算公式如下

$c o s t (h (x; θ), y) = {\begin{matrix} - l n (h (x; θ)), y = 1 \\ - l n (1 - h (x; θ)), y = 0 \end{matrix}$

分两种情况

判定计算结果是否为1。当计算结果为1时，代价为0，否则代价随 $h (x; θ)$ 减少而增大
判定计算结果是否为0。当计算结果为0时，代价为0，否则代价随 $h (x; θ)$ 增大而增大

import matplotlib.pyplot as plt
import numpy as np

def cost(x):
    y1 = -1 * np.log(x)
    y2 = -1 * np.log(1 - x)
    return y1, y2


def draw_cost():
    x = np.linspace(0, 1)
    y1, y2 = cost(x)

    plt.plot(x, y1)
    plt.plot(x, y2)
    plt.show()


if __name__ == '__main__':
    draw_cost()

交叉熵损失函数

在二元分类中，结果y取值为0或1，将负对数似然代价函数的两种情况合并在一起得到交叉熵损失函数（cross entropy loss function）

$l o s s (h (x; θ), y) = - y l n (h (x; θ)) - (1 - y) l n (1 - h (x; θ))$

逻辑回归

逻辑回归的实现就是线性回归加上sigmoid操作，其线性操作如下：

$z_{θ} (x) = θ^{T} \cdot x = θ_{0} \cdot x_{0} + θ_{1} \cdot x_{1} + . . . + θ_{n} \cdot x_{n}$

逻辑回归模型实现公式如下：

$h (x; θ) = g (z_{θ} (x)) = g (θ^{T} \cdot x) = \frac{1}{1 + e^{- θ^{T} \cdot x}}$

对逻辑回归模型求导如下：

$\frac{φ}{φ θ_{i}} h (x; θ) = {h (θ^{T} \cdot x)}^{'} = {g (θ^{T} \cdot x)}^{'} = g (θ^{T} \cdot x) \cdot (1 - g (θ^{T} \cdot x)) \cdot {(θ^{T} \cdot x)}^{'}$ $= g (θ^{T} \cdot x) \cdot (1 - g (θ^{T} \cdot x)) \cdot x_{i} = h (x; θ) \cdot (1 - h (x; θ)) \cdot x_{i}$

逻辑回归利用sigmoid函数进行二元分类，首先对输入数据进行线性运算 $θ^{T} \cdot x$ ，再将结果输入sigmoid函数，压缩到 $[0, 1]$ 范围内，输出结果作为判别概率，表示输出结果为1的可能性，即 $h (x; θ) = P (y = 1 | x; θ)$ ，相对应的输出结果为0的概率为 $P (y = 0 | x; θ) = 1 - h (x; θ)$

逻辑回归利用交叉熵损失函数作为二元分类损失函数，公式如下：

$J (θ) = - \frac{1}{N} \sum_{j = 1}^{N} [\begin{matrix} y_{j} l n (h (x_{j}; θ)) + (1 - y_{j}) l n (1 - h (x_{j}; θ)) \end{matrix}]$

矩阵运算如下：

$\Rightarrow J (θ) = - \frac{1}{N} (Y^{T} \cdot l n (g (X \cdot θ)) + (1 - Y^{T}) \cdot l n (1 - g (X \cdot θ))$

其中 $X$ 大小为 $m \times (n + 1)$ ， $θ$ 大小为 $(n + 1) \times 1$ ， $Y$ 大小为 $m \times 1$ ， $m$ 表示样本数量， $n$ 表示权重数量

测试数据

使用numeric类型的德国信用数据，其包含24个变量和一个2类标签 - german.data-numeric

梯度下降

使用批量训练数据进行梯度计算，对损失函数求导如下：

$\frac{φ}{φ θ_{i}} J (θ) = - \frac{1}{N} \sum_{j = 1}^{N} {[\begin{matrix} y_{j} l n (h (x_{j}; θ)) + (1 - y_{j}) l n (1 - h (x_{j}; θ)) \end{matrix}]}^{'}$ $= - \frac{1}{N} \sum_{j = 1}^{N} [\begin{matrix} y_{j} \frac{1}{h (x_{j}; θ)} \cdot {h (θ^{T} \cdot x)}^{'} + (1 - y_{j}) \frac{1}{1 - h (x_{j}; θ)} \cdot {(1 - h (θ^{T} \cdot x))}^{'} \end{matrix}]$ $= - \frac{1}{N} \sum_{j = 1}^{N} [\begin{matrix} y_{j} \frac{1}{h (x_{j}; θ)} \cdot h (x_{j}; θ) \cdot (1 - h (x_{j}; θ)) \cdot x_{j, i} + (1 - y_{j}) \frac{1}{1 - h (x_{j}; θ)} \cdot - h (x_{j}; θ) \cdot (1 - h (x_{j}; θ)) \cdot x_{j, i} \end{matrix}]$ $= - \frac{1}{N} \sum_{j = 1}^{N} [\begin{matrix} y_{j} \cdot (1 - h (x_{j}; θ)) \cdot x_{j, i} - (1 - y_{j}) \cdot h (x_{j}; θ) \cdot x_{j, i} \end{matrix}]$ $= - \frac{1}{N} \sum_{j = 1}^{N} [\begin{matrix} y_{j} \cdot x_{j, i} - h (x_{j}; θ) \cdot x_{j, i} \end{matrix}] = \frac{1}{N} \sum_{j = 1}^{N} [\begin{matrix} (h (x_{j}; θ) - y_{j}) \cdot x_{j, i} \end{matrix}]$

其中 $x_{j, i}$ 表示第 $j$ 行第 $i$ 列，矩阵运算如下：

$\Rightarrow \frac{φ}{φ θ} J (θ) = \frac{1}{N} X^{T} \cdot (g (X \cdot θ) - Y)$

其中 $X$ 大小为 $m \times (n + 1)$ ， $θ$ 大小为 $(n + 1) \times 1$ ， $Y$ 大小为 $m \times 1$ ， $m$ 表示样本数量， $n$ 表示权重数量

实现

# -*- coding: utf-8 -*-

# @Time    : 19-4-18 上午9:22
# @Author  : zj


import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

data_path = '../data/german.data-numeric'


def load_data(tsize=0.8, shuffle=True):
    data_list = pd.read_csv(data_path, header=None, sep='\s+')

    data_array = data_list.values
    height, width = data_array.shape[:2]
    data_x = data_array[:, :(width - 1)]
    data_y = data_array[:, (width - 1)]

    x_train, x_test, y_train, y_test = train_test_split(data_x, data_y, train_size=tsize, test_size=(1 - tsize),
                                                        shuffle=shuffle)

    y_train = np.atleast_2d(np.array(list(map(lambda x: 1 if x == 2 else 0, y_train)))).T
    y_test = np.atleast_2d(np.array(list(map(lambda x: 1 if x == 2 else 0, y_test)))).T

    return x_train, y_train, x_test, y_test


def init_weights(inputs):
    """
    初始化权重，符合标准正态分布
    """
    return np.atleast_2d(np.random.uniform(size=inputs)).T


def sigmoid(x):
    return 1 / (1 + np.exp(-1 * x))


def logistic_regression(w, x):
    """
    w大小为(n+1)x1
    x大小为mx(n+1)
    """
    z = x.dot(w)
    return sigmoid(z)


def compute_loss(w, x, y):
    """
    w大小为(n+1)x1
    x大小为mx(n+1)
    y大小为mx1
    """
    lr_value = logistic_regression(w, x)
    n = y.shape[0]
    res = -1.0 / n * (y.T.dot(np.log(lr_value)) + (1 - y.T).dot(np.log(1 - lr_value)))
    return res[0][0]


def compute_gradient(w, x, y):
    """
    梯度计算
    """
    n = y.shape[0]
    lr_value = logistic_regression(w, x)
    return 1.0 / n * x.T.dot(lr_value - y)


def compute_predict_accuracy(predictions, y):
    results = predictions > 0.5
    res = len(list(filter(lambda x: x[0] == x[1], np.dstack((results, y))[:, 0]))) / len(results)
    return res


def draw(res_list, title=None, xlabel=None, ylabel=None):
    if title is not None:
        plt.title(title)
    if xlabel is not None:
        plt.xlabel(xlabel)
    plt.plot(res_list)
    plt.show()


if __name__ == '__main__':
    # 加载训练和测试数据
    # train_data, train_label, test_data, test_label = load_german_numeric(tsize=0.85, shuffle=False)
    train_data, train_label, test_data, test_label = load_data()

    # 根据训练数据计算均值和标准差
    mu = np.mean(train_data, axis=0)
    std = np.std(train_data, axis=0)

    # 标准化训练和测试数据
    train_data = (train_data - mu) / std
    test_data = (test_data - mu) / std

    # 添加偏置值
    train_data = np.insert(train_data, 0, np.ones(train_data.shape[0]), axis=1)
    test_data = np.insert(test_data, 0, np.ones(test_data.shape[0]), axis=1)

    # 定义步长、权重和偏置值
    lr = 0.01
    w = init_weights(train_data.shape[1])

    # 计算目标函数/损失函数以及梯度更新
    epoches = 20000

    loss_list = []
    accuracy_list = []
    loss = 0
    best_accuracy = 0
    best_w = None
    for i in range(epoches):
        loss += compute_loss(w, train_data, train_label)
        # 计算梯度
        gradient = compute_gradient(w, train_data, train_label)
        # 权重更新
        tempW = w - lr * gradient
        w = tempW

        if i % 50 == 49:
            # 每个50次记录一次
            # 计算精度
            accuracy = compute_predict_accuracy(logistic_regression(w, train_data), train_label)
            accuracy_list.append(accuracy)

            if accuracy > best_accuracy:
                best_accuracy = accuracy
                best_w = w.copy()
            # 计算损失
            loss_list.append(loss / 50)
            loss = 0

    draw(loss_list, title='损失值/50')
    draw(accuracy_list, title='训练集检测精度/50')
    print(max(accuracy_list))
    print(compute_predict_accuracy(logistic_regression(best_w, test_data), test_label))

随机梯度下降实现如下

...
import warnings

warnings.filterwarnings('ignore')
...
...
def compute_loss(w, x, y, isBatch=True):
    """
    w大小为(n+1)x1
    x大小为mx(n+1)
    y大小为mx1
    """
    lr_value = logistic_regression(w, x)
    if isBatch:
        n = y.shape[0]
        res = -1.0 / n * (y.T.dot(np.log(lr_value)) + (1 - y.T).dot(np.log(1 - lr_value)))
        return res[0][0]
    else:
        res = -1.0 * (y * (np.log(lr_value)) + (1 - y) * (np.log(1 - lr_value)))
        return res[0]


def compute_gradient(w, x, y, isBatch=True):
    """
    梯度计算
    """
    lr_value = logistic_regression(w, x)
    if isBatch:
        n = y.shape[0]
        return 1.0 / n * x.T.dot(lr_value - y)
    else:
        return np.atleast_2d(1.0 * x.T * (lr_value - y)).T
...
...
if __name__ == '__main__':
    ...
    ...
    # 计算目标函数/损失函数以及梯度更新
    epoches = 20
    num = train_label.shape[0]

    loss_list = []
    accuracy_list = []
    loss = 0
    best_accuracy = 0
    best_w = None
    for i in range(epoches):
        for j in range(num):
            loss += compute_loss(w, train_data[j], train_label[j], isBatch=False)
            # 计算梯度
            gradient = compute_gradient(w, train_data[j], train_label[j], isBatch=False)
            # 权重更新
            tempW = w - lr * gradient
            w = tempW

            if j % 50 == 49:
                # 每个50次记录一次
                # 计算精度
                accuracy = compute_predict_accuracy(logistic_regression(w, train_data), train_label)
                accuracy_list.append(accuracy)

                if accuracy > best_accuracy:
                    best_accuracy = accuracy
                    best_w = w.copy()
                # 计算损失
                loss_list.append(loss / 50)
                loss = 0

    draw(loss_list, title='损失值/50')
    draw(accuracy_list, title='训练集检测精度/50')
    print(max(accuracy_list))
    print(compute_predict_accuracy(logistic_regression(best_w, test_data), test_label))

小批量梯度下降结果如下：

# 计算目标函数/损失函数以及梯度更新
epoches = 200
batch_size = 128
num = train_label.shape[0]

loss_list = []
accuracy_list = []
loss = 0
best_accuracy = 0
best_w = None
for i in range(epoches):
    for j in range(0, num, 16):
        loss_list.append(compute_loss(w, train_data[j:j + batch_size], train_label[j:j + batch_size], isBatch=True))
        # 计算梯度
        gradient = compute_gradient(w, train_data[j:j + batch_size], train_label[j:j + batch_size], isBatch=True)
        # 权重更新
        tempW = w - lr * gradient
        w = tempW

        # 每个小批次记录一次
        # 计算精度
        accuracy = compute_predict_accuracy(logistic_regression(w, train_data), train_label)
        accuracy_list.append(accuracy)

        if accuracy > best_accuracy:
            best_accuracy = accuracy
            best_w = w.copy()

draw(loss_list, title='损失值')
draw(accuracy_list, title='训练集检测精度')
print(max(accuracy_list))
print(compute_predict_accuracy(logistic_regression(best_w, test_data), test_label))

RuntimeWarning: divide by zero encountered in log

计算损失过程中可能会出现精度错误：

1	data_loss = -1.0 / num_train * np.sum(y_batch * np.log(scores) + (1 - y_batch) * np.log(1 - scores))

修改如下：

eplison = 1e-5

scores = self.logistic_regression(X_batch)
data_loss = -1.0 / num_train * np.sum(y_batch * np.log(np.maximum(scores, eplison)) + (1 - y_batch) * np.log(np.maximum(1 - scores, eplison)))

小结

逻辑回归二元分类需要注意：

标签值的转换（将两类标签转换成0/1数值）
预测值的计算（计算单个预测值就能判断类别）

大海

逻辑回归

基本求导公式

sigmoid函数

负对数似然代价函数

交叉熵损失函数

逻辑回归

测试数据

梯度下降

实现

RuntimeWarning: divide by zero encountered in log

小结

相关阅读