Basics of Linear Algebra for Machine Learning学习记录


《Basics of Linear Algebra for Machine Learning》

姓名:胡成成 学号:2101210578 题目:《Basics of Linear Algebra for Machine Learning》学习报告(笔记)


本次学习内容为Jason Brownlee写的《Basics of Linear Algebra for Machine Learning》,主要包括机器学习底层的数学原理,线性代数在机器学习中的应用。全书共19个章节(不包括附录),每一节并不长。所有章节一共分为Foundations,Numpy,Matrices,Factorization,Statistic这五大板块展开介绍。

  • 第一部分:基础。了解线性代数以及它与机器学习领域的关系。它与机器学习领域的关系。
  • 第二部分:NumPy。发现NumPy教程,告诉你如何创建、索引、切分。和重塑NumPy数组,这是机器学习中使用的主要数据结构,也是本文中线性代数例子的基础。本书中线性代数例子的基础。
  • 第三部分:矩阵。发现在线性代数中保存和操作数据的关键结构。向量、矩阵和张量的线性代数的关键结构。
  • 第四部分:因式分解。发现一套将矩阵分解成其组成元素,以使数字运算更有效和更稳定。
  • 第五部分:统计学。通过线性代数及其在主成分分析和线性回归中的应用,发现统计学。

本书主要对将老师推荐的数目中比较基础的一本进行深入学习,记录关键要点的笔记,旨在通过该书打好机器学习深度学习的基础。

Foundations

1. Introduction to Linear Algebra

线性代数是一个数学领域,被普遍认为是深入理解机器学习的前提条件。深入理解机器学习的前提。虽然线性代数是一个很大的领域,有许多深奥的理论和发现,但是从线性代数中获得的一些工具和符号是非常重要的,从这个领域中获得的核心工具和符号对机器学习从业者来说是很实用的。对机器学习从业者来说是很实用的。有了关于线性代数的坚实基础,就有可能把注意力集中在线性代数上。

2. Linear Algebra and Machine Learning

本章节作者主要分析了不学习线性代数和学习线性代数的原因。

3. Examples of Linear Algebra in Machine Learning

本章节主要是具体线性代数在机器学习领域内的具体应用,包括:

  • Dataset and Data Files
  • Images and Photographs
  • One Hot Encoding
  • Linear Regression
  • Regularization
  • Principal Component Analysis
  • Singular-Value Decomposition
  • Latent Semantic Analysis
  • Recommender Systems
  • Deep Learning

Numpy

4. Introduction to NumPy Arrays

数组是机器学习中使用的主要数据结构。在Python中,来自NumPy库的数组,称为N维数组或ndarray,被作为主要的数据结构用于代表数据。

NumPy N-dimensional Array

  • Numpy基本用法:
# create array
from numpy import array
# create array
l = [1.0, 2.0, 3.0]
a = array(l)
# display array
print(a)
# display array shape
print(a.shape)
# display array data type
print(a.dtype)
[ 1. 2. 3.]
(3,)
float64

Functions to Create Arrays

  • empty()函数将创建一个指定形状的新数组。该函数的参数是一个数组或元组,指定要创建的数组的每个维度的长度。
# create empty array
from numpy import empty
a = empty([3,3])
print(a)
  • zeros()创建指定大小的全0数组
a = zeros([3,5])
  • ones()创建指定大小的全1数组
a = ones([5])

Combining Arrays

  • Vertical Stack:纵向堆叠数组
# create array with vstack
from numpy import array
from numpy import vstack
# create first array
a1 = array([1,2,3])
print(a1)
# create second array
a2 = array([4,5,6])
print(a2)
# vertical stack
a3 = vstack((a1, a2))
print(a3)
print(a3.shape)
[1 2 3]
[4 5 6]
[[1 2 3]
[4 5 6]]
(2, 3)
  • Horizontal Stack:横向堆叠数组
# create array with hstack
from numpy import array
from numpy import hstack
# create first array
a1 = array([1,2,3])
print(a1)
# create second array
a2 = array([4,5,6])
print(a2)
# create horizontal stack
a3 = hstack((a1, a2))
print(a3)
print(a3.shape)
[1 2 3]
[4 5 6]
[1 2 3 4 5 6]
(6,)

5. Index, Slice and Reshape NumPy Arrays

From List to Arrays

  • array() :将list变成array数组
# create two-dimensional array
from numpy import array
# list of data
data = [[11, 22],
[33, 44],
[55, 66]]
# array of data
data = array(data)
print(data)
print(type(data))

Array Indexing

  • 索引数组:
# index two-dimensional array
from numpy import array
# define array
data = array([
[11, 22],
[33, 44],
[55, 66]])
# index data
print(data[0,0])

注:索引数组有两种方式:data[0,0]data[0][0]

Array Slicing

  • data[from:to]:数组切片
# split train and test data
from numpy import array
# define array
data = array([
[11, 22, 33],
[44, 55, 66],
[77, 88, 99]])
# separate data
split = 2
train,test = data[:split,:],data[split:,:]
print(train)
print(test)

Array Reshaping

  • shape属性
# shape of a two-dimensional array
from numpy import array
# list of data
data = [[11, 22],
[33, 44],
[55, 66]]
# array of data
data = array(data)
print(data.shape)
  • Reshape
# reshape 2D array to 3D
from numpy import array
# list of data
data = [[11, 22],
[33, 44],
[55, 66]]
# array of data
data = array(data)
print(data.shape)
# reshape
data = data.reshape((data.shape[0], data.shape[1], 1))
print(data.shape)

6. NumPy Array Broadcasting

不同大小的数组不能进行加减运算,一般也不能用于算术。解决这个问题的方法是复制较小的数组,使其具有与大数组相同的维度和大小。大小的数组。这被称为数组广播,在NumPy中执行数组运算时可以使用,这可以大大减少和简化你的代码。

Scalar and One-Dimensional Array

# broadcast scalar to one-dimensional array
from numpy import array
# define array
a = array([1, 2, 3])
print(a)
# define scalar
b = 2
print(b)
# broadcast
c = a + b
print(c)
[1 2 3]
2
[3 4 5]

One-Dimensional and Two-Dimensional Arrays

# broadcast one-dimensional array to two-dimensional array
from numpy import array
# define two-dimensional array
A = array([
[1, 2, 3],
[1, 2, 3]])
print(A)
# define one-dimensional array
b = array([1, 2, 3])
print(b)
# broadcast
C = A + b
print(C)
[[1 2 3]
[1 2 3]]
[1 2 3]
[[2 4 6]
[2 4 6]]

Matrices

7. Vectors and Vector Arithmetic

向量是线性代数的一个基础元素。在整个机器学习领域,矢量被用于描述算法和过程,如目标变量(y)。在描述算法和过程时使用,如训练算法时的目标变量(y)。当训练一个算法时。

Defining a Vector

# create a vector
from numpy import array
# define vector
v = array([1, 2, 3])
print(v)

Vector Arithmetic

要求两个参与运算的矩阵维度一致或满足广播条件

  • Addition
  • Subtraction
  • Multiplication:注意这里的乘法是对应位置元素相乘,注意区分内积
# vector multiplication
from numpy import array
# define first vector
a = array([1, 2, 3])
print(a)
# define second vector
b = array([1, 2, 3])
print(b)
# multiply vectors
c = a * b
[1 2 3]
[1 2 3]
[1 4 9]
  • Division

Vector Dot Product

  • 矩阵点积,又称矩阵内积
# vector dot product
from numpy import array
# define first vector
a = array([1, 2, 3])
print(a)
# define second vector
b = array([1, 2, 3])
print(b)
# multiply vectors
c = a.dot(b)
print(c)
[1 2 3]
[1 2 3]
14

Vector-Scalar Multiplication

# vector-scalar multiplication
from numpy import array
# define vector
a = array([1, 2, 3])
print(a)
# define scalar
s = 0.5
print(s)
# multiplication
c = s * a
print(c)

8. Vector Norms

计算向量的长度或大小通常需要直接作为机器学习中的正则化方法,或者作为更广泛的向量或矩阵运算的一部分

  • L1 Norm
  • L2 Norm
  • Max Norm

Vector L1 Norm

# vector L1 norm
from numpy import array
from numpy.linalg import norm
# define vector
a = array([1, 2, 3])
print(a)
# calculate norm
l1 = norm(a, 1)
print(l1)
[1 2 3]
6.0

Vector L2 Norm

# vector L2 norm
from numpy import array
from numpy.linalg import norm
# define vector
a = array([1, 2, 3])
print(a)
# calculate norm
l2 = norm(a)
print(l2)
[1 2 3]
3.74165738677

Vector Max Norm

# vector max norm
from math import inf
from numpy import array
from numpy.linalg import norm
# define vector
a = array([1, 2, 3])
print(a)
# calculate norm
maxnorm = norm(a, inf)
print(maxnorm)
[1 2 3]
3.0
  • 本章节主要是如何使用Numpy计算常见的矩阵范式

9. Matrices and Matrix Arithmetic

矩阵是线性代数的一个基础元素。矩阵在整个机器学习领域被用于描述算法和过程,如输入数据变量。在描述算法和过程时使用矩阵,如训练算法时的输入数据变量 (X)训练算法时。

Defining a Matrix

# create matrix
from numpy import array
A = array([[1, 2, 3], [4, 5, 6]])
print(A)

Matrix Arithmetic

矩阵的算数运算和向量一样,只不过从一维上升到二维及以上

Matrix-Matrix Multiplication

  • 矩阵乘法
# matrix dot product
from numpy import array
# define first matrix
A = array([
[1, 2],
[3, 4],
[5, 6]])
print(A)
# define second matrix
B = array([
[1, 2],
[3, 4]])
print(B)
# multiply matrices
C = A.dot(B)
print(C)
# multiply matrices with @ operator
D = A @ B
print(D)

注:“@”符号也可以表示点乘

Matrix-Scalar Multiplication

同向量一致

10. Types of Matrices

  • Square Matrix
  • Symmetric Matrix
  • Triangular Matrix
  • Diagonal Matrix
  • Identity Matrix
  • Orthogonal Matrix

Square Matrix

  • 方阵:正方形矩阵是一个行数(n)与列数(m)相等的矩阵。

Symmetric Matrix

  • 对称矩阵:矩阵等于矩阵的转置

Triangular Matrix

  • 三角矩阵
# triangular matrices
from numpy import array
from numpy import tril
from numpy import triu
# define square matrix
M = array([
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
print(M)
# lower triangular matrix
lower = tril(M)
print(lower)
# upper triangular matrix
upper = triu(M)
print(upper)
[[1 2 3]
[1 2 3]
[1 2 3]]

[[1 0 0]
[1 2 0]
[1 2 3]]

[[1 2 3]
[0 2 3]
[0 0 3]]

Diagonal Matrix

  • 对角阵
# diagonal matrix
from numpy import array
from numpy import diag
# define square matrix
M = array([
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
print(M)
# extract diagonal vector
d = diag(M)
print(d)
# create diagonal matrix from vector
D = diag(d)
print(D)
[[1 2 3]
[1 2 3]
[1 2 3]]

[1 2 3]

[[1 0 0]
[0 2 0]
[0 0 3]]

Identity Matrix

  • 单位矩阵
# identity matrix
from numpy import identity
I = identity(3)
print(I)
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]]

Orthogonal Matrix

  • 正交矩阵:当两个向量的点积等于零时,它们就是正交的。矩阵的转置等于矩阵的逆
# orthogonal matrix
from numpy import array
from numpy.linalg import inv
# define orthogonal matrix
Q = array([
[1, 0],
[0, -1]])
print(Q)
# inverse equivalence
V = inv(Q)
print(Q.T)
print(V)
# identity equivalence
I = Q.dot(Q.T)
print(I)
[[ 1 0]
[ 0 -1]]

[[ 1 0]
[ 0 -1]]

[[ 1. 0.]
[-0. -1.]]

[[1 0]
[0 1]]

11. Matrix Operations

  • 转置Transpose
  • 逆Inverse
  • 迹Trace
  • 行列式Determinant
  • 秩Rank

Transpose

# transpose matrix
from numpy import array
# define matrix
A = array([
[1, 2],
[3, 4],
[5, 6]])
print(A)
# calculate transpose
C = A.T
print(C)

Inverse

# invert matrix
from numpy import array
from numpy.linalg import inv
# define matrix
A = array([
[1.0, 2.0],
[3.0, 4.0]])
print(A)
# invert matrix
B = inv(A)
print(B)
# multiply A and B
I = A.dot(B)
print(I)

Trace

# matrix trace
from numpy import array
from numpy import trace
# define matrix
A = array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(A)
# calculate trace
B = trace(A)
print(B)
[[1 2 3]
[4 5 6]
[7 8 9]]

15

Determinant

# matrix determinant
from numpy import array
from numpy.linalg import det
# define matrix
A = array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(A)
# calculate determinant
B = det(A)
print(B)
[[1 2 3]
[4 5 6]
[7 8 9]]

-9.51619735393e-16

Rank

# vector rank
from numpy import array
from numpy.linalg import matrix_rank
# rank
v1 = array([1,2,3])
print(v1)
vr1 = matrix_rank(v1)
print(vr1)
# zero rank
v2 = array([0,0,0,0,0])
print(v2)
vr2 = matrix_rank(v2)
print(vr2)
[1 2 3]
1

[0 0 0 0 0]
0

12. Sparse Matrices

主要包含零值的矩阵称为稀疏矩阵,与大多数值为非零值的矩阵(称为稠密矩阵)不同。大型稀疏矩阵在一般情况下很常见,尤其是在应用机器学习中,例如在包含计数的数据中,在将类别映射到计数的数据编码中,甚至在机器学习的整个子领域中,例如在自然语言处理中。将稀疏矩阵当作稠密矩阵来表示和使用,计算成本很高,通过使用专门处理矩阵稀疏性的表示和操作,可以大大提高性能。

Sparse Matrix

矩阵的稀疏性可以用分数来量化,分数是矩阵中零值的数量除以矩阵中元素的总数。

稀疏矩阵即非零元素占比较少。

Problems with Sparsity

  • Space Complexity:非常大的矩阵需要大量的内存,而我们想要处理的一些非常大的矩阵是稀疏的。

  • Time Complexity:假设一个非常大的稀疏矩阵可以放入内存中,我们将希望对这个矩阵执行操作。简单地说,如果矩阵主要包含零值,即没有数据,则在该矩阵上执行操作可能需要很长时间,其中执行的大部分计算将涉及将零值相加或相乘。

Sparse Matrices in Machine Learning

  • Data:

    • 用户是否观看了电影目录中的电影。
    • 用户是否在产品目录中购买过产品。
    • 计算歌曲目录中某一首歌的收听次数。
  • Data Preparation

    • 独热编码,用于表示分类数据为稀疏的二进制向量。
    • 计数编码,用于表示一个文件的词汇中的词频
    • TF-IDF编码,用于表示词汇中的归一化词频分数。
  • Areas of Study

    • 用于处理文本文件的自然语言处理。
    • 推荐系统,用于处理目录中的产品使用情况。
    • 处理含有大量黑色像素的图像时的计算机视觉。

Working with Sparse Matrices

  • 系稀疏矩阵的存储方式:

    • 键的字典。一个字典用于将行和列的索引映射到一个值。

    • 列表的列表。矩阵的每一行都被存储为一个列表,每个子列表都包含了列索引和值。

    • 坐标列表。一个图元的列表被存储,每个图元包含行索引,列索引和值。

    • 压缩的稀疏行。稀疏矩阵用三个一维的数组来表示非零值、行的范围和列的索引。

    • 压缩稀疏列。与压缩稀疏行的方法相同,除了列索引被压缩并在行索引之前先读取。

Sparse Matrices in Python

# sparse matrix
from numpy import array
from scipy.sparse import csr_matrix
# create dense matrix
A = array([
[1, 0, 0, 1, 0, 0],
[0, 0, 2, 0, 0, 1],
[0, 0, 0, 2, 0, 0]])
print(A)
# convert to sparse matrix (CSR method)
S = csr_matrix(A)
print(S)
# reconstruct dense matrix
B = S.todense()
print(B)
[[1 0 0 1 0 0]
[0 0 2 0 0 1]
[0 0 0 2 0 0]]

(0, 0) 1
(0, 3) 1
(1, 2) 2
(1, 5) 1
(2, 3) 2

[[1 0 0 1 0 0]
[0 0 2 0 0 1]
[0 0 0 2 0 0]]

$$sparsity = 1.0 - count_nonzero(A) / A.size$$

# sparsity calculation
from numpy import array
from numpy import count_nonzero
# create dense matrix
A = array([
[1, 0, 0, 1, 0, 0],
[0, 0, 2, 0, 0, 1],
[0, 0, 0, 2, 0, 0]])
print(A)
# calculate sparsity
sparsity = 1.0 - count_nonzero(A) / A.size
print(sparsity)
[[1 0 0 1 0 0]
[0 0 2 0 0 1]
[0 0 0 2 0 0]]

0.7222222222222222

13. Tensors and Tensor Arithmetic

在深度学习中,经常会看到围绕张量的大量讨论是基础数据结构。Tensor甚至以谷歌旗舰机器学习库的名义出现:TensorFlow。张量是线性代数中使用的一种数据结构,与向量和矩阵,你可以用张量计算算术运算。

What are Tensors

矢量是一个一维或一阶张量,矩阵是一个二维或二阶张量。

Tensor:即高维向量

# create tensor
from numpy import array
T = array([
[[1,2,3], [4,5,6], [7,8,9]],
[[11,12,13], [14,15,16], [17,18,19]],
[[21,22,23], [24,25,26], [27,28,29]]])
print(T.shape)
print(T)

Tensor Arithmetic

同向量与矩阵四则运算类似。

Tensor Product

# tensor product
from numpy import array
from numpy import tensordot
# define first vector
A = array([1,2])
# define second vector
B = array([3,4])
# calculate tensor product
C = tensordot(A, B, axes=0)
print(C)

Factorization

14. Matrix Decompositions

许多复杂的矩阵运算不能用有限的计算机精度有效或稳定地求解。矩阵分解是将矩阵简化为组成部分的方法,使计算更复杂的矩阵运算更容易。矩阵分解方法,也称矩阵分解法,是计算机中线性代数的基础,甚至是用于求解线性方程组、求逆、求矩阵行列式等基本运算的基础。

What is a Matrix Decomposition

矩阵分解是将矩阵简化为其组成部分的一种方法。可以简化可在计算机上执行的更复杂的矩阵运算。分解矩阵,而不是原始矩阵本身。矩阵的一个常见类比是数字的因式分解,例如将10分解为2×5。

LU Decomposition

LU分解适用于正方形矩阵,将矩阵分解为L和U两部分。
$$
A = L · U
$$
LU分解是使用迭代数值过程发现的,对于那些不能分解或不容易分解的矩阵,LU分解可能会失败。这种分解的一种变体在数值上更稳定,在实践中可以求解,这种变体被称为LUP分解,或带有部分枢轴的LU分解。
$$
A = L · U · P
$$
关于LUP分解的详细介绍,参考博客:线性代数之PLU分解

# LU decomposition
from numpy import array
from scipy.linalg import lu
# define a square matrix
A = array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(A)
# factorize
P, L, U = lu(A)
print(P)
print(L)
print(U)
# reconstruct
B = P.dot(L).dot(U)
print(B)
[[1 2 3]
[4 5 6]
[7 8 9]]

[[ 0. 1. 0.]
[ 0. 0. 1.]
[ 1. 0. 0.]]

[[ 1. 0. 0. ]
[ 0.14285714 1. 0. ]
[ 0.57142857 0.5 1. ]]

[[ 7.00000000e+00 8.00000000e+00 9.00000000e+00]
[ 0.00000000e+00 8.57142857e-01 1.71428571e+00]
[ 0.00000000e+00 0.00000000e+00 -1.58603289e-16]]

[[ 1. 2. 3.]
[ 4. 5. 6.]
[ 7. 8. 9.]]

QR Decomposition

QR分解适用于n×m矩阵(不限于正方形矩阵),并将矩阵分解为Q和R两部分。
$$
A = Q · R
$$

# QR decomposition
from numpy import array
from numpy.linalg import qr
# define rectangular matrix
A = array([
[1, 2],
[3, 4],
[5, 6]])
print(A)
# factorize
Q, R = qr(A, 'complete')
print(Q)
print(R)
# reconstruct
B = Q.dot(R)
print(B)
[[1 2]
[3 4]
[5 6]]

[[-0.16903085 0.89708523 0.40824829]
[-0.50709255 0.27602622 -0.81649658]
[-0.84515425 -0.34503278 0.40824829]]

[[-5.91607978 -7.43735744]
[ 0. 0.82807867]
[ 0. 0. ]]

[[ 1. 2.]
[ 3. 4.]
[ 5. 6.]]

Cholesky Decomposition

Cholesky分解适用于所有值都大于零的平方对称矩阵,即所谓的正定矩阵
$$
A = L · L^T
$$

# Cholesky decomposition
from numpy import array
from numpy.linalg import cholesky
# define symmetrical matrix
A = array([
[2, 1, 1],
[1, 2, 1],
[1, 1, 2]])
print(A)
# factorize
L = cholesky(A)
print(L)
# reconstruct
B = L.dot(L.T)
print(B)
[[2 1 1]
[1 2 1]
[1 1 2]]

[[ 1.41421356 0. 0. ]
[ 0.70710678 1.22474487 0. ]
[ 0.70710678 0.40824829 1.15470054]]

[[ 2. 1. 1.]
[ 1. 2. 1.]
[ 1. 1. 2.]]

15. Eigendecomposition

矩阵分解是一种有用的工具,可以将矩阵简化为其组成部分,从而简化一系列更复杂的操作。也许最常用的矩阵分解类型是特征分解,它将矩阵分解为特征向量和特征值。这种分解也在机器学习中使用的方法中发挥作用,例如主成分分析方法或PCA。

Eigendecomposition of a Matrix

$$
Av = λv
$$

$$
A = QΛQ^T
$$

Eigenvectors and Eigenvalues

特征向量是单位向量,这意味着它们的长度或大小等于1.0。它们通常被称为右向量,也就是列向量(与行向量或左向量相反)。右向量是我们理解的向量。特征值是应用于特征向量的系数,给出了向量的长度或大小。例如,负特征值可能会反转特征向量的方向,作为缩放的一部分。只有正特征值的矩阵称为正定矩阵,而如果特征值都为负,则称为负定矩阵。

Calculation of Eigendecomposition

  • eig():特征值分解,获得特征值与特征向量矩阵
# eigendecomposition
from numpy import array
from numpy.linalg import eig
# define matrix
A = array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(A)
# factorize
values, vectors = eig(A)
print(values)
print(vectors)
[[1 2 3]
[4 5 6]
[7 8 9]]

[ 1.61168440e+01 -1.11684397e+00 -9.75918483e-16]

[[-0.23197069 -0.78583024 0.40824829]
[-0.52532209 -0.08675134 -0.81649658]
[-0.8186735 0.61232756 0.40824829]]

Confirm an Eigenvector and Eigenvalue

# confirm eigenvector
from numpy import array
from numpy.linalg import eig
# define matrix
A = array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# factorize
values, vectors = eig(A)
# confirm first eigenvector
B = A.dot(vectors[:, 0])
print(B)
C = vectors[:, 0] * values[0]
print(C)
[ -3.73863537 -8.46653421 -13.19443305]
[ -3.73863537 -8.46653421 -13.19443305]

Reconstruct Matrix

# reconstruct matrix
from numpy import diag
from numpy.linalg import inv
from numpy import array
from numpy.linalg import eig
# define matrix
A = array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(A)
# factorize
values, vectors = eig(A)
# create matrix from eigenvectors
Q = vectors
# create inverse of eigenvectors matrix
R = inv(Q)
# create diagonal matrix from eigenvalues
L = diag(values)
# reconstruct the original matrix
B = Q.dot(L).dot(R)
print(B)
[[1 2 3]
[4 5 6]
[7 8 9]]

[[ 1. 2. 3.]
[ 4. 5. 6.]
[ 7. 8. 9.]]

16. Singular Value Decomposition

矩阵分解,也被称为矩阵分解,涉及到使用其组成元素描述给定矩阵。也许最广为人知、应用最广泛的矩阵分解方法是奇异值分解(SVD)。所有矩阵都有一个奇异值分解,这使得它比其他方法(如特征分解)更稳定。因此,它通常被广泛应用于压缩、去噪和数据缩减等领域。

SVD参考资料:强大的矩阵奇异值分解(SVD)

What is the Singular-Value Decomposition

$$
A = U · Σ · V^T
$$

Calculate Singular-Value Decomposition

# singular-value decomposition
from numpy import array
from scipy.linalg import svd
# define a matrix
A = array([
[1, 2],
[3, 4],
[5, 6]])
print(A)
# factorize
U, s, V = svd(A)
print(U)
print(s)
print(V)
[[1 2]
[3 4]
[5 6]]

[[-0.2298477 0.88346102 0.40824829]
[-0.52474482 0.24078249 -0.81649658]
[-0.81964194 -0.40189603 0.40824829]]

[ 9.52551809 0.51430058]

[[-0.61962948 -0.78489445]
[-0.78489445 0.61962948]]

Reconstruct Matrix

# reconstruct rectangular matrix from svd
from numpy import array
from numpy import diag
from numpy import zeros
from scipy.linalg import svd
# define matrix
A = array([
[1, 2],
[3, 4],
[5, 6]])
print(A)
# factorize
U, s, V = svd(A)
# create m x n Sigma matrix
Sigma = zeros((A.shape[0], A.shape[1]))
# populate Sigma with n x n diagonal matrix
Sigma[:A.shape[1], :A.shape[1]] = diag(s)
# reconstruct matrix
B = U.dot(Sigma.dot(V))
print(B)
[[1 2 3]
[4 5 6]
[7 8 9]]

[[ 1. 2. 3.]
[ 4. 5. 6.]
[ 7. 8. 9.]]

Pseudoinverse

伪逆是将方阵的矩阵逆推广到行数和列数不相等的矩形矩阵。在两个独立的方法发现者或广义逆之后,它也被称为摩尔-彭罗斯逆。
$$
A^+ = V · D^+ · U^T
$$

# pseudoinverse
from numpy import array
from numpy.linalg import pinv
# define matrix
A = array([
[0.1, 0.2],
[0.3, 0.4],
[0.5, 0.6],
[0.7, 0.8]])
print(A)
# calculate pseudoinverse
B = pinv(A)
print(B)
[[ 0.1 0.2]
[ 0.3 0.4]
[ 0.5 0.6]
[ 0.7 0.8]]

[[ -1.00000000e+01 -5.00000000e+00 9.04289323e-15 5.00000000e+00]
[ 8.50000000e+00 4.50000000e+00 5.00000000e-01 -3.50000000e+00]]
# pseudoinverse via svd
from numpy import array
from numpy.linalg import svd
from numpy import zeros
from numpy import diag
# define matrix
A = array([
[0.1, 0.2],
[0.3, 0.4],
[0.5, 0.6],
[0.7, 0.8]])
print(A)
# factorize
U, s, V = svd(A)
# reciprocals of s
d = 1.0 / s
# create m x n D matrix
D = zeros(A.shape)
# populate D with n x n diagonal matrix
D[:A.shape[1], :A.shape[1]] = diag(d)
# calculate pseudoinverse
B = V.T.dot(D.T).dot(U.T)
print(B)
[[ 0.1 0.2]
[ 0.3 0.4]
[ 0.5 0.6]
[ 0.7 0.8]]

[[ -1.00000000e+01 -5.00000000e+00 9.04831765e-15 5.00000000e+00]
[ 8.50000000e+00 4.50000000e+00 5.00000000e-01 -3.50000000e+00]]

Dimensionality Reduction

  • TruncatedSVD:截断奇异值分解
# svd data reduction in scikit-learn
from numpy import array
from sklearn.decomposition import TruncatedSVD
# define matrix
A = array([
[1,2,3,4,5,6,7,8,9,10],
[11,12,13,14,15,16,17,18,19,20],
[21,22,23,24,25,26,27,28,29,30]])
print(A)
# create transform
svd = TruncatedSVD(n_components=2)
# fit transform
svd.fit(A)
# apply transform
result = svd.transform(A)
print(result)
[[ 1 2 3 4 5 6 7 8 9 10]
[11 12 13 14 15 16 17 18 19 20]
[21 22 23 24 25 26 27 28 29 30]]

[[ 18.52157747 6.47697214]
[ 49.81310011 1.91182038]
[ 81.10462276 -2.65333138]]

Statistics

17. Introduction to Multivariate Statistics

基础统计学是应用机器学习中的有用工具,可以更好地理解数据。它们也是为更高级的线性代数运算和机器学习方法提供基础的工具,例如协方差矩阵和主成分分析。因此,在线性代数表示法的背景下,掌握基础统计学非常重要。

Expected Value and Mean

$$
E[X]=\sum x_{1} \times p_{1}, x_{2} \times p_{2}, x_{3} \times p_{3}, \cdots, x_{n} \times p_{n}
$$

# matrix means
from numpy import array
from numpy import mean
# define matrix
M = array([
[1,2,3,4,5,6],
[1,2,3,4,5,6]])
print(M)
# column means
col_mean = mean(M, axis=0)
print(col_mean)
# row means
row_mean = mean(M, axis=1)
print(row_mean)
[[1 2 3 4 5 6]
[1 2 3 4 5 6]]

[ 1. 2. 3. 4. 5. 6.]

[ 3.5 3.5]

$$
\bar{x}=\frac{1}{n} \times \sum_{i=1}^{n} x_{i}
$$

Variance and Standard Deviation

$$
\operatorname{Var}[X]=\sum p\left(x_{1}\right) \times\left(x_{1}-E[X]\right)^{2}, p\left(x_{2}\right) \times\left(x_{2}-E[X]\right)^{2}, \cdots, p\left(x_{n}\right) \times\left(x_{n}-E[X]\right)^{2}
$$

# matrix variances
from numpy import array
from numpy import var
# define matrix
M = array([
[1,2,3,4,5,6],
[1,2,3,4,5,6]])
print(M)
# column variances
col_var = var(M, ddof=1, axis=0)
print(col_var)
# row variances
row_var = var(M, ddof=1, axis=1)
print(row_var)
[[1 2 3 4 5 6]
[1 2 3 4 5 6]]

[ 0. 0. 0. 0. 0. 0.]

[ 3.5 3.5]

$$
s=\sqrt{\sigma^{2}}
$$

# matrix standard deviation
from numpy import array
from numpy import std
# define matrix
M = array([
[1,2,3,4,5,6],
[1,2,3,4,5,6]])
print(M)
# column standard deviations
col_std = std(M, ddof=1, axis=0)
print(col_std)
# row standard deviations
row_std = std(M, ddof=1, axis=1)
print(row_std)

Covariance and Correlation

  • 协方差

$$
\operatorname{cov}(X, Y)=\frac{1}{n} \times \sum(x-E[X]) \times(y-E[Y])
$$

# vector covariance
from numpy import array
from numpy import cov
# define first vector
x = array([1,2,3,4,5,6,7,8,9])
print(x)
# define second covariance
y = array([9,8,7,6,5,4,3,2,1])
print(y)
# calculate covariance
Sigma = cov(x,y)[0,1]
print(Sigma)
[1 2 3 4 5 6 7 8 9]
[9 8 7 6 5 4 3 2 1]

-7.5
  • 相关系数

$$
r=\frac{\operatorname{cov}(X, Y)}{s_{X} \times s_{Y}}
$$

# vector correlation
from numpy import array
from numpy import corrcoef
# define first vector
x = array([1,2,3,4,5,6,7,8,9])
print(x)
# define second vector
y = array([9,8,7,6,5,4,3,2,1])
print(y)
# calculate correlation
corr = corrcoef(x,y)[0,1]
print(corr)
[1 2 3 4 5 6 7 8 9]
[9 8 7 6 5 4 3 2 1]

-1.0

Covariance Matrix

$$
\begin{gathered}
\Sigma=E[(X-E[X] \times(Y-E[Y])] \
\Sigma_{i, j}=\operatorname{cov}\left(X_{i}, X_{j}\right)
\end{gathered}
$$

# covariance matrix
from numpy import array
from numpy import cov
# define matrix of observations
X = array([
[1, 5, 8],
[3, 5, 11],
[2, 4, 9],
[3, 6, 10],
[1, 5, 10]])
print(X)
# calculate covariance matrix
Sigma = cov(X.T)
print(Sigma)
[[ 1 5 8]
[ 3 5 11]
[ 2 4 9]
[ 3 6 10]
[ 1 5 10]]

[[ 1. 0.25 0.75]
[ 0.25 0.5 0.25]
[ 0.75 0.25 1.3 ]]

18. Principal Component Analysis

一种重要的降维机器学习方法叫做主成分分析。它是一种使用线性代数和统计学中的简单矩阵运算来计算原始数据到相同数量或更少维度的投影的方法。

主成分分析法参考资料:主成分分析(PCA)原理详解

What is Principal Component Analysis

$$
M=\operatorname{mean}(A)
$$
$$
C=A-M
$$
$$
V=\operatorname{cov}(C)
$$
$$
\text { values, vectors }=e i g(V)
$$

$$
B=\operatorname{select}(\text { values }, \text { vectors })
$$
$$
P=B^{T} \cdot A
$$

Calculate Principal Component Analysis

# principal component analysis
from numpy import array
from numpy import mean
from numpy import cov
from numpy.linalg import eig
# define matrix
A = array([
[1, 2],
[3, 4],
[5, 6]])
print(A)
# column means
M = mean(A.T, axis=1)
# center columns by subtracting column means
C = A - M
# calculate covariance matrix of centered matrix
V = cov(C.T)
# factorize covariance matrix
values, vectors = eig(V)
print(vectors)
print(values)
# project data
P = vectors.T.dot(C.T)
print(P.T)
[[1 2]
[3 4]
[5 6]]

[[ 0.70710678 -0.70710678]
[ 0.70710678 0.70710678]]

[8. 0.]

[[-2.82842712 0. ]
[ 0. 0. ]
[ 2.82842712 0. ]]

Principal Component Analysis in scikit-learn

# principal component analysis with scikit-learn
from numpy import array
from sklearn.decomposition import PCA
# define matrix
A = array([
[1, 2],
[3, 4],
[5, 6]])
print(A)
# create the transform
pca = PCA(2)
# fit transform
pca.fit(A)
# access values and vectors
print(pca.components_)
print(pca.explained_variance_)
# transform data
B = pca.transform(A)
print(B)

19. Linear Regression

线性回归是一种建模一个或多个自变量和因变量之间关系的方法。它是统计学的一个重要组成部分,通常被认为是一种很好的机器学习入门方法。它也是一种可以用矩阵表示法重新表述并用矩阵运算求解的方法。

What is Linear Regression

$$
y=b_{0}+\left(b_{1} \times x_{1}\right)+\left(b_{2} \times x_{2}\right)+\cdots
$$

Matrix Formulation of Linear Regression

$$
b=\left(X^{T} \cdot X\right)^{-1} \cdot X^{T} \cdot y
$$

Linear Regression Dataset

# linear regression dataset
from numpy import array
from matplotlib import pyplot
# define dataset
data = array([
[0.05, 0.12],
[0.18, 0.22],
[0.31, 0.35],
[0.42, 0.38],
[0.5, 0.49]])
print(data)
# split into inputs and outputs
X, y = data[:,0], data[:,1]
X = X.reshape((len(X), 1))
# scatter plot
pyplot.scatter(X, y)
pyplot.show()

Solve via Inverse

$$
b=\left(X^{T} \cdot X\right)^{-1} \cdot X^{T} \cdot y
$$

b = inv(X.T.dot(X)).dot(X.T).dot(y)
# direct solution to linear least squares
from numpy import array
from numpy.linalg import inv
from matplotlib import pyplot
# define dataset
data = array([
[0.05, 0.12],
[0.18, 0.22],
[0.31, 0.35],
[0.42, 0.38],
[0.5, 0.49]])
# split into inputs and outputs
X, y = data[:,0], data[:,1]
X = X.reshape((len(X), 1))
# linear least squares
b = inv(X.T.dot(X)).dot(X.T).dot(y)
print(b)
# predict using coefficients
yhat = X.dot(b)
# plot data and predictions
pyplot.scatter(X, y)
pyplot.plot(X, yhat, color='red')
pyplot.show()

Solve via QR Decomposition

$$
b=R^{-1} \cdot Q^{T} \cdot y
$$

# QR decomposition
Q, R = qr(X)
b = inv(R).dot(Q.T).dot(y)
# QR decomposition solution to linear least squares
from numpy import array
from numpy.linalg import inv
from numpy.linalg import qr
from matplotlib import pyplot
# define dataset
data = array([
[0.05, 0.12],
[0.18, 0.22],
[0.31, 0.35],
[0.42, 0.38],
[0.5, 0.49]])
# split into inputs and outputs
X, y = data[:,0], data[:,1]
X = X.reshape((len(X), 1))
# factorize
Q, R = qr(X)
b = inv(R).dot(Q.T).dot(y)
print(b)
# predict using coefficients
yhat = X.dot(b)
# plot data and predictions
pyplot.scatter(X, y)
pyplot.plot(X, yhat, color='red')
pyplot.show()
[ 1.00233226]

Solve via SVD and Pseudoinverse

# SVD solution via pseudoinverse to linear least squares
from numpy import array
from numpy.linalg import pinv
from matplotlib import pyplot
# define dataset
data = array([
[0.05, 0.12],
[0.18, 0.22],
[0.31, 0.35],
[0.42, 0.38],
[0.5, 0.49]])
# split into inputs and outputs
X, y = data[:,0], data[:,1]
X = X.reshape((len(X), 1))
# calculate coefficients
b = pinv(X).dot(y)
print(b)
# predict using coefficients
yhat = X.dot(b)
# plot data and predictions
pyplot.scatter(X, y)
pyplot.plot(X, yhat, color='red')
pyplot.show()

Solve via Convenience Function

通过奇异值分解法求解线性最小二乘的伪逆是事实上的标准。这是因为它是稳定的,适用于大多数数据集。NumPy提供了一个名为lstsq()的方便函数,该函数使用SVD方法求解线性最小二乘函数。该函数将X矩阵和y向量作为输入,并返回b系数、残差、提供的X矩阵的秩和奇异值。下面的示例演示了测试数据集上的lstsq()函数。

# least squares via convenience function
from numpy import array
from numpy.linalg import lstsq
from matplotlib import pyplot
# define dataset
data = array([
[0.05, 0.12],
[0.18, 0.22],
[0.31, 0.35],
[0.42, 0.38],
[0.5, 0.49]])
# split into inputs and outputs
X, y = data[:,0], data[:,1]
X = X.reshape((len(X), 1))
# calculate coefficients
b, residuals, rank, s = lstsq(X, y)
print(b)
# predict using coefficients
yhat = X.dot(b)
# plot data and predictions
pyplot.scatter(X, y)
pyplot.plot(X, yhat, color='red')
pyplot.show()
[ 1.00233226]

Summary

本书主要针对机器学习的线性代数基础展开,深入理解机器学习底层用到的数学原理,针对Python语言的Numpy,Scipy和Scikit-Learn包展开编程教学,对机器学习基础有很好的认识,为后续理解机器学习实现有了更深层次的数学理解!

通过对该书的学习,让我对数学原理应用于机器学习基础有了很深的理解,如何将数学理论基础转变为实践应用,对个人未来的算法学习有很大的帮助!


文章作者: 杰克成
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 杰克成 !
评论
  目录