파이토치(PyTorch)¶
- 페이스북이 초기 루아(Lua) 언어로 개발된 토치(Torch)를 파이썬 버전으로 개발하여 2017년도에 공개
- 초기에 토치(Torch)는 넘파이(NumPy) 라이브러리처럼 과학 연산을 위한 라이브러리로 공개
- 이후 GPU를 이용한 텐서 조작 및 동적 신경망 구축이 가능하도록 딥러닝 프레임워크로 발전시킴
- 파이썬답게 만들어졌고, 유연하면서도 가속화된 계산 속도를 제공
파이토치 모듈 구조¶
파이토치의 구성요소¶
torch
: 메인 네임스페이스, 텐서 등의 다양한 수학 함수가 포함torch.autograd
: 자동 미분 기능을 제공하는 라이브러리torch.nn
: 신경망 구축을 위한 데이터 구조나 레이어 등의 라이브러리torch.multiprocessing
: 병럴처리 기능을 제공하는 라이브러리torch.optim
: SGD(Stochastic Gradient Descent)를 중심으로 한 파라미터 최적화 알고리즘 제공torch.utils
: 데이터 조작 등 유틸리티 기능 제공torch.onnx
: ONNX(Open Neural Network Exchange), 서로 다른 프레임워크 간의 모델을 공유할 때 사용
텐서(Tensors)¶
- 데이터 표현을 위한 기본 구조로 텐서(tensor)를 사용
- 텐서는 데이터를 담기위한 컨테이너(container)로서 일반적으로 수치형 데이터를 저장
- 넘파이(NumPy)의 ndarray와 유사
- GPU를 사용한 연산 가속 가능
import torch
텐서 초기화와 데이터 타입¶
초기화 되지 않은 텐서
t = torch.FloatTensor([0., 1., 2., 3., 4., 5., 6.])
print(t)
tensor([0., 1., 2., 3., 4., 5., 6.])
무작위로 초기화된 텐서
데이터 타입(dtype)이 long이고, 0으로 채워진 텐서
사용자가 입력한 값으로 텐서 초기화
x = 10
2 x 4 크기, double 타입, 1로 채워진 텐서
x와 같은 크기, float 타입, 무작위로 채워진 텐서
텐서의 크기 계산
데이터 타입(Data Type)¶
Data type | dtype | CPU tensor | GPU tensor |
---|---|---|---|
32-bit floating point | torch.float32 or torch.float |
torch.FloatTensor |
torch.cuda.FloatTensor |
64-bit floating point | torch.float64 or torch.double |
torch.DoubleTensor |
torch.cuda.DoubleTensor |
16-bit floating point | torch.float16 or torch.half |
torch.HalfTensor |
torch.cuda.HalfTensor |
8-bit integer(unsinged) | torch.uint8 |
torch.ByteTensor |
torch.cuda.ByteTensor |
8-bit integer(singed) | torch.int8 |
torch.CharTensor |
torch.cuda.CharTensor |
16-bit integer(signed) | torch.int16 or torch.short |
torch.ShortTensor |
torch.cuda.ShortTensor |
32-bit integer(signed) | torch.int32 or torch.int |
torch.IntTensor |
torch.cuda.IntTensor |
64-bit integer(signed) | torch.int64 or torch.long |
torch.LongTensor |
torch.cuda.LongTensor |
CUDA Tensors¶
.to
메소드를 사용하여 텐서를 어떠한 장치(cpu, gpu)로도 옮길 수 있음
x = torch.randn(1)
print(x)
print(x.item())
print(x.dtype)
tensor([-0.5172])
-0.5171635746955872
torch.float32
device = torch.device('cuda' if torch.cuda. is_available() else 'cpu')
print(device)
y = torch.ones_like(x, device = device)
print(y)
x = x.to(device)
print(x)
z = x + y
print(z)
print(z.to('cpu', torch.double))
cpu
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-41-1cae97758f4a> in <cell line: 3>()
1 device = torch.device('cuda' if torch.cuda. is_available() else 'cpu')
2 print(device)
----> 3 y = torch.ones_like(x, device = device)
4 print(y)
5 x = x.to(device)
NameError: name 'x' is not defined
다차원 텐서 표현¶
0D Tensor(Scalar)
- 하나의 숫자를 담고 있는 텐서(tensor)
- 축과 형상이 없음
t0 = torch.tensor(0)
print(t0.ndim)
print(t0.shape)
print(t0)
0
torch.Size([])
tensor(0)
1D Tensor(Vector)
- 값들을 저장한 리스트와 유사한 텐서
- 하나의 축이 존재
t1 = torch.tensor([1, 2, 3])
print(t1.ndim)
print(t1.shape)
print(t1)
1
torch.Size([3])
tensor([1, 2, 3])
2D Tensor(Matrix)
- 행렬과 같은 모양으로 두개의 축이 존재
- 일반적인 수치, 통계 데이터셋이 해당
- 주로 샘플(samples)과 특성(features)을 가진 구조로 사용
t2 = torch.tensor([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(t2.ndim)
print(t2.shape)
print(t2)
2
torch.Size([3, 3])
tensor([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
3D Tensor
- 큐브(cube)와 같은 모양으로 세개의 축이 존재
- 데이터가 연속된 시퀀스 데이터나 시간 축이 포함된 시계열 데이터에 해당
- 주식 가격 데이터셋, 시간에 따른 질병 발병 데이터 등이 존재
- 주로 샘플(samples), 타임스텝(timesteps), 특성(features)을 가진 구조로 사용
4D Tensor
- 4개의 축
- 컬러 이미지 데이터가 대표적인 사례 (흑백 이미지 데이터는 3D Tensor로 가능)
- 주로 샘플(samples), 높이(height), 너비(width), 컬러 채널(channel)을 가진 구조로 사용
5D Tensor
- 5개의 축
- 비디오 데이터가 대표적인 사례
- 주로 샘플(samples), 프레임(frames), 높이(height), 너비(width), 컬러 채널(channel)을 가진 구조로 사용
텐서의 연산(Operations)¶
- 텐서에 대한 수학 연산, 삼각함수, 비트 연산, 비교 연산, 집계 등 제공
import math
a = torch.rand(1, 2) * 2 - 1
print(a)
print(torch.abs(a))
print(torch.ceil(a))
print(torch.floor(a))
print(torch.clamp(a, -0.5, 0.5))
tensor([[-0.7389, -0.0339]])
tensor([[0.7389, 0.0339]])
tensor([[-0., -0.]])
tensor([[-1., -1.]])
tensor([[-0.5000, -0.0339]])
print(a)
print(torch.min(a))
print(torch.max(a))
print(torch.mean(a))
print(torch.std(a))
print(torch.prod(a))
print(torch.unique(torch.tensor([1, 2, 3, 1, 2, 3])))
tensor([[-0.7389, -0.0339]])
tensor(-0.7389)
tensor(-0.0339)
tensor(-0.3864)
tensor(0.4986)
tensor(0.0250)
tensor([1, 2, 3])
max
와 min
은 dim
인자를 줄 경우 argmax와 argmin도 함께 리턴
- argmax: 최대값을 가진 인덱스
- argmin: 최소값을 가진 인덱스
x = torch.rand(2, 2)
print(x)
print(x.max(dim=0))
print(x.max(dim=1))
tensor([[0.7870, 0.3448],
[0.5940, 0.6871]])
torch.return_types.max(
values=tensor([0.7870, 0.6871]),
indices=tensor([0, 1]))
torch.return_types.max(
values=tensor([0.7870, 0.6871]),
indices=tensor([0, 1]))
print(x)
print(x.min(dim=0))
print(x.min(dim=1))
tensor([[0.7870, 0.3448],
[0.5940, 0.6871]])
torch.return_types.min(
values=tensor([0.5940, 0.3448]),
indices=tensor([1, 0]))
torch.return_types.min(
values=tensor([0.3448, 0.5940]),
indices=tensor([1, 0]))
x = torch.rand(2, 2)
print(x)
y = torch.rand(2, 2)
print(y)
tensor([[0.9778, 0.2322],
[0.9288, 0.9593]])
tensor([[0.4161, 0.8909],
[0.9978, 0.6027]])
torch.add
: 덧셈
print(x + y)
print(torch.add(x, y))
tensor([[1.3939, 1.1232],
[1.9266, 1.5620]])
tensor([[1.3939, 1.1232],
[1.9266, 1.5620]])
결과 텐서를 인자로 제공
result = torch.empty(2, 4)
torch.add(x, y, out=result)
print(result)
tensor([[1.3939, 1.1232],
[1.9266, 1.5620]])
C:\Users\xodus\AppData\Local\Temp\ipykernel_36412\2363212739.py:2: UserWarning: An output with one or more elements was resized since it had shape [2, 4], which does not match the required output shape [2, 2]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\Resize.cpp:35.)
torch.add(x, y, out=result)
in-place
방식
- in-place방식으로 텐서의 값을 변경하는 연산 뒤에는 _''가 붙음
x.copy_(y), x.t_()
print(x)
print(y)
y.add_(x)
print(y)
tensor([[0.9778, 0.2322],
[0.9288, 0.9593]])
tensor([[0.4161, 0.8909],
[0.9978, 0.6027]])
tensor([[1.3939, 1.1232],
[1.9266, 1.5620]])
torch.sub
: 뺄셈
print(x)
print(y)
print(x - y)
print(torch.sub(x, y))
print(x.sub(y))
tensor([[0.9778, 0.2322],
[0.9288, 0.9593]])
tensor([[1.3939, 1.1232],
[1.9266, 1.5620]])
tensor([[-0.4161, -0.8909],
[-0.9978, -0.6027]])
tensor([[-0.4161, -0.8909],
[-0.9978, -0.6027]])
tensor([[-0.4161, -0.8909],
[-0.9978, -0.6027]])
torch.mul
: 곱셉
print(x)
print(y)
print(x * y)
print(torch.mul(x, y))
print(x.mul(y))
tensor([[0.9778, 0.2322],
[0.9288, 0.9593]])
tensor([[1.3939, 1.1232],
[1.9266, 1.5620]])
tensor([[1.3629, 0.2608],
[1.7893, 1.4984]])
tensor([[1.3629, 0.2608],
[1.7893, 1.4984]])
tensor([[1.3629, 0.2608],
[1.7893, 1.4984]])
torch.div
: 나눗셈
print(x)
print(y)
print(x / y)
print(torch.div(x, y))
print(x.div(y))
tensor([[0.9778, 0.2322],
[0.9288, 0.9593]])
tensor([[1.3939, 1.1232],
[1.9266, 1.5620]])
tensor([[0.7015, 0.2068],
[0.4821, 0.6142]])
tensor([[0.7015, 0.2068],
[0.4821, 0.6142]])
tensor([[0.7015, 0.2068],
[0.4821, 0.6142]])
torch.mm
: 내적(dot product)
print(x)
print(y)
print(torch.matmul(x, y))
z = torch.mm(x, y)
print(z)
print(torch.svd(z))
tensor([[0.9778, 0.2322],
[0.9288, 0.9593]])
tensor([[1.3939, 1.1232],
[1.9266, 1.5620]])
tensor([[1.8103, 1.4609],
[3.1428, 2.5416]])
tensor([[1.8103, 1.4609],
[3.1428, 2.5416]])
torch.return_types.svd(
U=tensor([[-0.4988, -0.8667],
[-0.8667, 0.4988]]),
S=tensor([4.6635e+00, 2.0650e-03]),
V=tensor([[-0.7777, -0.6286],
[-0.6286, 0.7777]]))
텐서의 조작(Manipulations)¶
인덱싱(Indexing): NumPy처럼 인덱싱 형태로 사용가능
x = torch.Tensor([[1, 2],
[3, 4]])
print(x)
print(x[0, 0])
print(x[0, 1])
print(x[1, 0])
print(x[1, 1])
print(x[:, 0])
print(x[:, 1])
print(x[0, :])
print(x[1, :])
tensor([[1., 2.],
[3., 4.]])
tensor(1.)
tensor(2.)
tensor(3.)
tensor(4.)
tensor([1., 3.])
tensor([2., 4.])
tensor([1., 2.])
tensor([3., 4.])
view
: 텐서의 크기(size)나 모양(shape)을 변경
- 기본적으로 변경 전과 후에 텐서 안의 원소 개수가 유지되어야 함
- -1로 설정되면 계산을 통해 해당 크기값을 유추
x = torch.randn(4, 5)
print(x)
y = x.view(20)
print(y)
z = x.view(5, -1)
print(z)
tensor([[ 1.3183, 0.4399, 0.4315, 0.7125, -0.4232],
[ 0.3619, -1.0817, -0.5947, -0.5867, 0.4631],
[-2.2517, 0.8459, 1.5883, -0.3597, -0.1833],
[-1.4298, -0.5841, -1.0301, -0.6693, 0.1959]])
tensor([ 1.3183, 0.4399, 0.4315, 0.7125, -0.4232, 0.3619, -1.0817, -0.5947,
-0.5867, 0.4631, -2.2517, 0.8459, 1.5883, -0.3597, -0.1833, -1.4298,
-0.5841, -1.0301, -0.6693, 0.1959])
tensor([[ 1.3183, 0.4399, 0.4315, 0.7125],
[-0.4232, 0.3619, -1.0817, -0.5947],
[-0.5867, 0.4631, -2.2517, 0.8459],
[ 1.5883, -0.3597, -0.1833, -1.4298],
[-0.5841, -1.0301, -0.6693, 0.1959]])
item
: 텐서에 값이 단 하나라도 존재하면 숫자값을 얻을 수 있음
x = torch.rand(1)
print(x)
print(x.item())
print(x.dtype)
tensor([0.5269])
0.5268673896789551
torch.float32
스칼라값 하나만 존재해야 item()
사용 가능
tensor([0.6151, 0.8299])
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[24], line 3
1 x = torch.rand(2)
2 print(x)
----> 3 print(x.item())
4 print(x.dtype)
RuntimeError: a Tensor with 2 elements cannot be converted to Scalar
squeeze
: 차원을 축소(제거)
tensor = torch.rand(1, 3, 3)
print(tensor)
print(tensor.shape)
tensor([[[0.9809, 0.5227, 0.9596],
[0.2491, 0.3174, 0.0162],
[0.4459, 0.2969, 0.0172]]])
torch.Size([1, 3, 3])
t = tensor.squeeze()
print(t)
print(t.shape)
tensor([[0.9809, 0.5227, 0.9596],
[0.2491, 0.3174, 0.0162],
[0.4459, 0.2969, 0.0172]])
torch.Size([3, 3])
unsqueeze
: 차원을 증가(생성)
t = torch.rand(3, 3)
print(t)
print(t.shape)
tensor([[0.7453, 0.8929, 0.4483],
[0.2229, 0.6966, 0.1367],
[0.0719, 0.9958, 0.3260]])
torch.Size([3, 3])
tensor = t.unsqueeze(dim=0)
print(tensor)
print(tensor.shape)
tensor([[[0.7453, 0.8929, 0.4483],
[0.2229, 0.6966, 0.1367],
[0.0719, 0.9958, 0.3260]]])
torch.Size([1, 3, 3])
tensor = t.unsqueeze(dim=2)
print(tensor)
print(tensor.shape)
tensor([[[0.7453],
[0.8929],
[0.4483]],
[[0.2229],
[0.6966],
[0.1367]],
[[0.0719],
[0.9958],
[0.3260]]])
torch.Size([3, 3, 1])
stack
: 텐서간 결합
x = torch.FloatTensor([1, 4])
print(x)
y = torch.FloatTensor([2, 5])
print(y)
z = torch.FloatTensor([3, 6])
print(z)
print(torch.stack([x, y, z]))
tensor([1., 4.])
tensor([2., 5.])
tensor([3., 6.])
tensor([[1., 4.],
[2., 5.],
[3., 6.]])
cat
: 텐서를 결합하는 메소드(concatenate)
- 넘파이의
stack
과 유사하지만, 쌓을dim
이 존재해야함 - 해당 차원을 늘려준 후 결합
a = torch.randn(1, 3, 3)
print(a)
b = torch.randn(1, 3, 3)
print(b)
c = torch.cat((a, b), dim=0)
print(c)
print(c.size())
tensor([[[ 0.9720, 0.5302, 0.3912],
[-0.3124, -0.6052, 0.6506],
[ 2.0312, 0.1410, -0.0298]]])
tensor([[[-1.1560, -0.2555, 1.0128],
[ 0.7819, 0.3953, 1.7806],
[-1.0710, 0.2716, -1.2741]]])
tensor([[[ 0.9720, 0.5302, 0.3912],
[-0.3124, -0.6052, 0.6506],
[ 2.0312, 0.1410, -0.0298]],
[[-1.1560, -0.2555, 1.0128],
[ 0.7819, 0.3953, 1.7806],
[-1.0710, 0.2716, -1.2741]]])
torch.Size([2, 3, 3])
c = torch.cat((a, b), dim=1)
print(c)
print(c.size())
tensor([[[ 0.9720, 0.5302, 0.3912],
[-0.3124, -0.6052, 0.6506],
[ 2.0312, 0.1410, -0.0298],
[-1.1560, -0.2555, 1.0128],
[ 0.7819, 0.3953, 1.7806],
[-1.0710, 0.2716, -1.2741]]])
torch.Size([1, 6, 3])
c = torch.cat((a, b), dim=2)
print(c)
print(c.size())
tensor([[[ 0.9720, 0.5302, 0.3912, -1.1560, -0.2555, 1.0128],
[-0.3124, -0.6052, 0.6506, 0.7819, 0.3953, 1.7806],
[ 2.0312, 0.1410, -0.0298, -1.0710, 0.2716, -1.2741]]])
torch.Size([1, 3, 6])
chunk
: 텐서를 여러 개로 나눌 때 사용 (몇 개로 나눌 것인가?)
tensor = torch.rand(3, 6)
print(tensor)
t1, t2, t3 = torch.chunk(tensor, 3, dim=1)
print(t1)
print(t2)
print(t3)
tensor([[0.9582, 0.5981, 0.6283, 0.6279, 0.8309, 0.7105],
[0.9168, 0.6787, 0.3635, 0.7603, 0.0968, 0.7248],
[0.5706, 0.6225, 0.7284, 0.8518, 0.3439, 0.8861]])
tensor([[0.9582, 0.5981],
[0.9168, 0.6787],
[0.5706, 0.6225]])
tensor([[0.6283, 0.6279],
[0.3635, 0.7603],
[0.7284, 0.8518]])
tensor([[0.8309, 0.7105],
[0.0968, 0.7248],
[0.3439, 0.8861]])
split
: chunk
와 동일한 기능이지만 조금 다름 (텐서의 크기는 몇인가?)
tensor = torch.rand(3, 6)
t1, t2 = torch.split(tensor, 3, dim=1)
print(tensor)
print(t1)
print(t2)
tensor([[0.6629, 0.7750, 0.7590, 0.0481, 0.2992, 0.2832],
[0.4073, 0.6713, 0.6036, 0.0977, 0.4953, 0.0680],
[0.4750, 0.0259, 0.5710, 0.3239, 0.5928, 0.6802]])
tensor([[0.6629, 0.7750, 0.7590],
[0.4073, 0.6713, 0.6036],
[0.4750, 0.0259, 0.5710]])
tensor([[0.0481, 0.2992, 0.2832],
[0.0977, 0.4953, 0.0680],
[0.3239, 0.5928, 0.6802]])
torch ↔ numpy
- Torch Tensor(텐서)를 NumPy array(배열)로 변환 가능
numpy()
from_numpy()
- Tensor가 CPU상에 있다면 NumPy 배열은 메모리 공간을 공유하므로 하나가 변하면, 다른 하나도 변함
a = torch.ones(7)
print(a)
tensor([1., 1., 1., 1., 1., 1., 1.])
b = a.numpy()
print(b)
[1. 1. 1. 1. 1. 1. 1.]
a.add_(1)
print(a)
print(b)
tensor([2., 2., 2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2. 2. 2.]
import numpy as np
a = np.ones(7)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)
[2. 2. 2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2., 2., 2.], dtype=torch.float64)
Autograd(자동미분)¶
torch.autograd
패키지는 Tensor의 모든 연산에 대해 자동 미분 제공- 이는 코드를 어떻게 작성하여 실행하느냐에 따라 역전파가 정의된다는 뜻
backprop
를 위해 미분값을 자동으로 계산
requires_grad
속성을 True
로 설정하면, 해당 텐서에서 이루어지는 모든 연산들을 추적하기 시작
기록을 추적하는 것을 중단하게 하려면, .detach()
를 호출하여 연산기록으로부터 분리
a = torch.randn(3, 3)
a = a * 3
print(a)
print(a.requires_grad)
tensor([[-4.2719, 1.3259, -0.3379],
[ 6.9943, 2.0913, 1.3107],
[ 2.7305, 0.7880, -2.9714]])
False
requires_grad_(...)
는 기존 텐서의 requires_grad
값을 바꿔치기(in-place
)하여 변경
grad_fn
: 미분값을 계산한 함수에 대한 정보 저장 (어떤 함수에 대해서 backprop 했는지)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b)
print(b.grad_fn)
True
tensor(92.0392, grad_fn=<SumBackward0>)
<SumBackward0 object at 0x0000019C1347E770>
기울기(Gradient)¶
x = torch.ones(3, 3, requires_grad=True)
print(x)
tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]], requires_grad=True)
y = x + 5
print(y)
tensor([[6., 6., 6.],
[6., 6., 6.],
[6., 6., 6.]], grad_fn=<AddBackward0>)
z = y * y
out = z.mean()
print(z, out)
tensor([[36., 36., 36.],
[36., 36., 36.],
[36., 36., 36.]], grad_fn=<MulBackward0>) tensor(36., grad_fn=<MeanBackward0>)
계산이 완료된 후, .backward()
를 호출하면 자동으로 역전파 계산이 가능하고, .grad
속성에 누적됨
print(out)
out.backward()
tensor(36., grad_fn=<MeanBackward0>)
grad
: data가 거쳐온 layer에 대한 미분값 저장
print(x)
print(x.grad)
tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]], requires_grad=True)
tensor([[1.3333, 1.3333, 1.3333],
[1.3333, 1.3333, 1.3333],
[1.3333, 1.3333, 1.3333]])
x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
y = y * 2
print(y)
tensor([ 924.5911, -436.0419, 840.3173], grad_fn=<MulBackward0>)
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)
print(x.grad)
tensor([1.0240e+02, 1.0240e+03, 1.0240e-01])
with torch.no_grad()
를 사용하여 기울기의 업데이트를 하지 않음
기록을 추적하는 것을 방지하기 위해 코드 블럭을 with torch.no_grad()
로 감싸면 기울기 계산은 필요없지만, requires_grad=True
로 설정되어 학습 가능한 매개변수를 갖는 모델을 평가(evaluate)할 때 유용
print(x.requires_grad)
print((x ** 2).requires_grad)
with torch.no_grad():
print((x ** 2).requires_grad)
True
True
False
detach()
: 내용물(content)은 같지만 require_grad
가 다른 새로운 Tensor를 가져올 때
print(x.requires_grad)
y = x.detach()
print(y.requires_grad)
print(x.eq(y).all())
True
False
tensor(True)
a = torch.ones(2, 2)
print(a)
tensor([[1., 1.],
[1., 1.]])
a = torch.ones(2, 2, requires_grad=True)
print(a)
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
print(a.data)
print(a.grad)
print(a.grad_fn)
tensor([[1., 1.],
[1., 1.]])
None
None
$b = a + 2$
b = a + 2
print(b)
tensor([[3., 3.],
[3., 3.]], grad_fn=<AddBackward0>)
$c = b^2$
c = b ** 2
print(c)
tensor([[9., 9.],
[9., 9.]], grad_fn=<PowBackward0>)
out = c.sum()
print(out)
tensor(36., grad_fn=<SumBackward0>)
print(out)
out.backward()
tensor(36., grad_fn=<SumBackward0>)
a의 grad_fn
이 None인 이유는 직접적으로 계산한 부분이 없었기 때문
print(a.data)
print(a.grad)
print(a.grad_fn)
tensor([[1., 1.],
[1., 1.]])
tensor([[6., 6.],
[6., 6.]])
None
print(b.data)
print(b.grad)
print(b.grad_fn)
tensor([[3., 3.],
[3., 3.]])
None
<AddBackward0 object at 0x0000019C7DCD14B0>
C:\Users\xodus\AppData\Local\Temp\ipykernel_36412\2485455394.py:2: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\build\aten\src\ATen/core/TensorBody.h:494.)
print(b.grad)
print(out.data)
print(out.grad)
print(out.grad_fn)
tensor(36.)
None
<SumBackward0 object at 0x0000019C7DCD2110>
C:\Users\xodus\AppData\Local\Temp\ipykernel_36412\578081240.py:2: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\build\aten\src\ATen/core/TensorBody.h:494.)
print(out.grad)
데이터 준비¶
파이토치에서는 데이터 준비를 위해 torch.utils.data
의 Dataset
과 DataLoader
사용 가능
Dataset
에는 다양한 데이터셋이 존재 (MNIST, FashionMNIST, CIFAR10, ...)- Vision Dataset: https://pytorch.org/vision/stable/datasets.html
- Text Dataset: https://pytorch.org/text/stable/datasets.html
- Audio Dataset: https://pytorch.org/audio/stable/datasets.html
DataLoader
와Dataset
을 통해batch_size
,train
여부,transform
등을 인자로 넣어 데이터를 어떻게 load할 것인지 정해줄 수 있음
import torch
import numpy as np
from torch.utils.data import Dataset, DataLoader
토치비전(torchvision
)은 파이토치에서 제공하는 데이터셋들이 모여있는 패키지
transforms
: 전처리할 때 사용하는 메소드 (https://pytorch.org/docs/stable/torchvision/transforms.html)transforms
에서 제공하는 클래스 이외는 일반적으로 클래스를 따로 만들어 전처리 단계를 진행
import torchvision.transforms as transforms
from torchvision import datasets
DataLoader
의 인자로 들어갈 transform
을 미리 정의할 수 있고, Compose
를 통해 리스트 안에 순서대로 전처리 진행
ToTensor
()를 하는 이유는 torchvision
이 PIL Image 형태로만 입력을 받기 때문에 데이터 처리를 위해서 Tensor형으로 변환 필요
mnist_transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize(mean=(0.5,),std=(1.0,))])
trainset = datasets.MNIST(root='/content/',
train=True, download=True,
transform=mnist_transform)
testset = datasets.MNIST(root='/content/',
train=False, download=True,
transform=mnist_transform)
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to /content/MNIST/raw/train-images-idx3-ubyte.gz
100%|██████████| 9912422/9912422 [00:03<00:00, 2557454.31it/s]
Extracting /content/MNIST/raw/train-images-idx3-ubyte.gz to /content/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to /content/MNIST/raw/train-labels-idx1-ubyte.gz
100%|██████████| 28881/28881 [00:00<00:00, 507791.50it/s]
Extracting /content/MNIST/raw/train-labels-idx1-ubyte.gz to /content/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to /content/MNIST/raw/t10k-images-idx3-ubyte.gz
100%|██████████| 1648877/1648877 [00:00<00:00, 4488439.88it/s]
Extracting /content/MNIST/raw/t10k-images-idx3-ubyte.gz to /content/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to /content/MNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████| 4542/4542 [00:00<00:00, 3811630.41it/s]
Extracting /content/MNIST/raw/t10k-labels-idx1-ubyte.gz to /content/MNIST/raw
DataLoader
는 데이터 전체를 보관했다가 실제 모델 학습을 할 때 batch_size
크기만큼 데이터를 가져옴
train_loader = DataLoader(trainset, batch_size=8, shuffle=True, num_workers=2)
testloader = DataLoader(testset, batch_size=8, shuffle=False, num_workers=2)
dataiter = iter(train_loader)
images, labels = next(dataiter)
images.shape, labels.shape
(torch.Size([8, 1, 28, 28]), torch.Size([8]))
torch_image = torch.squeeze(images[0])
torch_image.shape
torch.Size([28, 28])
import matplotlib.pyplot as plt
figure = plt.figure(figsize=(12, 6))
cols, rows = 4, 2
for i in range(1, cols * rows + 1):
sample_idx = torch.randint(len(trainset), size=(1,)).item()
img, label = trainset[sample_idx]
figure.add_subplot(rows, cols, i)
plt.title(label)
plt.axis('off')
plt.imshow(img.squeeze(), cmap='gray')
plt.show()
신경망 구성¶
- 레이어(layer): 신경망의 핵심 데이터 구조로 하나 이상의 텐서를 입력받아 하나 이상의 텐서를 출력
- 모듈(module): 한 개 이상의 계층이 모여서 구성
- 모델(model): 한 개 이상의 모듈이 모여서 구성
torch.nn
패키지¶
주로 가중치(weights), 편향(bias)값들이 내부에서 자동으로 생성되는 레이어들을 사용할 때 사용 (weight
값들을 직접 선언 안함)
import torch
import matplotlib.pyplot as plt
import torch.nn as nn
import numpy as np
nn.Linear
계층 예제
input = torch.randn(128, 20)
print(input)
m = nn.Linear(20, 30)
print(m)
output = m(input)
print(output)
print(output.size())
tensor([[ 0.1611, -0.1770, 0.9294, ..., -1.3216, 1.1687, 0.7508],
[ 0.0091, 0.2195, -0.1656, ..., 1.5742, -0.1580, -0.6254],
[-0.0739, 0.4897, -0.3660, ..., 2.4579, 0.1372, 0.7276],
...,
[-0.3231, 1.0198, -1.8128, ..., -0.3414, -0.1527, 0.0857],
[ 0.2109, -1.0355, 0.2443, ..., 1.2963, -1.0275, 0.9258],
[-2.5506, 0.8304, 1.2549, ..., -0.7072, 1.2520, -0.8628]])
Linear(in_features=20, out_features=30, bias=True)
tensor([[ 0.9625, -0.8421, -0.2202, ..., -0.6258, -0.4394, -0.1915],
[-0.9019, 0.1840, -0.6796, ..., 1.5266, -0.2464, -0.8951],
[-0.2721, -0.2364, 0.2282, ..., 0.3653, 0.5424, -1.0521],
...,
[-0.1131, -0.7157, -0.3249, ..., -0.2302, 0.4510, -0.3833],
[ 0.9199, -0.7164, 0.2842, ..., -1.1771, 0.0924, 0.3473],
[-0.6566, 1.4463, 0.9695, ..., 0.8163, 0.6002, 0.5698]],
grad_fn=<AddmmBackward0>)
torch.Size([128, 30])
nn.Conv2d
계층 예시
input = torch.rand(20, 16, 50, 100)
print(input.size())
torch.Size([20, 16, 50, 100])
m = nn.Conv2d(16, 33, 3, stride=2)
print(m)
m = nn.Conv2d(16, 33, (3, 5), stride=(2, 2), padding=(4,2))
print(m)
m = nn.Conv2d(16, 33, (3, 5), stride=(2, 2), padding=(4,2), dilation=(3, 1))
print(m)
Conv2d(16, 33, kernel_size=(3, 3), stride=(2, 2))
Conv2d(16, 33, kernel_size=(3, 5), stride=(2, 2), padding=(4, 2))
Conv2d(16, 33, kernel_size=(3, 5), stride=(2, 2), padding=(4, 2), dilation=(3, 1))
output = m(input)
print(output.size())
torch.Size([20, 33, 26, 50])
컨볼루션 레이어(Convolution Layers)¶
nn.Conv2d
예제
in_channels
: channel의 갯수out_channels
: 출력 채널의 갯수kernel_size
: 커널(필터) 사이즈
nn.Conv2d(in_channels=1, out_channels=20, kernel_size=5, stride=1)
Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
layer = nn.Conv2d(1, 20, 5, 1).to(torch.device('cpu'))
layer
Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
weight
확인
weight = layer.weight
weight.shape
torch.Size([20, 1, 5, 5])
weight
는 detach()
를 통해 꺼내줘야 numpy()
변환이 가능
weight = weight.detach()
weight = weight.numpy()
weight.shape
(20, 1, 5, 5)
plt.imshow(weight[0, 0, :, :], 'jet')
plt.colorbar()
plt.show()
print(images.shape)
print(images[0].size())
input_image = torch.squeeze(images[0])
print(input_image.size())
torch.Size([8, 1, 28, 28])
torch.Size([1, 28, 28])
torch.Size([28, 28])
input_data = torch.unsqueeze(images[0], dim=0)
print(input_data.size())
output_data = layer(input_data)
output = output_data.data
output_arr = output.numpy()
output_arr.shape
torch.Size([1, 1, 28, 28])
(1, 20, 24, 24)
plt.figure(figsize=(15, 30))
plt.subplot(131)
plt.title("input")
plt.imshow(input_image, 'gray')
plt.subplot(132)
plt.title("Weight")
plt.imshow(weight[0, 0, :, :], 'jet')
plt.subplot(133)
plt.title("Output")
plt.imshow(output_arr[0, 0, :, :], 'gray')
plt.show()
풀링 레이어(Pooling layers)¶
F.max_pool2d
stride
kernel_size
torch.nn.MaxPool2d
도 많이 사용
import torch.nn.functional as F
pool = F.max_pool2d(output, 2, 2)
pool.shape
torch.Size([1, 20, 12, 12])
- MaxPool Layer는 weight가 없기 때문에 바로
numpy()
변환 가능
pool_arr = pool.numpy()
pool_arr.shape
(1, 20, 12, 12)
plt.figure(figsize=(10, 15))
plt.subplot(121)
plt.title("Input")
plt.imshow(input_image, 'gray')
plt.subplot(122)
plt.title("output")
plt.imshow(pool_arr[0, 0, :, :], 'gray')
plt.show()
선형 레이어(Linear layers)¶
1d만 가능하므로 .view()
를 통해 1d로 펼쳐줘야함
flatten = input_image.view(1, 28 * 28)
flatten.shape
torch.Size([1, 784])
lin = nn.Linear(784, 10)(flatten)
lin.shape
torch.Size([1, 10])
lin
tensor([[-0.5786, 0.0164, -0.2729, 0.3160, 0.2966, -0.1504, -0.1433, 0.4315,
0.3447, -0.4311]], grad_fn=<AddmmBackward0>)
plt.imshow(lin.detach().numpy(), 'jet')
plt.colorbar()
plt.show()
비선형 활성화 (Non-linear Activations)¶
F.softmax
와 같은 활성화 함수 등
with torch.no_grad():
flatten = input_image.view(1, 28 * 28)
lin = nn.Linear(784, 10)(flatten)
softmax = F.softmax(lin, dim=1)
softmax
tensor([[0.0863, 0.0940, 0.0719, 0.1549, 0.0652, 0.0969, 0.0874, 0.0771, 0.1595,
0.1067]])
np.sum(softmax.numpy())
1.0
F.relu
- ReLU 함수를 적용하는 레이어
nn.ReLU
로도 사용 가능
device = torch.device('cuda' if torch.cuda. is_available() else 'cpu')
inputs = torch.randn(4, 3, 28, 28).to(device)
inputs.shape
torch.Size([4, 3, 28, 28])
layer = nn.Conv2d(3, 20, 5, 1).to(device)
output = F.relu(layer(inputs))
output.shape
torch.Size([4, 20, 24, 24])
신경망 종류¶
모델 정의¶
nn.Module
상속 클래스 정의¶
nn.Module
을 상속받는 클래스 정의__init__()
: 모델에서 사용될 모듈과 활성화 함수 등을 정의forward()
: 모델에서 실행되어야 하는 연산을 정의
class Model(nn.Module):
def __init__(self, inputs):
super(Model, self).__init__()
self.layer = nn.Linear(inputs, 1)
self.activation = nn.Sigmoid()
def forward(self, x):
x = self.layer(x)
x = self.activation(x)
return x
model = Model(1)
print(list(model.children()))
print(list(model.modules()))
[Linear(in_features=1, out_features=1, bias=True), Sigmoid()]
[Model(
(layer): Linear(in_features=1, out_features=1, bias=True)
(activation): Sigmoid()
), Linear(in_features=1, out_features=1, bias=True), Sigmoid()]
nn.Sequential
을 이용한 신경망 정의¶
nn.Sequential
객체로 그 안에 각 모듈을 순차적으로 실행__init__()
에서 사용할 네트워크 모델들을nn.Sequential
로 정의 가능forward()
에서 실행되어야 할 계산을 가독성 높게 작성 가능
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=64, kernel_size=5),
nn.ReLU(inplace=True),
nn.MaxPool2d(2)
)
self.layer2 = nn.Sequential(
nn.Conv2d(in_channels=64, out_channels=30, kernel_size=5),
nn.ReLU(inplace=True),
nn.MaxPool2d(2)
)
self.layer3 = nn.Sequential(
nn.Linear(in_features=30*5*5, out_features=10, bias=True),
nn.ReLU(inplace=True)
)
def forward(self, x):
x = self.layer1(x)
x = self.layer2(x)
x = x.view(x.shape[0], -1)
x = self.layer3(x)
return x
model = Model()
print(list(model.children()))
print(list(model.modules()))
[Sequential(
(0): Conv2d(3, 64, kernel_size=(5, 5), stride=(1, 1))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
), Sequential(
(0): Conv2d(64, 30, kernel_size=(5, 5), stride=(1, 1))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
), Sequential(
(0): Linear(in_features=750, out_features=10, bias=True)
(1): ReLU(inplace=True)
)]
[Model(
(layer1): Sequential(
(0): Conv2d(3, 64, kernel_size=(5, 5), stride=(1, 1))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(layer2): Sequential(
(0): Conv2d(64, 30, kernel_size=(5, 5), stride=(1, 1))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(layer3): Sequential(
(0): Linear(in_features=750, out_features=10, bias=True)
(1): ReLU(inplace=True)
)
), Sequential(
(0): Conv2d(3, 64, kernel_size=(5, 5), stride=(1, 1))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
), Conv2d(3, 64, kernel_size=(5, 5), stride=(1, 1)), ReLU(inplace=True), MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), Sequential(
(0): Conv2d(64, 30, kernel_size=(5, 5), stride=(1, 1))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
), Conv2d(64, 30, kernel_size=(5, 5), stride=(1, 1)), ReLU(inplace=True), MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), Sequential(
(0): Linear(in_features=750, out_features=10, bias=True)
(1): ReLU(inplace=True)
), Linear(in_features=750, out_features=10, bias=True), ReLU(inplace=True)]
파이토치 사전학습 모델¶
모델 파라미터¶
손실 함수(Loss function)¶
- 예측 값과 실제 값 사이의 오차 측정
- 학습이 진행되면서 해당 과정이 얼마나 잘 되고 있는지 나타내는 지표
- 모델이 훈련되는 동안 최소화될 값으로 주어진 문제에 대한 성공 지표
- 손실 함수에 따른 결과를 통해 학습 파라미터를 조정
- 최적화 이론에서 최소화 하고자 하는 함수
- 미분 가능한 함수 사용
- 파이토치의 주요 손실 함수
torch.nn.BCELoss
: 이진 분류를 위해 사용torch.nn.CrossEntropyLoss
: 다중 클래스 분류를 위해 사용torch.nn.MSELoss
: 회귀 모델에서 사용
criterion = nn.MSELoss()
criterion = nn.CrossEntropyLoss()
옵티마이저(Optimizer)¶
- 손실 함수를 기반으로 모델이 어떻게 업데이트되어야 하는지 결정 (특정 종류의 확률적 경사 하강법 구현)
- optimizer는
step()
을 통해 전달받은 파라미터를 모델 업데이트 - 모든 옵티마이저의 기본으로
torch.optim.Optimizer(params, defaults)
클래스 사용 zero_grad()
를 이용해 옵티마이저에 사용된 파라미터들의 기울기를 0으로 설정torch.optim.lr_scheduler
를 이용해 에포크(epochs)에 따라 학습률(learning rate) 조절- 파이토치의 주요 옵티마이저:
optim.Adadelta
,optim.Adagrad
,optim.Adam
,optim.RMSprop
,optim.SGD
학습률 스케줄러(Learning rate scheduler)¶
- 학습시 특정 조건에 따라 학습률을 조정하여 최적화 진행
- 일정 횟수 이상이 되면 학습률을 감소(decay)시키거나 전역 최소점(global minimum) 근처에 가면 학습률을 줄이는 등
- 파이토치의 학습률 스케줄러 종류
optim.lr_scheduler.LambdaLR
: 람다(lambda) 함수를 이용해 그 결과를 학습률로 설정optim.lr_scheduler.StepLR
: 단계(step)마다 학습률을 감마(gamma) 비율만큼 감소optim.lr_scheduler.MultiStepLR
:StepLR
과 비슷하지만 특정 단계가 아니라 지정된 에포크에만 감마 비율로 감소optim.lr_scheduler.ExponentialLR
: 에포크마다 이전 학습률에 감마만큼 곱함optim.lr_scheduler.CosineAnnealingLR
: 학습률을 코사인(cosine) 함수의 형태처럼 변화시켜 학습률일 커지기도 하고 작아지기도 함optim.lr_scheduler.ReduceLROnPlateau
: 학습이 잘되는지 아닌지에 따라 동적으로 학습률 변화
지표(Metrics)¶
- 모델의 학습과 테스트 단계를 모니터링
!pip install torchmetrics
Collecting torchmetrics
Downloading torchmetrics-1.4.1-py3-none-any.whl.metadata (20 kB)
Requirement already satisfied: numpy>1.20.0 in /usr/local/lib/python3.10/dist-packages (from torchmetrics) (1.26.4)
Requirement already satisfied: packaging>17.1 in /usr/local/lib/python3.10/dist-packages (from torchmetrics) (24.1)
Requirement already satisfied: torch>=1.10.0 in /usr/local/lib/python3.10/dist-packages (from torchmetrics) (2.3.1+cu121)
Collecting lightning-utilities>=0.8.0 (from torchmetrics)
Downloading lightning_utilities-0.11.6-py3-none-any.whl.metadata (5.2 kB)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from lightning-utilities>=0.8.0->torchmetrics) (71.0.4)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from lightning-utilities>=0.8.0->torchmetrics) (4.12.2)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->torchmetrics) (3.15.4)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->torchmetrics) (1.13.1)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->torchmetrics) (3.3)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->torchmetrics) (3.1.4)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->torchmetrics) (2024.6.1)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.10.0->torchmetrics)
Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.10.0->torchmetrics)
Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.10.0->torchmetrics)
Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.10.0->torchmetrics)
Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch>=1.10.0->torchmetrics)
Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch>=1.10.0->torchmetrics)
Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.2.106 (from torch>=1.10.0->torchmetrics)
Using cached nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch>=1.10.0->torchmetrics)
Using cached nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch>=1.10.0->torchmetrics)
Using cached nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-nccl-cu12==2.20.5 (from torch>=1.10.0->torchmetrics)
Using cached nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-nvtx-cu12==12.1.105 (from torch>=1.10.0->torchmetrics)
Using cached nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.7 kB)
Requirement already satisfied: triton==2.3.1 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->torchmetrics) (2.3.1)
Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch>=1.10.0->torchmetrics)
Using cached nvidia_nvjitlink_cu12-12.6.20-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.10.0->torchmetrics) (2.1.5)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.10.0->torchmetrics) (1.3.0)
Downloading torchmetrics-1.4.1-py3-none-any.whl (866 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 866.2/866.2 kB 17.1 MB/s eta 0:00:00
Downloading lightning_utilities-0.11.6-py3-none-any.whl (26 kB)
Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
Using cached nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
Using cached nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)
Using cached nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)
Using cached nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)
Using cached nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
Using cached nvidia_nvjitlink_cu12-12.6.20-py3-none-manylinux2014_x86_64.whl (19.7 MB)
Installing collected packages: nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, lightning-utilities, nvidia-cusparse-cu12, nvidia-cudnn-cu12, nvidia-cusolver-cu12, torchmetrics
Successfully installed lightning-utilities-0.11.6 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.6.20 nvidia-nvtx-cu12-12.1.105 torchmetrics-1.4.1
import torchmetrics
preds = torch.rand(10, 5).softmax(dim=-1)
target = torch.randint(5, (10, ))
print(preds, target)
acc = torchmetrics.functional.accuracy(preds, target, task="multiclass", num_classes=5)
print(acc)
tensor([[0.1885, 0.2010, 0.1557, 0.1867, 0.2681],
[0.1426, 0.1439, 0.2550, 0.3109, 0.1475],
[0.1143, 0.1913, 0.2336, 0.2436, 0.2172],
[0.1712, 0.1424, 0.1882, 0.2578, 0.2404],
[0.1272, 0.2463, 0.2691, 0.1844, 0.1730],
[0.1754, 0.1645, 0.1275, 0.2694, 0.2633],
[0.1431, 0.2439, 0.2712, 0.1571, 0.1847],
[0.2170, 0.2167, 0.1969, 0.2300, 0.1394],
[0.2083, 0.2762, 0.1354, 0.2309, 0.1492],
[0.2302, 0.2255, 0.2538, 0.1488, 0.1417]]) tensor([3, 0, 1, 4, 2, 3, 3, 2, 0, 2])
tensor(0.3000)
metric = torchmetrics.Accuracy(task="multiclass", num_classes=5)
n_batches = 10
for i in range(n_batches):
preds = torch.rand(10, 5).softmax(dim=1)
target = torch.randint(5, (10, ))
acc = torchmetrics.functional.accuracy(preds, target, task="multiclass", num_classes=5)
print(acc)
tensor(0.1000)
tensor(0.1000)
tensor(0.1000)
tensor(0.1000)
tensor(0.2000)
tensor(0.1000)
tensor(0.3000)
tensor(0.1000)
tensor(0.4000)
tensor(0.3000)
선형 회귀 모델(Linear Regression Model)¶
데이터 생성¶
X = torch.randn(200, 1) * 10
y = X + 3 * torch.randn(200, 1)
plt.scatter(X.numpy(), y.numpy())
plt.ylabel('y')
plt.xlabel('x')
plt.grid()
plt.show()
모델 정의 및 파라미터¶
class LinearRegressionModel(nn.Module):
def __init__(self):
super(LinearRegressionModel, self).__init__()
self.linear = nn.Linear(1, 1)
def forward(self, x):
pred = self.linear(x)
return pred
model = LinearRegressionModel()
print(model)
print(list(model.parameters()))
LinearRegressionModel(
(linear): Linear(in_features=1, out_features=1, bias=True)
)
[Parameter containing:
tensor([[0.7913]], requires_grad=True), Parameter containing:
tensor([-0.3978], requires_grad=True)]
w, b = model.parameters()
w1, b1 = w[0][0].item(), b[0].item()
x1 = np.array([-30, 30])
y1 = w1 * x1 + b1
plt.plot(x1, y1, 'r')
plt.scatter(X, y)
plt.grid()
plt.show()
손실 함수 및 옵티마이저¶
import torch.optim as optim
import matplotlib.pyplot as plt
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.001)
모델 학습¶
epochs = 100
losses = []
for epoch in range(epochs):
optimizer.zero_grad()
y_pred = model(X)
loss = criterion(y_pred, y)
losses.append(loss.item())
loss.backward()
optimizer.step()
plt.plot(range(epochs), losses)
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.show()
w1, b1 = w[0][0].item(), b[0].item()
x1 = np.array([-30, 30])
y1 = w1 * x1 + b1
plt.plot(x1, y1, 'r')
plt.scatter(X, y)
plt.grid()
plt.show()
FashionMNIST 분류 모델¶
GPU 설정
device = torch.device('cuda' if torch.cuda. is_available() else 'cpu')
device
device(type='cuda')
데이터 로드¶
transform = transforms.Compose(([transforms.ToTensor(),
transforms.Normalize((0.5, ), (0.5, ))]))
trainset = datasets.FashionMNIST(root='/content/',
train=True, download=True,
transform=transform)
testset = datasets.FashionMNIST(root='/content/',
train=False, download=True,
transform=transform)
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to /content/FashionMNIST/raw/train-images-idx3-ubyte.gz
100%|██████████| 26421880/26421880 [00:02<00:00, 12384588.47it/s]
Extracting /content/FashionMNIST/raw/train-images-idx3-ubyte.gz to /content/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to /content/FashionMNIST/raw/train-labels-idx1-ubyte.gz
100%|██████████| 29515/29515 [00:00<00:00, 201294.13it/s]
Extracting /content/FashionMNIST/raw/train-labels-idx1-ubyte.gz to /content/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to /content/FashionMNIST/raw/t10k-images-idx3-ubyte.gz
100%|██████████| 4422102/4422102 [00:01<00:00, 3720955.07it/s]
Extracting /content/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to /content/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to /content/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████| 5148/5148 [00:00<00:00, 19313306.79it/s]
Extracting /content/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to /content/FashionMNIST/raw
train_loader = DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)
test_loader = DataLoader(testset, batch_size=128, shuffle=False, num_workers=2)
images, labels = next(iter(train_loader))
images.shape, labels.shape
(torch.Size([128, 1, 28, 28]), torch.Size([128]))
labels_map = {
0: 'T-Shirt',
1: 'Trouser',
2: 'Pullover',
3: 'Dress',
4: 'Coat',
5: 'Sandal',
6: 'Shirt',
7: 'Sneaker',
8: 'Bag',
9: 'Ankle Boot'
}
figure = plt.figure(figsize=(12, 12))
cols, rows = 4, 4
for i in range(1, cols * rows + 1):
image = images[i].squeeze()
label_idx = labels[i].item()
label = labels_map[label_idx]
figure.add_subplot(rows, cols, i)
plt.title(label)
plt.axis('off')
plt.imshow(image, cmap='gray')
plt.show()
모델 정의 및 파라미터¶
class NeuralNet(nn.Module):
def __init__(self):
super(NeuralNet, self).__init__()
self.conv1 = nn.Conv2d(1, 6, 3)
self.conv2 = nn.Conv2d(6, 16, 3)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:]
num_features = 1
for s in size:
num_features *= s
return num_features
net = NeuralNet()
print(net)
NeuralNet(
(conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
params = list(net.parameters())
print(len(params))
print(params[0].size())
10
torch.Size([6, 1, 3, 3])
input = torch.randn(1, 1, 28, 28)
out = net(input)
print(out)
tensor([[-0.0871, 0.1913, -0.0433, -0.0459, 0.0379, 0.0232, 0.0585, -0.0060,
-0.0409, 0.0344]], grad_fn=<AddmmBackward0>)
손실함수와 옵티마이저¶
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
모델 학습¶
배치수 확인
total_batch = len(train_loader)
print(total_batch)
469
for epoch in range(10):
running_loss = 0.0
for i, data in enumerate(train_loader, 0):
inputs, labels = data
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 100 == 99:
print('Epoch : {}, Iter: {}, Loss: {}'.format(epoch+1, i+1, running_loss/2000))
running_loss = 0.0
Epoch : 1, Iter: 100, Loss: 0.11502504658699035
Epoch : 1, Iter: 200, Loss: 0.11446980822086335
Epoch : 1, Iter: 300, Loss: 0.11365943157672882
Epoch : 1, Iter: 400, Loss: 0.11183640277385712
Epoch : 2, Iter: 100, Loss: 0.09142237794399262
Epoch : 2, Iter: 200, Loss: 0.05995461705327034
Epoch : 2, Iter: 300, Loss: 0.04482015699148178
Epoch : 2, Iter: 400, Loss: 0.040373206377029416
Epoch : 3, Iter: 100, Loss: 0.03683351635932922
Epoch : 3, Iter: 200, Loss: 0.03587702712416649
Epoch : 3, Iter: 300, Loss: 0.03434920717775822
Epoch : 3, Iter: 400, Loss: 0.0334646110534668
Epoch : 4, Iter: 100, Loss: 0.032616506576538085
Epoch : 4, Iter: 200, Loss: 0.03125044773519039
Epoch : 4, Iter: 300, Loss: 0.032371528938412664
Epoch : 4, Iter: 400, Loss: 0.030714029729366304
Epoch : 5, Iter: 100, Loss: 0.03034444074332714
Epoch : 5, Iter: 200, Loss: 0.029525935858488082
Epoch : 5, Iter: 300, Loss: 0.02892155006527901
Epoch : 5, Iter: 400, Loss: 0.028165019646286964
Epoch : 6, Iter: 100, Loss: 0.028564947932958603
Epoch : 6, Iter: 200, Loss: 0.027898967817425728
Epoch : 6, Iter: 300, Loss: 0.02741909073293209
Epoch : 6, Iter: 400, Loss: 0.0271149540245533
Epoch : 7, Iter: 100, Loss: 0.026726744800806047
Epoch : 7, Iter: 200, Loss: 0.026773720502853395
Epoch : 7, Iter: 300, Loss: 0.026580355644226075
Epoch : 7, Iter: 400, Loss: 0.02655362620949745
Epoch : 8, Iter: 100, Loss: 0.026225275576114655
Epoch : 8, Iter: 200, Loss: 0.025761234283447267
Epoch : 8, Iter: 300, Loss: 0.024994379952549935
Epoch : 8, Iter: 400, Loss: 0.024353671863675118
Epoch : 9, Iter: 100, Loss: 0.02474271248281002
Epoch : 9, Iter: 200, Loss: 0.02464385850727558
Epoch : 9, Iter: 300, Loss: 0.02411820262670517
Epoch : 9, Iter: 400, Loss: 0.024146224185824395
Epoch : 10, Iter: 100, Loss: 0.02376581420004368
Epoch : 10, Iter: 200, Loss: 0.023817880272865296
Epoch : 10, Iter: 300, Loss: 0.023328649133443832
Epoch : 10, Iter: 400, Loss: 0.02291729202866554
모델의 저장 및 로드¶
torch.save
:net.state_dict()
를 저장torch.load
:load_state_dict
로 모델을 로드
PATH = './fashion_mnist.pth'
torch.save(net.state_dict(),PATH)
net = NeuralNet()
net.load_state_dict(torch.load(PATH))
<All keys matched successfully>
net.parameters
torch.nn.modules.module.Module.parameters
def parameters(recurse: bool=True) -> Iterator[Parameter]
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.pyReturn an iterator over module parameters.
This is typically passed to an optimizer.
Args:
recurse (bool): if True, then yields parameters of this module
and all submodules. Otherwise, yields only parameters that
are direct members of this module.
Yields:
Parameter: module parameter
Example::
>>> # xdoctest: +SKIP("undefined vars")
>>> for param in model.parameters():
>>> print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
모델 테스트¶
def imshow(image):
image = image / 2 + 0.5
npimg = image.numpy()
fig = plt.figure(figsize=(16, 8))
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()
import torchvision
dataiter = iter(test_loader)
images, labels = next(dataiter)
imshow(torchvision.utils.make_grid(images[:6]))
outputs = net(images)
_, predicted = torch.max(outputs, 1)
print(predicted)
tensor([9, 2, 1, 1, 6, 1, 2, 6, 5, 7, 4, 5, 5, 3, 4, 1, 2, 6, 8, 0, 2, 7, 7, 5,
1, 2, 6, 0, 9, 4, 8, 8, 3, 3, 8, 0, 7, 5, 7, 9, 0, 1, 0, 9, 6, 7, 2, 1,
2, 6, 6, 2, 5, 8, 4, 2, 8, 6, 8, 0, 7, 7, 8, 5, 1, 1, 0, 4, 7, 8, 7, 0,
2, 6, 4, 3, 1, 2, 8, 4, 1, 8, 5, 9, 5, 0, 3, 2, 0, 2, 5, 3, 6, 7, 1, 8,
0, 1, 4, 2, 3, 4, 7, 6, 7, 8, 5, 9, 9, 4, 2, 5, 7, 0, 5, 2, 8, 4, 7, 8,
0, 0, 9, 9, 3, 0, 8, 4])
print(''.join('{}. '.format(labels_map[int(predicted[j].numpy())]) for j in range(6)))
Ankle Boot. Pullover. Trouser. Trouser. Shirt. Trouser.
correct = 0
total = 0
with torch.no_grad():
for data in test_loader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(100 * correct / total)
81.81
'Python' 카테고리의 다른 글
PyGWalker - 파이썬에서 태블로처럼 빠른 EDA 시각화 (0) | 2024.08.17 |
---|---|
파이썬 기초 - 01_과목평균 (0) | 2024.08.17 |
Home Credit Default Risk [1] feature engineering (0) | 2024.07.24 |
분류(Classification) - 3 베이지안 최적화와 고객만족예측 실습 (2) | 2024.07.23 |
분류(Classification) - 2 (2) | 2024.07.20 |