Example of a Layer

  • ์ด์ œ ์›๋ž˜ ๋ฐฐ์› ๋˜ ์‹ ๊ฒฝ๋ง์˜ ๋…ธ๋“œ์—์„œ ํ•˜๋Š” ํ™œ๋™์œผ๋กœ ๋Œ์•„์˜ค์ž.
  • ๊ฒฐ๊ตญ ์ด ํ•„ํ„ฐ๋Š” ๊ฐ€์ค‘์น˜๋“ค์˜ ๋ชจ์ž„์ด๊ณ ,
  • ์›๋ž˜ ๋ฐ์ดํ„ฐ์—์„œ ์ด ๊ฐ€์ค‘์น˜๋ฅผ ๊ณฑํ•œ๋‹ค์Œ์— ๋”ํ•˜๋Š” ํ–‰์œ„๋Š” ํ•œ ๋…ธ๋“œ์— ์ด ๋ฐ์ดํ„ฐ๋“ค์ด ๋“ค์–ด๊ฐ€๋Š” ๊ฒƒ์ด๋‹ค.
  • ํ•˜๋‚˜์˜ ๋…ธ๋“œ์— ์—ฐ๊ฒฐ๋œ ๋งŽ์€ ๋…ธ๋“œ๋“ค์— ๊ฐ€์ค‘์น˜๋ฅผ ๊ณฑํ•˜๊ณ  ๋‹ค ๋”ํ•œ๋‹ค์Œ์— ์šฐ๋ฆฌ๋Š” ๋ญ˜ํ–ˆ์—ˆ์ง€?
  • Activation Function์—๋‹ค๊ฐ€ ์ด ๊ฐ’์„ ๋„ฃ๊ณ  ์ถœ๋ ฅ๊ฐ’์„ ์–ป์—ˆ๋‹ค.
  • ์ด๊ฑธ ์‹œ๊ฐํ™” ํ•ด๋ณด๋ฉด!

  • ์ด๋ ‡๊ฒŒ ๋œ๋‹ค.
  • ์›๋ž˜ ๊ฐ๊ฐ์˜ ๋…ธ๋“œ์— ๋Œ€ํ•ด Bias ํ•ญ์ด ์ƒ๊ธด๋‹ค๊ณ  ์•Œ์•˜๋˜ ๊ฒƒ๊ณผ ๋‹ฌ๋ฆฌ
  • ๊ตฌํ˜„์„ ์œ„ํ•ด์„œ๋Š” ์ถœ๋ ฅ์— ๋Œ€ํ•ด ๊ฐ™์€ Bias๋ฅผ ๋”ํ•ด์ฃผ๋Š” ๊ฒƒ์ด ํšจ์œจ์ ์ด๋‹ค.
  • ์ด ์—ฐ์‚ฐ์˜ ๊ฒฐ๊ณผ๋กœ ์šฐ๋ฆฌ๋Š” 4 x 4 x 2 ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  ์ด ๊ฒƒ์ด Convolution network์˜ ํ•œ Layer๊ฐ€ ๋œ๋‹ค.
  • ์ด๊ฑธ ํ–‰๋ ฌ์˜ ํ˜•ํƒœ๋กœ ์ •๋ฆฌํ•ด ๋ณด๋ฉด,
Z^{\[1\]} = W^{\[1\]}a^{\[0\]} + b^{\[1\]}\\ a^{\[1\]} = g(Z^{\[1\]})\\ a^{\[0\]};=;X(input)\\ g(x);=;ReLU

๊ทธ๋ฆผ์œผ๋กœ ๋‹ค์‹œํ•œ๋ฒˆ ๋ณด์ž.

  • ๊ฒฐ๊ตญ ์™„์ „ ์—ฐ๊ฒฐ๊ณผ ๋‹ฌ๋ผ์ง„ ๊ฒƒ์€ ๋ช‡๊ฐœ ๋‹จ์œ„๋กœ ์—ฐ๊ฒฐ์„ ํ•˜๋Š๋ƒ! ๋งŒ ๋‹ฌ๋ผ์กŒ๋‹ค.

Example

  • ๋งŒ์•ฝ 3 x 3 x 3 ํ•„ํ„ฐ๊ฐ€ 10๊ฐœ ์žˆ์„ ๋•Œ, ํŒŒ๋ผ๋ฏธํ…จ๋Š” ๋ช‡๊ฐœ์ผ๊นŒ?
  • 1๊ฐœ์˜ ํ•„ํ„ฐ์— ๋“ค์–ด๊ฐ€๋Š” ๋ณ€์ˆ˜๊ฐ€ 27๊ฐœ ์ด๊ณ , ์ด ํ•„ํ„ฐ์— ์ ์šฉ๋˜๋Š” Bias๋Š” 1๊ฐœ์ด๋ฏ€๋กœ
  • 1๊ฐœ์˜ ํ•„ํ„ฐ๋‹น ์ƒ๊ธฐ๋Š” ๋ณ€์ˆ˜๋Š” 28๊ฐœ์ด๋‹ค.
  • ์ด๊ฒŒ 10๊ฐœ๊ฐ€ ์žˆ์œผ๋ฏ€๋กœ ์šฐ๋ฆฌ๊ฐ€ ์กฐ์ •ํ•ด์ค˜์•ผํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ์ด 280๊ฐœ ์ด๋‹ค.
  • ์ด๊ฒƒ์ด ์œ ์šฉํ•œ ์ด์œ ๋Š”, ์™„์ „ ์—ฐ๊ฒฐ๋ง์—์„œ ์šฐ๋ฆฌ๋Š” ๋…ธ๋“œ๊ฐ€ ๋ชจ๋‘ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์—,
  • ์ž…๋ ฅ ๋ณ€์ˆ˜๊ฐ€ ๋งŽ๊ฒŒ ๋˜๋ฉด ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋Š” ๊ธฐํ•˜ ๊ธ‰์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€ํ–ˆ๋‹ค.
  • ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ์ด ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์˜ ์ตœ์ ๊ฐ’์„ ์ฐพ๊ธฐ๊ฐ€ ์–ด๋ ค์› ๋Š”๋ฐ,
  • ์ด์ œ ๋…ธ๋“œ์— ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ ์šฉํ•  ๋•Œ, ์ œํ•œ๋œ ๋ฒ”์œ„์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ์ ์šฉํ•˜๊ฒŒ ๋˜๋ฏ€๋กœ,
  • ์ฆ‰ ํ•„ํ„ฐ์˜ ํฌ๊ธฐ๋กœ ์ œํ•œํ•˜๊ฒŒ ๋˜๋ฏ€๋กœ, ์ด์ œ๋Š” ์ž…๋ ฅ๊ฐ’์ด ์ปค์ง€๋”๋ผ๋„ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๊ฐœ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•˜์—ฌ ์ƒ๊ธฐ๋Š”
  • ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค.
  • ์ฆ‰ ์œ„ ๋ฌธ์ œ์—์„œ ์ž…๋ ฅ์ด 1000 x 1000 ํ”ฝ์…€์˜ ์ด๋ฏธ์ง€๊ฐ€ ๋“ค์–ด์˜ค๋”๋ผ๋„ ์—ฌ์ „ํžˆ ๋‚ด๊ฐ€ ๊ฒ€์ถœํ•˜๊ณ  ์‹ถ์€
  • ํŠน์„ฑ ํ•„ํ„ฐ๋Š” 10๊ฐœ ์ด๋ฏ€๋กœ, 280๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ์กฐ์ •ํ•˜์—ฌ ์†์„ฑ์„ ๊ฒ€์ถœํ•  ์ˆ˜ ์žˆ๋‹ค!

Notation

l ๋ฒˆ์งธ Convolution Layer์— ์žˆ์„๋•Œ

<Input>\\ n_H^{\[l-1\]} \times n_W^{\[l-1\]}\times n_C^{\[l-1\]}\\ Because ;it ;is;value;of;the activation;of;previous;layer \\ we; use; l-1\\ And,\\ Input ;is;equal;to;a^{\[l-1\]}\\ So,\\ a^{\[l-1\]};=;n_H^{\[l-1\]} \times n_W^{\[l-1\]}\times n_C^{\[l-1\]}
  • a๋Š” ์ด์ „ ๊ณ„์ธต์˜ ๋…ธ๋“œ๋“ค์„ ํ†ต์นญํ•ด์„œ ์–˜๊ธฐํ•œ๋‹ค.
<Filter>\\ f^{\[l\]}=filter ;size\\ p^{\[l\]}=padding\\ s^{\[l\]}=stride\\ Each;filter;is = f^{\[l\]}\times f^{\[l\]} \times n_c^{\[l-1\]}\\ n_c^{\[l\]}=number;of;filters\\ And,\\ Filter;is;equal;to;Matrix;of;Weights.\\ So,\\ Number;of;Weights;=;f^{\[l\]}\times f^{\[l\]} \times n_c^{\[l-1\]}\times n_c^{\[l\]}
  • ๊ฐ ํ•„ํ„ฐ์˜ ์ฑ„๋„์ˆ˜์™€, ์ „์ฒด ์ฑ„๋„ ๊ฐœ์ˆ˜๋ฅผ ์ž˜ ๋ณด์ž.
  • ๊ฐ€์ค‘์น˜ ์ด ๊ฐœ์ˆ˜์™€ ํ•„ํ„ฐ์™€์˜ ๊ด€๊ณ„๋„ ์ž˜ ๋ณด์ž.
<Bias>\\ Because; every;filters; has ; one;bias, \\ So,\\ Number;of;Bias;=;n_c^{\[l\]} <Output>\\ n_H^{\[l\]} \times n_W^{\[l\]}\times n_C^{\[l\]}\\ n_H^{\[l\]} = floor({n_H^{\[l-1\]} +2p-f\over s^{\[l\]}}+1)\\ n_W^{\[l\]} = floor({n_W^{\[l-1\]} +2p-f\over s^{\[l\]}}+1)\\ Output ;is;equal;to;a^{\[l\]}\\ And,\\ a^{\[l\]};=;n_H^{\[l\]} \times n_W^{\[l\]}\times n_C^{\[l\]}
  • ์—ฌ๊ธฐ์„œ ์ฃผ๋ชฉํ•ด์•ผ ํ•˜๋Š” ๊ฒƒ์€ ๊ฐ ํ•„ํ„ฐ์˜ ์ฑ„๋„์€ ์ด์ „ ํ™œ์„ฑ input์˜ ์ฑ„๋„๊ณผ ๊ฐ™์•„์•ผ ํ•œ๋‹ค๋Š” ์ .
  • ๊ทธ๋ฆฌ๊ณ  ํ•„ํ„ฐ์˜ ๊ฐœ์ˆ˜๊ฐ€ output์˜ ์ฑ„๋„ ๊ฐœ์ˆ˜๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค๋Š” ์ ์ด๋‹ค!

Example of Simple Convolutional Network

Construction

์ „์ฒด์ ์ธ ๊ตฌ์กฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

graph LR
A[X-input] --> |Conv1|B[Layer 1]
B --> |Conv2|C[Layer 2]
C --> |FullyConct|D[Layer 3]
D --> |SoftMax|E[Prediction]

๊ฒฐ๊ตญ Convolution Network์—์„œ ์šฐ๋ฆฌ๊ฐ€ ์ค‘์ ์ ์œผ๋กœ ์†์„ ๋ด์•ผํ•˜๋Š” ๊ฒƒ์€, ์ด ๊ณผ์ •์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐ ์žˆ์–ด ์–ด๋–ค Hyperparameter๋ฅผ ์“ฐ๋Š๋ƒ์ด๋‹ค.

์ด๊ฒƒ์ด ๊ถ๊ทน์ ์œผ๋กœ ๊ตฌ์กฐ๋ฅผ ๋ณ€ํ™”์‹œํ‚ค๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

๋˜ํ•œ ์ด๋ ‡๊ฒŒ ์ฒ˜์Œ ์ด๋ฏธ์ง€๋Š” RGB์˜ 3๊ฐœ์˜ ์ฑ„๋„๋งŒ ๊ฐ€์ง€๊ณ  ์žˆ์ง€๋งŒ, Layer๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด์„œ Channel ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” ์–‘์ƒ์„ ๋ณด์ธ๋‹ค.

์ง๊ด€์ ์œผ๋กœ๋Š” ์ฒ˜์Œ์—๋Š” ๊ฐ„๋‹จํ•œ ํ•˜์œ„ํŠน์„ฑ(์ˆ˜์ง์„ , ์ˆ˜ํ‰์„ ) ๋“ฑ๋งŒ ํƒ์ƒ‰ํ•˜๋‹ค๊ฐ€ ์ด๊ฒƒ๋“ค์˜ ์กฐํ•ฉ์œผ๋กœ ๋งŒ๋“ค์–ด์ง€๋Š” ์ˆ˜๋งŽ์€ ํŠน์„ฑ๋“ค์ด ์žˆ๊ฒŒ๋˜๋ฏ€๋กœ ์ ์  ๋งŽ์€ ํ•„ํ„ฐ(ํŠน์„ฑ)๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋“ฏํ•˜๋‹ค.

What is Parameter, Hyperparameter

ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๊ฐ€์ค‘์น˜๋ฅผ ์–˜๊ธฐํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ํ•„ํ„ฐ์˜ ํฌ๊ธฐ, ์ŠคํŠธ๋ผ์ด๋“œ, ํŒจ๋”ฉ๊ณผ ๊ฐ™์€ ๊ฐ’์„ ์˜๋ฏธํ•œ๋‹ค.

์ฆ‰, ํŒŒ๋ผ๋ฏธํ„ฐ๋ž€, ์šฐ๋ฆฌ๊ฐ€ ๊ถ๊ทน์ ์œผ๋กœ ๊ตฌํ•˜๊ณ  ์‹ถ์€ ๋ณ€์ˆ˜๋ฅผ ์˜๋ฏธํ•˜๊ณ , ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ž€ ๊ทธ ๊ณผ์ •์†์—์„œ ํ•„์š”ํ•œ ๋ณ€์ˆ˜๋“ค์„ ์˜๋ฏธํ•œ๋‹ค.

ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์˜ˆ์‹œ๋กœ๋Š” Gradient Deacent ์˜ learning rate๊ฐ€ ์žˆ๊ฒ ๋‹ค.