์ดˆ๊ธฐ๊ฐ’ ์ฃผ๊ธฐ

x, w = 0 ์ผ ๊ฒฝ์šฐ gradient๊ฐ€ 0์ด ๋˜์„œ ํ•™์Šต์ด ์•ˆ๋œ๋‹ค!

๋ชจ๋‘ 0์ด๋ฉด ํ•™์Šต์ด ์•ˆ๋œ๋‹ค!

Restricted Boatman Machine(RBM)

์–ด๋–ป๊ฒŒ Weight๋ฅผ ์ดˆ๊ธฐํ™” ํ•  ์ˆ˜ ์žˆ์„๊นŒ?

๊ทธ ํ•ด๊ฒฐ๋ฐฉ์•ˆ ์ค‘ ํ•˜๋‚˜๊ฐ€ RBM์ด์—ˆ๋Š”๋ฐ, ์•„์ด๋””์–ด๋Š” ๊ฐ„๋‹จํ•˜๋‹ค.

๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๊ฐ€ ๋งŒ๋“ค์–ด์ ธ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜์ž. ์ž…๋ ฅ๋‹จ์€ 3๊ฐœ, ์ถœ๋ ฅ์€ 4๊ฐœ. ์šฐ๋ฆฌ๊ฐ€ ์ด ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ํ•™์Šตํ•  ๋•Œ, ๊ฒฐ๊ตญ ํฌ๊ฒŒ ๋‘๊ฐœ์˜ ๊ณผ์ •์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋˜๋Š”๋ฐ,

Forward Propagation, Backward Propagation ์ด๋‹ค.

๊ทธ๋ž˜์„œ ์šฐ๋ฆฌ๋Š” ์–ด๋–ค ๊ฒฐ๊ณผ๋ฅผ ์›ํ•˜๋ƒ๋ฉด, ์ž…๋ ฅ ์‹ ํ˜ธ์™€ ์ถœ๋ ฅ ์‹ ํ˜ธ๊ฐ€ ๊ฐ™์•„์ง€๋Š” weight, bias๋ฅผ ๊ตฌํ•œ๋‹ค.

์ด ๋•Œ weight, bias๋ฅผ ์ดˆ๊ธฐ๊ฐ’์œผ๋กœ ์‚ฌ์šฉํ•˜๊ฒ ๋‹ค๋Š” ๊ฒƒ์ด RBM์ด๋‹ค.

RBM ๊ณผ์ •

Pre-training

์ „์ฒด ๋„คํŠธ์›Œํฌ๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜๋Š”๋ฐ ์žˆ์–ด์„œ,

2๊ฐœ์˜ Layer๋งŒ RBM์„ ์ ์šฉํ•œ๋‹ค. ์ˆœ์ฐจ์ ์œผ๋กœ 2๊ฐœ์”ฉ ํŠœ๋‹ํ•œ weight๊ฐ€ ์ดˆ๊ธฐํ™”ํ•œ weight์ด๋‹ค.

Xavier initialization

๊ทธ๋ ‡๊ฒŒ ์•ˆํ•ด๋„ ๋ผ!

๋„ˆ๋ฌด ์ž‘๊ฑฐ๋‚˜, ๋„ˆ๋ฌด ํฌ๊ฒŒ๋งŒ ์„ค์ • ์•ˆํ•˜๋ฉด ๋ ๊ฑฐ์•ผ.

์ž…๋ ฅ ๊ฐœ์ˆ˜ = fan_in, ์ถœ๋ ฅ ๊ฐœ์ˆ˜ = fan_out

W = np.random.randn(fan_in, fan_out)/np.sqrt(fan_in)

He initialization

W = np.random.randn(fan_in, fan_out)/np.sqrt(fan_in/2)

์ด๊ฑฐ ์ž˜๋˜๋˜๋ฐ..?