๋น„์šฉ ์ตœ์†Œํ™” ์ฝ”๋“œ

์ €๋ฒˆ๊ธ€์—์„œ X, Y์— ์ตœ์ ํ™”๋œ W, b๋ฅผ ์ฐพ์•˜๋‹ค๋ฉด, ์ด๋ฒˆ์—๋Š” ๊ณ ์ •๋œ X, Y ์— ๋Œ€ํ•ด ๋‹ค๋ฅธ W๊ฐ’์„ ๋„ฃ์–ด์ฃผ๋ฉด์„œ Cost์˜ ๋ณ€ํ™”๋ฅผ ์‚ดํŽด๋ณด์ž.

# Lab 3 Minimizing Cost
import tensorflow as tf
import matplotlib.pyplot as plt
 
# ๊ณ ์ •๋œ ๋ฐ์ดํ„ฐ ์…‹
X = [1, 2, 3]
Y = [1, 2, 3]
 
# W๊ฐ’์„ ๋ณ€ํ™”์‹œํ‚ค๋ฉฐ ๊ด€์ฐฐํ•  ๊ฒƒ์ด๋ฏ€๋กœ placeholder๋กœ ์„ ์–ธํ•ด์ค€๋‹ค.
W = tf.placeholder(tf.float32)
 
# Our hypothesis for linear model X * W
# ๊ฐ„๋‹จํ•œ 1์ฐจํ•จ์ˆ˜๋กœ ์„ ์–ธํ•ด๋ณด์ž.
hypothesis = X * W
 
# cost/loss function
# ๋น„์šฉํ•จ์ˆ˜๋Š” MSE๋กœ ํ•œ๋‹ค.
cost = tf.reduce_mean(tf.square(hypothesis - Y))
 
# Variables for plotting cost function
# W๊ฐ’์— ๋”ฐ๋ฅธ Cost์˜ ๋ณ€ํ™”๋ฅผ ์•Œ์•„๋ณผ ๊ฒƒ์ด๋ฏ€๋กœ ๊ทธ๋ž˜ํ”„๋ฅผ ์‹คํ–‰ํ•˜๋ฉด์„œ ๊ฐ’์„ ์ €์žฅํ•  ๋นˆ list๋ฅผ ๋งŒ๋“ค์–ด์ค€๋‹ค.
W_history = []
cost_history = []
 
# Launch the graph in a session.
with tf.Session() as sess:
    for i in range(-30, 50):
        curr_W = i * 0.1
        curr_cost = sess.run(cost, feed_dict={W: curr_W})
 
        W_history.append(curr_W)
        cost_history.append(curr_cost)
 
# Show the cost function
plt.plot(W_history, cost_history)
plt.show()

W = 1 ์ผ๋•Œ, cost๊ฐ€ ์ตœ์†Œ๊ฐ€ ๋œ๋‹ค.

Gradient Descent

W๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๊ฒƒ์ด W์˜ ์ตœ์†Œ๋ฅผ ์ฐพ๋„๋ก ํ•œ๋‹ค.

์•„๊นŒ minimize ํ•  ๋•Œ ์‚ฌ์šฉํ–ˆ๋˜ ์ฝ”๋“œ๋ฅผ ์šฐ๋ฆฌ๊ฐ€ ๋‹ค์Œ์˜ ์ฝ”๋“œ๋กœ ๋ฐ”๊ฟ€ ์ˆ˜ ์žˆ๋‹ค.

# Minimize: Gradient Descent using derivative: W -= learning_rate * derivative
learning_rate = 0.1
gradient = tf.reduce_mean((W * X - Y) * X)
descent = W - learning_rate * gradient
update = W.assign(descent)

๊ทธ๋ฆฌ๊ณ  ๋‚˜์„œ ์„ธ์…˜์„ ์‹คํ–‰ํ•ด์ค˜์•ผ ํ•˜๋ฏ€๋กœ,

# Launch the graph in a session.
with tf.Session() as sess:
    # Initializes global variables in the graph.
    sess.run(tf.global_variables_initializer())
 
    for step in range(21):
        _, cost_val, W_val = sess.run(
            [update, cost, W], feed_dict={X: x_data, Y: y_data}
        )
        print(step, cost_val, W_val)
 
"""
0 6.8174477 [1.6446238]
1 1.9391857 [1.3437994]
2 0.5515905 [1.1833596]
3 0.15689684 [1.0977918]
4 0.044628453 [1.0521556]
5 0.012694317 [1.0278163]
6 0.003610816 [1.0148354]
7 0.0010270766 [1.0079122]
8 0.00029214387 [1.0042198]
9 8.309683e-05 [1.0022506]
10 2.363606e-05 [1.0012003]
11 6.723852e-06 [1.0006402]
12 1.912386e-06 [1.0003414]
13 5.439676e-07 [1.000182]
14 1.5459062e-07 [1.000097]
15 4.3941593e-08 [1.0000517]
16 1.2491266e-08 [1.0000275]
17 3.5321979e-09 [1.0000147]
18 9.998237e-10 [1.0000079]
19 2.8887825e-10 [1.0000042]
20 8.02487e-11 [1.0000023]

์ดˆ๊ธฐ Weight๋ฅผ ๋‹ค๋ฅด๊ฒŒ ์ค˜๋ณด์ž.

# Lab 3 Minimizing Cost
import tensorflow as tf
 
# tf Graph Input
X = [1, 2, 3]
Y = [1, 2, 3]
 
# Set wrong model weights
W = tf.Variable(5.0)
 
# Linear model
hypothesis = X * W
 
# cost/loss function
cost = tf.reduce_mean(tf.square(hypothesis - Y))
 
# Minimize: Gradient Descent Optimizer
train = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost)
 
# Launch the graph in a session.
with tf.Session() as sess:
    # Initializes global variables in the graph.
    sess.run(tf.global_variables_initializer())
 
    for step in range(101):
        _, W_val = sess.run([train, W])
        print(step, W_val)
 
 
0 5.0
1 1.2666664
2 1.0177778
3 1.0011852
4 1.000079
...
97 1.0
98 1.0
99 1.0
100 1.0
 

Tensorflow์—๊ฒŒ Gradient ๊ตฌํ•˜๊ฒŒ ํ•ด๋ณด๊ธฐ

์œ„์˜ ์˜ˆ์ œ ์—์„œ ์šฐ๋ฆฌ๋Š”

MSE ํ•จ์ˆ˜์˜ Gradient ๋ฅผ ๊ตฌํ•ด, gradient๋ผ๋Š” ๋ณ€์ˆ˜์— ๋„ฃ์–ด์ฃผ์—ˆ๋‹ค.

gradient = tf.reduce_mean((W * X - Y) * X)

๊ทธ๋Ÿฐ๋ฐ, MSE๋งŒ ์ •์˜ํ•ด์ฃผ๊ณ , tensorflowํ•œํ…Œ ์ด๊ฑธ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค.

๋˜ํ•œ ๋ฏธ๋ถ„๊นŒ์ง€๋งŒ ์ •์˜ํ•˜๊ณ , ๊ทธ ๊ฐ’์—์„œ ๋‚ด๊ฐ€ ์ถ”๊ฐ€์ ์ธ ํ–‰๋™ ์—ญ์‹œ ํ•  ์ˆ˜ ์žˆ๋‹ค.

# Lab 3 Minimizing Cost
# This is optional
import tensorflow as tf
 
# tf Graph Input
X = [1, 2, 3]
Y = [1, 2, 3]
 
# Set wrong model weights
W = tf.Variable(5.)
 
# Linear model
hypothesis = X * W
 
# Manual gradient
# ์œ„์—์„œ๋Š” 2๊ฐ€ ์—†์ง€๋งŒ, 
gradient = tf.reduce_mean((W * X - Y) * X) * 2
 
# cost/loss function
cost = tf.reduce_mean(tf.square(hypothesis - Y))
 
# Gradient Descent Optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
 
# optimize ๊นŒ์ง€ ์„ ์–ธํ•˜๊ณ  .minizize(cost)๋ฅผ ์“ฐ๊ฒŒ ๋˜๋ฉด, ๊ธฐ๋ณธ์ ์ธ
# Gradient descent๊ฐ€ ์ง„ํ–‰๋œ๋‹ค.
# ๊ทธ๋Ÿฐ๋ฐ ์ดํ•จ์ˆ˜๋ฅผ ์“ฐ์ง€๋ง๊ณ , ์—ฌ๊ธฐ์„œ .compute_gradients(cost)๋ฅผ ์“ฐ๊ฒŒ๋˜๋ฉด,
# ์ด cost ํ•จ์ˆ˜์— ๋Œ€ํ•œ gradient๋ฅผ ๊ณ„์‚ฐํ•ด์ค€๋‹ค.
# ์ฆ‰ ๋ฏธ๋ถ„ํ•œ ๊ฐ’์„ ๋Œ๋ ค์ค€๋‹ค.
# Get gradients
gvs = optimizer.compute_gradients(cost)
 
# ์—ฌ๊ธฐ์„œ ๋ชจ๋ธ์„ ์ˆ˜์ •ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค๋ฅธ ์ž‘์—…์ด ํ•„์š”ํ•˜๋ฉด ํ•˜๋ฉด๋œ๋‹ค.
# Optional: modify gradient if necessary
# gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
 
# Apply gradients
# ์ด ๋ณ€ํ™”ํ•œ gvs ๊ฐ’์„ ๋‹ค์‹œ optimizer์— ๋Œ๋ ค์ค„ ์ˆ˜ ์žˆ๋‹ค.
apply_gradients = optimizer.apply_gradients(gvs)
 
# Launch the graph in a session.
with tf.Session() as sess:
    # Initializes global variables in the graph.
    sess.run(tf.global_variables_initializer())
 
    for step in range(101):
        gradient_val, gvs_val, _ = sess.run([gradient, gvs, apply_gradients])
        print(step, gradient_val, gvs_val)
 
'''
0 37.333332 [(37.333336, 5.0)]
1 33.84889 [(33.84889, 4.6266665)]
2 30.689657 [(30.689657, 4.2881775)]
3 27.825289 [(27.825289, 3.981281)]
...
97 0.0027837753 [(0.0027837753, 1.0002983)]
98 0.0025234222 [(0.0025234222, 1.0002704)]
99 0.0022875469 [(0.0022875469, 1.0002451)]
100 0.0020739238 [(0.0020739238, 1.0002222)]
'''