Numpy.loadtxt()

๋‹จ, csv ํŒŒ์ผ์˜ ๋ชจ๋“  ๊ฐ’์ด ๊ฐ™์€ ์ž๋ฃŒํ˜•์ด์–ด์•ผ ํ•œ๋‹ค.

# Lab 4 Multi-variable linear regression
import tensorflow as tf
import numpy as np
tf.set_random_seed(777)  # for reproducibility
 
xy = np.loadtxt('data-01-test-score.csv', delimiter=',', dtype=np.float32)
x_data = xy[:, 0:-1]
y_data = xy[:, [-1]]
  • Slicing์„ ๋ชจ๋ฅด๊ฒ ์–ด์š”!

    [3:5] => 3์ด์ƒ 5๋ฏธ๋งŒ (3, 4)
    	[:] => ์ „์ฒด ๋ฆฌ์ŠคํŠธ ๊ฐ’ ์ถœ๋ ฅ (0, 1, 2, 3, 4, 5)
    [:-1] => ๋์—์„œ ํ•˜๋‚˜ ๋บด๊ณ  ๋‹ค ๊ฐ€์ ธ์™€ (0, 1, 2, 3, 4)
    [2:4] = [8,9] => 2์ด์ƒ, 4๋ฏธ๋งŒ ์ž๋ฆฌ์— 8,9๋ฅผ ๋„ฃ์–ด์ค˜ (1, 8, 9, 4, 5)

๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ ์ฝ”๋“œ

# Lab 4 Multi-variable linear regression
import tensorflow as tf
import numpy as np
tf.set_random_seed(777)  # for reproducibility
 
xy = np.loadtxt('data-01-test-score.csv', delimiter=',', dtype=np.float32)
x_data = xy[:, 0:-1]
y_data = xy[:, [-1]]
 
# Make sure the shape and data are OK
print(x_data, "\nx_data shape:", x_data.shape)
print(y_data, "\ny_data shape:", y_data.shape)
[[ 73.  80.  75.]
 [ 93.  88.  93.]
 ...
 [ 76.  83.  71.]
 [ 96.  93.  95.]] 
x_data shape: (25, 3)
[[152.]
 [185.]
 ...
 [149.]
 [192.]] 
y_data shape: (25, 1)

Placeholder์— ๊ฐ’ ๋„ฃ์–ด์„œ printํ•ด๋ณด๊ธฐ

# Lab 4 Multi-variable linear regression
import tensorflow as tf
import numpy as np
tf.set_random_seed(777)  # for reproducibility
 
xy = np.loadtxt('data-01-test-score.csv', delimiter=',', dtype=np.float32)
x_data = xy[:, 0:-1]
y_data = xy[:, [-1]]
 
# Make sure the shape and data are OK
print(x_data, "\nx_data shape:", x_data.shape)
print(y_data, "\ny_data shape:", y_data.shape)
 
# data output
'''
[[ 73.  80.  75.]
 [ 93.  88.  93.]
 ...
 [ 76.  83.  71.]
 [ 96.  93.  95.]] 
x_data shape: (25, 3)
[[152.]
 [185.]
 ...
 [149.]
 [192.]] 
y_data shape: (25, 1)
'''
 
# placeholders for a tensor that will be always fed.
X = tf.placeholder(tf.float32, shape=[None, 3])
Y = tf.placeholder(tf.float32, shape=[None, 1])
 
W = tf.Variable(tf.random_normal([3, 1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')
 
# Hypothesis
hypothesis = tf.matmul(X, W) + b
 
# Simplified cost/loss function
cost = tf.reduce_mean(tf.square(hypothesis - Y))
 
# Minimize
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)
 
# Launch the graph in a session.
sess = tf.Session()
# Initializes global variables in the graph.
sess.run(tf.global_variables_initializer())
 
for step in range(2001):
    cost_val, hy_val, _ = sess.run([cost, hypothesis, train], 
                                   feed_dict={X: x_data, Y: y_data})
    if step % 10 == 0:
        print(step, "Cost:", cost_val, "\nPrediction:\n", hy_val)
 
# train output
'''
0 Cost: 21027.0 
Prediction:
 [[22.048063 ]
 [21.619772 ]
 ...
 [31.36112  ]
 [24.986364 ]]
10 Cost: 95.976326 
Prediction:
 [[157.11063 ]
 [183.99283 ]
 ...
 [167.48862 ]
 [193.25117 ]]
 1990 Cost: 24.863274 
Prediction:
 [[154.4393  ]
 [185.5584  ]
 ...
 [158.27443 ]
 [192.79778 ]]
2000 Cost: 24.722485 
Prediction:
 [[154.42894 ]
 [185.5586  ]
 ...
 [158.24257 ]
 [192.79166 ]]
'''
 
# Ask my score
print("Your score will be ", sess.run(hypothesis, 
                                      feed_dict={X: [[100, 70, 101]]}))
 
print("Other scores will be ", sess.run(hypothesis,
                                        feed_dict={X: [[60, 70, 110], [90, 100, 80]]}))
Your score will be  [[ 181.73277283]]
Other scores will be  [[ 145.86265564]
 [ 187.23129272]]

Queue Runners

๋ชจ๋ธ์„ Train ์‹œ์ผœ์•ผ ํ•˜๋Š” ๊ธฐ๋ณธ ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์„ ๋•Œ, ์ด ๋ชจ๋“  ๊ฐ’๋“ค์„ ๋ฉ”๋ชจ๋ฆฌ์— ์˜ฌ๋ ค๋†“๊ณ  ์ž‘์—…ํ•˜๋Š” ๊ฒƒ์€ ๋„ˆ๋ฌด ๋ถ€ํ•˜๊ฐ€ ํฌ๋‹ค.

๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— Tensorflow๋Š” Queue Runner๋ผ๋Š” ๊ฒƒ์„ ์ œ๊ณตํ•œ๋‹ค. ๊ธฐ๋ณธ์ ์ธ ์›๋ฆฌ๋Š” ์ด๋ ‡๋‹ค.

  1. ๋ถˆ๋Ÿฌ์™€์•ผ ํ•  ํŒŒ์ผ๋“ค์„ ๋‹ค ๊ฐ€์ง€๊ณ  ์žˆ๋Š”๋‹ค.
  2. ๊ฐ๊ฐ์˜ ํŒŒ์ผ๋งˆ๋‹ค ์–ด๋–ป๊ฒŒ Readํ• ์ง€ ์ •ํ•œ๋‹ค. (Reader1, Reader2โ€ฆ)
  3. ์ฝ์–ด์˜จ ํŒŒ์ผ์„ ์–ด๋–ป๊ฒŒ Decodingํ• ์ง€ ์ •ํ•œ๋‹ค. (Decoder)
  4. ๋ช‡ ๊ฐœ์”ฉ ํ•™์Šตํ• ์ง€ Batch ๋‹จ์œ„๋กœ ์‹คํ–‰ํ•œ๋‹ค.
data-01-test-score.csv
73,80,75,152
93,88,93,185
89,91,90,180
96,98,100,196
73,66,70,142
53,46,55,101
69,74,77,149
47,56,60,115
87,79,90,175
79,70,88,164
69,70,73,141
70,65,74,141
93,95,91,184
79,80,73,152
70,73,78,148
93,89,96,192
78,75,68,147
81,90,93,183
88,92,86,177
78,83,77,159
82,86,90,177
86,82,89,175
78,83,85,175
76,83,71,149
96,93,95,192
# Lab 4 Multi-variable linear regression
# https://www.tensorflow.org/programmers_guide/reading_data
 
import tensorflow as tf
tf.set_random_seed(777)  # for reproducibility
 
# Queue Runner๋ฅผ ์‚ฌ์šฉํ•˜์ž.
# filename์€ list๋กœ ์—ฌ๋Ÿฌ๊ฐœ ์ค„ ์ˆ˜ ์žˆ๋‹ค.
filename_queue = tf.train.string_input_producer(
    ['data-01-test-score.csv'], shuffle=False, name='filename_queue')
 
# filename_queue๋ฅผ ์–ด๋–ป๊ฒŒ readํ• ์ง€ ์ •ํ•œ๋‹ค.
# ์ง€๊ธˆ ๊ฐ™์€ ๊ฒฝ์šฐ๋Š” text๋กœ ์ฝ์–ด์˜ค๋ผ๋Š” ๋ช…๋ น์„ ํ–ˆ๋‹ค.
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
 
# ์นผ๋Ÿผ์— ๊ฐ’์ด ์—†์„ ๊ฒฝ์šฐ ์–ด๋–ค ์ดˆ๊ธฐ๊ฐ’์„ ์“ธ ๊ฒƒ์ธ์ง€ ์ •ํ•œ๋‹ค.
# Default values, in case of empty columns. Also specifies the type of the
# decoded result.
# ๋˜ํ•œ ์•„๊นŒ text๋กœ readํ•œ ๋…€์„(value)์ด ์–ด๋–ป๊ฒŒ decodingํ•  ๋ฐฉ๋ฒ•์„ ํƒํ•œ๋‹ค.
# ์ง€๊ธˆ์€ csv ํŒŒ์ผ์„ text๋กœ ๋ถˆ๋Ÿฌ์˜จ ๋’ค csv ๋ฐฉ์‹์œผ๋กœ ํ•ด์„ํ•˜๋ผ๋Š” ๋ช…๋ น์„ ์ค€๊ฒƒ์ด๋‹ค.
record_defaults = [[0.], [0.], [0.], [0.]]
xy = tf.decode_csv(value, record_defaults=record_defaults)
 
# batch ๋ผ๋Š” ๋‹จ์œ„๋กœ ์šฐ๋ฆฌ๋Š” ํ•™์Šต์„ ํ• ํ…๋ฐ,
# ๋ถˆ๋Ÿฌ์˜จ ๋ฐ์ดํ„ฐ๋ฅผ ์ด ๋‹จ์œ„๋กœ ๋ฐ”๊ฟ”์ค„ ํ•„์š”๊ฐ€ ์žˆ๋‹ค.
# decodeํ•œ ๋ฐ์ดํ„ฐ๋ฅผ X_data, Y_data๋กœ ๋‚˜๋ˆ ์„œ batch ํ•จ์ˆ˜์— ๋„ฃ๋„๋ก ํ•˜์ž.
# ๊ทธ๋ฆฌ๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋ฅผ train์„ ๋ถ™์ธ ๊ฒƒ์— ๋ฐ›์ž.
# ๋ช…์‹ฌํ•  ๊ฒƒ์€, ์ด ์ž‘์—…์„ ํ•˜๋Š” ๋…ธ๋“œ๋ฅผ train_x_batch, train_y_batch๋ผ ๋ถ€๋ฅด๋Š” ๊ฒƒ์ด๋‹ค.
# ๋‚˜์ค‘์— ์‹คํ–‰ํ•ด์ค˜์•ผ ์ด ์ž‘์—…์„ ์‹ค์ œ๋กœ ์ˆ˜ํ–‰ํ•œ๋‹ค.
# ํ•œ๋ฒˆ ํŽŒํ”„์งˆ ํ• ๋•Œ ๋ช‡๊ฐœ์”ฉ ๊ฐ€์ ธ์˜ฌ๊นŒ? ๋ผ๋Š” ์–˜๊ธฐ๋กœ ๋ฐ›์•„๋“ค์ด๋ฉด ์ดํ•ด๊ฐ€ ์‰ฝ๋‹ค.
# collect batches of csv in
train_x_batch, train_y_batch = \
    tf.train.batch([xy[0:-1], xy[-1:]], batch_size=10)
 
# placeholders for a tensor that will be always fed.
X = tf.placeholder(tf.float32, shape=[None, 3])
Y = tf.placeholder(tf.float32, shape=[None, 1])
 
W = tf.Variable(tf.random_normal([3, 1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')
 
# Hypothesis
hypothesis = tf.matmul(X, W) + b
 
# Simplified cost/loss function
cost = tf.reduce_mean(tf.square(hypothesis - Y))
 
# Minimize
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)
 
# Launch the graph in a session.
sess = tf.Session()
# Initializes global variables in the graph.
sess.run(tf.global_variables_initializer())
 
# Queue Runner๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์ „์— ์จ์ฃผ๋Š” ์ฝ”๋“œ
# Start populating the filename queue.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
 
for step in range(2001):
# ์‹ค์ œ batch๋ฅผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด์„œ ๋…ธ๋“œ๋ฅผ ์‹คํ–‰์‹œ์ผœ์ค€๋‹ค.
    x_batch, y_batch = sess.run([train_x_batch, train_y_batch])
    cost_val, hy_val, _ = sess.run(
        [cost, hypothesis, train], feed_dict={X: x_batch, Y: y_batch})
    if step % 10 == 0:
        print(step, "Cost: ", cost_val, "\nPrediction:\n", hy_val)
 
# ๋๋‚ด๊ณ  ์ ์–ด์ฃผ๋Š” ์ฝ”๋“œ
coord.request_stop()
coord.join(threads)
 
# Ask my score
print("Your score will be ",
      sess.run(hypothesis, feed_dict={X: [[100, 70, 101]]}))
 
print("Other scores will be ",
      sess.run(hypothesis, feed_dict={X: [[60, 70, 110], [90, 100, 80]]}))
...
[172.44307]
[173.06042]
[164.73372]
[158.24258]
[192.79166]]
Your score will be  [[181.73277]]
Other scores will be  [[145.86266]
[187.2313 ]]