๊ฐœ์š”

์•ž์„œ ์•Œ์•„๋ณธ, FCN์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ๋ฅผ Dilated Convolution์œผ๋กœ ํ•ด๊ฒฐํ•˜๊ฒ ๋‹ค๋Š” ๋…ผ์ง€์ด๋‹ค.

ํ•ต์‹ฌ ์•„์ด๋””์–ด

Dilated Convolution์œผ๋กœ parameter ์ˆ˜๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ Pooling์˜ ํšจ๊ณผ๋ฅผ ๋ˆ„๋ฆฌ๊ณ , Resoultion์ด ์ค„์–ด๋“œ๋Š” ๊ฒƒ์„ ๋ง‰๋Š”๋‹ค.(์ผ์„์‚ผ์กฐ)

๊ธฐ์กด์˜ FCN์—์„œ๋Š” pooling์œผ๋กœ ์ธํ•œ ํ•ด์ƒ๋„ ๊ฐ์†Œ๋ฌธ์ œ๋ฅผ skip connection์œผ๋กœ ํ•ด๊ฒฐํ•˜์˜€๋‹ค. ์—ฌ๊ธฐ์„œ, ๊ทผ๋ณธ์ ์œผ๋กœ pooling์— ๋Œ€ํ•ด ํ•ด๊ฒฐํ•˜๋ณด๋ ค๋Š” ์˜์ง€๊ฐ€ ์—ฟ๋ณด์ธ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ Pooling์„ ํ•˜๋Š” ์ด์œ ๋Š” global feature๋ฅผ multi-scale๋กœ ๋ณด๊ธฐ ์œ„ํ•ด์„œ์ด๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฐ ๊ด€์ ์€ classification์˜ ๊ด€์ ์—์„œ ๋งž๋Š” ๋ง์ด๋‹ค.

segementation์„ ์œ„ํ•ด์„œ๋Š” ๊ฒฐ๊ตญ dense prediction์„ ์–ป์–ด๋‚ด์•ผ ํ•˜๋Š”๋ฐ, ์ด๋Š” up-convolutions์™€ multi-scale inputs๋ฅผ ํ†ตํ•ด ๊ฐ€๋Šฅํ•˜๋‹ค. up-convolutions๋Š” ์ด์ „ ๊ธ€์—์„œ ์ฐพ์•„๋ณผ ์ˆ˜ ์žˆ๋‹ค. multi sclae inputs์€ ์ด๋ฆ„์—์„œ๋„ ์œ ์ถ”๊ฐ€ ๊ฐ€๋Šฅํ•˜๋“ฏ์ด ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ ์—ฌ๋Ÿฌ scale์—์„œ test๋ฅผ ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์˜๋ฏธํ•œ๋‹ค.

multi scale inputs ์˜ˆ์‹œ

Dilated Convolution์€ ์• ์ดˆ์— pooling์„ ํ•ด์•ผ๋ผ? ๋ผ๋Š” ์งˆ๋ฌธ์—์„œ ์ถœ๋ฐœํ•œ๋‹ค.

Dilated Convolution

Dilated Convolution์€ ํ•„ํ„ฐ ๋‚ด๋ถ€์— zero padding์„ ์ถ”๊ฐ€ํ•ด ๊ฐ•์ œ๋กœ receptive field๋ฅผ ๋Š˜๋ฆฌ๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ์œ„ ๊ทธ๋ฆผ์€ ํŒŒ๋ž€์ƒ‰์ด ์ธํ’‹, ์ดˆ๋ก์ƒ‰์ด ์•„์›ƒํ’‹์ธ๋ฐ, ์ง„ํ•œ ํŒŒ๋ž‘ ๋ถ€๋ถ„์—๋งŒ weight๊ฐ€ ์žˆ๊ณ  ๋‚˜๋จธ์ง€ ๋ถ€๋ถ„์€ 0์œผ๋กœ ์ฑ„์›Œ์ง„๋‹ค. receptive field๋ž€ ํ•„ํ„ฐ๊ฐ€ ํ•œ ๋ฒˆ์˜ ๋ณด๋Š” ์˜์˜์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ๋Š”๋ฐ, ๊ฒฐ๊ตญ ํ•„ํ„ฐ๋ฅผ ํ†ตํ•ด ์–ด๋–ค ์‚ฌ์ง„์˜ ์ „์ฒด์ ์ธ ํŠน์ง•์„ ์žก์•„๋‚ด๊ธฐ ์œ„ํ•ด์„œ๋Š” receptive field๋Š” ๋†’์œผ๋ฉด ๋†’์„ ์ˆ˜๋ก ์ข‹๋‹ค. ๊ทธ๋ ‡๋‹ค๊ณ  ํ•„ํ„ฐ์˜ ํฌ๊ธฐ๋ฅผ ํฌ๊ฒŒํ•˜๋ฉด ์—ฐ์‚ฐ์˜ ์–‘์ด ํฌ๊ฒŒ ๋Š˜์–ด๋‚˜๊ณ , ์˜ค๋ฒ„ํ”ผํŒ…์˜ ์šฐ๋ ค๊ฐ€์žˆ๋‹ค.

๊ทธ๋ž˜์„œ ์ผ๋ฐ˜์ ์ธ CNN์—์„œ๋Š” ์ด๋ฅผ conv-pooling์˜ ๊ฒฐํ•ฉ์œผ๋กœ ํ•ด๊ฒฐํ•œ๋‹ค. pooling์„ ํ†ตํ•ด dimension์„ ์ค„์ด๊ณ  ๋‹ค์‹œ ์ž‘์€ ํฌ๊ธฐ์˜ filter๋กœ conv๋ฅผ ํ•˜๋ฉด, ์ „์ฒด์ ์ธ ํŠน์ง•์„ ์žก์•„๋‚ผ ์ˆ˜ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ pooling์„ ์ˆ˜ํ–‰ํ•˜๋ฉด ๊ธฐ์กด ์ •๋ณด์˜ ์†์‹ค์ด ์ผ์–ด๋‚œ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ๊ฒƒ์ด Dilated Convolution์œผ๋กœ Pooling์„ ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š๊ณ ๋„ receptive field์˜ ํฌ๊ธฐ๋ฅผ ํฌ๊ฒŒ ๊ฐ€์ ธ๊ฐˆ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— spatial dimension์˜ ์†์‹ค์ด ์ ๊ณ , ๋Œ€๋ถ€๋ถ„์˜ weight๊ฐ€ 0์ด๊ธฐ ๋•Œ๋ฌธ์— ์—ฐ์‚ฐ์˜ ํšจ์œจ๋„ ์ข‹๋‹ค.

Structure

Structure of using Dilated Convolution

์ฒซ๋ฒˆ์งธ ๊ทธ๋ฆผ์€ classification์„ ์œ„ํ•œ CNN VGG-16์˜ ์•„ํ‚คํ…์ณ์ด๋‹ค. conv-pooling์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•œ ํ›„, ๋งˆ์ง€๋ง‰์œผ๋กœ Fully Connected Layer์— ํ†ต๊ณผํ•˜์—ฌ ์ตœ์ข… classification ๊ฒฐ๊ณผ๋ฅผ ์–ป๋Š” ๊ณผ์ •์„ ๋ณด์—ฌ์ฃผ๊ณ ์žˆ๋‹ค. ๊ทธ ์•„๋ž˜์˜ ๊ทธ๋ฆผ์€ Dilated Convolution์„ ํ†ตํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ segmentationํ•˜๋Š” ์˜ˆ๋ฅผ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋‹ค. ์ด ์•„ํ‚คํ…์ณ์˜ ์•„์›ƒํ’‹์˜ ์‚ฌ์ด์ฆˆ๋Š” 28x28xN ์ด๋ฉฐ, (N์€ segmentation ์›ํ•˜๋Š” ํด๋ž˜์Šค์˜ ์ˆ˜) ์ด๋ฅผ ๋‹ค์‹œ upsamplingํ•˜์—ฌ ์›๋ž˜์˜ ํฌ๊ธฐ๋กœ ๋ณต์›ํ•œ๋‹ค. (์ด๋ถ€๋ถ„์—์„œ ๊ณต๊ฐ„์  ์ •๋ณด์˜ ์†์‹ค์ด ์žˆ๋‹ค.)

์ด ์•„ํ‚คํ…์ณ์™€ classification ์•„ํ‚คํ…์ณ์˜ ๋‹ค๋ฅธ์ ์€ ์šฐ์„  ๋‹ค์ด์•„๋ชฌ๋“œ ๋ชจ์–‘์œผ๋กœ ํ‘œ์‹œํ•œ dilated convolution์œผ ํ†ตํ•ด ๊ณต๊ฐ„์  ์ •๋ณด์˜ ์†์‹ค์„ ์ตœ์†Œํ™”ํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  dilated convolution 2๋ฒˆ์„ ์ ์šฉํ•œ ๋’ค ๋‚˜์˜จ 28x28x4096 ์— ๋Œ€ํ•˜์—ฌ 1x1 convolution์œผ๋กœ channel์˜ dimension reduction์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ตœ์ข…์ ์œผ๋กœ 28x28xN์ด ๋‚˜์˜ค๊ณ  ์ด๋ฅผ 8x upsamplingํ•˜์—ฌ ์ตœ์ข…์ ์ธ segmention ๊ฒฐ๊ณผ๋ฅผ output์œผ๋กœ ๋‚ด๋†“๋Š”๋‹ค. ์ด ๋•Œ 1x1 convolution ์€ ๊ณต๊ฐ„์ ์ธ ์ •๋ณด๋ฅผ ์žƒ์ง€ ์•Š๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋ฉฐ, classification์˜ Fully Connected Layer(FC)์™€ ๋น„์Šทํ•œ ์—ญํ• ์„ ํ•œ๋‹ค. ํ•˜์ง€๋งŒ classification์—์„œ๋Š” ๊ณต๊ฐ„์ ์ธ ์ •๋ณด๋Š” ์ค‘์š”ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— Flattenํ•˜์—ฌ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋Š” ์•ž์„  ๊ธ€์—์„œ ์ž์„ธํ•˜๊ฒŒ ๋‹ค๋ค„๋ณด์•˜๋‹ค.

๊ฒฐ๊ณผ

comparison of whether using dilated conv

์ด ๊ทธ๋ฆผ์„ ํ†ตํ•ด pooling-convํ›„ upsampling์„ ํ•˜๋Š” ๊ฒƒ๊ณผ dilated convolution(astrous convolution)์„ ํ•˜๋Š” ๊ฒƒ์˜ ์ฐจ์ด๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์œ„ ๊ทธ๋ฆผ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ ๊ณต๊ฐ„์  ์ •๋ณด์˜ ์†์‹ค์ด ์žˆ๋Š” ๊ฒƒ์„ upsampling ํ•˜๋ฉด ํ•ด์ƒ๋„๊ฐ€ ๋–จ์–ด์ง„๋‹ค. ํ•˜์ง€๋งŒ dilated convolution์˜ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด receptive field๋ฅผ ํฌ๊ฒŒ ๊ฐ€์ ธ๊ฐ€๋ฉด์„œ convolution์„ ํ•˜๋ฉด ์ •๋ณด์˜ ์†์‹ค์„ ์ตœ๋Œ€ํ™”ํ•˜๋ฉด์„œ ํ•ด์ƒ๋„๋Š” ํฐ output์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

Reference