{"id":2495,"date":"2020-02-06T02:14:33","date_gmt":"2020-02-05T17:14:33","guid":{"rendered":"https:\/\/julialang.kr\/?p=2495"},"modified":"2020-02-06T02:14:35","modified_gmt":"2020-02-05T17:14:35","slug":"flux-mnist-example-with-gpu-mini-batch-fix-loss-nan","status":"publish","type":"post","link":"https:\/\/julialang.kr\/?p=2495","title":{"rendered":"[Flux] mnist example with gpu, mini-batch, fix loss NaN"},"content":{"rendered":"\n<p>\uc18c\uc2a4 URL :  <a href=\"https:\/\/github.com\/mrchaos\/model-zoo\/blob\/master\/vision\/mnist\/mlp_gpu_minibatch.jl\">https:\/\/github.com\/mrchaos\/model-zoo\/blob\/master\/vision\/mnist\/mlp_gpu_minibatch.jl<\/a> <\/p>\n\n\n\n<p>model zoo\uc758 mnist\uc608\uc81c \uc911 mlp.jl \uc18c\uc2a4\uc5d0 \uba87\uac00\uc9c0 \uc774\uc288\uac00 \uc788\uc5b4 \uc218\uc815\ud558\uace0 \ubbf8\ub2c8\ubc30\uce58\ub97c \uc801\uc6a9 \ud558\uc600\ub2e4.<\/p>\n\n\n\n<p>Julia 1.3.1, Flux 0.10.1 \uc744 \uc0ac\uc6a9 \ud558\uc600\ub2e4.<\/p>\n\n\n\n<p>\uac00\uc7a5 \ud070 \uc774\uc288\ub294 loss function\uc5d0\uc11c NaN\uc774 \ubc1c\uc0dd\ud558\uc5ec train\uc774 \uc81c\ub300\ub85c \ub418\uc9c0 \uc54a\ub294 \ubb38\uc81c\ub97c \uc218\uc815 \ud588\ub2e4.<\/p>\n\n\n\n<p>loss\uac00 \uc544\ub798\uc640 \uac19\uc774 \uc815\uc758 \ub418\ub294 \uacbd\uc6b0 mini-batch\ub97c \uc801\uc6a9\ud558\uba74 \uacc4\uc0b0 \ub418\ub294 batch \ub370\uc774\ud130\uac00 \uc801\uae30 \ub54c\ubb38\uc5d0 \uc790\uc8fc NaN\uc774 \ubc1c\uc0dd\ud55c\ub2e4.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>loss(x,y) = crossentropy(m(x), y)<\/code><\/pre>\n\n\n\n<p>\uc608\ub97c \ub4e4\uc5b4 \uc544\ub798\uc758 \uacbd\uc6b0 \uacc4\uc0b0 \ub370\uc774\ud130\uac00 \uc801\uae30 \ub54c\ubb38\uc5d0 crossentropy\uac12\uc774 NaN\uc774 \ubc1c\uc0dd\ud558\uc5ec \uac00\uc911\uce58 \uc5c5\ub370\uc774\ud2b8 \uc989 \ud559\uc2b5\uc774 \ub418\uc9c0 \uc54a\ub294\ub2e4. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>julia> crossentropy(&#91;1,1,0],&#91;1,0,0])\nNaN<\/code><\/pre>\n\n\n\n<p>\uc704\uc758 \ubb38\uc81c\ub97c \ud574\uacb0 \ud558\uae30 \uc704\ud574 \uc544\uc8fc \uc791\uc740 \uac12\uc740 \ub354 \ud574 \uc8fc\uba74 \ubb38\uc81c\ub97c \ud53c\ud574 \uac08 \uc218 \uc788\uc73c\uba70 \ud559\uc2b5\uc774 \uc815\uc0c1\uc801\uc73c\ub85c \uc798 \ub41c\ub2e4.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>julia> const \u03f5 = 1.0f-10\njulia> crossentropy(&#91;1,1,0] .+ \u03f5,&#91;1,0,0] .+ \u03f5)\n2.302585f-9<\/code><\/pre>\n\n\n\n<p>\ub610\ud55c onehotbatch \ub610\ub294  onecold \uac00 gpu\uc5d0\uc11c \uc7ac\ub300\ub85c \ub3d9\uc791 \ud558\uc9c0 \uc54a\ub294 \ubb38\uc81c\uac00 \uc788\uc73c\uba70 \ub610\ud55c gpu scalar \uc5f0\uc0b0\uc73c\ub85c \uc778\ud574 \ub290\ub824\uc9c4\ub2e4\ub294 \uacbd\uace0\uac00 \uacc4\uc18d \ubc1c\uc0dd\ud55c\ub2e4.<\/p>\n\n\n\n<p> gpu scalar \uc5f0\uc0b0\uc18d\ub3c4 \uc800\ud558\uc5d0 \ub300\ud55c \uacbd\uace0\ub97c \uc5c6\uc560\uace0   onehotbatch \ub610\ub294  onecold \uc774 gpu\uc5d0\uc11c \ubb38\uc81c\ub97c \uc5c6\uc560\uae30 \uc704\ud574 \uc544\ub798\uc640 \uac19\uc774 \uc218\uc815 \ud558\uc600\ub2e4.<\/p>\n\n\n\n<p>CuArrays.allowscalar(false) \ub85c gpu scalar\uc744 \ubc29\uc9c0\ud568<\/p>\n\n\n\n<p>onehotbatch(labels,0:9) |> gpu  &#8212;&#8211;> float.(onehotbatch(labels,0:9)) |> gpu<\/p>\n\n\n\n<p>accuracy \uacc4\uc0b0\uc2dc\uc5d0\ub3c4 \ubb38\uc81c\uac00 \ubc1c\uc0dd\ud558\uc5ec \uc544\ub798\uc640 \uac19\uc774 \uc218\uc815 \ud558\uc600\ub2e4. onecode\uacc4\uc0b0\uc2dc gpu\uba54\ubaa8\ub9ac \ub370\uc774\ud130\ub97c \uc0ac\uc6a9\ud558\ub294 \uacbd\uc6b0 \ubb38\uc81c\uac00 \ubc1c\uc0dd\ud558\ub294 \ub4ef\ud568<br>\ucd94\ud6c4\uc5d0  \uac1c\uc120\ub420 \uac83\uc73c\ub85c \uae30\ub300 \ud55c\ub2e4. <\/p>\n\n\n\n<p>gpu\uba54\ubaa8\ub9ac\ub97c cpu \ubaa8\ub4dc\ub85c  \ubcf5\uc0ac \ud6c4 onecold\ub97c \uc801\uc6a9\ud558\uba74 \ubb38\uc81c \uc5c6\uc774 \uc218\ud589\ub41c\ub2e4.<\/p>\n\n\n\n<p>\uc5d0\ub7ec\uac00 \ubc1c\uc0dd\ud558\ub294 accuracy \ubc84\uc804<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>accuracy(x,y) = mean(onecold(m(x)) .== onecold(y)) # \uc5d0\ub7ec \ubc1c\uc0dd<\/code><\/pre>\n\n\n\n<p>\uc218\uc815\ub41c  accuracy  \ubc84\uc804<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>accuracy(x,y) = mean(onecold(m(x)|>cpu) .== onecold(y|>cpu)) # \uc798 \ub3d9\uc791\ud568<\/code><\/pre>\n\n\n\n<p>mini-batch(\ubbf8\ub2c8\ubc30\uce58) \ub97c \uc801\uc6a9\ud558\uc5ec GPU\uc758 \uba54\ubaa8\ub9ac\ub97c \uc544\ub07c\uace0 \ud559\uc2b5\uacb0\uacfc\ub97c \ud5a5\uc0c1 \uc2dc\ud0b4<\/p>\n\n\n\n<p>\uc544\ub798\ub294 mini-batch \ub370\uc774\ud130\ub97c \ub9cc\ub294 function<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>function make_minibatch(imgs,labels,batch_size)\n  #=\n   reshape.(MNIST.images(),:) : &#91;(784,),(784,),...,(784,)]  60,000\uac1c\uc758 \ub370\uc774\ud130\n   X : (784x60,000)\n   Y : (10x60,000)\n  =#\n  X = hcat(float.(reshape.(imgs,:))...) |> gpu\n  Y = float.(onehotbatch(labels,0:9)) |> gpu\n  # Y = Float32.(onehotbatch(labels,0:9))\n  \n  data_set = &#91;(X&#91;:,i],Y&#91;:,i]) for i in partition(1:length(labels),batch_size)]\n  return data_set\nend<\/code><\/pre>\n\n\n\n<p>\ucc38\uace0\ub85c  \uc5ec\ub7ec\uac1c\uc758 GPU\uac00 \uc788\ub294 \uacbd\uc6b0 \ud2b9\uc815 GPU\ub97c \uc0ac\uc6a9 \ud560 \uc218 \uc788\ub2e4.  GPU \uc0ac\uc6a9\uc804\uc5d0 \uba3c\uc800 \uc544\ub798\uc640 \uac19\uc774 \uc124\uc815 \ud558\uba74 \uc9c0\uc815\ub41c GPU\uc5d0\uc11c \uc5f0\uc0b0\uc774 \ubc1c\uc0dd\ub41c\ub2e4.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># use 1nd GPU : default\nCUDAnative.device!(0)\n# use 2nd GPU\n#CUDAnative.device!(1)<\/code><\/pre>\n\n\n\n<p>\uc804\uccb4 \uc18c\uc2a4 \ucf54\ub4dc<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#=\nJulia version: 1.3.1\nFlux version : 0.10.1\n=#\n__precompile__()\nmodule MNIST_BATCH\nusing Flux\nusing Flux.Data.MNIST, Statistics\nusing Flux: onehotbatch, onecold, crossentropy,throttle\nusing Base.Iterators: repeated,partition\n\nusing CUDAnative\nusing CuArrays\nCuArrays.allowscalar(false)\n\n#= \nVery important !!\n\u03f5 is used to prevent loss NaN\n=#\nconst \u03f5 = 1.0f-10\n\n# Load training labels and images from Flux.Data.MNIST\n@info(\"Loading data...\")\n#=\nMNIST.images() : &#91;(28x28),...,(28x28)] 60,000x28x28 training images\nMNIST.labels() : 0 ~ 9 labels , 60,000x10 training labels\n=#\ntrain_imgs = MNIST.images()\ntrain_labels = MNIST.labels()\n\n# use 1nd GPU : default\n#CUDAnative.device!(0)\n# use 2nd GPU\n#CUDAnative.device!(1)\n\n# Bundle images together with labels and group into minibatch\nfunction make_minibatch(imgs,labels,batch_size)\n  #=\n   reshape.(MNIST.images(),:) : &#91;(784,),(784,),...,(784,)]  60,000\uac1c\uc758 \ub370\uc774\ud130\n   X : (784x60,000)\n   Y : (10x60,000)\n  =#\n  X = hcat(float.(reshape.(imgs,:))...) |> gpu\n  Y = float.(onehotbatch(labels,0:9)) |> gpu\n  # Y = Float32.(onehotbatch(labels,0:9))\n  \n  data_set = &#91;(X&#91;:,i],Y&#91;:,i]) for i in partition(1:length(labels),batch_size)]\n  return data_set\nend\n\n@info(\"Making model...\")\n# Model\nm = Chain(\n  Dense(28^2,32,relu), # y1 = relu(W1*x + b1), y1 : (32x?), W1 : (32x784), b1 : (32,)\n  Dense(32,10), # y2 = W2*y1 + b2, y2 : (10,?), W2: (10x32), b2:(10,)\n  softmax\n) |> gpu\nloss(x,y) = crossentropy(m(x) .+ \u03f5, y .+ \u03f5)\naccuracy(x,y) = mean(onecold(m(x)|>cpu) .== onecold(y|>cpu))\n\nbatch_size = 500\ntrain_dataset = make_minibatch(train_imgs,train_labels,batch_size)\n\nopt = ADAM()\n\n\n@info(\"Training model...\")\n\nepochs = 200\n# used for plots\naccs = Array{Float32}(undef,0)\n\ndataset_len = length(train_dataset)\nfor i in 1:epochs\n  for (idx,dataset) in enumerate(train_dataset)\n    Flux.train!(loss,params(m),&#91;dataset],opt)\n    # Flux.train!(loss,params(m),&#91;dataset],opt,cb = throttle(()->@show(loss(dataset...)),20))\n    acc = accuracy(dataset...)\n    if idx == dataset_len\n      @info(\"Epoch# $(i)\/$(epochs) - loss: $(loss(dataset...)), accuracy: $(acc)\")\n      push!(accs,acc)\n    end\n  end\nend\n\n# Test Accuracy\ntX = hcat(float.(reshape.(MNIST.images(:test),:))...) |> gpu\ntY = float.(onehotbatch(MNIST.labels(:test),0:9)) |> gpu\n\nprintln(\"Test loss:\", loss(tX,tY))\nprintln(\"Test accuracy:\", accuracy(tX,tY))\n\nend\n\nusing Plots;gr()\nplot(MNIST_BATCH.accs)\n\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>\uc18c\uc2a4 URL : https:\/\/github.com\/mrchaos\/model-zoo\/blob\/master\/vision\/mnist\/mlp_gpu_minibatch.jl model zoo\uc758 mnist\uc608\uc81c \uc911 mlp.jl \uc18c\uc2a4\uc5d0 \uba87\uac00\uc9c0 \uc774\uc288\uac00 \uc788\uc5b4 \uc218\uc815\ud558\uace0 \ubbf8\ub2c8\ubc30\uce58\ub97c \uc801\uc6a9 \ud558\uc600\ub2e4. Julia 1.3.1, Flux 0.10.1 \uc744 \uc0ac\uc6a9 \ud558\uc600\ub2e4. \uac00\uc7a5 \ud070 \uc774\uc288\ub294 loss function\uc5d0\uc11c NaN\uc774 \ubc1c\uc0dd\ud558\uc5ec train\uc774 \uc81c\ub300\ub85c \ub418\uc9c0 \uc54a\ub294 \ubb38\uc81c\ub97c \uc218\uc815 \ud588\ub2e4. loss\uac00 \uc544\ub798\uc640 \uac19\uc774 \uc815\uc758 \ub418\ub294 \uacbd\uc6b0 mini-batch\ub97c \uc801\uc6a9\ud558\uba74 \uacc4\uc0b0 \ub418\ub294 batch \ub370\uc774\ud130\uac00 \uc801\uae30 \ub54c\ubb38\uc5d0 \uc790\uc8fc NaN\uc774 \ubc1c\uc0dd\ud55c\ub2e4. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"default","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"default","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[18,21],"tags":[],"_links":{"self":[{"href":"https:\/\/julialang.kr\/index.php?rest_route=\/wp\/v2\/posts\/2495"}],"collection":[{"href":"https:\/\/julialang.kr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/julialang.kr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/julialang.kr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/julialang.kr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2495"}],"version-history":[{"count":3,"href":"https:\/\/julialang.kr\/index.php?rest_route=\/wp\/v2\/posts\/2495\/revisions"}],"predecessor-version":[{"id":2498,"href":"https:\/\/julialang.kr\/index.php?rest_route=\/wp\/v2\/posts\/2495\/revisions\/2498"}],"wp:attachment":[{"href":"https:\/\/julialang.kr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2495"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/julialang.kr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2495"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/julialang.kr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2495"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}