[Tip] Julia에 TensorFlow 커스텀 빌드 및 설치

Julia에 TensorFlow을 설치시 특히 GPU 지원 버전 설치시 내 서버에 설치된 cuda 버전과 추가한 tensorflow build 버전이 맞지 않는 경우

커스텀으로 tensorflow를 빌드 하고 후속 작업을 해야 한다.

먼저 Julia TensorFlow 패키지를 설치 하고 GPU 버전으로 컴파일 한다.

Julia>ENV["TF_USE_GPU"] = "1"
Pkg>add TensorFlow#master
Pkg>build TensorFlow

빌드된 TensorFlow 이미지는 예를 들어 /home/shpark/.julia/packages/TensorFlow/JljDB/deps/usr/bin 에 생성된다.

정상적으로 생성되면 아래 두개의 파일이 생성된다

libtensorflow.so 
libtensorflow_framework.so 

Julia에서 아래 코드를 실행 했을 때 에러나 현제 설치된 cuda버전과 다른 라이브러리를 찾는다면 위의 .so 라이브러리를 삭제하고

커스텀 빌드를 한다.

Julia> using TensorFlow
Julia> TensorFlow.Session()

Tensorflow 1.10 ~ 1.12 버전은 Bazel 버전 0.18 버전으로 빌드 한다

1.3 이 후 부처 Bazel 0.26.1 로 한다

기존에 높은 버전의 Bazel이 설치 되어 있다면 아래와 같이 삭제 한다.

bazel shutdown
rm $HOME/.cache/bazel -fr 
sudo rm /usr/local/bin/bazel /etc/bazelrc /usr/local/lib/bazel -fr

Bazel 0.26.1 다운로드 (더 높은 버전에서는 빌드가 안됨)

wget https://releases.bazel.build/0.26.1/release/bazel-0.26.1-installer-linux-x86_64.sh
chmod +x bazel-0.26.1-installer-linux-x86_64.sh
./bazel-0.26.1-installer-linux-x86_64.sh --user

Tensorflow 1.13.1다운로드
빌드를 위해 nccl도 필요 하다
nccl 다운로드 에서 파일을 받아 빌드 한다.

빌드 후 아래와 같이 nccl 라이브러리를 복사 해준다

cd build
sudo mkdir -p /usr/local/cuda/nccl/lib /usr/local/cuda/nccl/include
sudo cp *.txt /usr/local/cuda/nccl
sudo cp include/*.h /usr/include/
sudo cp lib/libnccl.so.2.5.6 lib/libnccl_static.a /usr/lib/x86_64-linux-gnu/
sudo ln -s /usr/include/nccl.h /usr/local/cuda/nccl/include/nccl.h
cd /usr/lib/x86_64-linux-gnu
sudo ln -s libnccl.so.2.5.6 libnccl.so.2
sudo ln -s libnccl.so.2 libnccl.so
for i in libnccl*; do sudo ln -s /usr/lib/x86_64-linux-gnu/$i /usr/local/cuda/nccl/lib/$i; done

받은 Tensorflow를 풀고 tensorflow 폴더로 이동한다.

./configure
Do you wish to build TensorFlow with CUDA support? [y/N]:y

Tensorflow build

bazel build --config=opt --config=cuda  --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" //tensorflow:libtensorflow.so

Build완료 후 (Tensorflow package의 라이브러리가 복사되는 곳(예시)  /home/shpark/.julia/packages/TensorFlow/JljDB/deps/usr/bin )

bazel shutdown
cd bazel-bin/tensorflow
cp -a * /home/shpark/.julia/packages/TensorFlow/JljDB/deps/usr/bin
cd /home/shpark/.julia/packages/TensorFlow/JljDB/deps/usr/bin
cd ln -s libtensorflow_framework.so.2 libtensorflow_framework.so
ls -al
lrwxrwxrwx  1 shpark shpark        28 Dec  4 02:33 libtensorflow_framework.so -> libtensorflow_framework.so.2*
lrwxrwxrwx  1 shpark shpark        32 Dec  4 02:26 libtensorflow_framework.so.2 -> libtensorflow_framework.so.2.0.0*
-r-xr-xr-x  1 shpark shpark  34342328 Dec  4 02:26 libtensorflow_framework.so.2.0.0*
-r-xr-xr-x  1 shpark shpark     22121 Dec  4 01:54 libtensorflow_framework.so.2.0.0-2.params*
lrwxrwxrwx  1 shpark shpark        18 Dec  4 02:27 libtensorflow.so -> libtensorflow.so.2*
lrwxrwxrwx  1 shpark shpark        22 Dec  4 02:27 libtensorflow.so.2 -> libtensorflow.so.2.0.0*
-r-xr-xr-x  1 shpark shpark 539673528 Dec  4 02:27 libtensorflow.so.2.0.0*
-r-xr-xr-x  1 shpark shpark    143213 Dec  4 01:55 libtensorflow.so.2.0.0-2.params*

Julia에서 아래코드 실행시

Julia> using TensorFlow
Julia> TensorFlow.Session()

아래와 같이 GPU 정보가 나오면 성공 한것임

2019-12-04 02:38:29.094119: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2598210000 Hz
2019-12-04 02:38:29.096951: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3569360 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2019-12-04 02:38:29.096994: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2019-12-04 02:38:29.100743: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-12-04 02:38:29.648171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1551] Found device 0 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:05:00.0
2019-12-04 02:38:29.649442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1551] Found device 1 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:06:00.0
2019-12-04 02:38:29.650699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1551] Found device 2 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:09:00.0
2019-12-04 02:38:29.651941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1551] Found device 3 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:0a:00.0
2019-12-04 02:38:29.652168: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-12-04 02:38:29.653514: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2019-12-04 02:38:29.654750: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2019-12-04 02:38:29.654994: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2019-12-04 02:38:29.656261: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2019-12-04 02:38:29.656903: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2019-12-04 02:38:29.659538: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-12-04 02:38:29.671761: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1679] Adding visible gpu devices: 0, 1, 2, 3
2019-12-04 02:38:29.671796: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-12-04 02:38:29.677085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1092] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-04 02:38:29.677102: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1098]      0 1 2 3
2019-12-04 02:38:29.677111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1111] 0:   N Y Y Y
2019-12-04 02:38:29.677117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1111] 1:   Y N Y Y
2019-12-04 02:38:29.677124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1111] 2:   Y Y N Y
2019-12-04 02:38:29.677130: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1111] 3:   Y Y Y N
2019-12-04 02:38:29.687338: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1237] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11436 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:05:00.0, compute capability: 6.1)
2019-12-04 02:38:29.690090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1237] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 11439 MB memory) -> physical GPU (device: 1, name: TITAN Xp, pci bus id: 0000:06:00.0, compute capability: 6.1)
2019-12-04 02:38:29.692745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1237] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 11439 MB memory) -> physical GPU (device: 2, name: TITAN Xp, pci bus id: 0000:09:00.0, compute capability: 6.1)
2019-12-04 02:38:29.695417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1237] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 11439 MB memory) -> physical GPU (device: 3, name: TITAN Xp, pci bus id: 0000:0a:00.0, compute capability: 6.1)
Session(Ptr{Nothing} @0x00007ff5a081c070)

그리고 아래와 같이 확인 해볼 수도 있다. 위 코드를 실행 하면 julia아 GPU메모리를 차지 한것이 보인다.

$ nvidia-smi
Wed Dec  4 02:57:46 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            On   | 00000000:05:00.0 Off |                  N/A |
|  0%   21C    P8    15W / 250W |    155MiB / 12193MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            On   | 00000000:06:00.0 Off |                  N/A |
|  0%   21C    P8     8W / 250W |    155MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN Xp            On   | 00000000:09:00.0 Off |                  N/A |
|  0%   22C    P8     8W / 250W |    155MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  TITAN Xp            On   | 00000000:0A:00.0 Off |                  N/A |
|  0%   18C    P8     7W / 250W |    155MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     17252      C   julia                                        145MiB |
|    1     17252      C   julia                                        145MiB |
|    2     17252      C   julia                                        145MiB |
|    3     17252      C   julia                                        145MiB |
+-----------------------------------------------------------------------------+

댓글 달기

이메일 주소는 공개되지 않습니다. 필수 필드는 *로 표시됩니다