I am using tensor flow version :
0.12.1
Cuda tool set version is 8.
lrwxrwxrwx 1 root root 19 May 28 17:27 cuda -> /usr/local/cuda-8.0
As documented here I have downloaded and installed cuDNN. But while execeting following line from my python script I am getting error messages mentioned in header:
model.fit_generator(train_generator,
steps_per_epoch= len(train_samples),
validation_data=validation_generator,
validation_steps=len(validation_samples),
epochs=9)
Detailed error message is as follows:
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Epoch 1/9 Exception in thread Thread-1: Traceback (most recent call last): File " lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run() File " lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs) File " lib/python3.5/site-packages/keras/engine/training.py", line 612, in data_generator_task
generator_output = next(self._generator) StopIteration
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1),
but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885]
Found device 0 with properties: name: GRID K520 major: 3 minor: 0 memoryClockRate (GHz) 0.797 pciBusID 0000:00:03.0 Total memory: 3.94GiB Free memory:
3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975]
Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
Traceback (most recent call last): File "model_new.py", line 82, in <module>
model.fit_generator(train_generator, steps_per_epoch= len(train_samples),validation_data=validation_generator, validation_steps=len(validation_samples),epochs=9) File " lib/python3.5/site-packages/keras/legacy/interfaces.py", line 88, in wrapper
return func(*args, **kwargs) File " lib/python3.5/site-packages/keras/models.py", line 1110, in fit_generator
initial_epoch=initial_epoch) File " lib/python3.5/site-packages/keras/legacy/interfaces.py", line 88, in wrapper
return func(*args, **kwargs) File " lib/python3.5/site-packages/keras/engine/training.py", line 1890, in fit_generator
class_weight=class_weight) File " lib/python3.5/site-packages/keras/engine/training.py", line 1633, in train_on_batch
outputs = self.train_function(ins) File " lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2229, in __call__
feed_dict=feed_dict) File " lib/python3.5/site-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr) File " lib/python3.5/site-packages/tensorflow/python/client/session.py", line 937, in _run
np_val = np.asarray(subfeed_val, dtype=subfeed_dtype) File " lib/python3.5/site-packages/numpy/core/numeric.py", line 531, in asarray
return array(a, dtype, copy=False, order=order) MemoryError
If any suggestion to resolve this error is appreciated.
EDIT: Issue is fatal.
uname -a
Linux ip-172-31-76-109 4.4.0-78-generic #99-Ubuntu SMP
Thu Apr 27 15:29:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
sudo lshw -short
[sudo] password for carnd:
H/W path Device Class Description
==========================================
system HVM domU
/0 bus Motherboard
/0/0 memory 96KiB BIOS
/0/401 processor Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
/0/402 processor CPU
/0/403 processor CPU
/0/404 processor CPU
/0/405 processor CPU
/0/406 processor CPU
/0/407 processor CPU
/0/408 processor CPU
/0/1000 memory 15GiB System Memory
/0/1000/0 memory 15GiB DIMM RAM
/0/100 bridge 440FX - 82441FX PMC [Natoma]
/0/100/1 bridge 82371SB PIIX3 ISA [Natoma/Triton II]
/0/100/1.1 storage 82371SB PIIX3 IDE [Natoma/Triton II]
/0/100/1.3 bridge 82371AB/EB/MB PIIX4 ACPI
/0/100/2 display GD 5446
/0/100/3 display GK104GL [GRID K520]
/0/100/1f generic Xen Platform Device
/1 eth0 network Ethernet interface
EDIT 2:
This is an EC2 instance in Amazon cloud. And all the files holding value -1.
:/sys$ find . -name numa_node -exec cat '{}' \;
find: ‘./fs/fuse/connections/39’: Permission denied
-1
-1
-1
-1
-1
-1
-1
find: ‘./kernel/debug’: Permission denied
EDIT3: After updating the numa_nod files NUMA related error is disappeared. But all other previous errors listed above is remaining. And again I got a fatal error.
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Epoch 1/9
Exception in thread Thread-1:
Traceback (most recent call last):
File " lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File " lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File " lib/python3.5/site-packages/keras/engine/training.py", line 612, in data_generator_task
generator_output = next(self._generator)
StopIteration
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
Traceback (most recent call last):
File "model_new.py", line 85, in <module>
model.fit_generator(train_generator, steps_per_epoch= len(train_samples),validation_data=validation_generator, validation_steps=len(validation_samples),epochs=9)
File " lib/python3.5/site-packages/keras/legacy/interfaces.py", line 88, in wrapper
return func(*args, **kwargs)
File " lib/python3.5/site-packages/keras/models.py", line 1110, in fit_generator
initial_epoch=initial_epoch)
File " lib/python3.5/site-packages/keras/legacy/interfaces.py", line 88, in wrapper
return func(*args, **kwargs)
File " lib/python3.5/site-packages/keras/engine/training.py", line 1890, in fit_generator
class_weight=class_weight)
File " lib/python3.5/site-packages/keras/engine/training.py", line 1633, in train_on_batch
outputs = self.train_function(ins)
File " lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2229, in __call__
feed_dict=feed_dict)
File " lib/python3.5/site-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File " lib/python3.5/site-packages/tensorflow/python/client/session.py", line 937, in _run
np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
File " lib/python3.5/site-packages/numpy/core/numeric.py", line 531, in asarray
return array(a, dtype, copy=False, order=order)
MemoryError