Tensorflow работает быстрее на CPU чем на GPU. Как правильно настроить?

Question

JackBoner @JackBoner

Tensorflow работает быстрее на CPU чем на GPU. Как правильно настроить?

Не могу понять почему модель обучается на CPU в 2-3 быстрее чем на GPU.

windows 10
tensorflow 1.13.1
keras 2.2.4
CUDA 10.1

Имеется модель:

network = models.Sequential()
network.add(layers.Dense(5, activation='relu', input_shape=(5,), kernel_regularizer=regularizers.l2(0.05), activity_regularizer=regularizers.l1(0.01)))
network.add(layers.Dense(2, activation='relu', kernel_regularizer=regularizers.l2(0.01)))
network.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])

Лог:

Using TensorFlow backend.
2019-04-26 19:42:22.001733: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-04-26 19:42:22.242292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.83
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.59GiB
2019-04-26 19:42:22.242786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-26 19:42:22.858856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-26 19:42:22.859063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-04-26 19:42:22.859197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-04-26 19:42:22.859446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 6314 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 9401497665143581718
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 6620742943
locality {
  bus_id: 1
  links {
  }
}
incarnation: 3794371743575443843
physical_device_desc: "device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5"
]
2019-04-26 19:42:22.871318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-26 19:42:22.871539: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-26 19:42:22.871806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-04-26 19:42:22.871938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-04-26 19:42:22.872124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6314 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-04-26 19:42:22.874432: I tensorflow/core/common_runtime/direct_session.cc:317] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5
WARNING:tensorflow:From C:\Program Files\Miniconda\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From C:\Program Files\Miniconda\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-04-26 19:42:24.455242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-26 19:42:24.455451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-26 19:42:24.455650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-04-26 19:42:24.455810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-04-26 19:42:24.455997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6314 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-04-26 19:42:24.846946: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_90.dll locally

В %PATH% добавлены пути к тулкиту и cupti

CUDA\lib64
CUDA\include
CUDA\bin

При обучении GPU загружен на 10%, но при этом память занята проактически вся
А CPU загружен на 60-70%, будто обучение проходит на нем, а не на GPU

Где в действительности происходит выполнение? Если на GPU, то почему оно в несколько раз медленне чем на CPU?

Вопрос задан более трёх лет назад
2320 просмотров

1 комментарий

Подписаться 3 Средний 1 комментарий

Пригласить эксперта

Ответы на вопрос 1

1 комментарий

Ваш ответ на вопрос

Войдите, чтобы написать ответ

Войти через центр авторизации

Похожие вопросы

Python

+1 ещё

Простой
Как увеличить паузу между отправкой запроса и получением результата?
- 1 подписчик
- 3 часа назад
- 45 просмотров
2

ответа
Python

Простой
Как исправить проблему с установкой torch?
- 1 подписчик
- 6 часов назад
- 49 просмотров
0

ответов
Python

+1 ещё

Средний
Как навести мышь внутри приложения?
- 1 подписчик
- 8 часов назад
- 40 просмотров
0

ответов
Python

+1 ещё

Простой
Как пройти авторизацию на youtube с помощью selenium?
- 1 подписчик
- 15 часов назад
- 49 просмотров
2

ответа
Python

+2 ещё

Простой
Как установить 2 версии libssl в kubuntu 22.04?
- 2 подписчика
- вчера
- 166 просмотров
0

ответов
Python

Простой
Как в библиотеке Flet при нажатии на кнопку сделать, чтобы появилось всплывающее окно?
- 1 подписчик
- вчера
- 27 просмотров
0

ответов
Python

+1 ещё

Сложный
Интерпретация результатов модели lambdamart?
- 1 подписчик
- вчера
- 36 просмотров
0

ответов
Python

Простой
Как в конце каждой строки файла добавить тэг?
- 1 подписчик
- вчера
- 138 просмотров
1

ответ
Python

+1 ещё

Простой
Почему asyncio.current_task() не передается в функцию?
- 1 подписчик
- вчера
- 94 просмотра
1

ответ
Python

+2 ещё

Простой
Срабатывает антивирус на скомпилированный файл python, как исправить?
- 1 подписчик
- 22 апр.
- 208 просмотров
1

ответ
Показать ещё Загружается…

Team Lead (С++, Python)

TopAssistant • Москва

от 400 000 ₽

Python developer

Bell Integrator

До 350 000 ₽

Python developer

Greenway Global • Новосибирск

от 150 000 ₽

Разработка игры нарды

25 апр. 2024, в 15:31

70000 руб./за проект

Верстка мобильной версии на SvelteKit [FIGMA]

25 апр. 2024, в 15:26

15000 руб./за проект

Сделать ретопологию 3D моделей в Blender

25 апр. 2024, в 15:13

3000 руб./за проект

Цпу кушается на операции чтения данных из хранилища датасета и для отправки их на gpu. Если вы кормите gpu малыми батчами, то эффективность работы gpu снижается и на каком-то значении становится неэффективной и убыточной, а процессор оказывается занят постоянным общением с gpu и хранилищем, передавая туда-сюда данные микропакетами. Как запускаете процесс обучения?

Answer 1 · 2019-09-02 19:22:17

Вам нужно установить tensorflow-gpu.
И проверьте что все ok:
#test.py
import tensorflow as tf

#allow growth to take up minimal resources
config = tf.ConfigProto()
config.gpu_options.allow_growth = True

sess = tf.Session(config=config)

Tensorflow работает быстрее на CPU чем на GPU. Как правильно настроить?

Войдите, чтобы написать ответ

Минуточку внимания

Войдите на сайт