Tensorflow
Creating a docker with the correct versions of tensorflow⚑
Tensorflow is very picky with the versions of Cuda and Cudnn, probably the ones that your OS installs don't work and they don't give you any instructions on how to install a specific version from a tarball. Luckily david does with that I've made the following Dockerfile to run:
- Tensorflow 1.29.0
- Cuda 12.5
- Cudnn 9.3.0
FROM nvidia/cuda:12.5.1-cudnn-devel-ubuntu22.04
# Install Python and dependencies
RUN apt-get update && apt-get install -y \
python3 python3-pip wget \
&& apt-get remove -y --allow-change-held-packages cudnn libcudnn9-dev-cuda-12 libcudnn9-cuda-12 \
&& apt autoremove -y \
&& rm -rf /var/lib/apt/lists/*
# Install TensorFlow (latest GPU-compatible)
RUN pip install tensorflow keras pandas jupyter notebook matplotlib scikit-learn
# Install correct cudnn version
RUN wget https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-9.3.0.75_cuda12-archive.tar.xz \
&& tar -Jxvf cudnn-linux-x86_64-9.3.0.75_cuda12-archive.tar.xz \
&& cp -P cudnn-linux-x86_64-9.3.0.75_cuda12-archive/lib/* /usr/local/cuda/lib64/ \
&& cp -P cudnn-linux-x86_64-9.3.0.75_cuda12-archive/include/* /usr/local/cuda/include/ \
&& chmod a+r /usr/local/cuda/include/cudnn*.h \
&& chmod a+r /usr/local/cuda/lib64/libcudnn* \
&& rm -rf /cudnn-linux-x86_64-9.3.0.75_cuda12-archive.tar.xz /cudnn-linux-x86_64-9.3.0.75_cuda12-archive
# Create working directory
WORKDIR /notebooks
# Expose Jupyter Notebook port
EXPOSE 8888
LABEL com.centurylinklabs.watchtower.enable=false
ENTRYPOINT []
CMD ["/usr/local/bin/jupyter", "notebook", "--ip=0.0.0.0", "--no-browser", "--allow-root"]
Which works with the next docker-compose
---
services:
jupyter:
image: jupyter:latest
container_name: jupyter-anomalia
restart: unless-stopped
ports:
- "8888:8888"
volumes:
- notebooks:/notebooks
- /etc/localtime:/etc/localtime:ro
environment:
- JUPYTER_ENABLE_LAB=yes
- JUPYTER_TOKEN=your-super-secure-pass
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
notebooks:
driver: local
driver_opts:
type: none
o: bind
device: /home/your-user/notebooks
Once you do docker compose up
you'll be able to access the notebook under http://your-ip:8888?token=your-super-secure-pass
Troubleshooting⚑
tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:40] 'cuModuleLoadData(&module, data)' failed with 'CUDA_ERROR_ILLEGAL_ADDRESS⚑
It's probably because you're trying to load stuff that does not fit your GPU
In my case I had to reduce units from 100 to 64
def define_model(in_vocab_size, embedding_vec_length, max_text_length, out_timesteps, out_vocab_size):
mt_model = Sequential()
mt_model.add(Embedding(
in_vocab_size,
embedding_vec_length,
mask_zero=True
)
)
mt_model.add(GRU(units))
mt_model.add(RepeatVector(out_timesteps))
mt_model.add(GRU(units, return_sequences=True))