-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added tensorflow-cpu docker image #15
Open
sabetAI
wants to merge
6
commits into
mlbench:develop
Choose a base branch
from
sabetAI:develop
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
bc51db3
Added tensorflow-cpu docker image
fc81fcb
Created Docker base image for running tensorflow with mpi support
f7b0aff
Fixed Dockerfile git fetch error for tf-mpi fork repo
f4c149d
Delete .Dockerfile.swp
sabetAI 90adbfc
Removed fetch from pull request, was merged and unnecessary
66da667
Merge branch 'develop' of https://github.com/sabetAI/mlbench-benchmar…
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
FROM tensorflow/tensorflow:latest-py3 as mlbench-worker-base-cpu | ||
# TODO: reduce size and complexity of image. | ||
|
||
RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
gcc \ | ||
make \ | ||
libc-dev \ | ||
musl-dev \ | ||
openssh-server \ | ||
g++ \ | ||
git \ | ||
curl \ | ||
sudo \ | ||
iproute2 | ||
|
||
# -------------------- SSH -------------------- | ||
RUN cat /etc/ssh/ssh_config | grep -v StrictHostKeyChecking > /etc/ssh/ssh_config.new && \ | ||
echo " StrictHostKeyChecking no" >> /etc/ssh/ssh_config.new && \ | ||
mv /etc/ssh/ssh_config.new /etc/ssh/ssh_config | ||
|
||
ARG SSH_USER=root | ||
ENV SSH_USER=$SSH_USER | ||
RUN mkdir -p /ssh-key/$SSH_USER && chown -R $SSH_USER:$SSH_USER /ssh-key/$SSH_USER | ||
RUN mkdir -p /.sshd/host_keys && \ | ||
chown -R $SSH_USER:$SSH_USER /.sshd/host_keys && chmod 700 /.sshd/host_keys | ||
RUN mkdir -p /.sshd/user_keys/$SSH_USER && \ | ||
chown -R $SSH_USER:$SSH_USER /.sshd/user_keys/$SSH_USER && chmod 700 /.sshd/user_keys/$SSH_USER | ||
VOLUME /ssh-key/$SSH_USER | ||
|
||
# -------------------- Conda environment -------------------- | ||
RUN curl -o ~/miniconda.sh -O https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh && \ | ||
sh ~/miniconda.sh -b -p /conda && rm ~/miniconda.sh | ||
ENV PATH /conda/bin:$PATH | ||
ENV LD_LIBRARY_PATH /conda/lib:$LD_LIBRARY_PATH | ||
|
||
# TODO: Source code in Channel Anaconda can be outdated, switch to conda-forge if posible. | ||
RUN conda install -y -c anaconda numpy pyyaml scipy mkl setuptools cmake cffi mkl-include typing \ | ||
&& conda install -y -c mingfeima mkldnn \ | ||
&& conda install -y -c soumith magma-cuda90 \ | ||
&& conda install -y -c conda-forge python-lmdb opencv numpy \ | ||
&& conda clean --all -y | ||
|
||
# -------------------- Open MPI -------------------- | ||
RUN mkdir /.openmpi/ | ||
RUN apt-get update && apt-get install -y --no-install-recommends wget \ | ||
&& wget https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.0.tar.gz\ | ||
&& gunzip -c openmpi-3.0.0.tar.gz | tar xf - \ | ||
&& cd openmpi-3.0.0 \ | ||
&& ./configure --prefix=/.openmpi/ \ | ||
&& make all install \ | ||
&& rm /openmpi-3.0.0.tar.gz \ | ||
&& rm -rf /openmpi-3.0.0 \ | ||
&& apt-get remove -y wget | ||
|
||
ENV PATH /.openmpi/bin:$PATH | ||
ENV LD_LIBRARY_PATH /.openmpi/lib:$LD_LIBRARY_PATH | ||
|
||
RUN mv /.openmpi/bin/mpirun /.openmpi/bin/mpirun.real && \ | ||
echo '#!/bin/bash' > /.openmpi/bin/mpirun && \ | ||
echo "/.openmpi/bin/mpirun.real" '--allow-run-as-root "$@"' >> /.openmpi/bin/mpirun && \ | ||
chmod a+x /.openmpi/bin/mpirun | ||
|
||
# Configure OpenMPI to run good defaults: | ||
# --bind-to none --map-by slot --mca btl_tcp_if_exclude lo,docker0 | ||
RUN echo "hwloc_base_binding_policy = none" >> /.openmpi/etc/openmpi-mca-params.conf && \ | ||
echo "rmaps_base_mapping_policy = slot" >> /.openmpi/etc/openmpi-mca-params.conf && \ | ||
echo "btl_tcp_if_exclude = lo,docker0" >> /.openmpi/etc/openmpi-mca-params.conf | ||
|
||
# configure the path. | ||
RUN echo export 'PATH=$HOME/conda/envs/pytorch-py$PYTHON_VERSION/bin:$HOME/.openmpi/bin:$PATH' >> ~/.bashrc | ||
RUN echo export 'LD_LIBRARY_PATH=$HOME/.openmpi/lib:$LD_LIBRARY_PATH' >> ~/.bashrc | ||
|
||
RUN conda install -y -c conda-forge mpi4py | ||
# -------------------- TensorFlow Related -------------------- | ||
RUN conda install tensorflow | ||
|
||
# -------------------- Others -------------------- | ||
RUN echo "orte_keep_fqdn_hostnames=t" >> /.openmpi/etc/openmpi-mca-params.conf | ||
|
||
ADD ./entrypoint.sh /usr/local/bin/ | ||
RUN chmod a+x /usr/local/bin/entrypoint.sh | ||
|
||
# Copy your application code to the container (make sure you create a .dockerignore file if any large files or directories should be excluded) | ||
RUN mkdir /app/ | ||
WORKDIR /app/ | ||
|
||
EXPOSE 22 | ||
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"] | ||
CMD ["/usr/sbin/sshd","-eD", "-f", "/.sshd/user_keys/root/sshd_config"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
#!/bin/sh | ||
|
||
PERMIT_ROOT_LOGIN=yes | ||
MY_NAME=root | ||
|
||
ssh-keygen -f /.sshd/host_keys/host_rsa_key -C '' -N '' -t rsa | ||
ssh-keygen -f /.sshd/host_keys/host_dsa_key -C '' -N '' -t dsa | ||
|
||
create_ssh_key() { | ||
user=$1 | ||
mkdir -p /.sshd/user_keys/$user | ||
chmod 700 /.sshd/user_keys/$user | ||
chown $user:$user /.sshd/user_keys/$user | ||
if ! [ -z "$(ls -A /ssh-key/root)" ]; then | ||
cp /ssh-key/root/* /.sshd/user_keys/$user/ | ||
chmod 600 /.sshd/user_keys/$user/* | ||
chown $user:$user /.sshd/user_keys/$user/* | ||
fi | ||
} | ||
|
||
create_ssh_key $MY_NAME | ||
|
||
# generating sshd_config | ||
cat << EOT > /.sshd/user_keys/$MY_NAME/sshd_config | ||
# Package generated configuration file | ||
# See the sshd_config(5) manpage for details | ||
# What ports, IPs and protocols we listen for | ||
Port 22 | ||
# Use these options to restrict which interfaces/protocols sshd will bind to | ||
#ListenAddress :: | ||
#ListenAddress 0.0.0.0 | ||
Protocol 2 | ||
PidFile /.sshd/user_keys/$MY_NAME/sshd.pid | ||
# HostKeys for protocol version 2 | ||
HostKey /.sshd/host_keys/host_rsa_key | ||
HostKey /.sshd/host_keys/host_dsa_key | ||
#Privilege Separation is turned on for security | ||
UsePrivilegeSeparation no | ||
# Lifetime and size of ephemeral version 1 server key | ||
KeyRegenerationInterval 3600 | ||
ServerKeyBits 768 | ||
# Logging | ||
SyslogFacility AUTH | ||
LogLevel INFO | ||
# Authentication: | ||
LoginGraceTime 120 | ||
PermitRootLogin $PERMIT_ROOT_LOGIN | ||
StrictModes yes | ||
RSAAuthentication yes | ||
PubkeyAuthentication yes | ||
AuthorizedKeysFile /.sshd/user_keys/%u/authorized_keys | ||
# Don't read the user's ~/.rhosts and ~/.shosts files | ||
IgnoreRhosts yes | ||
# For this to work you will also need host keys in /etc/ssh_known_hosts | ||
RhostsRSAAuthentication no | ||
# similar for protocol version 2 | ||
HostbasedAuthentication no | ||
# Uncomment if you don't trust ~/.ssh/known_hosts for RhostsRSAAuthentication | ||
#IgnoreUserKnownHosts yes | ||
# To enable empty passwords, change to yes (NOT RECOMMENDED) | ||
PermitEmptyPasswords no | ||
# Change to yes to enable challenge-response passwords (beware issues with | ||
# some PAM modules and threads) | ||
ChallengeResponseAuthentication no | ||
X11Forwarding yes | ||
X11DisplayOffset 10 | ||
PrintMotd no | ||
PrintLastLog yes | ||
TCPKeepAlive yes | ||
#UseLogin no | ||
# Allow client to pass locale environment variables | ||
AcceptEnv LANG LC_* | ||
Subsystem sftp /usr/lib/openssh/sftp-server | ||
# Set this to 'yes' to enable PAM authentication, account processing, | ||
# and session processing. If this is enabled, PAM authentication will | ||
# be allowed through the ChallengeResponseAuthentication and | ||
# PasswordAuthentication. Depending on your PAM configuration, | ||
# PAM authentication via ChallengeResponseAuthentication may bypass | ||
# the setting of "PermitRootLogin without-password". | ||
# If you just want the PAM account and session checks to run without | ||
# PAM authentication, then enable this but set PasswordAuthentication | ||
# and ChallengeResponseAuthentication to 'no'. | ||
UsePAM no | ||
# we need this to set various variables (LD_LIBRARY_PATH etc.) for users | ||
# since sshd wipes all previously set environment variables when opening | ||
# a new session | ||
PermitUserEnvironment yes | ||
EOT | ||
|
||
#cat << EOT > /$MY_NAME/.ssh/config | ||
cat << EOT > /etc/ssh/ssh_config | ||
StrictHostKeyChecking no | ||
IdentityFile /.sshd/user_keys/$MY_NAME/id_rsa | ||
Port 22 | ||
UserKnownHostsFile=/dev/null | ||
EOT | ||
|
||
#prepare run dir | ||
if [ ! -d "/var/run/sshd" ]; then | ||
mkdir -p /var/run/sshd | ||
fi | ||
# EOT | ||
exec "$@" |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
FROM nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04 as mlbench-worker-base | ||
# TODO: reduce size and complexity of image. | ||
|
||
RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
gcc \ | ||
make \ | ||
libc-dev \ | ||
musl-dev \ | ||
openssh-server \ | ||
g++ \ | ||
git \ | ||
curl \ | ||
sudo \ | ||
iproute2 | ||
|
||
# -------------------- SSH -------------------- | ||
RUN cat /etc/ssh/ssh_config | grep -v StrictHostKeyChecking > /etc/ssh/ssh_config.new && \ | ||
echo " StrictHostKeyChecking no" >> /etc/ssh/ssh_config.new && \ | ||
mv /etc/ssh/ssh_config.new /etc/ssh/ssh_config | ||
|
||
ARG SSH_USER=root | ||
ENV SSH_USER=$SSH_USER | ||
RUN mkdir -p /ssh-key/$SSH_USER && chown -R $SSH_USER:$SSH_USER /ssh-key/$SSH_USER | ||
RUN mkdir -p /.sshd/host_keys && \ | ||
chown -R $SSH_USER:$SSH_USER /.sshd/host_keys && chmod 700 /.sshd/host_keys | ||
RUN mkdir -p /.sshd/user_keys/$SSH_USER && \ | ||
chown -R $SSH_USER:$SSH_USER /.sshd/user_keys/$SSH_USER && chmod 700 /.sshd/user_keys/$SSH_USER | ||
VOLUME /ssh-key/$SSH_USER | ||
|
||
# -----–––---------------------- Cuda Dependency -------------------- | ||
RUN echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list | ||
RUN apt-get update && apt-get install -y --no-install-recommends --allow-downgrades \ | ||
--allow-change-held-packages \ | ||
libnccl2=2.0.5-3+cuda9.0 \ | ||
libnccl-dev=2.0.5-3+cuda9.0 &&\ | ||
rm -rf /var/lib/apt/lists/* | ||
|
||
# -------------------- Conda environment -------------------- | ||
RUN curl -o ~/miniconda.sh -O https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh && \ | ||
sh ~/miniconda.sh -b -p /conda && rm ~/miniconda.sh | ||
ENV PATH /conda/bin:$PATH | ||
ENV LD_LIBRARY_PATH /conda/lib:$LD_LIBRARY_PATH | ||
|
||
# TODO: Source code in Channel Anaconda can be outdated, switch to conda-forge if posible. | ||
RUN conda install -y -c anaconda numpy pyyaml scipy mkl setuptools cmake cffi mkl-include typing \ | ||
&& conda install -y -c mingfeima mkldnn \ | ||
&& conda install -y -c soumith magma-cuda90 \ | ||
&& conda install -y -c conda-forge python-lmdb opencv numpy \ | ||
&& conda clean --all -y | ||
|
||
# -------------------- Open MPI -------------------- | ||
RUN mkdir /.openmpi/ | ||
RUN apt-get update && apt-get install -y --no-install-recommends wget \ | ||
&& wget https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.0.tar.gz\ | ||
&& gunzip -c openmpi-3.0.0.tar.gz | tar xf - \ | ||
&& cd openmpi-3.0.0 \ | ||
&& ./configure --prefix=/.openmpi/ --with-cuda\ | ||
&& make all install \ | ||
&& rm /openmpi-3.0.0.tar.gz \ | ||
&& rm -rf /openmpi-3.0.0 \ | ||
&& apt-get remove -y wget | ||
|
||
ENV PATH /.openmpi/bin:$PATH | ||
ENV LD_LIBRARY_PATH /.openmpi/lib:$LD_LIBRARY_PATH | ||
|
||
RUN mv /.openmpi/bin/mpirun /.openmpi/bin/mpirun.real && \ | ||
echo '#!/bin/bash' > /.openmpi/bin/mpirun && \ | ||
echo "/.openmpi/bin/mpirun.real" '--allow-run-as-root "$@"' >> /.openmpi/bin/mpirun && \ | ||
chmod a+x /.openmpi/bin/mpirun | ||
|
||
# Configure OpenMPI to run good defaults: | ||
# --bind-to none --map-by slot --mca btl_tcp_if_exclude lo,docker0 | ||
RUN echo "hwloc_base_binding_policy = none" >> /.openmpi/etc/openmpi-mca-params.conf && \ | ||
echo "rmaps_base_mapping_policy = slot" >> /.openmpi/etc/openmpi-mca-params.conf && \ | ||
echo "btl_tcp_if_exclude = lo,docker0" >> /.openmpi/etc/openmpi-mca-params.conf | ||
|
||
# configure the path. | ||
RUN echo export 'PATH=$HOME/conda/envs/pytorch-py$PYTHON_VERSION/bin:$HOME/.openmpi/bin:$PATH' >> ~/.bashrc | ||
RUN echo export 'LD_LIBRARY_PATH=$HOME/.openmpi/lib:$LD_LIBRARY_PATH' >> ~/.bashrc | ||
|
||
RUN conda install -y -c conda-forge mpi4py | ||
|
||
# -------- Build Tensorflow with MPI support------ | ||
# source install instructions https://www.tensorflow.org/install/source | ||
# pip six numpy wheel setuptools already installed | ||
RUN pip install mock && \ | ||
pip install keras_applications==1.0.6 --no-deps && \ | ||
pip install keras_preprocessing==1.0.5 --no-deps | ||
|
||
# install bazel | ||
RUN echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list && \ | ||
curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add - && \ | ||
apt-get update && apt-get -y install bazel && \ | ||
apt-get install --only-upgrade bazel | ||
|
||
# clone tensorflow repo | ||
WORKDIR /tmp | ||
RUN git clone https://github.com/tensorflow/tensorflow.git | ||
|
||
# configure for CUDA 9.0 with mpi | ||
WORKDIR /tmp/tensorflow | ||
RUN echo '\n''\n'n'\n''\n'N'\n'y'\n'9'\n''\n''\n''\n'N'\n''\n''\n'N'\n''\n'y'\n''\n''\n'N'\n' | ./configure | ||
|
||
# build package with bazel | ||
RUN bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package && \ | ||
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg | ||
|
||
# install with pip | ||
RUN TFLOW_PKG_VER=$(ls /tmp/tensorflow_pkg/) && \ | ||
pip install /tmp/tensorflow_pkg/$TFLOW_PKG_VER | ||
|
||
# -------------------- Others -------------------- | ||
RUN echo "orte_keep_fqdn_hostnames=t" >> /.openmpi/etc/openmpi-mca-params.conf | ||
|
||
ADD ./entrypoint.sh /usr/local/bin/ | ||
RUN chmod a+x /usr/local/bin/entrypoint.sh | ||
|
||
# Copy your application code to the container (make sure you create a .dockerignore file if any large files or directories should be excluded) | ||
RUN mkdir /app/ | ||
WORKDIR /app/ | ||
RUN git clone https://github.com/mlbench/mlbench-benchmarks.git | ||
RUN pip install -r /app/mlbench-benchmarks/tensorflow/imagerecognition/openmpi-cifar10-resnet20-all-reduce/requirements.txt | ||
|
||
EXPOSE 22 | ||
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"] | ||
CMD ["/usr/sbin/sshd","-eD", "-f", "/.sshd/user_keys/root/sshd_config"] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR. looks great. We should use a fixed-version base image so updates to the TF image don't accidentally break our images