Skip to content

Installing and running on Google Cloud #74

@saganatt

Description

@saganatt

Image: Deep Learning Image: TensorFlow 2.4, m66 CUDA 110

Install from scratch

  1. On first launch, it asks if you want to install Nvidia drivers, choose 'yes'. If it prints errors, wait a while, then try:
sudo /opt/deeplearning/install-driver.sh
  1. (Optional?) Stop jupyter notebook (powered by conda): sudo service stop jupyter
  2. Install pip packages: pip install setuptools wheel virtualenv
  3. Install aliBuild prerequisities for Ubuntu: link. Note: get default-libmysqlclient-dev instead of libmysqlclient-dev.
  4. If you see at the end some ldconfig errors, it means CUDA libraries are not symlinked correctly. Run:
sudo ln -sf /usr/local/cuda/lib64/libcudnn.so.8.0.5 /usr/local/cuda/lib64/libcudnn.so.8
sudo ln -sf /usr/local/cuda/lib64/libcudnn.so.8 /usr/local/cuda/lib64/libcudnn.so

Same for: libcudnn_{adv,cnn,ops}_{infer,train}.so, /lib/libnvinfer.so, /lib/libnvinfer_plugin.so, /lib/libnvonnxparser.so, /lib/libmyelin.so, /lib/libnvparsers.so.

You can check with ls -lha /usr/local/cuda/lib64/libcudnn* whether the files are properly symlinked. ldconfig -v tests libraries.

  1. Additional dependencies: sudo apt-get install libssl-dev libpython3.7 tcl environment-modules
  2. Install aliBuild: sudo pip install alibuild --upgrade. Add appropriate lines to your .bashrc:
export ALIBUILD_WORK_DIR="/home/jupyter/alice/sw"
eval "`alienv shell-helper`"

Note: I install under /home/jupyter, where the additional big disk is mounted on. I move all jupyter stuff to /home/jupyter/jupyter (sudo mkdir /home/jupyter/jupyter, then sudo chown -R jupyter:jupyter /home/jupyter/jupyter, then edit jupyter paths in /home/jupyter/jupyter/.jupyter/jupyter_notebook_config.py).

  1. Install ROOT and AliPhysics:
screen # So as not to care about connection problems
sudo mkdir -p /home/jupyter/alice
sudo chown -R <user>:<user> /home/jupyter/alice/ # Replace with your username
cd /home/jupyter/alice
aliBuild init AliPhysics@master
aliBuild build AliPhysics --defaults user-next-root6 --force-unknown-architecture

Copy the data

Edit ~/.ssh/config to add tunnel to aliceml
NOTE: Create bias / nobias subdirectories and copy only 90-17-17 or 180-33-33 data. 180-x-x files take a huge time to transfer!

sudo mkdir /home/jupyter/data
sudo chown -R <user>:<user> /home/jupyter/data
cd /home/jupyter/data
mkdir bias; mkdir nobias
rsync -vaP <user>@aliceml:<path_to_data> <target_path>

Run the analysis

Currently the installation (the steps above) is ready for use on instance 4

  1. If you haven't done it before (during the installation): pip install setuptools wheel virtualenv and add appropriate lines to your .bashrc:
export ALIBUILD_WORK_DIR="/home/jupyter/alice/sw"
eval "`alienv shell-helper`"
  1. Get TPCwithDNN (I cloned to my home directory): git clone https://github.com/AliceO2Group/TPCwithDNN.git.
  2. Modify load.sh:
    • check python path: which python and correct the path on line 38
    • change python version (to 3.7) on line 97
    • correct ALICE_ROOT path on line 91
  3. Remove the '> 3.7' Python version requirement from setup.py
  4. Install the package:
alienv enter AliPhysics/latest
source load.sh
pip install -e .
  1. pip install root_pandas root_numpy

Check results via X11 forward

sudo apt-get install qpdfview eog
To get X11 forwarding, you need to install gcloud console on your machine. During the installation you can specify your defaul project and zone.

curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-333.0.0-linux-x86_64.tar.gz
tar -xvf google-cloud-sdk-333.0.0-linux-x86_64.tar.gz
cd google-cloud-sdk
./install.sh
source ~/.bashrc

Then ssh with gcloud:

gcloud init
gcloud compute ssh --ssh-flag="-Y" --zone <zone> --project <project_name> <username>@<cloud_instance_name>

If some commands will complain about locale, set it with: sudo dpkg-reconfigure locales, select matching locale with Space (not Enter!)

Shutdown when execution finished

Add sudo shutdown -h now at the end of your run *.sh script.
Note: piping with ... | shutdown -h now causes immediate shutdown!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions