Python¶
Introduction¶
Python is free software for computing and graphics used heavily in the AI/ML space.
Availability¶
Python is available on all clusters in all queues (partitions) through Python modules, Anaconda modules or Singularity containers.
Interface¶
There are two types of environments in which the python application can be used on ARC resources:
Graphical interface via OnDemand using Jupyter
Command-line interface. You can also start python from the command line after loading the required software module.
Note
Larger computations should be submitted as jobs, via a traditional job submission script.
Managing environments¶
The power of python is through extension of the base functionality via python packages. Managing and configuring your local python environment is best accomplished through a combination of a package manager (pip or conda) and an evironment manager Anaconda (or miniconda or micoromamba). Creation and use of conda environments allows one to activate the environment for later use. You can have several environments, each with different software dependencies, where you activate the one of interest at run time. Commonly, you will create a conda env, install software into it via conda/pip and then activate it for use. For example:
module load Anaconda3/2020.11
conda create -n mypy3 python=3.8 pip
source activate mypy3
conda install ipykernel
pip install plotly kaleido
Source activating the environment ensures later conda or pip installs will install into the environment location. For a more full discussion and examples, please see the Anaconda documentation:
https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
Running without environments¶
If you prefer to use python without an environment, you will need to set the PYTHONUSERBASE
environment variable to a location you can write to. For example:
#load a python module
module reset; module load Python/3.8.6-GCCcore-10.2.0
#give python a directory where it can install/load personalized packages
#you may want to make this more specific to cluster/node type/python version
export PYTHONUSERBASE=$HOME/python3
#install a package (--user tells python to install to the location
#specified by PYTHONUSERBASE)
pip install --user plotly
Command line running of Python scripts¶
First, we need both a python script and (likely) the conda environment setup. The environment for this example was shown above as mypy3
.
## violins.py
import plotly.express as px
# using the tips dataset
df = px.data.tips()
# plotting the violin chart
fig = px.violin(df, x="day", y="total_bill")
fig.write_image("fig1.jpeg")
Second, we need a shell script to submit to the Slurm scheduler. The script needs to specify the required compute resources, load the required software and finally run the actual script.
#!/bin/bash
### python.sh
###########################################################################
## environment & variable setup
####### job customization
#SBATCH -N 1
#SBATCH -n 16
#SBATCH -t 1:00:00
#SBATCH -p normal_q
#SBATCH -A <your account>
####### end of job customization
# end of environment & variable setup
###########################################################################
#### add modules:
module load Anaconda/2020.11
module list
#end of add modules
###########################################################################
###print script to keep a record of what is done
cat python.sh
echo "python code"
cat violins.py
###########################################################################
echo start load env and run python
source activate mypy3
python violins.py
exit;
Finally, to run both the batch script and python, we type:
sbatch python.sh
This will output a job number. You will have two output files:
fig1.jpeg
slurm-JOBID.log
The slurm log contains any output you would have seen had you typed python violins.py
at the command line.
Parallel Computing in Python¶
Coming soon-ish. In the meantime, an mpi4py example is provided as part of ARC’s examples repository.