Running on Azure Batch

Azure Batch is a cloud-based service that allows you to run large-scale parallel and high-performance computing applications. This section provides an overview of how to set up and run SCHISM on Azure Batch.

Setup Azure

Download Azure Utilities

Azure is a paid service separate from SCHISM, BayDeltaSCHISM, and CA-DWR. Once you have set up your Azure batch and storage accounts within your Azure resource group, you’ll need to download a few things:

  • Azure CLI
    • download and install on your machine to pass jobs (simulations) to Azure batch

    • this is likely a command line string that you’ll run and re-launch a terminal

  • AzCopy
    • download and install on your machine to upload content to storage container

  • Azure Batch Explorer (desktop app)
    • this isn’t technically necessary, but is quite helpful in managing ongoing simulations

  • Azure Storage Explorer (desktop app)
    • again, not technically necessary, but quite useful

Setup can be done via az commands. Here we setup a batch account with associated storage.

Login

Login with your Azure credentials in a console with admin access.

az login --use-device-code

Copy in the code from the console then go to https://microsoft.com/devicelogin and paste into the browser window to verify your credentials.

You may get a “certificate verify failed” message. If you are on a government network you may have firewall settings that restrict all SSL. You’ll either need to talk to your IT personell or use a personal network.

To test that your login worked, use:

az account show

This will show your account info.

Create a Resource Group

See the Azure docs for details. To use the commands below, enter your values (replacing the <angle brackets and values>)

az group create --name <resource_group_name> --location <location_name>

az storage account create --resource-group <resource_group_name> --name <storage_account_name> --location <location_name> --sku Standard_LRS

az batch account create --name <batch_account_name> --storage-account <storage_account_name> --resource-group <resource_group_name> --location <location_name>

For the –location, you can use any location, but eastus and eastus2 is what DWR uses. The storage account name typically ends in “sa”, and the batch account typically ends in “batch” for ease of navigating resources on Azure.

You can also create the batch account and associated account as explained here.

You may encounter some errors about regional quotas. You’ll want your resource group, storage account, and batch account in the same region/location.

Download azure_dms_batch

Git clone *this* project

git clone https://github.com/CADWRDeltaModeling/azure_dms_batch.git

Navigate to the azure_dms_batch folder created above, and use the environment.yml with conda to create an environment called azure

conda env create -f environment.yml

or if you’re not using conda:

pip install -r requirements.txt

Stay in the azure_dms_batch repo folder and then install using:

conda activate azure
pip install --no-deps -e .

For more information on the azure_dms_batch package, see the README.md file.

Upload Applications

Azure batch requires the setup and installation to happen via zip files that are called application packages. The user should specify these packages with the version names as specified in the template. Here we will refer to the alma87_mvapich2 template.

The app-packages/batch_app_package_and_upload.sh script can be used to upload the packages which you will do in the following steps.

SCHISM

For SCHISM, you’ll need to either compile and zip the executables yourself, or you can refer to the releases page and download the relevant schism_with_deps_*.zip file. For HelloSCHISM and BayDeltaSCHISM tutorials, we’ll refer to the latest schism release, schism_with_deps_5.11.1_alma8.7hpc_v4_mvapich2.zip. You’ll also need the latest alma release, nfs_alma8.7.

Save the .zip file to your local azure_dms_batch repository under azure_dms_batch/app-packages.

Now use app-packages/batch_app_package_and_upload.sh to upload in the command line:

export MY_BATCH_ACCOUNT="<batchaccountname>"
export MY_RG="<resourcegroupname>"

cd azure_dms_batch/app-packages

source batch_app_package_and_upload.sh

package_and_upload_telegraf "telegraf" $MY_BATCH_ACCOUNT $MY_RG
package_and_upload_app schism_with_deps 5.11.1_alma8.7hpc_v4_mvapich2 schism_with_deps_5.11.1_alma8.7hpc_v4_mvapich2.zip  $MY_BATCH_ACCOUNT $MY_RG
package_and_upload_app nfs alma8.7 nfs_alma8.7.zip  $MY_BATCH_ACCOUNT $MY_RG
package_and_upload_batch_setup "../schism_scripts/" $MY_BATCH_ACCOUNT $MY_RG

Python Packages

For python packages like schimpy and BayDeltaSCHISM’s bdschism you can also use the batch_app_package_and_upload.sh script to upload the packages to your batch account.

export MY_BATCH_ACCOUNT="<batchaccountname>"
export MY_RG="<resourcegroupname>"

cd azure_dms_batch/app-packages

source batch_app_package_and_upload.sh

package_and_upload_schimpy $MY_BATCH_ACCOUNT $MY_RG
package_and_upload_bdschism $MY_BATCH_ACCOUNT $MY_RG

The above utility names the package with today’s date and then uploads it and sets it to the default version.

You can check the versions of packages and which is considered the default by going to the online Azure Portal and navigating to your batch account. Once in your batch account, navigate to Features > Applications. There you’ll see th application IDs as well as the default version that’s being used.

If you were to want to upload a development or updated version of any of these packages you could use a similar approach to the package_and_upload techniques, and then use the online portal to specify the default version.

File Transfer Commands

azcopy commands

The basic syntax of azcopy to copy local to Azure is:

azcopy copy "<local_directory>" "<azure_storage_account/blob_container>/?<sas_link>"
and for Azure to local:
azcopy copy "<azure_storage_account/blob_container>/?<sas_link>" "<local_directory>"

But at the Delta Modeling Section we most often use something like:

export AZLINK="https://<storage_account>.blob.core.windows.net/<blob_storage_container/"
export sas="<sas_link>"

azcopy copy "<local_directory>" "${AZLINK}<blob_storage_container>/?${sas}" --exclude-regex="outputs/.\*nc" --recursive --preserve-symlinks --dry-run

where:

  • local_directory
    • whatever local path to your simulation directory you’re uploading

  • storage_account
    • name of your Storage Account through Azure (not the same as the Batch Account name)

  • blob_storage_container
    • folder path to your blob storage container

    • this will look like a folder path (eg: project_name/simulations/)

  • sas_link
    • SAS permissions key (generated each day for security)

azcopy options

These are some of the most frequently used azcopy flag options:

  • –dry-run
    • this is useful to test your command before running

    • this flag prints a list of which files will be copied where without actually uploading/downloading anything

  • –recursive
    • this will copy all files in all subdirectories

  • –preserve-symlinks
    • any symbolic links will be preserved in the upload to the blob container

  • –include-regex
    • use a Regular Expression to limit which files are included in the upload

    • ex: –include-regex=”suisun_(2|3|7)/.*;baseline_6/.*"
      • this would upload all folder contents of
        • suisun_2/

        • suisun_3/

        • suisun_7/

        • baseline_6/

      • The .* string signifies “all contents”

  • –exclude-regex
    • use a Regular Expression to determine which files are excluded in the upload

    • this is particularly useful for things like outputs *.nc files and sflux *.nc files which are very large and costly to upload

    • ex: –exclude-regex=”outputs.*/.*nc;sflux/.*nc”
      • this would exclude any files that end in “nc” that are found in the sflux, outputs, or outputs* folders

References

Python SDK Setup BlobStorage Python Example Azure Batch Python API Azure Batch Python Samples Azure Batch Shipyard

MPI specific Azure Batch MPI Cluster configuration options

Intel MPI Azure settings for Intel MPI Intel MPI Pre-requisites