Loading...

Submitting Jobs at PARAM HIMALAYA facility

  • In general, all computational work can be run in an interactive manner or in batch mode







  • Interactive Jobs:

    You can run an interactive job as follows.

    $ srun --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i


    The above command asks for a single core in one hour with default amount of memory. If you exceed the time or memory limits the job will abort. Note: Please note that PARAM Himalaya is NOT meant for executing interactive jobs. However, for the purpose of quickly ascertaining the successful run of a job before submitting a large job in batch (with large iteration counts), this can be used. This can even be used for running small jobs. The point to be kept in mind is that, since others too would be using this node, it is prudent not to inconvenience them by running large jobs.








    Batch jobs:

    Batch jobs are ideal for computational tasks that require significant processing time and are resource-intensive.SLURM (Simple Linux Utility for Resource Management): is a workload manager that provides a framework for job queues, allocation of compute nodes, and the start and execution of jobs.


    You can run an interactive job as follows.

    1. Script for a Sequential Job
    2. #!/bin/bash
      #SBATCH -N 1 // number of nodes
      #SBATCH --ntasks-per-node=1 // number of cores per node
      #SBATCH --error=job.%J.err // name of output file
      #SBATCH --output=job.%J.out // name of error file
      #SBATCH --time=01:00:00 // time required to execute the program
      #SBATCH --partition=standard // specifies queue name (standard is the
      default partition if you do not specify any partition job will be submitted
      using default partition). For other partitions you can specify hm or gpu
      // To load the module //
      module load compiler/intel/2018.2.199
      cd
      /home/cdac/a.out (Name of the executable).


    3. Script for a Parallel OpenMP Job
    4. #!/bin/bash
      #SBATCH -N 1 // Number of nodes
      #SBATCH --ntasks-per-node=48 // Number of cores per node
      #SBATCH --error=job.%J.err // Name of output file
      #SBATCH --output=job.%J.out // Name of error file
      #SBATCH --time=01:00:00 // Time take to execute the program #SBATCH --
      partition=standard // specifies queue name(standard is the default
      partition if you does not specify any partition job will be submitted using
      default partition) other partitions
      You can specify hm and gpu
      // To load the module //
      module load compiler/intel/2018.4
      cd
      or
      cd $SLURM_SUBMIT_DIR //To run job in the directory from where it is submitted
      export OMP_NUM_THREADS=48 (Depending upon your requirement you can change
      number of threads. If total number of threads per node is more than 48,
      multiple threads will share core(s) and performance may degrade)
      /home/cdac/a.out (Name of the executable)


    5. Script for Parallel Job – MPI (Message Passing Interface)
    6. #!/bin/bash
      #SBATCH -N 16 // Number of nodes
      #SBATCH --ntasks-per-node=48 // Number of cores per node
      #SBATCH --time=06:50:20 // Time required to execute the program
      #SBATCH --job-name=lammps // Name of application
      #SBATCH --error=job.%J.err_16_node_48 // Name of the output file
      #SBATCH --output=job.%J.out_16_node_48 // Name of the error file
      #SBATCH --partition=standard // Partition or queue name
      // To load the module //
      module load compiler/intel/2018.4
      // Below are Intel MPI specific settings //
      export I_MPI_FALLBACK=disable
      export I_MPI_FABRICS=shm:dapl
      export I_MPI_DEBUG=9 // Level of MPI verbosity //
      cd $SLURM_SUBMIT_DIR
      or
      cd /home/manjuv/LAMMPS_2018COMPILER/lammps-22Aug18/bench
      // Command to run the lammps in Parallel //
      time mpiexec.hydra -n $SLURM_NTASKS -genv OMP_NUM_THREADS 1
      /home/manjuv/LAMMPS_2018COMPILER/lammps-22Aug18/src/lmp_intel_cpu_intelmpi
      -in in.lj


    7. Script for Hybrid Parallel Job – (MPI + OpenMP)
    8. #!/bin/sh
      #SBATCH -N 16 // Number of nodes
      #SBATCH --ntasks-per-node=48 // Number of cores for node
      #SBATCH --time=06:50:20 // Time required to execute the program
      #SBATCH --job-name=lammps // Name of application
      #SBATCH --error=job.%J.err_16_node_48 // Name of the output file
      #SBATCH --output=job.%J.out_16_node_48 // Name of the error file
      #SBATCH --partition=standard // Partition or queue name
      cd $SLURM_SUBMIT_DIR
      // To load the module //
      module load compiler/intel/2018.2.199
      // Below are Intel MPI specific settings //
      export I_MPI_FALLBACK=disable
      export I_MPI_FABRICS=shm:dapl
      export I_MPI_DEBUG=9 // Level of MPI verbosity //
      export OMP_NUM_THREADS=24 //Possibly then total no. of MPI ranks will be =
      (total no. of cores, in this case 16 nodes x 48 cores/node) divided by (no.
      of threads per MPI rank i.e. 24)
      // Command to run the lammps in Parallel //
      time mpiexec.hydra -n 32 lammps.exe -in in.lj


    Find the table below for detail of commonly used resource flags in SLURM job script:


    Resource Flags for Job Submission

    Resource Flag Syntax Description
    Partition --partition=partition_name Partition is a queue for jobs.
    Time --time=01:00:00 Time limit for the job (HH:MM:SS format).
    Nodes --nodes=2 Number of compute nodes for the job.
    CPUs/Cores --ntasks-per-node=8 Number of CPU cores (tasks) per compute node.
    Resource Feature --gres=gpu:2 Request use of GPUs on compute nodes (e.g., 2 GPUs).
    Account --account=group-slurmaccount Specifies the user’s group or account for resource allocation.
    Job Name --job-name="lammps" Name of the job (e.g., "lammps").
    Output File --output=lammps.out Name of the file to store the standard output (stdout) of the job.
    Access --exclusive Exclusive access to compute nodes; job allocation cannot share nodes with others.

    For more resource flags detail:- Refer the Param Himalaya manual.