Friday, August 28, 2020

How to install Mellanox OFED IB Drivers in CentOS Linux 7 (unattended)

Title: Unattended Mellanox Infiniband OFED Installation - Tutorial

OS: CentOS Linux release 7.8.2003 (Core)

Kernel Version: 3.10.0-1127.13.1.el7.x86_64

Compiler: gcc v4.8.5

Software: Mellanox OFED

Software Version: 5.0-2.1.8.0

User Account: Use ‘sudo’ (recommended)

Reboot Required: Yes (recommended)

·        Create a directory ‘/tools/apps/sources/mellanox/ofed/5.0-2.1.8.0’ (Please create directory according to your requirements)

o   mkdir -p /tools/apps/sources/mellanox/ofed/5.0-2.1.8.0

o   export MLNX_ROOT_DIR=/tools/apps/sources/mellanox/ofed/5.0-2.1.8.0 

·        Download the latest Mellanox OFED .tgz file for your OS distribution and architecture from Mellanox website (https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) and place it into the directory created in the previous step. In my case the downloaded file is ‘MLNX_OFED_LINUX-5.0-2.1.8.0-rhel7.8-x86_64.tgz’.

·        Navigate into the directory and extract the tarball.

o   cd ${MLNX_ROOT_DIR}

o   tar xvfz MLNX_OFED_LINUX-5.0-2.1.8.0-rhel7.8-x86_64.tgz 

·        Navigate into ‘MLNX_OFED_LINUX-5.0-2.1.8.0-rhel7.8-x86_64’ directory

o   export MLNX_SOURCE=MLNX_OFED_LINUX-5.0-2.1.8.0-rhel7.8-x86_64

o   cd ${MLNX_ROOT_DIR}/${MLNX_SOURCE} 

(If your kernel version is not listed in ‘.supported_kernels’ file then you will have to add support for your kernel. To accomplish this, please follow the steps under ‘Troubleshooting’ section below)

·        Mellanox package comes with some sample configuration files which can be used to install mellanox OFED unattended. Configuration samples can be found in ‘${MLNX_ROOT_DIR}/${MLNX_SOURCE}/docs/conf’ directory i.e ofed-basic.conf, ofed-hpc.conf and ofed-all.conf. 

·        Install the necessary packages required for Mellanox OFED installations

o   yum install tcl tk 

·        To install Mellanox OFED stack run the following script with the necessary flags for unattended installation. You can use configuration file according to your requirements.

o   ./mlnxofedinstall –c docs/conf/ofed-hpc.conf --force

·        After the above command, the installation will be completed. Restart the system so that it may load the device drivers on startup.

How to start infiniband related services?

Once the installation has been performed, it installs two main services:

·        Openibd: Infiniband related drivers

·        Opensmd: Subnet manager

An instance of a ‘Subnet Manager’ is required on an infiniband fabric (network) for machines to communicate. This service can either be started on Infiniband switches (subject to having the ‘Subnet Manager’ support on the switches) or on servers connected to the same infiniband fabric.

openibd’ service needs to be running on all servers whereas ‘opensmd’ service can only be started on a single server/switch or on multiple core servers/switches.

·        Start and enable ‘openibd’ service

o   systemctl start openibd

o   systemctl enable openibd 

·        Start and enable ‘opensmd’ service

o   systemctl start opensmd

o   systemctl enable opensmd

 Troubleshooting:

Adding Additional Kernel Support

Sometimes a downloaded .tgz file doesn’t come with the RPMs which support the latest kernel. To add support for a different kernel follow these steps:

o   Navigate into the Mellanox OFED source directory and run these commands

      •   cd MLNX_OFED_LINUX-5.0-2.1.8.0-rhel7.8-x86_64
      •   yum install python-devel
      •   ./mlnx_add_kernel_support.sh -m . -make-tgz

The above command will re-compile Mellanox OFED stack and will create a ‘tgz’ file in ‘/tmp’ directory. This will be the tarball which can be used to install the OFED stack on machines with a specific kernel.

 

Wednesday, August 26, 2020

Welcome to my blog 'howtohpc'!

I have been thinking for a long time that I need to give a better push to this world. Since I have a solid background in High Performance Computing (HPC), here in my blog I will be posting about different “how to’s” about this lot to make life easier for many beginners and professionals. This will include compilations, deployments, configurations and troubleshooting of different software and much much more…

If you want to find out more about me please have a look at my personal profile here.

Please feel free to follow me on LinkedIn, Instagram, twitter and Facebook.

Personal Profile

I'm sure this is general rubbish but you may find the below listed details somewhat useful 👱

  • Education: Master’s Degree in Computer Sciences
  • Nationality: British
  • HPC Experience: +14 Yrs
  • OS: Redhat, CentOS, Fedora, SuSe, Debian, Ubuntu and MS Windows
  • Resource Managers: SLURM, LSF, GE, MOAB and Torque
  • Cluster Management Tools: xCAT, IBM Platform HPC & Bright Cluster Manager
  • Software: Ansys Fluent, Ansys RSM, StarCCM+, DL_POLY, Telemac, OpenFOAM and Cadence
  • Benchmarking: HPCC, Linpack (HPL), Intel IMB, b_eff, Gromacs, CASTEP
  • MPIs: OpenMPI, Intel MPI, IBM Platform MPI, Mvapich, Mvapich2
  • Compilers & Libraries: gcc, icc, ifort, icpp, intel MKL, acml, BLAS
  • Programming: C, C++, Bash scripting, Visual Basic
  • Applications: DHCP, HTTPS, Bind, Firewalls, LDAP, SAMBA, IPMI, Active Directory
  • Networks: Ethernet, Infiniband Fabrics (Mellanox & Intel OPA)
  • Switch Configurations: Design Topologies, VLANs, LACP, VLAG
  • Accreditations: RHCSA & RHCE


How to install and configure xCAT (for beginners)

Are you fed-up of manual server installation? Are you not a fan of managing your cluster using DVDs and USB sticks anymore? Are you afra...