Usage guide

Once you have the master board initialized and one or more slave nodes configured, you can begin using the cluster. This includes functions such as running scripts in parallel, viewing node metrics and network status, or managing the system remotely from the master.

Cluster execution

The system has the Slurm tool, which provides a queuing system for both users and processes to manage execution in the cluster. Below we explain some of the functionalities that this tool offers:

For more information on the Slurm tool and how to use it, you can consult its documentation on its website: https://slurm.schedmd.com/documentation.html.

System monitoring

The system has a monitoring tool that is installed and configured during system initialization so that you can thoroughly monitor the cluster. This tool is Ganglia, and to access the web page that shows the metrics of your cluster you will have to access a browser from the master board.

If you want to know more about the possibilities offered by this tool, you can take a look at its website: http://ganglia.sourceforge.net/.

Remote administration

The system has a couple of scripts for cluster management and maintenance. These scripts allow the execution of commands on all cluster nodes remotely from the master board. In this way, it is possible to consult the status of services or to execute update or restart commands, for example, in a fast and centralized way, without having to connect one by one to all the nodes of the cluster.

For this task, there are two scripts, called 'global_execute_seq' and 'global_execute_par', which execute the command or sequence of commands with pipes that is indicated by parameter on all nodes of the cluster.

Some issues to consider: