Slurm User and Administrator Guide for Cray Systems Natively

User Guide
Cray Specific Features
Administrator Guide
Cray System Setup
High Availability
Cray

User Guide

This document describes the unique features of Slurm on Cray XC computers natively, or without the use of Cray's Application Level Placement Scheduler (ALPS). You should be familiar with the Slurm's mode of operation on Linux clusters before studying the differences in Cray system operation described in this document. When running Slurm in native mode a Cray system will function very similar to a Linux cluster.
このドキュメントでは、Cray XCコンピュータ上のSlurmの固有の機能について、またはCrayのアプリケーションレベルの配置スケジューラ（ALPS）を使用せずに説明します。このドキュメントで説明されているCrayシステムの動作の違いを学ぶ前に、LinuxクラスタでのSlurmの動作モードに精通している必要があります。Slurmをネイティブモードで実行する場合、CrayシステムはLinuxクラスターと非常によく似た機能を果たします。

Slurm is designed to operate as a workload manager on Cray XC systems (Cascade) without the use of ALPS. In addition to providing the same look and feel of a regular Linux cluster this also allows for many functionalities such as:
Slurmは、ALPSを使用せずにCray XCシステム（Cascade）のワークロードマネージャーとして動作するように設計されています。通常のLinuxクラスターと同じルックアンドフィールを提供することに加えて、これは次のような多くの機能も可能にします。

Ability to run multiple jobs per node
ノードごとに複数のジョブを実行する機能
Ability to status running jobs with sstat
sstatで実行中のジョブのステータスを確認する機能
Full accounting support for job steps
ジョブステップの完全な会計サポート
Ability to run multiple jobs/steps in background from the same session
同じセッションからバックグラウンドで複数のジョブ/ステップを実行する機能

Cray Specific Features

Network Performance Counters

To access Cray's Network Performance Counters (NPC) you can use the --network option in sbatch/salloc/srun to request them. There are 2 different types of counters, system and blade.
Crayのネットワークパフォーマンスカウンター（NPC）にアクセスするには、sbatch / salloc / srunの--networkオプションを使用して要求します。カウンターには、システムとブレードの2種類があります。

For the system option (--network=system) only one job can use system at a time. Only nodes requested will be marked in use for the job allocation. If the job does not fill up the entire system the rest of the nodes are not able to be used by other jobs using NPC, if idle their state will appear as PerfCnts. These nodes are still available for other jobs not using NPC.
システムオプション（--network = system）の場合、一度に1つのジョブのみがシステムを使用できます。要求されたノードのみがジョブ割り当てに使用中としてマークされます。ジョブがシステム全体を満たさない場合、残りのノードは、NPCを使用する他のジョブで使用できません。アイドルの場合、その状態はPerfCntsとして表示されます。これらのノードは、NPCを使用していない他のジョブで引き続き使用できます。

For the blade option (--network=blade) Only nodes requested will be marked in use for the job allocation. If the job does not fill up the entire blade(s) allocated to the job those blade(s) are not able to be used by other jobs using NPC, if idle their state will appear as PerfCnts. These nodes are still available for other jobs not using NPC.
ブレードオプション（--network = blade）の場合、要求されたノードのみがジョブ割り当てに使用中としてマークされます。ジョブがジョブに割り当てられたブレード全体を満たさない場合、それらのブレードはNPCを使用する他のジョブで使用できません。アイドル状態の場合、その状態はPerfCntsとして表示されます。これらのノードは、NPCを使用していない他のジョブで引き続き使用できます。

Core Specialization

To use set CoreSpecPlugin=core_spec/cray_aries. Ability to reserve a number of cores allocated to the job for system operations and not used by the application. The application will not use these cores, but will be charged for their allocation.
set CoreSpecPlugin = core_spec / cray_ariesを使用するには、システム操作用にジョブに割り当てられ、アプリケーションでは使用されない多数のコアを予約する機能。アプリケーションはこれらのコアを使用しませんが、それらの割り当てに対して課金されます。

Admin Guide

Many new plugins were added to utilize the Cray system without ALPS. These should be set up in your slurm.conf outside of your normal configuration.
ALPSなしでCrayシステムを利用するために、多くの新しいプラグインが追加されました。これらは、通常の構成とは別に、slurm.confで設定する必要があります。

BurstBuffer

Set BurstBufferPlugins=burst_buffer/datawarp to use. The burst buffer capability on Cray systems is also known by the name DataWarp. For more information, see Slurm Burst Buffer Guide.
使用するBurstBufferPlugins = burst_buffer / datawarpを設定します。Crayシステムのバーストバッファ機能は、DataWarpという名前でも知られています。詳細については、「Slurm Burst Buffer Guide」を参照してください。

CoreSpec

To use set CoreSpecPlugin=core_spec/cray_aries.
set CoreSpecPlugin = core_spec / cray_ariesを使用するには、

JobSubmit

Set JobSubmitPlugins=job_submit/cray_aries to use. This plugin is primarily used to set a gres=craynetwork value which is used to limit the number of applications that can run on a node at once. For a node without MICs on it that number at most is 4. Nodes with MICs the number drops to 2. This craynetwork gres needs to be set up in your slurm.conf to ensure proper functionality. In example...
JobSubmitPlugins = job_submit / cray_ariesを使用するように設定します。このプラグインは主に、ノード上で同時に実行できるアプリケーションの数を制限するために使用されるgres = craynetwork値を設定するために使用されます。MICのないノードの場合、その数は最大で4です。MICのあるノードの数は2に減少します。適切な機能を確保するには、このcraynetwork gresをslurm.confに設定する必要があります。例では...

    ...
    Grestypes=craynetwork
    NodeName=nid000[00-10] gres=craynetwork:4 #node without MIC
    NodeName=nid000[11-20] gres=craynetwork:2 #node with MIC
    ...

Power

Set PowerPlugin=power/cray_aries to use. PowerParameters is also typically configured. For more information, see Slurm Power Management Guide.
使用するPowerPlugin = power / cray_ariesを設定します。PowerParametersも通常は構成されます。詳細については、「Slurm Power Management Guide」を参照してください。

Proctrack

Set ProctrackType=proctrack/cray_aries to use.
使用するProctrackType = proctrack / cray_ariesを設定します。

Select

Set SelectType=select/cray_aries to use. This plugin is a layered plugin. Which means it enhances a lower layer select plugin. By default it is layered on top of the select/linear plugin. It can also be layered on top of the select/cons_res plugin by using the SelectTypeParameters=other_cons_res, doing this will allow you to run multiple jobs on a Cray node just like on a normal Linux cluster. Use additional SelectTypeParameters to identify the resources to allocate (e.g. cores, sockets, memory, etc.). See the slurm.conf man page for details.
使用するSelectType = select / cray_ariesを設定します。このプラグインは階層化プラグインです。つまり、下位層の選択プラグインが強化されます。デフォルトでは、select / linearプラグインの上に階層化されています。また、SelectTypeParameters = other_cons_resを使用して、select / cons_resプラグインの上に階層化することもできます。これにより、通常のLinuxクラスターと同じように、Crayノードで複数のジョブを実行できます。追加のSelectTypeParametersを使用して、割り当てるリソース（コア、ソケット、メモリなど）を識別します。詳細については、slurm.confのmanページを参照してください。

SlurmctldPort, SlurmdPort, SrunPortRange

Realm-Specific IP Addressing (RSIP) will automatically try to interact with anything opened on ports 8192 to 60000. Configure SlurmctldPort, SlurmdPort, and SrunPortRange to use ports above 60000. In the case of SrunPortRange, making 1000 or more ports available is recommended.
レルム固有のIPアドレッシング（RSIP）は、ポート8192から60000で開いているものと自動的に対話しようとします。60000を超えるポートを使用するようにSlurmctldPort、SlurmdPort、およびSrunPortRangeを構成します。SrunPortRangeの場合、1000以上のポートを使用可能にすることをお勧めします。

Switch

Set SwitchType=switch/cray_aries to use.
使用するSwitchType = switch / cray_ariesを設定します。

Task

Set TaskPlugin=cray_aries,cgroup to use. Use of the task/cgroup plugin is required alongside task/cray_aries. You may also use the task/affinity plugin along with task/cray_aries,task/cgroup if desired (i.e. TaskPlugin=cray_aries,affinity,cgroup). Note that plugins are used in the order they are defined in the comma separated list, and that task/cray_aries must be listed before task/cgroup due to internal dependencies between the two plugins.
TaskPlugin = cray_aries、cgroupを使用するように設定します。task / cray_ariesと一緒にtask / cgroupプラグインを使用する必要があります。必要に応じて、task / affinityプラグインをtask / cray_aries、task / cgroupと一緒に使用することもできます（つまり、TaskPlugin = cray_aries、affinity、cgroup）。プラグインは、コンマ区切りのリストで定義されている順序で使用され、2つのプラグイン間の内部依存関係のため、task / cray_ariesはtask / cgroupの前にリストする必要があることに注意してください。

Cray system setup

Some Slurm plugins (burst_buffer/datawarp and power/cray_aries) plugins parse JSON format data. These plugins are designed to make use of the JSON-C library for this purpose. See JSON-C installation instructions for details.
一部のSlurmプラグイン（burst_buffer / datawarpおよびpower / cray_aries）プラグインは、JSON形式のデータを解析します。これらのプラグインは、この目的でJSON-Cライブラリを使用するように設計されています。詳細については、JSON-Cのインストール手順を参照してください。

Some services on the system need to be set up to run correctly with Slurm. Below is how to restart the service and the nodes they run on. It is probably a good idea to set this up to happen automatically.
システム上の一部のサービスは、Slurmで正しく実行するように設定する必要があります。以下は、サービスとそれらが実行されるノードを再起動する方法です。これを自動的に行われるように設定することはおそらく良い考えです。

boot node

WLM_DETECT_ACTIVE=SLURM /etc/init.d/aeld restart
WLM_DETECT_ACTIVE = SLURM /etc/init.d/aeld restart

sdb node

WLM_DETECT_ACTIVE=SLURM /etc/init.d/ncmd restart
WLM_DETECT_ACTIVE = SLURM /etc/init.d/ncmd restart
WLM_DETECT_ACTIVE=SLURM /etc/init.d/apptermd restart
WLM_DETECT_ACTIVE = SLURM /etc/init.d/apptermd restart

As with Linux clusters you will need to start a slurmd on each of your compute nodes. If you choose to use munge authentication, advised, you will also need munge installed and a munged running on each of your compute nodes as well. See the quick start guide for more info. Outside of the differences listed in this file it can be used to set up your Cray system to run Slurm natively.
Linuxクラスターの場合と同様に、各計算ノードでslurmdを開始する必要があります。munge認証の使用を選択する場合は、アドバイスとして、mungeをインストールし、各計算ノードでmungeを実行する必要もあります。詳細については、クイックスタートガイドをご覧ください。このファイルにリストされている違い以外に、Slurmをネイティブで実行するようにCrayシステムをセットアップするために使用できます。

On larger systems, you may wish to set the PMI_MMAP_SYNC_WAIT_TIME environment variable in your users' profiles to a larger value than the default (180 seconds) to prevent PMI from falsely detecting job launch failures.
大規模なシステムでは、ユーザーのプロファイルのPMI_MMAP_SYNC_WAIT_TIME環境変数をデフォルト（180秒）よりも大きい値に設定して、PMIがジョブの起動の失敗を誤って検出しないようにすることができます。

High Availability

A backup controller can be setup in or outside the Cray. However, when the backup is within the Cray, both the primary and the backup controllers will go down when the Cray is rebooted. It is best to setup the backup controller on a Cray external node so that the controller can still receive new jobs when the Cray is down. When the backup is configured on an external node the no_backup_scheduling SchedulerParameter should be specified in the slurm.conf. This allows new jobs to be submitted while the Cray is down and prevents any new jobs from being started.
バックアップコントローラーはクレイの内外に設定できます。ただし、バックアップがCray内にある場合、Crayを再起動すると、プライマリコントローラーとバックアップコントローラーの両方がダウンします。Crayがダウンしているときでもコントローラーが新しいジョブを受信できるように、Cray外部ノードにバックアップコントローラーをセットアップすることをお勧めします。外部ノードでバックアップが構成されている場合、slurm.confでno_backup_scheduling SchedulerParameterを指定する必要があります。これにより、Crayがダウンしているときに新しいジョブを送信でき、新しいジョブが開始されなくなります。

Last modified 7 March 2019