Profiling Using HDF5 User Guide

Overview
Administration
Profiling Jobs
HDF5
Data Structure

Overview

The acct_gather_profile/hdf5 plugin allows Slurm to coordinate collecting data on jobs it runs on a cluster that is more detailed than is practical to include in its database. The data comes from periodically sampling various performance data either collected by Slurm, the operating system, or component software. The plugin will record the data from each source as a Time Series and also accumulate totals for each statistic for the job.
acct_gather_profile / hdf5プラグインを使用すると、Slurmは、クラスターで実行するジョブに関するデータの収集を調整して、データベースに含めるのに実際的ではない詳細な情報を得ることができます。データは、Slurm、オペレーティングシステム、またはコンポーネントソフトウェアによって収集されたさまざまなパフォーマンスデータを定期的にサンプリングしたものです。プラグインは、各ソースからのデータを時系列として記録し、ジョブの各統計の合計も累積します。

Time Series are energy data collected by an acct_gather_energy plugin, I/O data from a network interface collected by an acct_gather_interconnect plugin, I/O data from parallel file systems such as Lustre collected by an acct_gather_filesystem plugin, and task performance data such as local disk I/O, cpu consumption, and memory use from a jobacct_gather plugin. Data from other sources may be added in the future.
時系列は、acct_gather_energyプラグインによって収集されたエネルギーデータ、acct_gather_interconnectプラグインによって収集されたネットワークインターフェイスからのI / Oデータ、acct_gather_filesystemプラグインによって収集されたLusterなどの並列ファイルシステムからのI / Oデータ、およびローカルディスクなどのタスクパフォーマンスデータです。 jobacct_gatherプラグインからのI / O、CPU消費、メモリ使用量。他のソースからのデータは将来追加される可能性があります。

The data is collected into a file on a shared file system for each step on each allocated node of a job and then merged into an HDF5 file. Individual files on a shared file system was chosen because it is possible that the data is voluminous so solutions that pass data to the Slurm control daemon via RPC may not scale to very large clusters or jobs with many allocated nodes.
データは、ジョブに割り当てられた各ノードのステップごとに、共有ファイルシステム上のファイルに収集され、HDF5ファイルにマージされます。RPCを介してSlurm制御デーモンにデータを渡すソリューションが非常に大規模なクラスターまたは多くの割り当てられたノードを持つジョブに拡張されない可能性があるため、データが膨大になる可能性があるため、共有ファイルシステム上の個々のファイルが選択されました。

A separate Slurm Profile Accounting Plugin API (AcctGatherProfileType) documents how to write other Profile Accounting plugins.
別のSlurmプロファイルアカウンティングプラグインAPI（AcctGatherProfileType）は、他のプロファイルアカウンティングプラグインの記述方法を文書化しています。

Administration

Shared File System

The HDF5 Profile Plugin requires a common shared file system on all the compute nodes. While a job is running, the plugin writes a file into this file system for each step of the job on each node. When the job ends, the merge process is launched and the node-step files are combined into one HDF5 file for the job.
HDF5プロファイルプラグインでは、すべての計算ノードに共通の共有ファイルシステムが必要です。ジョブの実行中、プラグインは各ノードのジョブの各ステップでこのファイルシステムにファイルを書き込みます。ジョブが終了すると、マージプロセスが開始され、ノードステップファイルがジョブの1つのHDF5ファイルに結合されます。

The root of the directory structure is declared in the ProfileHDF5Dir option in the acct_gather.conf file. The directory will be created by Slurm if it doesn't exist. Each user will have their own directory created in the ProfileHDF5Dir which contains the HDF5 files. All the directories and files are created by the SlurmdUser which is usually root. The user specific directories, as well as the files inside, are chowned to the user running the job so they can access the files. Since user root is usually creating these files/directories a root squashed file system will not work for the ProfileHDF5Dir.
ディレクトリ構造のルートは、acct_gather.confファイルのProfileHDF5Dirオプションで宣言されています。ディレクトリが存在しない場合、Slurmによって作成されます。各ユーザーは、HDF5ファイルを含むProfileHDF5Dirに独自のディレクトリを作成します。すべてのディレクトリとファイルは、通常ルートであるSlurmdUserによって作成されます。ユーザー固有のディレクトリと内部のファイルは、ジョブを実行しているユーザーに割り当てられ、ファイルにアクセスできます。ユーザーrootは通常これらのファイル/ディレクトリを作成しているため、ルートスカッシュファイルシステムはProfileHDF5Dirでは機能しません。

Each user that creates a profile will have a subdirectory in the profile directory that has read/write permission only for the user.
プロファイルを作成する各ユーザーは、そのユーザーに対してのみ読み取り/書き込み権限を持つプロファイルディレクトリにサブディレクトリを持ちます。

Configuration parameters

The profile plugin is enabled in the slurm.conf file and it is internally configured in the acct_gather.conf file.
プロファイルプラグインはslurm.confファイルで有効にされ、acct_gather.confファイルで内部的に設定されます。

slurm.conf parameters

This enables the HDF5 plugin:
これにより、HDF5プラグインが有効になります。

AcctGatherProfileType = acct_gather_profile/hdf5

This sets the sampling frequency for data types:
これにより、データ型のサンプリング頻度が設定されます。

JobAcctGatherFrequency = <seconds>

acct_gather.conf parameters

These parameters are directly used by the HDF5 Profile Plugin.
これらのパラメーターは、HDF5プロファイルプラグインによって直接使用されます。

ProfileHDF5Dir = <path>
ProfileHDF5Default = [options]

Time Series Control Parameters

Other plugins add time series data to the HDF5 collection. They typically have a default polling frequency specified in slurm.conf in the JobAcctGatherFrequency parameter. The polling frequency can be overridden using the --acctg-freq srun parameter. They are both of the form task=sec,energy=sec,filesystem=sec,network=sec.
他のプラグインは、時系列データをHDF5コレクションに追加します。通常、デフォルトのポーリング頻度は、JobAcctGatherFrequencyパラメーターのslurm.confで指定されています。--acctg-freq srunパラメータを使用して、ポーリング頻度を上書きできます。どちらもtask = sec、energy = sec、filesystem = sec、network = secの形式です。

The IPMI energy plugin also needs the EnergyIPMIFrequency value set in the acct_gather.conf file. This sets the rate at which the plugin samples the external sensors. This value should be the same as the energy=sec in either JobAcctGatherFrequency or --acctg-freq.
IPMIエネルギープラグインには、acct_gather.confファイルで設定されたEnergyIPMIFrequency値も必要です。これにより、プラグインが外部センサーをサンプリングするレートが設定されます。この値は、JobAcctGatherFrequencyまたは--acctg-freqのエネルギー=秒と同じである必要があります。

Note that the IPMI and profile sampling are not synchronous. The profile sample simply takes the last available IPMI sample value. If the profile energy sample is more frequent than the IPMI sample rate, the IPMI value will be repeated. If the profile energy sample is greater than the IPMI rate, IPMI values will be lost.
IPMIとプロファイルのサンプリングは同期していないことに注意してください。プロファイルサンプルは、最後に使用可能なIPMIサンプル値を取得するだけです。プロファイルエネルギーサンプルがIPMIサンプルレートより頻繁である場合、IPMI値が繰り返されます。プロファイルエネルギーサンプルがIPMIレートよりも大きい場合、IPMI値は失われます。

Also note that smallest effective IPMI (EnergyIPMIFrequency) sample rate for 2013 era Intel processors is 3 seconds.
また、2013年のインテルプロセッサーの最小有効IPMI（EnergyIPMIFrequency）サンプルレートは3秒です。

Profiling Jobs

Data Collection

The --profile option on salloc|sbatch|srun controls whether data is collected and what type of data is collected. If --profile is not specified no data collected unless the ProfileHDF5Default option is used in acct_gather.conf. --profile on the command line overrides any value specified in the configuration file.
salloc | sbatch | srunの--profileオプションは、データを収集するかどうか、および収集するデータのタイプを制御します。--profileが指定されていない場合、acct_gather.confでProfileHDF5Defaultオプションが使用されていない限り、データは収集されません。コマンドラインの--profileは、構成ファイルで指定された値を上書きします。

enables detailed data collection by the acct_gather_profile plugin. Detailed data are typically time-series that are stored in a HDF5 file for the job.
acct_gather_profileプラグインによる詳細なデータ収集を有効にします。詳細データは通常、ジョブのHDF5ファイルに保存される時系列です。

All: All data types are collected. (Cannot be combined with other values.)
すべてのデータ型が収集されます。（他の値と組み合わせることはできません。）
None: No data types are collected. This is the default. (Cannot be combined with other values.)
データタイプは収集されません。これがデフォルトです。（他の値と組み合わせることはできません。）
Energy: Energy data is collected.
エネルギーデータが収集されます。
Filesystem: Filesystem data is collected. Currently only Lustre filesystem is supported.
ファイルシステムのデータが収集されます。現在、Lustreファイルシステムのみがサポートされています。
Network: Network (InfiniBand) data is collected.
ネットワーク（InfiniBand）データが収集されます。
Task: Task (I/O, Memory, ...) data is collected.
タスク（I / O、メモリ、...）データが収集されます。

Data Consolidation

The node-step files are merged into one HDF5 file for the job using the sh5util.
ノードステップファイルは、sh5utilを使用するジョブの1つのHDF5ファイルにマージされます。

If the job is started with sbatch, the command line may added to the normal launch script, For example:
ジョブがsbatchで開始された場合、コマンドラインは通常の起動スクリプトに追加されます。次に例を示します。

sbatch -n1 -d$SLURM_JOB_ID --wrap="sh5util -j $SLURM_JOB_ID"

Data Extraction

The sh5util program can also be used to extract specific data from the HDF5 file and write it in comma separated value (csv) form for importation into other analysis tools such as spreadsheets.
sh5utilプログラムを使用して、HDF5ファイルから特定のデータを抽出し、それをコンマ区切り値（csv）形式で書き込んで、スプレッドシートなどの他の分析ツールにインポートすることもできます。

HDF5

HDF5 is a well known structured data set that allows heterogeneous but related data to be stored in one file. (.i.e. sections for energy statistics, network I/O, Task data, etc.) Its internal structure resembles a file system with groups being similar to directories and data sets being similar to files. It also allows attributes to be attached to groups to store application defined properties.
HDF5はよく知られた構造化データセットであり、異種の関連データを1つのファイルに格納できます。（つまり、エネルギー統計、ネットワークI / O、タスクデータなどのセクション）。その内部構造はファイルシステムに似ており、グループはディレクトリに、データセットはファイルに似ています。また、属性をグループに添付して、アプリケーション定義のプロパティを格納することもできます。

There are commodity programs, notably HDFView, for viewing and manipulating these files.
これらのファイルを表示および操作するための汎用プログラム、特にHDFViewがあります。

Below is a screen shot from HDFView expanding the job tree and showing the attributes for a specific task.
以下は、HDFViewのジョブツリーを展開し、特定のタスクの属性を示すスクリーンショットです。

Data Structure

In the job file, there will be a group for each step of the job. Within each step, there will be a group for nodes, and a group for tasks.
ジョブファイルには、ジョブのステップごとにグループがあります。各ステップには、ノードのグループとタスクのグループがあります。

The nodes group will have a group for each node in the step allocation. For each node group, there is a sub-group for Time Series and another for Totals.
ノードグループには、ステップ割り当ての各ノードのグループがあります。各ノードグループには、時系列のサブグループと合計のサブグループがあります。
- The Time Series group contains a group/dataset containing the time series for each collector.
  時系列グループには、各コレクターの時系列を含むグループ/データセットが含まれています。
- The Totals group contains a group/dataset that has corresponding Minimum, Average, Maximum, and Sum Total for each item in the time series.
  合計グループには、時系列の各アイテムに対応する最小、平均、最大、および合計のあるグループ/データセットが含まれます。
The Tasks group will only contain a subgroup for each task. It primarily contains an attribute stating the node on which the task was executed. This set of groups is essentially a cross reference table.
タスクグループには、各タスクのサブグループのみが含まれます。これには主に、タスクが実行されたノードを示す属性が含まれています。このグループのセットは、基本的に相互参照表です。

Energy Data

AcctGatherEnergyType=acct_gather_energy/ipmi

is required in slurm.conf to collect energy data. Appropriately set energy=freq in either JobAcctGatherFrequency in slurm.conf or in --acctg-freq on the command line. Also appropriately set EnergyIPMIFrequency in acct_gather.conf.
エネルギーデータを収集するには、slurm.confでAcctGatherEnergyType = acct_gather_energy / ipmiisが必要です。slurm.confのJobAcctGatherFrequencyまたはコマンドラインの--acctg-freqで、energy = freqを適切に設定します。また、acct_gather.confでEnergyIPMIFrequencyを適切に設定します。

Each data sample in the Energe Time Series contains the following data items.
Energe時系列の各データサンプルには、次のデータ項目が含まれています。

Date Time: Time of day at which the data sample was taken. This can be used to correlate activity with other sources such as logs.
データサンプルが取得された時刻。これは、アクティビティをログなどの他のソースと関連付けるために使用できます。
Time: Elapsed time since the beginning of the step.
ステップの開始からの経過時間。
Power: Power consumption during the interval.
インターバル中の電力消費。
CPU Frequency: CPU Frequency at time of sample in kilohertz.
サンプリング時のCPU周波数（キロヘルツ）。

Filesystem Data

AcctGatherFilesystemType=acct_gather_filesystem/lustre

is required in slurm.conf to collect task data. Appropriately set Filesystem=freq in either JobAcctGatherFrequency in slurm.conf or in --acctg-freq on the command line.
タスクデータを収集するには、slurm.confでAcctGatherFilesystemType = acct_gather_filesystem / lustreisが必要です。slurm.confのJobAcctGatherFrequencyまたはコマンドラインの--acctg-freqでFilesystem = freqを適切に設定します。

Each data sample in the Filesystem Time Series contains the following data items.
ファイルシステム時系列の各データサンプルには、次のデータ項目が含まれています。

Date Time: Time of day at which the data sample was taken. This can be used to correlate activity with other sources such as logs.
データサンプルが取得された時刻。これは、アクティビティをログなどの他のソースと関連付けるために使用できます。
Time: Elapsed time since the beginning of the step.
ステップの開始からの経過時間。
Reads: Number of read operations.
読み取り操作の数。
Megabytes Read: Number of megabytes read.
読み取られたメガバイト数。
Writes: Number of write operations.
書き込み操作の数。
Megabytes Write: Number of megabytes written.
書き込まれたメガバイト数。

Network (Infiniband Data)

JobAcctInfinibandType=acct_gather_interconnect/ofed

is required in slurm.conf to collect task data. Appropriately set network=freq in either JobAcctGatherFrequency in slurm.conf or in --acctg-freq on the command line.
タスクデータを収集するには、slurm.confにJobAcctInfinibandType = acct_gather_interconnect / ofedisが必要です。slurm.confのJobAcctGatherFrequencyまたはコマンドラインの--acctg-freqでnetwork = freqを適切に設定します。

Each data sample in the Network Time Series contains the following data items.
ネットワーク時系列の各データサンプルには、次のデータ項目が含まれています。

Date Time: Time of day at which the data sample was taken. This can be used to correlate activity with other sources such as logs.
データサンプルが取得された時刻。これは、アクティビティをログなどの他のソースと関連付けるために使用できます。
Time: Elapsed time since the beginning of the step.
ステップの開始からの経過時間。
Packets In: Number of packets coming in.
入ってくるパケットの数。
Megabytes Read: Number of megabytes coming in through the interface.
インターフェースから入ってくるメガバイト数。
Packets Out: Number of packets going out.
発信するパケットの数。
Megabytes Write: Number of megabytes going out through the interface.
インターフェースを介して出て行くメガバイト数。

Task Data

JobAcctGatherType=jobacct_gather/linux

is required in slurm.conf to collect task data. Appropriately set task=freq in either JobAcctGatherFrequency in slurm.conf or in --acctg-freq on the command line.
タスクデータを収集するには、slurm.confにJobAcctGatherType = jobacct_gather / linuxが必要です。slurm.confのJobAcctGatherFrequencyまたはコマンドラインの--acctg-freqでtask = freqを適切に設定します。

Each data sample in the Task Time Series contains the following data items.
タスク時系列の各データサンプルには、次のデータアイテムが含まれています。

Date Time: Time of day at which the data sample was taken. This can be used to correlate activity with other sources such as logs.
データサンプルが取得された時刻。これは、アクティビティをログなどの他のソースと関連付けるために使用できます。
Time: Elapsed time since the beginning of the step.
ステップの開始からの経過時間。
CPU Frequency: CPU Frequency at time of sample.
サンプル時のCPU周波数。
CPU Time: Seconds of CPU time used during the sample.
サンプル中に使用されたCPU時間の秒。
CPU Utilization: CPU Utilization during the interval.
インターバル中のCPU使用率。
RSS: Value of RSS at time of sample.
サンプル時のRSSの値。
VM Size: Value of VM Size at time of sample.
サンプル時のVMサイズの値。
Pages: Pages used in sample.
サンプルで使用されているページ。
Read Megabytes: Number of megabytes read from local disk.
ローカルディスクから読み取られたメガバイト数。
Write Megabytes: Number of megabytes written to local disk.
ローカルディスクに書き込まれたメガバイト数。

Last modified 30 January 2020

Slurm Workload Manager