Tuning Lustre: Difference between revisions
No edit summary |
(Marked this version for translation) |
||
Line 1: | Line 1: | ||
<languages /> | <languages /> | ||
<translate> | <translate> | ||
= Lustre Filesystem = | = Lustre Filesystem = <!--T:1--> | ||
<!--T:2--> | |||
''Lustre'' is a high performance distributed filesystem which allows users of Compute Canada to reach high bandwidth for input/output operations. There are however some caveats to consider if one wants to achieve the best performance. | ''Lustre'' is a high performance distributed filesystem which allows users of Compute Canada to reach high bandwidth for input/output operations. There are however some caveats to consider if one wants to achieve the best performance. | ||
== Stripe Count and Stripe Size == | == Stripe Count and Stripe Size == <!--T:3--> | ||
<!--T:4--> | |||
For each file or directory, it is possible change the stripe size and stripe count parameters. Stripe size is the size of the smallest block of data that is allocated on the filesystem. Stripe count is the number of disks on which the data are spread. | For each file or directory, it is possible change the stripe size and stripe count parameters. Stripe size is the size of the smallest block of data that is allocated on the filesystem. Stripe count is the number of disks on which the data are spread. | ||
<!--T:5--> | |||
It is possible to get the value of those parameters for a given file or directory using the command | It is possible to get the value of those parameters for a given file or directory using the command | ||
{{Command|lfs getstripe ''path/to/file''}} | {{Command|lfs getstripe ''path/to/file''}} | ||
<!--T:6--> | |||
It is also possible to change those parameters for a given directory using the command | It is also possible to change those parameters for a given directory using the command | ||
{{Command|lfs setstripe -c ''count'' -s ''size'' ''/path/to/dir''}} | {{Command|lfs setstripe -c ''count'' -s ''size'' ''/path/to/dir''}} | ||
<!--T:7--> | |||
For example, if ''count''=8 and ''size''=4m, then the files will be spread on 8 disks and will grow by steps of 4 MB each time that new space is required. | For example, if ''count''=8 and ''size''=4m, then the files will be spread on 8 disks and will grow by steps of 4 MB each time that new space is required. | ||
<!--T:8--> | |||
It is not possible to change the stripe count or the stripe size of an existing file. To change those parameters, the file must be '''copied''' (not moved) to a directory with different parameters. To create an empty file with a given value of those parameters without changing the parameters of the directory, you may run ''lfs setstripe'' on the name of the file to be created. The file will be created as an empty file with the given parameters. | It is not possible to change the stripe count or the stripe size of an existing file. To change those parameters, the file must be '''copied''' (not moved) to a directory with different parameters. To create an empty file with a given value of those parameters without changing the parameters of the directory, you may run ''lfs setstripe'' on the name of the file to be created. The file will be created as an empty file with the given parameters. | ||
<!--T:9--> | |||
Increasing the stripe count may improve performances, but also makes this file more susceptible to hardware failures. | Increasing the stripe count may improve performances, but also makes this file more susceptible to hardware failures. | ||
<!--T:10--> | |||
When a parallel program needs to read a small file (< 1MB), a configuration file for example, it is best to put this file on one disk (stripe count=1), to read it with the master rank, and to send its content to other ranks using a <tt>MPI_Broadcast</tt> or <tt>MPI_Scatter</tt>. | When a parallel program needs to read a small file (< 1MB), a configuration file for example, it is best to put this file on one disk (stripe count=1), to read it with the master rank, and to send its content to other ranks using a <tt>MPI_Broadcast</tt> or <tt>MPI_Scatter</tt>. | ||
<!--T:11--> | |||
When treating large files, it is usually best to use a stripe count as large as the number of MPI ranks. For the stripe size, you will want it to be the same size as the buffer size for the data that is being read or written, by each rank. For example, if each rank reads 1 MB of data at a time, the ideal stripe size will likely be 1 MB. If you don't know what size to use, your best bet is to keep the default value, which has been optimized for large files. '''Note that you must never use a stripe size that is not a multiple of 1 MB'''. | When treating large files, it is usually best to use a stripe count as large as the number of MPI ranks. For the stripe size, you will want it to be the same size as the buffer size for the data that is being read or written, by each rank. For example, if each rank reads 1 MB of data at a time, the ideal stripe size will likely be 1 MB. If you don't know what size to use, your best bet is to keep the default value, which has been optimized for large files. '''Note that you must never use a stripe size that is not a multiple of 1 MB'''. | ||
<!--T:12--> | |||
In general, you want to reduce the number of open/close operations on the filesystem. It is therefore best to concatenate all data within a single file rather than writing a lot of small files. It will also be best to open the file once at the beginning, and close it once at the end of the program, rather than opening and closing it each time you want to add new data. | In general, you want to reduce the number of open/close operations on the filesystem. It is therefore best to concatenate all data within a single file rather than writing a lot of small files. It will also be best to open the file once at the beginning, and close it once at the end of the program, rather than opening and closing it each time you want to add new data. | ||
== See also == | == See also == <!--T:13--> | ||
<!--T:14--> | |||
* http://www.nics.tennessee.edu/io-tips : explanations on Lustre | * http://www.nics.tennessee.edu/io-tips : explanations on Lustre | ||
* http://www.nics.tennessee.edu/I-O-Best-Practices : advices to obtain better performances | * http://www.nics.tennessee.edu/I-O-Best-Practices : advices to obtain better performances | ||
* Tools and examples for [[Archiving and compressing files]] | * Tools and examples for [[Archiving and compressing files]] | ||
</translate> | </translate> |
Revision as of 16:50, 24 August 2017
Lustre Filesystem[edit]
Lustre is a high performance distributed filesystem which allows users of Compute Canada to reach high bandwidth for input/output operations. There are however some caveats to consider if one wants to achieve the best performance.
Stripe Count and Stripe Size[edit]
For each file or directory, it is possible change the stripe size and stripe count parameters. Stripe size is the size of the smallest block of data that is allocated on the filesystem. Stripe count is the number of disks on which the data are spread.
It is possible to get the value of those parameters for a given file or directory using the command
[name@server ~]$ lfs getstripe ''path/to/file''
It is also possible to change those parameters for a given directory using the command
[name@server ~]$ lfs setstripe -c ''count'' -s ''size'' ''/path/to/dir''
For example, if count=8 and size=4m, then the files will be spread on 8 disks and will grow by steps of 4 MB each time that new space is required.
It is not possible to change the stripe count or the stripe size of an existing file. To change those parameters, the file must be copied (not moved) to a directory with different parameters. To create an empty file with a given value of those parameters without changing the parameters of the directory, you may run lfs setstripe on the name of the file to be created. The file will be created as an empty file with the given parameters.
Increasing the stripe count may improve performances, but also makes this file more susceptible to hardware failures.
When a parallel program needs to read a small file (< 1MB), a configuration file for example, it is best to put this file on one disk (stripe count=1), to read it with the master rank, and to send its content to other ranks using a MPI_Broadcast or MPI_Scatter.
When treating large files, it is usually best to use a stripe count as large as the number of MPI ranks. For the stripe size, you will want it to be the same size as the buffer size for the data that is being read or written, by each rank. For example, if each rank reads 1 MB of data at a time, the ideal stripe size will likely be 1 MB. If you don't know what size to use, your best bet is to keep the default value, which has been optimized for large files. Note that you must never use a stripe size that is not a multiple of 1 MB.
In general, you want to reduce the number of open/close operations on the filesystem. It is therefore best to concatenate all data within a single file rather than writing a lot of small files. It will also be best to open the file once at the beginning, and close it once at the end of the program, rather than opening and closing it each time you want to add new data.
See also[edit]
- http://www.nics.tennessee.edu/io-tips : explanations on Lustre
- http://www.nics.tennessee.edu/I-O-Best-Practices : advices to obtain better performances
- Tools and examples for Archiving and compressing files