Scratch purging policy: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
(Marked this version for translation)
No edit summary
Line 3: Line 3:
<translate>
<translate>
=Overview= <!--T:1-->
=Overview= <!--T:1-->
The scratch filesystem on Compute Canada clusters is intended as temporary, fast storage for data being used during job execution.  Data needed for long term storage and reference should be kept in either [[Project layout|<tt>/project</tt>]] or other archival storage areas. In order to ensure adequate space on scratch older files are periodically deleted according to the policy outlined in this page. The threshold for purging is 60 days, which is a little more than twice the maximum duration of a job on the cluster.
The scratch filesystem on Compute Canada clusters is intended as temporary, fast storage for data being used during job execution.  Data needed for long term storage and reference should be kept in either [[Project layout|<tt>/project</tt>]] or other archival storage areas. In order to ensure adequate space on scratch, older files are periodically deleted according to the policy outlined in this page. The threshold for purging is 60 days, which is a little more than twice the maximum duration of a job on the cluster.


=Expiration procedure= <!--T:2-->
=Expiration procedure= <!--T:2-->
Line 11: Line 11:


<!--T:4-->
<!--T:4-->
On the 12th of the month, a final notification e-mail will be sent with an updated assessment of candidate files for expiration on the 15th, giving you 72 hours to make arrangements for moving these files. At the end of day on the 15th any remaining files on the scratch filesystem for which both the <tt>ctime</tt> and the <tt>atime</tt> are older than 60 days will be deleted.
On the 12th of the month, a final notification e-mail will be sent with an updated assessment of candidate files for expiration on the 15th, giving you 72 hours to make arrangements for moving these files. At the end of day on the 15th, any remaining files on the scratch filesystem for which both the <tt>ctime</tt> and the <tt>atime</tt> are older than 60 days will be deleted.


<!--T:5-->
<!--T:5-->
Note that simplying copying or using the <tt>rsync</tt> command to displace your files will update the <tt>atime</tt> for the original data on scratch, making them ineligible for deletion. Once you have put the data in another location please delete the original files and directories in scratch, rather than depending on the automatic purging.
Note that simply copying or using the <tt>rsync</tt> command to displace your files will update the <tt>atime</tt> for the original data on scratch, making them ineligible for deletion. Once you have put the data in another location please delete the original files and directories in scratch instead of depending on the automatic purging.


= How do I check the age of a file? = <!--T:6-->
= How do I check the age of a file? = <!--T:6-->
Line 24: Line 24:
while the <tt>atime</tt> can be obtained with the command  
while the <tt>atime</tt> can be obtained with the command  
{{Command|ls -lu <filename>}}  
{{Command|ls -lu <filename>}}  
We do not use the modify time (<tt>mtime</tt>) of the file because it can be modified by the user or other programs to display incorrect information.  
We do not use the modify time (<tt>mtime</tt>) of the file because it can be modified by the user or by other programs to display incorrect information.  


<!--T:7-->
<!--T:7-->
Ordinarily simple use of the <tt>atime</tt> property would be sufficient, as it is updated by the system in sync with the <tt>ctime</tt>. However userspace programs are able to alter <tt>atime</tt>, potentially to times in the past, which could result in early expiration of a file. The use of <tt>ctime</tt> as a fallback guards against this undesirable behaviour.
Ordinarily, simple use of the <tt>atime</tt> property would be sufficient, as it is updated by the system in sync with the <tt>ctime</tt>. However, userspace programs are able to alter <tt>atime</tt>, potentially to times in the past, which could result in early expiration of a file. The use of <tt>ctime</tt> as a fallback guards against this undesirable behaviour.


=Abuse= <!--T:8-->
=Abuse= <!--T:8-->
This method of tracking file age does allow for potential abuse the system by periodically running a recursive <tt>touch</tt> command on your files, preventing them from being flagged for expiration. Compute Canada staff have methods for detecting this and similar tactics to circumvent the purging policy. Users who employ such techniques will be contacted and asked to modify their behaviour, in particular to move the "retouched" data from scratch to a more appropriate location.
This method of tracking file age does allow for potential abuse by periodically running a recursive <tt>touch</tt> command on your files to prevent them from being flagged for expiration. Compute Canada staff have methods for detecting this and similar tactics to circumvent the purging policy. Users who employ such techniques will be contacted and asked to modify their behaviour, in particular to move the "retouched" data from scratch to a more appropriate location.


</translate>
</translate>

Revision as of 19:36, 1 March 2018

Other languages:

Overview

The scratch filesystem on Compute Canada clusters is intended as temporary, fast storage for data being used during job execution. Data needed for long term storage and reference should be kept in either /project or other archival storage areas. In order to ensure adequate space on scratch, older files are periodically deleted according to the policy outlined in this page. The threshold for purging is 60 days, which is a little more than twice the maximum duration of a job on the cluster.

Expiration procedure

The scratch filesystem is checked at the end of the month for files which will be candidates for expiry on the 15th of the following month. On the first day of the month, a notification e-mail is sent to all users who have at least one file which is a candidate for purging and containing the location of a file which lists all the candidates for purging. You will thus have two weeks to make arrangements to move data to your project space or some other location if you wish to save the data in question.

On the 12th of the month, a final notification e-mail will be sent with an updated assessment of candidate files for expiration on the 15th, giving you 72 hours to make arrangements for moving these files. At the end of day on the 15th, any remaining files on the scratch filesystem for which both the ctime and the atime are older than 60 days will be deleted.

Note that simply copying or using the rsync command to displace your files will update the atime for the original data on scratch, making them ineligible for deletion. Once you have put the data in another location please delete the original files and directories in scratch instead of depending on the automatic purging.

How do I check the age of a file?

We define a file's age as the most recent of:

  • the access time (atime) and
  • the change time (ctime).

You can find the ctime of a file using

Question.png
[name@server ~]$ ls -lc <filename>

while the atime can be obtained with the command

Question.png
[name@server ~]$ ls -lu <filename>

We do not use the modify time (mtime) of the file because it can be modified by the user or by other programs to display incorrect information.

Ordinarily, simple use of the atime property would be sufficient, as it is updated by the system in sync with the ctime. However, userspace programs are able to alter atime, potentially to times in the past, which could result in early expiration of a file. The use of ctime as a fallback guards against this undesirable behaviour.

Abuse

This method of tracking file age does allow for potential abuse by periodically running a recursive touch command on your files to prevent them from being flagged for expiration. Compute Canada staff have methods for detecting this and similar tactics to circumvent the purging policy. Users who employ such techniques will be contacted and asked to modify their behaviour, in particular to move the "retouched" data from scratch to a more appropriate location.