Diskusage Explorer: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
 
(36 intermediate revisions by 5 users not shown)
Line 1: Line 1:
==Content of Folders==
<languages />
<translate>


<span style="color:red">Warning, at the moment this feature is only available on [[Béluga]]</span>
==Content of folders== <!--T:1-->


You can get a break down by folder on how the data is consumed in your project space. That information is currently updated once a day and is stored in a portable sqlite format.  
<!--T:2-->
<span style="color:red">Warning: This tool is currently only available on [[Béluga/en|Béluga]] and [[Narval/en|Narval]].</span>


Here is a walkthrough of how to look at your data where <code>$GROUP</code> will be the project space folder to investigate.
<!--T:3-->
You can get a breakdown by folder of how the disk space is being consumed in your /home, /scratch and /project spaces. That information is currently updated once a day and is stored in an [[SQLite]] format for fast access.  


=== Ncurse User Interface ===
<!--T:4-->
First list all the projects that you have access to:
Here is how to explore your disk consumption, using the example of /project space <code>def-professor</code> as the particular directory to investigate.
<pre>
 
ls ~/projects/
=== ncurse user interface === <!--T:5-->
    def-bourqueg  def-lathrop  def-poq-ab  rrg-bourqueg-ad  rrg-lathrop
Choose a /project space you have access to and want to analyze; for the purpose of this discussion we will use <tt>def-professor</tt>.
</pre>
{{Command|diskusage_explorer /project/def-professor}}
Will will take the def-poq-ab project as an example,
This command loads a browser that shows the resources consumed by all files under any directory tree.
<pre>
[[File:Ncurse duc.png|thumb|using|450px|frame|left| Navigating your project space with duc's ncurse tool]]
GROUP=def-poq-ab
duc ui -d /project/.duc_databases/${GROUP}.sqlite  /project/${GROUP}
</pre>
This load a browser that state the resources consumed by all files under any directory tree
[[File:Ncurse duc.png|thumb|using|450px|frame|left| Navigating you project space with duc ncurse tool]]
<br clear=all> <!-- This is to prevent the next section from filling to the right of the image. -->
<br clear=all> <!-- This is to prevent the next section from filling to the right of the image. -->


Type <code>c</code> to toggle between consumed disk space and the number of files. Type <code>q</code> <code><esc></code> to quit, type <code>h</code> for help  
<!--T:6-->
Type <code>c</code> to toggle between consumed disk space and the number of files, <code>q</code> or <code><esc></code> to quit and <code>h</code> for help.


If I am only interested in a <code>/project/${GROUP}</code> subfolder and do not want to navigate the whole tree in the ncurse ui,  
<!--T:7-->
<pre>
If you are only interested in a subdirectory of this /project space and do not want to navigate the whole tree in the ncurse user interface, use
duc ui -d /project/.duc_databases/${GROUP}.sqlite  /project/${GROUP}/some/subfolder/
{{Command|diskusage_explorer /project/def-professor/subdirectory/}}
</pre>
 
<!--T:8-->
A complete manual page is available with the <code>man duc</code> command.
 
=== Graphical user interface === <!--T:9-->
 
<!--T:10-->
Note that when the login node is especially busy or if you have an especially large number of files in your /project space, the graphical interface mode can be slow and choppy. For a better experience, you can read the section below to run <code>diskusage_explorer</code> on your own machine.


A complete manual is available with the <code>man duc</code> command.
<!--T:11-->
Note that we recommend the use of the standard text-based ncurse mode on our cluster login nodes but <code>diskusage_explorer</code> does also include a nice graphical user interface (GUI).  


<!--T:12-->
First, make sure that you are connected to the cluster in such a way that [[SSH]] is capable of correctly displaying GUI applications. You can then use a graphical interface by means of the command,
{{Command|duc gui -d /project/.duc_databases/def-professor.sqlite  /project/def-professor}}


=== Graphical User Interface ===
<!--T:13-->
You can navigate the folders with the mouse and still type <code>c</code> to toggle between the size of the files and the number of files.


Note that on buzy login node days, or if you have an especially large amount of files in you project space, the GUI mode can be slow and choppy. For a better experience, read the section below and run duc on you own machine.
<!--T:14-->
[[File:Duc gui navigation.gif|thumb|using|450px|frame|left|Navigating your project space with duc's GUI tool]]
<br clear=all> <!-- This is to prevent the next section from filling to the right of the image. -->


So now you are warned, we recommend to use the standard <code>ui</code> mode on our clusters login nodes, but duc also includes a nice graphical user interface (GUI).
=== Browse faster on your own machine === <!--T:15-->


First make sure you are connected to the cluster using ssh's <code>-Y</code> option
<!--T:16-->
First, [http://duc.zevv.nl/#download install the diskusage_explorer software] on your local machine and then, still on your local machine, download the SQLite file from your cluster and run <code>duc</code>
 
<!--T:17-->
<pre>
<pre>
ssh -Y  poq@beluga.calculcanada.ca
rsync -v --progress username@beluga.calculcanada.ca:/project/.duc_databases/def-professor.sqlite  .
</pre>
duc gui -d ./def-professor.sqlite  /project/def-professor
Then make GUI appear
<pre>
GROUP=def-poq-ab
duc gui -d /project/.duc_databases/${GROUP}.sqlite  /project/${GROUP}
</pre>
</pre>


You can navigate the folders with the mouse and still type <code>c</code> to toggle between the size of the files and their numbers
<!--T:18-->
This immediately leads to a smoother and more satisfying browsing experience.
 
== Space and file count usage per user on Cedar == <!--T:19-->
 
<!--T:20-->
On Cedar, it is possible for any member of a group to run <code>diskusage_report</code> with the following options <code>--per_user</code> and <code>--all_users</code> to have the breakdown per user. The first option displays only heavy users. In other terms, members of the group who have more files and/or occupy more space. When both options are used, the command gives the breakdown for all members of the group. This is a handy command that helps to identify the users within a group who have more files and/or a large amount of data and ask them to better manage their data by reducing their file count usage for example.
 
<!--T:21-->
In the following example, user <b>user01</b> runs the command and gets the following output:
 
<!--T:22-->
<source lang="bash">
[user01@cedar1 ~]$ diskusage_report --per_user --all_users
                            Description                Space          # of files
                    /home (user user01)            109k/50G              12/500k
                  /scratch (user user01)            4000/20T              1/1000k
                /project (group user01)              0/2048k              0/1025
          /project (group def-professor)            9434G/10T            497k/500k


[[File:Duc gui navigation.gif|thumb|using|450px|frame|left|Navigating you project space with duc gui tool]]
<!--T:23-->
<br clear=all> <!-- This is to prevent the next section from filling to the right of the image. -->
Breakdown for project def-professor (Last update: 2023-05-02 01:03:10)
          User      File count                Size            Location
-------------------------------------------------------------------------
        user01          28313            4.00 GiB              On disk
        user02          11926            3.74 GiB              On disk
        user03          14507          6121.03 GiB              On disk
        user04            4010          377.86 GiB              On disk
        user05          125929          262.75 GiB              On disk
        user06          201099            60.51 GiB              On disk
        user07          84806          1721.33 GiB              On disk
        user08          26516          947.23 GiB              On disk
          Total          497106          9510.43 GiB              On disk


=== Browse faster on your own machine ===
<!--T:24-->
Breakdown for nearline def-professor (Last update: 2023-05-02 01:01:30)
          User      File count                Size            Location
-------------------------------------------------------------------------
        user03              5          1197.90 GiB    On disk and tape
          Total              5          1197.90 GiB    On disk and tape
</source>


First [http://duc.zevv.nl/#download install the duc software] in your local machine. Then always on your local machine, download the sqlite file from your cluster and run duc. 
<!--T:25-->
This group has 8 users and the above output shows clearly that at least 4 of them have a large number of files for a small amount of data:


<pre>
<!--T:26-->
MYUSERNAME=poq
<source lang="bash">
GROUP=def-poq-ab
          User      File count                Size            Location
rsync -v --progress ${MYUSERNAME}@beluga.calculcanada.ca:/project/.duc_databases/${GROUP}.sqlite  .
-------------------------------------------------------------------------
duc gui -d ./${GROUP}.sqlite  /project/${GROUP}
        user01          28313            4.00 GiB              On disk
</pre>
        user02          11926            3.74 GiB              On disk
        user05          125929          262.75 GiB              On disk
        user06          201099            60.51 GiB              On disk
</source>


Voilà! A smooth and satisfying browsing experience.
</translate>

Latest revision as of 19:45, 16 January 2024

Other languages:

Content of folders

Warning: This tool is currently only available on Béluga and Narval.

You can get a breakdown by folder of how the disk space is being consumed in your /home, /scratch and /project spaces. That information is currently updated once a day and is stored in an SQLite format for fast access.

Here is how to explore your disk consumption, using the example of /project space def-professor as the particular directory to investigate.

ncurse user interface

Choose a /project space you have access to and want to analyze; for the purpose of this discussion we will use def-professor.

Question.png
[name@server ~]$ diskusage_explorer /project/def-professor

This command loads a browser that shows the resources consumed by all files under any directory tree.

Navigating your project space with duc's ncurse tool


Type c to toggle between consumed disk space and the number of files, q or <esc> to quit and h for help.

If you are only interested in a subdirectory of this /project space and do not want to navigate the whole tree in the ncurse user interface, use

Question.png
[name@server ~]$ diskusage_explorer /project/def-professor/subdirectory/

A complete manual page is available with the man duc command.

Graphical user interface

Note that when the login node is especially busy or if you have an especially large number of files in your /project space, the graphical interface mode can be slow and choppy. For a better experience, you can read the section below to run diskusage_explorer on your own machine.

Note that we recommend the use of the standard text-based ncurse mode on our cluster login nodes but diskusage_explorer does also include a nice graphical user interface (GUI).

First, make sure that you are connected to the cluster in such a way that SSH is capable of correctly displaying GUI applications. You can then use a graphical interface by means of the command,

Question.png
[name@server ~]$ duc gui -d /project/.duc_databases/def-professor.sqlite  /project/def-professor

You can navigate the folders with the mouse and still type c to toggle between the size of the files and the number of files.

Navigating your project space with duc's GUI tool


Browse faster on your own machine

First, install the diskusage_explorer software on your local machine and then, still on your local machine, download the SQLite file from your cluster and run duc.

rsync -v --progress username@beluga.calculcanada.ca:/project/.duc_databases/def-professor.sqlite  .
duc gui -d ./def-professor.sqlite  /project/def-professor

This immediately leads to a smoother and more satisfying browsing experience.

Space and file count usage per user on Cedar

On Cedar, it is possible for any member of a group to run diskusage_report with the following options --per_user and --all_users to have the breakdown per user. The first option displays only heavy users. In other terms, members of the group who have more files and/or occupy more space. When both options are used, the command gives the breakdown for all members of the group. This is a handy command that helps to identify the users within a group who have more files and/or a large amount of data and ask them to better manage their data by reducing their file count usage for example.

In the following example, user user01 runs the command and gets the following output:

[user01@cedar1 ~]$ diskusage_report --per_user --all_users
                             Description                Space           # of files
                     /home (user user01)             109k/50G              12/500k
                  /scratch (user user01)             4000/20T              1/1000k
                 /project (group user01)              0/2048k               0/1025
          /project (group def-professor)            9434G/10T            497k/500k

Breakdown for project def-professor (Last update: 2023-05-02 01:03:10)
           User      File count                 Size             Location
-------------------------------------------------------------------------
         user01           28313             4.00 GiB              On disk
         user02           11926             3.74 GiB              On disk
         user03           14507          6121.03 GiB              On disk
         user04            4010           377.86 GiB              On disk
         user05          125929           262.75 GiB              On disk
         user06          201099            60.51 GiB              On disk
         user07           84806          1721.33 GiB              On disk
         user08           26516           947.23 GiB              On disk
          Total          497106          9510.43 GiB              On disk

Breakdown for nearline def-professor (Last update: 2023-05-02 01:01:30)
           User      File count                 Size             Location
-------------------------------------------------------------------------
         user03               5          1197.90 GiB     On disk and tape
          Total               5          1197.90 GiB     On disk and tape

This group has 8 users and the above output shows clearly that at least 4 of them have a large number of files for a small amount of data:

           User      File count                 Size             Location
-------------------------------------------------------------------------
         user01           28313             4.00 GiB              On disk
         user02           11926             3.74 GiB              On disk
         user05          125929           262.75 GiB              On disk
         user06          201099            60.51 GiB              On disk