Arbutus object storage: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
(Comments on why one might use objects)
(Marked this version for translation)
 
(87 intermediate revisions by 10 users not shown)
Line 2: Line 2:
<translate>
<translate>


= Introduction = <!--T:1-->
== Introduction == <!--T:1-->


Object storage is a storage facility that is simpler than a normal hierarchical filesystem, but benefits by avoiding some performance bottlenecks.
<!--T:27-->
Object storage is a service that manages data as objects. This is different from other storage architectures where data is managed in a file hierarchy. Objects can be created, replaced, or deleted, but unlike traditional storage, they cannot be edited in place.  Object storage has become popular due to its ability to handle large files and large numbers of files, and due to the prevalence of compatible tools.


An object is a fixed file in a flat namespace: you can create/upload an object as a whole, but cannot modify bytes within it, and name it as bucket:tag with no further nesting. Since bucket operations are basically whole-file, the provider can use a simpler internal representation. The flat namespace allows the provider to avoid metadata bottlenecks - it's basically a key-value store.
<!--T:28-->
Unlike other storage types, a unit of data or <i>object</i> is managed as a whole, and the information within it cannot be modified in place. Objects are stored in containers in the object store. The containers are stored in a way that makes them easier and often faster to access than in a traditional filesystem.


The best use of object storage is to store and export items which do not need hierarchical naming, are accessed mostly atomically and mostly read-only, and with simplified access-control rules.
<!--T:29-->
The best use of object storage is to store and export items which do not need hierarchical naming; are accessed mostly as a whole and mostly read-only; and have simplified access-control rules. We recommend using it with software or platforms that are designed to work with data living in an object store.


<!--T:2-->
<!--T:2-->
All Arbutus projects are allocated a default 1TB of Object Store. If more is required, you can either apply for a RAS allocation or a RAC allocation.  
All Arbutus projects are allocated a default 1TB of object storage. If more is required, you can either request an additional 9TB available through our [https://alliancecan.ca/en/services/advanced-research-computing/accessing-resources/rapid-access-service Rapid Access Service]. More than 10TB must be requested and allocated under the annual [https://alliancecan.ca/en/services/advanced-research-computing/accessing-resources/resource-allocation-competition Resource Allocation Competition].
 
<!--T:30-->
Unlike a cluster computing environment, management of a project's object storage containers is self-service. This includes operations such as [[Backing up your VM|backups]] because the object store itself is not backed up. For more information about differences between object storage and other cloud storage types, see [[Cloud storage options]].


<!--T:3-->
<!--T:3-->
We offer access to the Object Store via two different protocols: Swift or S3.
We offer access to the OpenStack Object Store via two different protocols: Swift or Amazon Simple Storage Service (S3).


<!--T:5-->
<!--T:5-->
These protocols are very similar and in most situations you can use whichever you like. You don't have to commit to one, as buckets and objects created with Swift or S3 can be accessed using both protocols. There are a few key differences in the context of Arbutus Object Store.
These protocols are very similar and in most situations you can use whichever you like. You don't have to commit to one, as object storage containers and objects created with Swift or S3 can be accessed using both protocols. There are a few key differences in the context of the Arbutus Object Store.


<!--T:6-->
<!--T:6-->
Swift is given by default and is simpler since you do not have to manage credentials yourself. Access is governed using your Arbutus account. However, Swift does not replicate all the functionality of S3. The main use case here is when you want to manage your buckets using bucket policies you must use S3 as Swift does not support bucket policies. You can also create and manage your own keys using S3, which could be useful if you for example want to create a read-only user for a specific application. A full list of Swift/S3 compatibility can be found here:
Swift is the default and is simpler since you do not have to manage credentials yourself. Access is governed using your Arbutus account. However, Swift does not replicate all the functionalities of S3. The main use case here is that when you want to manage your object storage containers using access policies, you must use S3, as Swift does not support access policies. You can also create and manage your own keys using S3, which could be useful if you for example want to create a read-only user account for a specific application. See [https://docs.openstack.org/swift/latest/s3_compat.html the OpenStack S3/Swift compatibility list] for more details.


<!--T:7-->
== Establishing access to your Arbutus Object Store == <!--T:8-->
https://docs.openstack.org/swift/latest/s3_compat.html


= Accessing and managing Object Store = <!--T:8-->
<!--T:13-->
In order to manage your Arbutus Object Store, you will need your own storage access ID and secret key. To generate these, use the [[OpenStack command line clients|OpenStack command line client]]:


<!--T:9-->
<!--T:14-->
When requesting access we will ask you for the following:
<code>openstack ec2 credentials create</code>


<!--T:10-->
= Accessing your Arbutus Object Store = <!--T:35-->
You can interact with your Object Store using the Object Store tab for your project at https://arbutus.cloud.computecanada.ca/. This interface refers to buckets as containers. In this context the two terms are interchangable. Please note that if you create a new container as "Public" any object placed within this container can be accessed (read-only) by anyone freely on the internet simply by navigating to the following URL with your container and object names inserted in place:
Setting access policies cannot be done via a web browser but must be done with a [[Arbutus object storage clients|SWIFT or S3-compatible client]].  There are two ways to access your data containers:


<!--T:11-->
<!--T:21-->
https://object-arbutus.cloud.computecanada.ca/<YOUR CONTAINER NAME HERE>/<YOUR OBJECT NAME HERE>
# if your data container policies are set to private (default), object storage is accessible via an [[Arbutus_object_storage_clients|S3-compatible client]] (e.g. s3cmd).
# if your object storage policies are set to public (not default), object storage is accessible using a browser via an HTTPS endpoint:
<code>https://object-arbutus.cloud.computecanada.ca:443/DATA_CONTAINER/FILENAME</code>


<!--T:12-->
== Managing your Arbutus Object Store == <!--T:36-->
You can also use the Swift command line tool included with the Openstack command line clients.
For instructions on how to install and operate the Openstack command line clients please refer
to [https://docs.openstack.org/python-openstackclient/latest/ the Openstack documentation] as this falls outside of the scope of this document.


<!--T:13-->
<!--T:15-->
If you wish to use the S3 protocol, you can generate your own S3 access and secret keys using the Openstack command line client:
The recommended way to manage buckets and objects in the Arbutus Object Store is by using the <code>s3cmd</code> tool, which is available in Linux.
Our documentation provides specific instructions on [[Accessing_object_storage_with_s3cmd|configuring and managing access]] with the <code>s3cmd</code> client.
We can also use other [[Arbutus object storage clients|S3-compatible clients]] that are also compatible with Arbutus Object Store.


<!--T:14-->
<!--T:10-->
<code>openstack ec2 credentials create</code>
In addition, we can perform certain management tasks for our object storage using the [https://arbutus.cloud.computecanada.ca/project/containers Containers] section under the <b>Object Store</b> tab in the [https://arbutus.cloud.computecanada.ca Arbutus OpenStack Dashboard].
 
<!--T:15-->
The tool "s3cmd" which is available in Linux is the preferred way to access our S3 gateway, however there are [[Arbutus Object Storage Clients|other tools]] out there that will also work.


<!--T:16-->
<!--T:37-->
The users are responsible for operations inside of the 'tenant'. As such, the buckets and management of those buckets are up to the user.  
This interface refers to <i>data containers</i>, which are also known as <i>buckets</i> in other object storage systems.


=== General information === <!--T:17-->
<!--T:38-->
Using the dashboard, we can create new data containers, upload files, and create directories. Alternatively, we can also create data containers using [[Arbutus object storage clients|S3-compatible clients]].


<!--T:18-->
<!--T:39-->
* Buckets are owned by the user that creates them, and no other users can manipulate them.
{{quote|Please note that data containers are owned by the user who creates them and cannot be manipulated by others.<br/>Therefore, you are responsible for managing your data containers and their contents within your cloud project.}}
* You can make a bucket world accessible which then gives you a URL to share that will serve content in the bucket.
* Bucket and object names must be unique across '''all''' users in the Object Store, so you may benefit by prefixing each bucket and object with your project name to maintain uniqueness.  In other words, don't bother trying to create a bucket named "test", but "def-myname-test" is probably OK.
* Bucket policies are managed via json files.


= Connection details and s3cmd Configuration = <!--T:19-->
<!--T:40-->
If you create a new container as <b>Public</b>, anyone on the internet can read its contents by simply navigating to


<!--T:20-->
<!--T:41-->
The object storage is accessible via an HTTP endpoint:
<code>
<nowiki>https://object-arbutus.cloud.computecanada.ca/<YOUR CONTAINER NAME HERE>/<YOUR OBJECT NAME HERE></nowiki>
</code>


<!--T:21-->
<!--T:42-->
<code>object-arbutus.cloud.computecanada.ca:443</code>
with your container and object names inserted in place.


<!--T:22-->
<!--T:43-->
The following is an example of a bare minimum s3cmd configuration file. You will need these values, but are free to explore additional s3cmd configuration options to fit your use case. Note that in the example the keys are redacted and you will need to replace them with your provided key values:
{{quote|It's important to keep in mind that each data container on the <b>Arbutus Object Store</b> must have a <b>unique name across all users</b>. To ensure uniqueness, we may want to prefix our data container names with our project name to avoid conflicts with other users. One useful rule of thumb is to refrain from using generic names like <code>test</code> for data containers. Instead, consider using more specific and unique names like <code>def-myname-test</code>.}}


<!--T:23-->
<!--T:44-->
<pre>[default]
To make a data container accessible to the public, we can change its policy to allow public access. This can come in handy if we need to share files to a wider audience. We can manage container policies using JSON files, allowing us to specify various access controls for our containers and objects.
access_key = <redacted>
check_ssl_certificate = True
check_ssl_hostname = True
host_base = object-arbutus.cloud.computecanada.ca
host_bucket = object-arbutus.cloud.computecanada.ca
secret_key = <redacted>
use_https = True
</pre>


<!--T:24-->
=== Managing data container (bucket) policies for your Arbutus Object Store === <!--T:31-->
Using s3cmd's <code>--configuration</code> feature is [https://docs.computecanada.ca/wiki/Arbutus_Object_Storage_Clients#Configuring_s3cmd described here].
<br>
{{Warning|title=Attention|content=Be careful with policies because an ill-conceived policy can lock you out of your data container.}}


= Example operations on a bucket = <!--T:25-->
<!--T:34-->
Currently, Arbutus Object Storage only supports a [[Arbutus_object_storage#Policy_subset|subset]] of the AWS specification for [https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-iam-policies.html data container polices]. The following example shows how to create, apply, and view a policy. The first step is to create a policy json file:


<!--T:26-->
<!--T:45-->
<ul>
<syntaxhighlight lang="json">
<li><p>Making a bucket public so that it is web accessible:</p>
<p><code>s3cmd setacl s3://testbucket --acl-public</code></p></li>
<li><p>Make the bucket private again:</p>
<p><code>s3cmd setacl s3://testbucket --acl-private</code></p></li>
<li><p>Example bucket policy:</p>
<p>You need to first create a policy json file:</p>
<pre>&quot;testbucket.policy&quot;:
{
{
&quot;Version&quot;: &quot;2012-10-17&quot;,
    "Version": "2012-10-17",
&quot;Statement&quot;: [{
    "Id": "S3PolicyId1",
    &quot;Effect&quot;: &quot;Allow&quot;,
    "Statement": [
    &quot;Principal&quot;: {&quot;AWS&quot;: [
        {
    &quot;arn:aws:iam::rrg_cjhuofw:user/parsa7&quot;,
            "Sid": "IPAllow",
    &quot;arn:aws:iam::rrg_cjhuofw:user/dilbar&quot;
            "Effect": "Deny",
    ]},
            "Principal": "*",
    &quot;Action&quot;: [
            "Action": "s3:*",
    &quot;s3:ListBucket&quot;,
            "Resource": [
    &quot;s3:PutObject&quot;,
                "arn:aws:s3:::testbucket",
    &quot;s3:DeleteObject&quot;,
                "arn:aws:s3:::testbucket/*"
    &quot;s3:GetObject&quot;
            ],
    ],
            "Condition": {
    &quot;Resource&quot;: [
                "NotIpAddress": {
    &quot;arn:aws:s3:::testbucket/*&quot;,
                    "aws:SourceIp": "206.12.0.0/16",
    &quot;arn:aws:s3:::testbucket&quot;
                    "aws:SourceIp": "142.104.0.0/16"
                }
            }
        }
     ]
     ]
}]
}
}
</pre>
</syntaxhighlight>
<p>This file allows you to set specific permissions for any number of users of that bucket.</p>
 
<p>You can even specify users from another tenant if there is a user from another project working with you.</p>
<!--T:46-->
<p>Now that you have your policy file, you can implement that policy on the bucket:</p>
This example denies access except from the specified source IP address ranges in Classless Inter-Domain Routing (CIDR) notation. In this example the s3://testbucket is limited to the public IP address range (206.12.0.0/16) used by the Arbutus cloud and the public IP address range (142.104.0.0/16) used by the University of Victoria.
 
<!--T:32-->
<p>Once you have your policy file, you can implement that policy on the data container:</p>
<p><code>s3cmd setpolicy testbucket.policy s3://testbucket</code></p>
<p><code>s3cmd setpolicy testbucket.policy s3://testbucket</code></p>
<p>More extensive examples and actions can be found here: https://www.linode.com/docs/platform/object-storage/how-to-use-object-storage-acls-and-bucket-policies/</p></li></ul>
 
<!--T:33-->
<p>To view the policy you can use the following command:</p>
<p><code>s3cmd info s3://testbucket</code></p>
 
=== Policy subset === <!--T:47-->
 
<!--T:48-->
Currently, we support only the following actions:
 
<!--T:49-->
* s3:AbortMultipartUpload
* s3:CreateBucket
* s3:DeleteBucketPolicy
* s3:DeleteBucket
* s3:DeleteBucketWebsite
* s3:DeleteObject
* s3:DeleteObjectVersion
* s3:DeleteReplicationConfiguration
* s3:GetAccelerateConfiguration
* s3:GetBucketAcl
* s3:GetBucketCORS
* s3:GetBucketLocation
* s3:GetBucketLogging
* s3:GetBucketNotification
* s3:GetBucketPolicy
* s3:GetBucketRequestPayment
* s3:GetBucketTagging
* s3:GetBucketVersioning
* s3:GetBucketWebsite
* s3:GetLifecycleConfiguration
* s3:GetObjectAcl
* s3:GetObject
* s3:GetObjectTorrent
* s3:GetObjectVersionAcl
* s3:GetObjectVersion
* s3:GetObjectVersionTorrent
* s3:GetReplicationConfiguration
* s3:IPAddress
* s3:NotIpAddress
* s3:ListAllMyBuckets
* s3:ListBucketMultipartUploads
* s3:ListBucket
* s3:ListBucketVersions
* s3:ListMultipartUploadParts
* s3:PutAccelerateConfiguration
* s3:PutBucketAcl
* s3:PutBucketCORS
* s3:PutBucketLogging
* s3:PutBucketNotification
* s3:PutBucketPolicy
* s3:PutBucketRequestPayment
* s3:PutBucketTagging
* s3:PutBucketVersioning
* s3:PutBucketWebsite
* s3:PutLifecycleConfiguration
* s3:PutObjectAcl
* s3:PutObject
* s3:PutObjectVersionAcl
* s3:PutReplicationConfiguration
* s3:RestoreObject
 
<!--T:51-->
[[Category:Cloud]]
 
</translate>
</translate>
[[Category:CC-Cloud]]

Latest revision as of 15:01, 15 April 2024

Other languages:

Introduction

Object storage is a service that manages data as objects. This is different from other storage architectures where data is managed in a file hierarchy. Objects can be created, replaced, or deleted, but unlike traditional storage, they cannot be edited in place. Object storage has become popular due to its ability to handle large files and large numbers of files, and due to the prevalence of compatible tools.

Unlike other storage types, a unit of data or object is managed as a whole, and the information within it cannot be modified in place. Objects are stored in containers in the object store. The containers are stored in a way that makes them easier and often faster to access than in a traditional filesystem.

The best use of object storage is to store and export items which do not need hierarchical naming; are accessed mostly as a whole and mostly read-only; and have simplified access-control rules. We recommend using it with software or platforms that are designed to work with data living in an object store.

All Arbutus projects are allocated a default 1TB of object storage. If more is required, you can either request an additional 9TB available through our Rapid Access Service. More than 10TB must be requested and allocated under the annual Resource Allocation Competition.

Unlike a cluster computing environment, management of a project's object storage containers is self-service. This includes operations such as backups because the object store itself is not backed up. For more information about differences between object storage and other cloud storage types, see Cloud storage options.

We offer access to the OpenStack Object Store via two different protocols: Swift or Amazon Simple Storage Service (S3).

These protocols are very similar and in most situations you can use whichever you like. You don't have to commit to one, as object storage containers and objects created with Swift or S3 can be accessed using both protocols. There are a few key differences in the context of the Arbutus Object Store.

Swift is the default and is simpler since you do not have to manage credentials yourself. Access is governed using your Arbutus account. However, Swift does not replicate all the functionalities of S3. The main use case here is that when you want to manage your object storage containers using access policies, you must use S3, as Swift does not support access policies. You can also create and manage your own keys using S3, which could be useful if you for example want to create a read-only user account for a specific application. See the OpenStack S3/Swift compatibility list for more details.

Establishing access to your Arbutus Object Store

In order to manage your Arbutus Object Store, you will need your own storage access ID and secret key. To generate these, use the OpenStack command line client:

openstack ec2 credentials create

Accessing your Arbutus Object Store

Setting access policies cannot be done via a web browser but must be done with a SWIFT or S3-compatible client. There are two ways to access your data containers:

  1. if your data container policies are set to private (default), object storage is accessible via an S3-compatible client (e.g. s3cmd).
  2. if your object storage policies are set to public (not default), object storage is accessible using a browser via an HTTPS endpoint:

https://object-arbutus.cloud.computecanada.ca:443/DATA_CONTAINER/FILENAME

Managing your Arbutus Object Store

The recommended way to manage buckets and objects in the Arbutus Object Store is by using the s3cmd tool, which is available in Linux. Our documentation provides specific instructions on configuring and managing access with the s3cmd client. We can also use other S3-compatible clients that are also compatible with Arbutus Object Store.

In addition, we can perform certain management tasks for our object storage using the Containers section under the Object Store tab in the Arbutus OpenStack Dashboard.

This interface refers to data containers, which are also known as buckets in other object storage systems.

Using the dashboard, we can create new data containers, upload files, and create directories. Alternatively, we can also create data containers using S3-compatible clients.

Please note that data containers are owned by the user who creates them and cannot be manipulated by others.
Therefore, you are responsible for managing your data containers and their contents within your cloud project.

If you create a new container as Public, anyone on the internet can read its contents by simply navigating to

https://object-arbutus.cloud.computecanada.ca/<YOUR CONTAINER NAME HERE>/<YOUR OBJECT NAME HERE>

with your container and object names inserted in place.

It's important to keep in mind that each data container on the Arbutus Object Store must have a unique name across all users. To ensure uniqueness, we may want to prefix our data container names with our project name to avoid conflicts with other users. One useful rule of thumb is to refrain from using generic names like test for data containers. Instead, consider using more specific and unique names like def-myname-test.

To make a data container accessible to the public, we can change its policy to allow public access. This can come in handy if we need to share files to a wider audience. We can manage container policies using JSON files, allowing us to specify various access controls for our containers and objects.

Managing data container (bucket) policies for your Arbutus Object Store



Attention

Be careful with policies because an ill-conceived policy can lock you out of your data container.



Currently, Arbutus Object Storage only supports a subset of the AWS specification for data container polices. The following example shows how to create, apply, and view a policy. The first step is to create a policy json file:

{
    "Version": "2012-10-17",
    "Id": "S3PolicyId1",
    "Statement": [
        {
            "Sid": "IPAllow",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::testbucket",
                "arn:aws:s3:::testbucket/*"
            ],
            "Condition": {
                "NotIpAddress": {
                    "aws:SourceIp": "206.12.0.0/16",
                    "aws:SourceIp": "142.104.0.0/16"
                }
            }
        }
    ]
}

This example denies access except from the specified source IP address ranges in Classless Inter-Domain Routing (CIDR) notation. In this example the s3://testbucket is limited to the public IP address range (206.12.0.0/16) used by the Arbutus cloud and the public IP address range (142.104.0.0/16) used by the University of Victoria.

Once you have your policy file, you can implement that policy on the data container:

s3cmd setpolicy testbucket.policy s3://testbucket

To view the policy you can use the following command:

s3cmd info s3://testbucket

Policy subset

Currently, we support only the following actions:

  • s3:AbortMultipartUpload
  • s3:CreateBucket
  • s3:DeleteBucketPolicy
  • s3:DeleteBucket
  • s3:DeleteBucketWebsite
  • s3:DeleteObject
  • s3:DeleteObjectVersion
  • s3:DeleteReplicationConfiguration
  • s3:GetAccelerateConfiguration
  • s3:GetBucketAcl
  • s3:GetBucketCORS
  • s3:GetBucketLocation
  • s3:GetBucketLogging
  • s3:GetBucketNotification
  • s3:GetBucketPolicy
  • s3:GetBucketRequestPayment
  • s3:GetBucketTagging
  • s3:GetBucketVersioning
  • s3:GetBucketWebsite
  • s3:GetLifecycleConfiguration
  • s3:GetObjectAcl
  • s3:GetObject
  • s3:GetObjectTorrent
  • s3:GetObjectVersionAcl
  • s3:GetObjectVersion
  • s3:GetObjectVersionTorrent
  • s3:GetReplicationConfiguration
  • s3:IPAddress
  • s3:NotIpAddress
  • s3:ListAllMyBuckets
  • s3:ListBucketMultipartUploads
  • s3:ListBucket
  • s3:ListBucketVersions
  • s3:ListMultipartUploadParts
  • s3:PutAccelerateConfiguration
  • s3:PutBucketAcl
  • s3:PutBucketCORS
  • s3:PutBucketLogging
  • s3:PutBucketNotification
  • s3:PutBucketPolicy
  • s3:PutBucketRequestPayment
  • s3:PutBucketTagging
  • s3:PutBucketVersioning
  • s3:PutBucketWebsite
  • s3:PutLifecycleConfiguration
  • s3:PutObjectAcl
  • s3:PutObject
  • s3:PutObjectVersionAcl
  • s3:PutReplicationConfiguration
  • s3:RestoreObject