Arbutus object storage: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
Line 47: Line 47:


<!--T:15-->
<!--T:15-->
The <code>s3cmd</code> tool which is available in Linux is the preferred way to access our S3 gateway; however there are [[Arbutus Object Storage Clients|other tools]] out there that will also work.
The <code>s3cmd</code> tool which is available in Linux is the preferred way to access our S3 gateway; however there are [[Arbutus object storage clients|other tools]] out there that will also work.


<!--T:16-->
<!--T:16-->

Revision as of 20:39, 17 February 2023

Other languages:

Introduction

Object storage is a service that manages data as objects. This is different from other storage architectures where data is managed in a file hierarchy. Objects can be created, replaced, or deleted, but unlike traditional storage, they cannot be edited in place. Object storage has become popular due to its ability to handle large files and large numbers of files, and due to the prevalence of compatible tools.

Unlike other storage types, a unit of data or object is managed as a whole, and the information within it cannot be modified in place. Objects are stored in containers in the object store. The containers are stored in a way that makes them easier and often faster to access than in a traditional filesystem.

The best use of object storage is to store and export items which do not need hierarchical naming; are accessed mostly as a whole and mostly read-only; and have simplified access-control rules. We recommend using it with software or platforms that are designed to work with data living in an object store.

All Arbutus projects are allocated a default 1TB of object storage. If more is required, you can either request an additional 9 TB available through our Rapid Access Service. More than 10TB must be requested and allocated under the annual Resource Allocation Competition.

Unlike a cluster computing environment, system administration for a project's containers are managed by that user, which includes operations like backups. For more information about differences between object storage and other cloud storage types, see Cloud storage options.

We offer access to the OpenStack Object Store via two different protocols: Swift or Amazon Simple Storage Service (S3).

These protocols are very similar and in most situations you can use whichever you like. You don't have to commit to one, as object storage containers and objects created with Swift or S3 can be accessed using both protocols. There are a few key differences in the context of the Arbutus Object Store.

Swift is the default and is simpler since you do not have to manage credentials yourself. Access is governed using your Arbutus account. However, Swift does not replicate all the functionalities of S3. The main use case here is that when you want to manage your object storage containers using access policies, you must use S3, as Swift does not support access policies. You can also create and manage your own keys using S3, which could be useful if you for example want to create a read-only user for a specific application. A full list of Swift/S3 compatibility can be found here:

https://docs.openstack.org/swift/latest/s3_compat.html

Accessing and managing your object store

You can manage your object storage using the Object Store tab for your project at https://arbutus.cloud.computecanada.ca/. This interface refers to buckets as containers (not to be confused with containers based on namespace functionality of the Linux kernel). You can create containers (AKA buckets) in this interface, upload files, and create directories. Containers can also be created using S3-compatible CLI clients. Please note that if you create a new container as Public, any object placed within this container can be freely accessed (read-only) by anyone on the Internet simply by navigating to https://object-arbutus.cloud.computecanada.ca/<YOUR CONTAINER NAME HERE>/<YOUR OBJECT NAME HERE> with your container and object names inserted in place.

You can also use OpenStack Command Line Clients.

To generate your own S3 access ID and secret key for the S3 protocol, use the OpenStack command line client:

openstack ec2 credentials create

The s3cmd tool which is available in Linux is the preferred way to access our S3 gateway; however there are other tools out there that will also work.

The users are responsible for operations inside the tenant. As such, the buckets and management of those buckets are up to the user.

General information

  • Buckets are owned by the user who creates them, and no other user can manipulate them.
  • You can make a bucket accessible to the world, which then gives you a URL to share that will serve content from it.
  • Container names must be unique across all users in the Object Store, so you may benefit by prefixing each bucket with your project name to maintain uniqueness. In other words, don't bother trying to create a container named test, but def-myname-test is probably OK.
  • Container policies are managed via json files.

Connection details and s3cmd configuration

Object storage is accessible via an HTTPS endpoint:

object-arbutus.cloud.computecanada.ca:443

The following is an example of a minimal s3cmd configuration file. You will need these values, but are free to explore additional s3cmd configuration options to fit your use case. Note that in the example the keys are redacted and you will need to replace them with your provided key values:

[default]
access_key = <redacted>
check_ssl_certificate = True
check_ssl_hostname = True
host_base = object-arbutus.cloud.computecanada.ca
host_bucket = object-arbutus.cloud.computecanada.ca
secret_key = <redacted>
use_https = True

Using s3cmd's --configure feature is described here.

Example operations on a bucket

  • Make a bucket public so that it is Web accessible:

    s3cmd setacl s3://testbucket --acl-public

  • Make the bucket private again:

    s3cmd setacl s3://testbucket --acl-private

  • View the configuration of a bucket:

    s3cmd info s3://testbucket

Bucket policies

Attention

Be careful with policies because an ill-conceived policy can lock you out of your bucket.



Currently, Arbutus Object Storage only implements a subset of Amazon's specification for [bucket polices]. The following example shows how to create, apply, and view a bucket's policy. The first step is create a policy json file:

{
    "Version": "2012-10-17",
    "Id": "S3PolicyId1",
    "Statement": [
        {
            "Sid": "IPAllow",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::testbucket",
                "arn:aws:s3:::testbucket/*"
            ],
            "Condition": {
                "NotIpAddress": {
                    "aws:SourceIp": "206.12.0.0/16"
                    "aws:SourceIp": "142.104.0.0/16"
                }
            }
        }
    ]
}

This example denies access except from the specified source IP address ranges in Classless Inter-Domain Routing (CIDR) notation. In this example the s3://testbucket is limited to the public IP address range (206.12.0.0/16) used by the Arbutus cloud and the public IP address range (142.104.0.0/16) used by the University of Victoria.

Once you have your policy file, you can implement that policy on the bucket:

s3cmd setpolicy testbucket.policy s3://testbucket

To view the policy you can use the following command:

s3cmd info s3://testbucket