This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

File Transfer

File Transfer

1 - Rclone on Rivanna

Using Rclone to upload and download from cloud services

Using the Rclone Module on Rivanna

Rclone is a useful tool to upload and download from cloud services such as Google Drive by using the commandline. However, a web browser is required for initial setup, which can be done from the computer that logs into Rivanna.

Setup Rclone on Rivanna

First, load the newer version of module; otherwise, Rivanna loads an incompatible, older version by default. Then, initialize a new rclone configuration and enter the following inputs:

$ module load rclone/1.61.1
$ rclone config
n/s/q> n
name> gdrive
Storage> drive

A client ID is required to create a provision that interfaces with Google Drive. Follow the instructions at https://rclone.org/drive/#making-your-own-client-id to create a client ID and then input the values into Rivanna.

client_id> myCoolID..
client_secret> verySecretClientSecret..
scope> 2 # read only
service_account_file> # just press enter
Edit advanced config?
y) Yes
n) No (default)
y/n> n
Use web browser to automatically authenticate rclone with remote?
y/n> n

Install Rclone on Client Computer

If the computer used to log on to Rivanna is running Windows, and the computer has Chocolatey, then download Rclone using an administrative Git Bash instance with

$ choco install rclone -y

Otherwise, for Linux and macOS, use

$ sudo -v ; curl https://rclone.org/install.sh | sudo bash

Then, after opening a new instance of the terminal, paste the command given into Git Bash and follow the instructions.

Rclone Authentication

In the web browser, click Advanced when google says that they have not verified this app; it is safe and expected. Then click Go to rclone, then Continue.

When Rclone gives the config token, ensure that all new line characters are removed. This can be done by pasting the code into an application such as Notepad and manually ensuring that all characters are on the same line. Otherwise, the code will be split across new prompts, breaking the setup.

This is bad:

sjgnkajdfnkj
fdnskjafnkad
asdfnasjkffd

This is good:

sjgnkajdfnkjfdnskjafnkadasdfnasjkffd

Paste the fixed token into Rivanna.

config_token> myCoolCodeThatHasNoNewLineCharacters
Configure this as a Shared Drive (Team Drive)?

y) Yes
n) No (default)
y/n> n
Keep this "gdrive" remote?
y) Yes this is OK (default)
y/e/d> y
q) Quit config
e/n/d/r/c/s/q> q

An example command to use Rclone is as follows. The flag --drive-shared-with-me restricts the scope to only shared files.

$ rclone copy --drive-shared-with-me gdrive:Colab\ Datasets/EarthquakeDec2020  /scratch/$USER/EarthquakeDec2020 -P

2 - Globus

File transfer with Globus

Getting the Cosmoflow data via globus commandline

Data Directory

We will showcase how to transfer data via globus commandline tools.

In our example we will use the data directory as

export DATA=/project/bii_dsc_community/$USER/cosmoflow/data

Globus Set Up on Rivanna

Rivanna allows to load the Globus file transfer command line tools via the modules command with the following commands. However, prior to executing globus login, please visit https://www.globus.org/ and log in using your UVA credentials.

module load globus_cli
globus login

The globus login method will output a unique link per user that you should paste into a web browser and sign in with using your UVA credentials. Afterwords, the website will present you with a unique sign-in key that you will need to paste back into the command line to verify your login.

After executing globus login your console should look like the following block.

NOTE: this is a unique link generated for the example login, each user will have a different link.

-bash-4.2$globus login
Please authenticate with Globus here:
------------------------------------
https://auth.globus.org/v2/oauth2/authorize?client_id=affbecb5-5f93-404e-b342-957af296dea0&redirect_uri=https%3A%2F%2Fauth.globus.org%2Fv2%2Fweb%2Fauth-code&scope=openid+profile+email+urn%3Aglobus%3Aauth%3Ascope%3Aauth.globus.org%3Aview_identity_set+urn%3Aglobus%3Aauth%3Ascope%3Atransfer.api.globus.org%3Aall&state=_default&response_type=code&access_type=offline&prompt=login
------------------------------------

Enter the resulting Authorization Code here:

Follow the url and input the authorization code to login successfully.

First, verify that you were able to sign in properly, and verify your identity and then search for the source endpoint of the data you want to transfer. In this example, our endpoint is named CosmoFlow benchmark data cosmoUniverse_2019_02_4parE. Please note that the file to be downloaded is 1.7 TB large. Make sure that the system on which you download it has enough space. The following commands will verify your sign in identity and then search for an endpoint within the single quotation marks.

globus get-identities -v 'youremail@gmailprobably.com'
globus endpoint search 'CosmoFlow benchmark data cosmoUniverse_2019_02_4parE'

Each globus endpoint has a unique endpoint ID. In this case our source endpoint ID is:

  • d0b1b73a-efd3-11e9-993f-0a8c187e8c12

Set up a variable ENDPOINT so you can use the endpoint more easily without retyping it. Also set a variable SRC_DIR to indicate the directory with the files to be transferred.

export SRC_ENDPOINT=d0b1b73a-efd3-11e9-993f-0a8c187e8c12
export SRC_DIR=/~/

You can look at the files in the globus endpoint using globus ls to verify that you are looking at the right endpoint.

globus ls $SRC_ENDPOINT

Destination Endpoint Set Up

Rivanna HPC has set a special endpoint for data transfers into the /project, /home, or /scratch directories. The name of this destination endpoint will be UVA Standard Security Storage.

Repeat the above steps with this endpoint and set up the variables including a path variable with the desired path to write to.

globus endpoint search 'UVA Standard Security Storage'
export DEST_ENDPOINT=e6b338df-213b-4d31-b02c-1bc2c628ca07
export DEST_DIR=/dtn/landings/users/u/uj/$USER/project/bii_dsc_community/uja2wd/cosmoflow/

NOTE: We cannot set the path to start at the root level in rivanna and instead need to follow a few steps to find the specific path of where to write to.

To begin, our path must start with /dtn/landings/users/ and is then appended on a unique sequence depending on the users computing ID. The rest of the path is dependent on characters of the users computing ID. As an example, if your computing ID is abc5xy, the next three arguments are /a/ab/abc5xy (first char, first two chars, computing id), at this point the user is essentially in the root level of rivanna and can access /home, /project, or /scratch how they normally would.


Note: If you want to use the web format of Globus to find the path isntead. Follow the below steps to find the desired value of your path var.

  • First sign into the web format of globus
  • Locate file manager on the left side of the screen
  • In the collections box at the top of the screen begin to search for UVA Standard Security Storage
  • Select our destination endpoint
  • Use the GUI tool to select exactly where you wish to write to
  • Copy the path from the box immedietally below collections
  • Write this value to the DEST_DIR variable created above (I have included my path to where I wish to write to)

Initiate the Transfer

Finally, execute the transfer

globus transfer $SRC_ENDPOINT:$SRC_DIR $DEST_ENDPOINT:$DEST_DIR

NOTE: In case your first transfer may have an issue because you need to give globus permission to initiate transfers via the CLI instead of via the web tool. I was given the unique command as follows by my terminal:

-bash-4.2$globus transfer $SRC_ENDPOINT:$SRC_DIR $DEST_ENDPOINT:$DEST_DIR
The collection you are trying to access data on requires you to grant
consent for the Globus CLI to access it.  message: Missing required
data_access consent

Please run

  globus session consent 'urn:globus:auth:scope:transfer.api.globus.org:all[*https://auth.globus.org/scopes/e6b338df-213b-4d31-b02c-1bc2c628ca07/data_access]'

to login with the required scopes

After initiating this command, a similar sign in a verification will be conducted compared to the globus login method where the cli will output a url to follow, the user will sign in, and return a verification code.

After fixing this, remember to re-initiate the transfer with the

globus transfer

command as previously descibed.

Managing Tasks

To monitor the status of active transfers, use

globus task list

or similarly you can use the web tool to verify transfers.

References:

  1. Globus Data Transfer, Rivanna HPC https://www.rc.virginia.edu/userinfo/globus/