File Transfer
This is the multi-page printable view of this section. Click here to print.
File Transfer
- 1: Rclone on Rivanna
- 2: Globus
1 - Rclone on Rivanna
Using the Rclone Module on Rivanna
Rclone is a useful tool to upload and download from cloud services such as Google Drive by using the commandline. However, a web browser is required for initial setup, which can be done from the computer that logs into Rivanna.
Setup Rclone on Rivanna
First, load the newer version of module; otherwise, Rivanna loads an incompatible, older version by default. Then, initialize a new rclone configuration and enter the following inputs:
$ module load rclone/1.61.1
$ rclone config
n/s/q> n
name> gdrive
Storage> drive
A client ID is required to create a provision that interfaces with Google Drive. Follow the instructions at https://rclone.org/drive/#making-your-own-client-id to create a client ID and then input the values into Rivanna.
client_id> myCoolID..
client_secret> verySecretClientSecret..
scope> 2 # read only
service_account_file> # just press enter
Edit advanced config?
y) Yes
n) No (default)
y/n> n
Use web browser to automatically authenticate rclone with remote?
y/n> n
Install Rclone on Client Computer
If the computer used to log on to Rivanna is running Windows, and the computer has Chocolatey, then download Rclone using an administrative Git Bash instance with
$ choco install rclone -y
Otherwise, for Linux and macOS, use
$ sudo -v ; curl https://rclone.org/install.sh | sudo bash
Then, after opening a new instance of the terminal, paste the command given into Git Bash and follow the instructions.
Rclone Authentication
In the web browser, click Advanced when google says that they have not verified this app; it is safe and expected. Then click Go to rclone, then Continue.
When Rclone gives the config token, ensure that all new line characters are removed. This can be done by pasting the code into an application such as Notepad and manually ensuring that all characters are on the same line. Otherwise, the code will be split across new prompts, breaking the setup.
This is bad:
sjgnkajdfnkj
fdnskjafnkad
asdfnasjkffd
This is good:
sjgnkajdfnkjfdnskjafnkadasdfnasjkffd
Paste the fixed token into Rivanna.
config_token> myCoolCodeThatHasNoNewLineCharacters
Configure this as a Shared Drive (Team Drive)?
y) Yes
n) No (default)
y/n> n
Keep this "gdrive" remote?
y) Yes this is OK (default)
y/e/d> y
q) Quit config
e/n/d/r/c/s/q> q
An example command to use Rclone is as follows.
The flag --drive-shared-with-me
restricts the scope to
only shared files.
$ rclone copy --drive-shared-with-me gdrive:Colab\ Datasets/EarthquakeDec2020 /scratch/$USER/EarthquakeDec2020 -P
2 - Globus
Getting the Cosmoflow data via globus commandline
Data Directory
We will showcase how to transfer data via globus commandline tools.
In our example we will use the data directory as
export DATA=/project/bii_dsc_community/$USER/cosmoflow/data
Globus Set Up on Rivanna
Rivanna allows to load the Globus file transfer command line tools via the modules command with the following commands. However, prior to executing globus login, please visit https://www.globus.org/ and log in using your UVA credentials.
module load globus_cli
globus login
The globus login
method will output a unique link per user that you
should paste into a web browser and sign in with using your UVA
credentials. Afterwords, the website will present you with a unique
sign-in key that you will need to paste back into the command line to
verify your login.
After executing globus login
your console should look like the
following block.
NOTE: this is a unique link generated for the example login, each user will have a different link.
-bash-4.2$globus login
Please authenticate with Globus here:
------------------------------------
https://auth.globus.org/v2/oauth2/authorize?client_id=affbecb5-5f93-404e-b342-957af296dea0&redirect_uri=https%3A%2F%2Fauth.globus.org%2Fv2%2Fweb%2Fauth-code&scope=openid+profile+email+urn%3Aglobus%3Aauth%3Ascope%3Aauth.globus.org%3Aview_identity_set+urn%3Aglobus%3Aauth%3Ascope%3Atransfer.api.globus.org%3Aall&state=_default&response_type=code&access_type=offline&prompt=login
------------------------------------
Enter the resulting Authorization Code here:
Follow the url and input the authorization code to login successfully.
Source Endpoint Search
First, verify that you were able to sign in properly, and verify your
identity and then search for the source endpoint of the data you want
to transfer. In this example, our endpoint is named CosmoFlow benchmark data cosmoUniverse_2019_02_4parE
. Please note that the file
to be downloaded is 1.7 TB large. Make sure that the system on which
you download it has enough space. The following commands
will verify your sign in identity and then search for an endpoint
within the single quotation marks.
globus get-identities -v 'youremail@gmailprobably.com'
globus endpoint search 'CosmoFlow benchmark data cosmoUniverse_2019_02_4parE'
Each globus endpoint has a unique endpoint ID. In this case our source endpoint ID is:
d0b1b73a-efd3-11e9-993f-0a8c187e8c12
Set up a variable ENDPOINT
so you can use the endpoint more easily without retyping it.
Also set a variable SRC_DIR
to indicate the directory with the files to be transferred.
export SRC_ENDPOINT=d0b1b73a-efd3-11e9-993f-0a8c187e8c12
export SRC_DIR=/~/
You can look at the files in the globus endpoint using globus ls
to
verify that you are looking at the right endpoint.
globus ls $SRC_ENDPOINT
Destination Endpoint Set Up
Rivanna HPC has set a special endpoint for data transfers into the
/project
, /home
, or /scratch
directories. The name of this
destination endpoint will be UVA Standard Security Storage
.
Repeat the above steps with this endpoint and set up the variables
including a path
variable with the desired path to write to.
globus endpoint search 'UVA Standard Security Storage'
export DEST_ENDPOINT=e6b338df-213b-4d31-b02c-1bc2c628ca07
export DEST_DIR=/dtn/landings/users/u/uj/$USER/project/bii_dsc_community/uja2wd/cosmoflow/
NOTE: We cannot set the path to start at the root level in rivanna and instead need to follow a few steps to find the specific path of where to write to.
To begin, our path must start with
/dtn/landings/users/
and is then appended on a unique sequence depending on the users computing ID. The rest of the path is dependent on characters of the users computing ID. As an example, if your computing ID is abc5xy, the next three arguments are/a/ab/abc5xy
(first char, first two chars, computing id), at this point the user is essentially in the root level of rivanna and can access/home
,/project
, or/scratch
how they normally would.
Note: If you want to use the web format of Globus to find the path isntead. Follow the below steps to find the desired value of your path var.
- First sign into the web format of globus
- Locate
file manager
on the left side of the screen - In the
collections
box at the top of the screen begin to search forUVA Standard Security Storage
- Select our destination endpoint
- Use the GUI tool to select exactly where you wish to write to
- Copy the path from the box immedietally below
collections
- Write this value to the DEST_DIR variable created above (I have included my path to where I wish to write to)
Initiate the Transfer
Finally, execute the transfer
globus transfer $SRC_ENDPOINT:$SRC_DIR $DEST_ENDPOINT:$DEST_DIR
NOTE: In case your first transfer may have an issue because you need to give globus permission to initiate transfers via the CLI instead of via the web tool. I was given the unique command as follows by my terminal:
-bash-4.2$globus transfer $SRC_ENDPOINT:$SRC_DIR $DEST_ENDPOINT:$DEST_DIR
The collection you are trying to access data on requires you to grant
consent for the Globus CLI to access it. message: Missing required
data_access consent
Please run
globus session consent 'urn:globus:auth:scope:transfer.api.globus.org:all[*https://auth.globus.org/scopes/e6b338df-213b-4d31-b02c-1bc2c628ca07/data_access]'
to login with the required scopes
After initiating this command, a similar sign in a verification will
be conducted compared to the globus login
method where the cli will
output a url to follow, the user will sign in, and return a
verification code.
After fixing this, remember to re-initiate the transfer with the
globus transfer
command as previously descibed.
Managing Tasks
To monitor the status of active transfers, use
globus task list
or similarly you can use the web tool to verify transfers.
References:
- Globus Data Transfer, Rivanna HPC https://www.rc.virginia.edu/userinfo/globus/