This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Documentation
This is a placeholder page that shows you how to use this template site.
This section is where the user documentation for your project lives - all the
information your users need to understand and successfully use your project.
For large documentation sets we recommend adding content under the headings in
this section, though if some or all of them don’t apply to your project feel
free to remove them or add your own. You can see an example of a smaller Docsy
documentation site in the Docsy User Guide, which
lives in the Docsy theme
repo if you’d like to
copy its docs section.
Other content such as marketing material, case studies, and community updates
should live in the About and Community pages.
Find out how to use the Docsy theme in the Docsy User
Guide. You can learn more about how to organize your
documentation (and how we organized this site) in Organizing Your
Content.
1 - Overview
Here’s where your user finds out if your project is for them.
This is a placeholder page that shows you how to use this template site.
The Overview is where your users find out about your project. Depending on the size of your docset, you can have a separate overview page (like this one) or put your overview contents in the Documentation landing page (like in the Docsy User Guide).
Try answering these questions for your user in this page:
What is it?
Introduce your project, including what it does or lets you do, why you would use it, and its primary goal (and how it achieves it). This should be similar to your README description, though you can go into a little more detail here if you want.
Why do I want it?
Help your user know if your project will help them. Useful information can include:
-
What is it good for?: What types of problems does your project solve? What are the benefits of using it?
-
What is it not good for?: For example, point out situations that might intuitively seem suited for your project, but aren’t for some reason. Also mention known limitations, scaling issues, or anything else that might let your users know if the project is not for them.
-
What is it not yet good for?: Highlight any useful features that are coming soon.
Where should I go next?
Give your users next steps from the Overview. For example:
2 - Education
Education actvities.
A list of education activities.
2.1 - Diversity, Equity and Inclusion (DEI) Plan
Diversity, Equity and Inclusion (DEI) Plan
This page contains the Diversity, Equity and Inclusion (DEI) Plan.
Over its first two hundred years, the University of Virginia has been
at the forefront of producing global citizen leaders, groundbreaking
research and scholarship, and world-class patient care. As the
University enters its third century, we continuously seek to define
and redefine ourselves, motivated by our shared passion for discovery,
innovation, community, service, and social justice. One of the
University’s top priorities is to create a living, learning, and work
environment that supports—and challenges—our academic
community. To achieve its mission, the University must be a place in
which all faculty, students, and staff are active participants in its
work, where those groups historically excluded from participation in
University life are present in numbers that prevent isolation of the
spirit and of the mind, and where each individual is conscious of how
they contribute to the creation and dissemination of knowledge that
enhances the well-being of our community, our state, our nation, and
the world.
Listed below are specific details about how the University of Virginia
(UVA) is engaging in a meaningful way with respect to DEI, as well as
the project plans to ensure that historically underrepresented groups
are a collaborative part of our project.
Inclusion of persons from groups underrepresented in STEM as PI, co-PI, and/or senior personnel
-
UVA seeks to hire persons from underrepresented groups and will seek
representative applicants for the senior personnel in this effort.
-
The UVA’s diversity action plan includes actions steps and metrics
to provide training and support to create processes that mitigate
bias and are more equitable and inclusive. In 2021, the University
created an Inclusive Excellence Planning Committee to assess
inclusivity, which suggested regular professional development
opportunities to all provost office units (including the
Biocomplexity Institute), and to offer leadership development
opportunities around diversity, equity and inclusion. UVA has a goal
of attracting and retaining greater numbers of individuals from
underrepresented populations. UVA’s Racial Equity Task Force (RETF)
aligned initiatives with the University's Inclusive Excellence
framework and the University's 2030 Strategic Plan to orient and
embed the initiatives in the institutional operations.
-
Mentoring and support programs to increase retention include the
Division for Diversity, Equity and Inclusion at the UVA, which
provides outreach, resources, and support for inclusive hiring
practices for departments and faculty.
Inclusion of persons from groups underrepresented in STEM as student researchers or post-doctoral researchers
-
We will seek representative applicants for the postdoctoral position
in this effort.
-
The Office of Diversity Programs at UVA cultivates a rich, inclusive
and supportive learning environment, where the identities,
perspectives, and views of all graduate students and postdoctoral
scholars are affirmed to establish dignity and pride to thrive at
the University of Virginia. The Office of Diversity Programs is a
subsidiary of the Office of Graduate and Postdoctoral Affairs (OGPA)
and serves as a resource hub for information to connect students,
staff, and faculty to opportunities by offering recruitment and
retention, developmental training, and community engagement and
outreach programs.
Enhancement/collaboration with existing diversity programs at your home organization and/or nearby organizations
The University of Virginia’s Racial Equity Task Force (RETF) aligned initiatives with the University's Inclusive Excellence framework and the University's 2030 Strategic Plan to orient and embed the initiatives in the institutional operations. The Division for Diversity, Equity and Inclusion office at the University of Virginia offers training on education and prevention related to discrimination and harassment, sexual misconduct, mandated reporting.
University efforts located in or benefiting underserved communities
- The Center for Community Partnerships at University of Virginia
serves as both a front door to the University for community members
and a collaborative space for UVA’s existing community efforts. The
center is anchored by UVA’s Equity Center, and is staffed by members
of the University’s offices for Diversity, Equity, and Inclusion;
Academic Outreach; and Community Partnerships at UVA Health.
Implementation of evidence-based, diversity-focused education programs
The project senior personnel and student researchers will participate in the offerings and training provided by the University of Virginia.
-
The Division for Diversity, Equity and Inclusion office at the
University of Virginia offers training on education and prevention
related to discrimination and harassment, sexual misconduct,
mandated reporting, UVA System’s Equity and Title IX policies and
procedures, and more.
-
UVA requires employees to complete training in regard to its
Preventing and Addressing Discrimination, Harassment and
Retaliation (PADHR) and Title IX and Sexual Misconduct (Title IX)
policies and processes. New employees receive the training and are
required to complete it every two years. A separate training is
provided for employees in supervisory positions. These trainings
are aimed at ensuring every employee understands their rights and
responsibility with respect to discrimination, harassment,
retaliation and sexual misconduct, understands how to identify,
address and report potential prohibited conduct, how to intervene
as bystanders, and the process followed in response to a
complaint. Students also receive similar training with regard to
sexual misconduct.
-
UVA requires all employees to complete a “DEI and Cultural
Competence” learning module and offers more than 80 additional
DEI-related learning modules.
Soliciting bids for supplies, services and equipment from minority owned, woman owned and veteran owned businesses
-
The University is committed to: (1) enhancing successful business
relationships between small, minority-owned, women-owned, and
otherwise disadvantaged business enterprises (S/M/W/DBEs) and
through the implementation of programs that requires a consistent
and high level of participation from all University schools and
departments; and (2) actively seeking goods and services from
small, minority-owned, and women-owned businesses in support of
the Commonwealth’s small, minority-owned, and women-owned (SWaM)
program.
-
The University of Virginia increases business opportunities for
small, minority-owned, and women-owned business enterprises
(S/M/WBE’s) through its Supplier Diversity Program.
3 - Examples
A number of example projects
We collect here a number of example projects.
Do you have any example applications or code for your users in your repo
or elsewhere? Link to your examples here.
MLCommons
Eartchquake Forecasting
Cloudmask
Uno
OSMI
CosmoFlow
4 - Concepts
Important Concepts related to Foundation Models
This page contains important concepts related to foundation models.
Foundation Model
Definition from Stanford:
“any model that is trained on broad data (generally using
self-supervision at scale) that can be adapted (e.g., fine-tuned) to
a wide range of downstream tasks”. https://arxiv.org/abs/2108.07258
Wikipedia link:
Answer from ChatGPT 3.5:
“A foundation model typically refers to a pre-trained machine
learning model that serves as the basis or starting point for more
specialized or task-specific models. These models are trained on
large and diverse datasets to learn general patterns and
representations of the data. Once trained, they can be fine-tuned or
adapted for specific tasks or domains with smaller, task-specific
datasets.
In the context of natural language processing (NLP), models like
OpenAI’s GPT (Generative Pre-trained Transformer) can be considered
foundation models. These models are trained on massive amounts of
text data to understand the structure and relationships within
language. Users can then fine-tune these models for specific tasks
such as text classification, language translation, summarization,
and more.
The advantage of using foundation models lies in their ability to
capture general knowledge and language understanding, which can be
beneficial for a wide range of applications. Fine-tuning allows
developers to leverage the pre-learned representations for specific
tasks without the need to train a model from scratch, saving
computational resources and time.”
Generative AI
TBD
Neural Network
TBD
5 - References
Refernces
A list of useful refernces.
5.1 - Foundation models
A list of useful references about foundation models.
This page contains pointers to useful refernces.
References
-
References about Science Foundation models (external to site)
-
AI for Science Foundation Models,
Geoffrey Fox, Biocomplexity Institute and Computer Science Department, University of Virginia, April 18, 2024.
-
Some comments on Multi Modal Foundation
Models, Geoffrey Fox, Argonne National Laboratory, April 5, 2024.
-
Calorimeter Research and Report Teasers:
-
At this stage, we are evaluating different approaches in a uniform way and will present results and code later in a GitHub repository
-
Compendium: Calorimeter Surrogate Research, Geoffrey Fox
University of Virginia July 21, 2023.
-
Evaulation of Calorimeter Surrogate Models,
V1,
Farzana Yasmin Ahmad, Vanamala Venkataswamy, Geoffrey Fox, Apr. 2024.
-
Correlation
Teaser,
Farzana Yasmin Ahmad, December 2023.
-
Other Surrogates
6.1 - Getting Started with the bii_dsc_community
Getting Started
Contributing your tutorial and experiences
Please contribute to infomall.org as your experiences
will help. Please remember that technology evolves fast,
and we like to stay up to date by improving information.
Each page as an edit here feature, that you ca use to propose changes.
The changes will be reviewed by Gregor and are not automatically posted
online.
Once a change is accepted, the Web site will be published and
updates are visible. Send an e-mail to Gergor for urgent updates.
Activating your account
Do the following while sending an e-mail to Gregor:
Subject: Activate my account ,
Body: (fill in lastname and firstname. Do not use all caps)
Firstname:
Lastname:
e-mail:
github.com:
* [ ] Please add me to the `discord`
* [ ] Please add me to the unix groups:
* [ ] `biocomplexity`
* [ ] `nssac_students`
* [ ] `bii_dsc_community`
Preparing your computer for research
Seee the documentation at
Using Docker on your computer
To isolate your computer form changes and to develop portable code we recommend
using docker images. This is especially the case when using GPUs on your computer
as this is these days the default distribution mechanism for NVIDIA software for
research.
Using Singularity on your computer
As Rivanna is using singularity, it is also beneficial to use singularity on
your loacl computer as this can be used to create images for rivanna. However,
note that due to the transfer speeds to rivanna the experience may be limited
For that reason. we recommend you visit our
[https://infomall.org/uva/docs/tutorial/singularity/](Tutorial on Singularity on Rivanna)
Getting an account on Rivanna
Please read
Do not make your account insecure. On Rivanna’s documentation you will find
a statement that we do NOT RECOMMEND TO FOLLOW as it is not
best security practice and can be handled in almost all cases differently.
The statement on the official UVA Rivanna Web Site states:
Sometimes you will need to enable passwordless ssh.
We allow passwordless ssh to frontend nodes from UVA
IP addresses. Key authentication works by matching two
halves of an encrypted keypair. The “public” key is
placed within your home directory on the remote server
and the “private” key is kept safely on your own workstation.
You should treat private keys as securely as you would any password."
Instead you need to use
eval `ssh-agent`
ssh-add
Using Python
When using anaconda, be careful as it takes over your python instalation and may not
provide a level of inconsistant libraries when you do more complex stuff.
Evaluate if you need anaconda or not. IN many cases it is best to just use vanilla
python and use pip.
You can also switch between anaconda and regular python. for that you DO NOT USE
conda init
Fix or outcomment anaconde from your .bashrc or .zshrc files.
If you are a conda expert, give us some tips and tutorials on this topic.
Always check if you use the correct version of python with
which python
python --version
Please keep in mind: When attending university classes some teachers may give
you convenient but inssuficcient instructions on how to use python. They are
typically designed to make the use
of python easy for a specific class and not necessarily easy for research.
Please keep in mind that you may have python versions that do not work properly
on your computer if you have attended classes some years back. You will likely
need to update your python. Often its good to unisntall your previous verison
and reinstall.
If you need multiple python versions such as teacher A wants version X and
wants version Y, this is possible. Just use python virtual environments,
containers, or virtual machines. What you chose is your choice.
Using Rivanna
Read
Using Singularity on Rivanna
Read
Using Docker on Rivanna via Singularity
Which they do not document but we do on infomall.org
I will go into this in tutorial. If you already have created a passwordless key,
please redo it with a password ….
Onramping Tutorial with Gregor
If you need help on assessing your computer for research you can optionally
send the folloing info to me.
email to gregor@virginia.edu
os:
size ram:
size hdd/ssd:
free space on hdd/ssd
date purchased:
We observed that when using chrome and pycharm and zoom you may need lots of
memory. Shut down all over applications. We recommend 16GB ram these days.
However, many students have 8GB which may lead to slowing things down in
some cases as you may hit the memory
For example, when Gregor runs chrome and
pycharm he uses up 8.1GB RAM, so if you were to have an 8GB machine it would
slow down. However, your usage of the RAM may vary dependent on what plugins and
which version of software as well as the OS you use.
- Please make sure to have some space on your computers HDD, send me how much
free space you have
- if windows, please install gitbash before meeting
- if windows I recommend chocolatey, but be careful what you install
- make sure you know how to use UVA vpn
- set up ssh key with ssh-keygen and use password WRITE PASSWORD DOWN
- set up ~/.ssh/config as
- upload sshkey to github
Make sure you employ backup strategy on external HDD or google or something
like that. I have seen to many computer HDD break and this is standard best
practice. We can discuss in meeting.
If anythinig unclear or you have questions let me know we will also go through
the sshkey things
if you do not understand.
Editor
- use pycharm (best) on your local computer alternatively vscode
- learn commandline edtor for rivanna emacs best. alternatives nano, pico, vim
Cloudmesh is useful
You will see that cloudmesh has many features that you will find useful.
We focus here on a number of libraries useful for rivanna.
Please create venv, this depends on your os on how to do.
Name it ~/ENV3 (if you do conda do it in whatever fashion conda does, as I do
not use conda you can help us writing documentation about it)
activate it and do
python -m venv ~/ENV3
source ~/ENV3
pip install pip -U
pip install cloudmesh-common
pip install cloudmesh-sbatch
pip install cloudmesh-rivanna
cms help
On rivanna
python -m venv /project/bii_dsc_community/$USER/ENV3
source /project/bii_dsc_community/$USER/ENV3
pip install pip -U
pip install cloudmesh-common
pip install cloudmesh-sbatch
pip install cloudmesh-rivanna
pip install cloudmesh-gpu
cms help
Make sure you are in Gregors discord
In future learn how to do cloudmesh StopWatch so you conveniently augment
your code with timers
Gregor von Laszewski
laszewski@gmail.com
6.2 - Checklist
Checklist
6.3 - Cloudbank
Cloudbank
6.3.1 - Using Cloudbank
Using Cloudbank
Using Cloudbank
- WARNING: ANY USE OF CLOUDBANK MUST INCLUDE THAT YOU HAVE A DEEP
UNDERSTANDING HOW CHARGES ARE DONE. WASTEFUL USE OF RESOURCES WHICH
COST REAL $ WILL RESULT IN AN IMMEDIATE TERMINATION OF YOUR
ACCOUNT. Any cost above $100 over the semester must be preapproved
and accompanied by a detailed cost breakdown.
- Students need to create an account on https://cloudbank.org with
their university e-mail. Please follow the instructions.
- If multiple students need an account all students must have completed step 1 before we proceed.
- All names and emails as entered in Cloudbank must be forwarded to Gregor and Bud.
Activating your Cloudbank account with our project
- Gregor will activate your Cloudbank account in the Cloudbank
project. You do not have to do anything but wait. Gregor will
notify Bud.
- Bud will add you to AWS via Cloudbank. He will try to limit your
spending amount to $100 initially. You do not have to do
anything but wait.
- Bud will notify you and Gregor once this is done.
- This process is done on best effort and as we have never done it it
may take some time to complete. We anticipate a week.
What to do while you wait
- Find out which services you need on AWS
- Find out what they cost as we have to pay for it in $ this needs to
be estimated precisely
- If it exceeds $100 you need to figure out why and estimate the total
cost over the entire semester as we do not have unlimited funds.
- Make sure you understand running costs and startup costs for a
service. Note: starting a service every second will cost a lot!
- Learn that you must not place keys and certs in your
gitrepos. Figure out where to place them
- If you need ssh and use Windows use gitbash.
- Improve this tutorial and make pull pull request with your improvement.
Pitfalls
- Ignore reading the documentation
- Not understanding ssh kegen, ssh agent, ssh-add
- On Windows: Juts using putty and not understanding gitbash
- Placeing any key in your project code other then reading them from a secure filesystem.
- Not understanding what 1-4 is about.
Alternatives
- Try to use alternatives whenever you can
- Spark is available on Rivanna
- Hadoop development can be done on Windows, Mac, Linux (Search in
google how to set it up). Development is often easier on your
laptop. Once you complete it use cloudbank.
- MPI is available on Rivanna
- TF and pytorch is available on Rivanna
- Jupyter notebooks can be run on Rivanna or Google Collab
Honor Policy
- The project you do must be part of an approved project.
- You must not use cloudbank and its clouds for personal projects.
- You must not use cloudbank and its clouds for bitcoin mining.
- In case publications and reports be written based on usage form our
cloudbank account, you must include it in an Acknowledgement
statement and notify us with a copy of the report. Please note that
the report may be made publically available.
6.4 - File Transfer
File Transfer
6.4.1 - Rclone on Rivanna
Using Rclone to upload and download from cloud services
Using the Rclone Module on Rivanna
Rclone is a useful tool to upload and download from cloud
services such as Google Drive by using the commandline.
However, a web browser is required for initial setup,
which can be done from the computer that logs into Rivanna.
Setup Rclone on Rivanna
First, load the newer version of module; otherwise, Rivanna
loads an incompatible, older version by default. Then, initialize
a new rclone configuration and enter the following inputs:
$ module load rclone/1.61.1
$ rclone config
n/s/q> n
name> gdrive
Storage> drive
A client ID is required to create a provision that interfaces
with Google Drive. Follow the instructions at
https://rclone.org/drive/#making-your-own-client-id to create
a client ID and then input the values into Rivanna.
client_id> myCoolID..
client_secret> verySecretClientSecret..
scope> 2 # read only
service_account_file> # just press enter
Edit advanced config?
y) Yes
n) No (default)
y/n> n
Use web browser to automatically authenticate rclone with remote?
y/n> n
Install Rclone on Client Computer
If the computer used to log on to Rivanna is running Windows,
and the computer has Chocolatey, then download Rclone using an
administrative Git Bash instance with
$ choco install rclone -y
Otherwise, for Linux and macOS, use
$ sudo -v ; curl https://rclone.org/install.sh | sudo bash
Then, after opening a new instance of the terminal, paste the command given
into Git Bash and follow the instructions.
Rclone Authentication
In the web browser, click Advanced when google says that they
have not verified this app; it is safe and expected. Then click
Go to rclone, then Continue.
When Rclone gives the config token, ensure that all new line
characters are removed. This can be done by pasting the code
into an application such as Notepad and manually ensuring that
all characters are on the same line. Otherwise, the code will
be split across new prompts, breaking the setup.
This is bad:
sjgnkajdfnkj
fdnskjafnkad
asdfnasjkffd
This is good:
sjgnkajdfnkjfdnskjafnkadasdfnasjkffd
Paste the fixed token into Rivanna.
config_token> myCoolCodeThatHasNoNewLineCharacters
Configure this as a Shared Drive (Team Drive)?
y) Yes
n) No (default)
y/n> n
Keep this "gdrive" remote?
y) Yes this is OK (default)
y/e/d> y
q) Quit config
e/n/d/r/c/s/q> q
An example command to use Rclone is as follows.
The flag --drive-shared-with-me
restricts the scope to
only shared files.
$ rclone copy --drive-shared-with-me gdrive:Colab\ Datasets/EarthquakeDec2020 /scratch/$USER/EarthquakeDec2020 -P
6.4.2 - Globus
File transfer with Globus
Getting the Cosmoflow data via globus commandline
Data Directory
We will showcase how to transfer data via globus
commandline tools.
In our example we will use the data directory as
export DATA=/project/bii_dsc_community/$USER/cosmoflow/data
Globus Set Up on Rivanna
Rivanna allows to load the Globus file transfer command line tools via
the modules command with the following commands. However, prior to
executing globus login, please visit https://www.globus.org/ and log
in using your UVA credentials.
module load globus_cli
globus login
The globus login
method will output a unique link per user that you
should paste into a web browser and sign in with using your UVA
credentials. Afterwords, the website will present you with a unique
sign-in key that you will need to paste back into the command line to
verify your login.
After executing globus login
your console should look like the
following block.
NOTE: this is a unique link generated for the example login,
each user will have a different link.
-bash-4.2$globus login
Please authenticate with Globus here:
------------------------------------
https://auth.globus.org/v2/oauth2/authorize?client_id=affbecb5-5f93-404e-b342-957af296dea0&redirect_uri=https%3A%2F%2Fauth.globus.org%2Fv2%2Fweb%2Fauth-code&scope=openid+profile+email+urn%3Aglobus%3Aauth%3Ascope%3Aauth.globus.org%3Aview_identity_set+urn%3Aglobus%3Aauth%3Ascope%3Atransfer.api.globus.org%3Aall&state=_default&response_type=code&access_type=offline&prompt=login
------------------------------------
Enter the resulting Authorization Code here:
Follow the url and input the authorization code to login successfully.
Source Endpoint Search
First, verify that you were able to sign in properly, and verify your
identity and then search for the source endpoint of the data you want
to transfer. In this example, our endpoint is named CosmoFlow benchmark data cosmoUniverse_2019_02_4parE
. Please note that the file
to be downloaded is 1.7 TB large. Make sure that the system on which
you download it has enough space. The following commands
will verify your sign in identity and then search for an endpoint
within the single quotation marks.
globus get-identities -v 'youremail@gmailprobably.com'
globus endpoint search 'CosmoFlow benchmark data cosmoUniverse_2019_02_4parE'
Each globus endpoint has a unique endpoint ID. In this case our source endpoint ID is:
d0b1b73a-efd3-11e9-993f-0a8c187e8c12
Set up a variable ENDPOINT
so you can use the endpoint more easily without retyping it.
Also set a variable SRC_DIR
to indicate the directory with the files to be transferred.
export SRC_ENDPOINT=d0b1b73a-efd3-11e9-993f-0a8c187e8c12
export SRC_DIR=/~/
You can look at the files in the globus endpoint using globus ls
to
verify that you are looking at the right endpoint.
Destination Endpoint Set Up
Rivanna HPC has set a special endpoint for data transfers into the
/project
, /home
, or /scratch
directories. The name of this
destination endpoint will be UVA Standard Security Storage
.
Repeat the above steps with this endpoint and set up the variables
including a path
variable with the desired path to write to.
globus endpoint search 'UVA Standard Security Storage'
export DEST_ENDPOINT=e6b338df-213b-4d31-b02c-1bc2c628ca07
export DEST_DIR=/dtn/landings/users/u/uj/$USER/project/bii_dsc_community/uja2wd/cosmoflow/
NOTE: We cannot set the path to start at the root level in rivanna
and instead need to follow a few steps to find the specific path of
where to write to.
To begin, our path must start with /dtn/landings/users/
and is then appended on a unique sequence depending on the users computing ID.
The rest of the path is dependent on characters of the users computing ID.
As an example, if your computing ID is abc5xy, the next three arguments are
/a/ab/abc5xy
(first char, first two chars, computing id), at this point the
user is essentially in the root level of rivanna and can access
/home
, /project
, or /scratch
how they normally would.
Note: If you want to use the web format of Globus to find the path isntead.
Follow the below steps to find the desired value of your path var.
- First sign into the web format of globus
- Locate
file manager
on the left side of the screen
- In the
collections
box at the top of the screen begin to search
for UVA Standard Security Storage
- Select our destination endpoint
- Use the GUI tool to select exactly where you wish to write to
- Copy the path from the box immedietally below
collections
- Write this value to the DEST_DIR variable created above (I have
included my path to where I wish to write to)
Initiate the Transfer
Finally, execute the transfer
globus transfer $SRC_ENDPOINT:$SRC_DIR $DEST_ENDPOINT:$DEST_DIR
NOTE: In case your first transfer may have an issue because you need to give
globus permission to initiate transfers via the CLI instead of via the
web tool. I was given the unique command as follows by my terminal:
-bash-4.2$globus transfer $SRC_ENDPOINT:$SRC_DIR $DEST_ENDPOINT:$DEST_DIR
The collection you are trying to access data on requires you to grant
consent for the Globus CLI to access it. message: Missing required
data_access consent
Please run
globus session consent 'urn:globus:auth:scope:transfer.api.globus.org:all[*https://auth.globus.org/scopes/e6b338df-213b-4d31-b02c-1bc2c628ca07/data_access]'
to login with the required scopes
After initiating this command, a similar sign in a verification will
be conducted compared to the globus login
method where the cli will
output a url to follow, the user will sign in, and return a
verification code.
After fixing this, remember to re-initiate the transfer with the
globus transfer
command as previously descibed.
Managing Tasks
To monitor the status of active transfers, use
or similarly you can use the web tool to verify transfers.
References:
- Globus Data Transfer, Rivanna HPC https://www.rc.virginia.edu/userinfo/globus/
6.5.1 - Facilities Statement
Facilities statement
Computing Environments at UVA
Research Computing (UVA-RC) serves as the principal center for computational resources and associated expertise at the University of Virginia (UVA). Each year UVA-RC provides services to over 433 active PIs that sponsor more than 2463 unique users from 14 different schools/organizations at the University, maintaining a breadth of systems to support the computational and data intensive research of UVA’s researchers.
High Performance Computing UVA-RC’s High Performance Computing (HPC) systems are designed with high speed networks, high performance storage, GPUs, and large amounts of memory in order to support modern compute and memory intensive programs. UVA-RC’s HPC systems are comprised of over 614 compute nodes, with a total of 20476 X86 64-bit compute cores and 240 TB total RAM. Scheduled using Slurm, these resource can support over 1.5 PFLOP of peak CPU performance. HPC nodes are equipped with between 375 GB and 1 TB of RAM to support applications that require small and large amounts of memory, and 49 nodes include various configurations of the NVIDIA general purpose GPU accelerators (RTX2080, RTX3090, A6000, V100 and A100), from 4- to 10-way.
UVA-RC also acquires and maintains capability systems focused on providing novel environments. This includes an 18-node DGX BasePOD system with 8x A100 GPU. The BasePOD provides a shared memory space across all GPUs in the system allowing the system to work collectively on models with memory needs larger than what can be held in a single node.
Interactive Computing and Scientific Visualization
UVA-RC supports specialized interfaces (i.e., Open OnDemand, FastX) and hardware for remote visualization and interactive computing. Interactive HPC systems allow real-time user inputs in order to facilitate code development, real-time data exploration, and visualizations. Interactive HPC systems are used when data are too large to download to a desktop or laptop, software is difficult or impossible to install on a personal machine, or specialized hardware resources (e.g., GPUs) are needed to visualize large data sets.
Expertise
UVA-RC aggregates expertise to provide consulting and collaboration services to researchers addressing all levels of the Research Computing technology stack.
UVA-RCs user support staff provide basic support and general onboarding through helpdesk and regularly scheduled tutorials. Senior support staff have advanced degrees in relevant research domains such as biology, imaging, physics, computer science and material science, enabling in-depth collaboration on complex projects. For projects that require significant application development work, UVA-RC maintains a Solutions & DevOps team capable of rapid iteration while leveraging non-traditional HPC technologies. Lastly, UVA-RC’s Infrastructure Services team enables projects that may require custom hardware or configurations outside of the standard images. Beyond their availability for direct project support, together these teams provide the R&D and operations expertise needed to ensure that UVA-RC is providing a modern research computing ecosystem for UVA researchers.
Cloud Computing
Ivy is a secure computing environment for researchers consisting of virtual machines (Linux and Windows) backed by a total of 45 nodes and 2048 cores. Researchers can use Ivy to process and store sensitive data with the confidence that the environment is secure and meets HIPAA, FERPA, or CUI requirements.
For standard security projects, UVA-RC supports microservices in a clustered orchestration environment that leverages Kubernetes to automate the deployment and management of many containers in an easy and scalable manner. This cluster has 876 cores and 4.9TB of memory allocated to running containerized services, including one node with 4 x A100 GPUs. It also has over 300TB of cluster storage and can attach to UVA-RC’s broader storage offerings.
ACCORDA
The ACCORD project (NSF Award: #1919667) offers flexible web-based interfaces for sensitive and highly sensitive data in a system focused on supporting cross-institutional access and collaboration. The ACCORD platform consists of 8 nodes in a Kubernetes cluster, for a total of 320 cores and ~3.2TB of memory. Cluster storage is approximately 1PB of IBM Spectrum storage (GPFS).
Researchers from non-UVA institutions can be brought into the ACCORD system through a memorandum of understanding between the researcher’s institution and UVA, security training for the researcher, and a posture-checking client installed on the researcher’s laptop/desktop.
Data Storage
All researchers on UVA-RC’s systems have access to a high-performance parallel storage platform. This system provides 8PB (PetaBytes) of storage with sustained read and write speeds of up to 10 GB/sec. The integrity of the data is protected by daily snapshots. UVA-RC also supports a second-tier storage solution, 3 PB, designed to address the growing need for resources that support data-intensive research by offering a lower cost, scalable solution. The system is tightly integrated with other UVA-RC storage and computing resources in order to support a wide variety of research data life cycles and data analysis workflows.
Data Centers, Network Connectivity, and Office Facilities
UVA-RC enables interdisciplinary research through its robust data center facilities with over 1.5 MW of IT capacity to support leading edge computational and data storage systems. UVA-RC’s equipment occupies a data center near campus, connected to the 10 Gbps campus network. Dedicated 10 and 100 Gbps links to our regional optical network and Internet2 give our researchers the network capacity and capability needed to collaborate with researchers from around the world. A Globus data transfer node enables data access and transfers to transcend institutional credentials. Located in the Ivy Translational Research Building of the Fontaine Research Park, UVA-RC’s offices (2,877 sq. ft) are a short shuttle ride away from the central UVA grounds.
6.5.2 - Rivanna
Rivanna
graph TB
subgraph Getting-Started
b1(UVA Account)
b2(email to Gregor about groups)
b3(groups available)
b4(access to singularity build)
b1 --> b2 --> b3 --> b4
end
subgraph Windows
a1(gitbash)
a2(wsl)
a3[an <b>important</b> <a href='http://google.com'>link</a>]
end
Rivanna is the University of Virginia’s High-Performance Computing
(HPC) system. As a centralized resource and has many software packages
available. Currently, the Rivanna supercomputer has 603 nodes with
over 20476 cores and 8PB of various storage. Rivanna has multiple
nodes equipped with GPUs including RTX2080, RTX3090, K80, P100, V100,
A100-40GB, A100-80GB.
Communication
We have a team discord at: uva-bii-community
https://discord.gg/uFKJ5TUv
please subscribe if you work on rivanna and are part of the
bii_dsc_community.
Rivanna at UVA
The official Web page for Rivanna is located at
In case you need support you can ask the staff using a ticket system
at
It is important that before you use Rivanna to attend a seminar that
upon request is given every Wednesday. To sign up, use the link:
Please note that in this introduction we will provide you with
additional inforamation that may make the use of Rivanna easier. We
encourage you to add to this information and share your tips,
Getting Permissions to use Rivanna
To use Rivanna you need to have special authorization. In case you
work with a faculty member you will need to be added to a special
group (or multiple) to be able to access it. The faculty member will
know which group it is. This is managed via the
group management portal
by the faculty member. Please do not use the previous link and instead
communicate with your faculty member first.
- Note: For BII work conducted with Geoffrey Fox or Gregor von
Laszewski, please contact Gregor at laszewski@gmail.com
TODO: IS THIS THE CASE?
Once you are added to the group, you will receive an invitation email
to set up password for the research computing support portal.
If you do not recive such an email, please visit the support portal at
TBD
This
password is also the password that you will use to log into the
system.
END TODO IS THIS THE CASE
After your account is set up, you can try to log in through the
Web-based access.
Please test it to make sure you have the proper access already.
However, we will typically notuse the online portal but instead use
the more advanced batch system as it provides significant advantages
for you when managing multiple jobs to Fivanna.
Accessing an HPC Computer via command line
If you need to use X11 on Rivanna you can finde documentation at
the rivanna documentation.
In case you need
to run
jupyter
notebooks directly on Rivanna, please consult with the Rivanna
documentation.
VPN (required)
You can access rivanna via ssh only via VPN. UVA requires you to use
the VPN to access any computer on campus. VPN is offered by IT
services but oficially only supported for
Mac and Windows.
However, if you have a Linux machine you can follow the
VPN install instructions for Linux.
If you have issues installing it, attend an online support session
with the Rivanna staff.
Access via the Web Browser
Rivanna can be accessed right from the Web browser. Although this may
be helpful for those with systems where a proper terminal can not be
accessed it can not leverage the features of your own desktop or
laptop while using for example advanced editors or keeping the file
system of your machine in sync with the HPC file system.
Therefore, practical experience shows that you benefit while using a
terminal and your own computer for software development.
Additiional documentation by the rivanna system staff is provided at
Access Rivanna from macOS and Linux
To access Rivanna from macOS, use the terminal and use ssh to connect
to it. We will provide an in depth configuration tutorial on this
later on. We will use the same programs as on Linux and Windows so we
have to only provide one documentation and it is uniform across
platforms.
Please remember to use
$ eval `ssh-agent`
$ ssh-add
To activate ssh in your terminal
Access Rivanna from Windows
While exploring the various choices for accessing Rivanna from Windows
you can use putty and
MobaXterm.
However, most recently a possible better choice is available while
using gitbash. Git bash is trivial to
install. However, you need to read the configuration options
carefully. READ CAREFULLY Let us know your options so we can add
them here.
To simplify the setup of a Windows computer for research we have
prepared a separate
It addresses the installation of gitbash, Python, PyCharm (much better
than VSCode), and other useful tools such as chocolate.
With git bash, you get a bash terminal that works the same as a Linux
bash terminal and which is similar to the zsh terminal for a Mac.
Set up the connection (mac/Linux)
The first thing to do when trying to connect to Rivanna is to create
an ssh key if you have not yet done so.
To do this use the command
Please make sure you use a passphrase when generating the key. Make
sure to not just skip the passphrase by typing in ENTER but instead
use a real not easy to guess passphrase as this is best practice and
not in violation violation of security policies. You always can use
use ssh-agent
and ssh-add
so you do not have to repeatedly enter
your passphrase.
The ssh-keygen
program will generate a public-private keypair in the
directory ~/.ssh/id_rsa.pub
(public key) and ~/.ssh/id_rsa
. Please
never share the private key with anyone.
Next, we need to add the public key to Rivanna’s
rivanna:~/.ssh/authorized_keys file
. The easiest way to do this is
to use the program ssh-copy-id
.
ssh-copy-id username@rivanna.hpc.virginia.edu
Please use your password when using ssh-copy-id
. Your username is
your UVA computing id. Now you should be ready to connect with
ssh username@rivanna.hpc.virginia.edu
Commandline editor
Sometimes it is necessary to edit files on Rivanna. For this, we
recommend that you learn a command line editor. There are lots of
debates on which one is better. When I was young I used vi, but found
it too cumbersome. So I spend one-day learning emacs which is just
great and all you need to learn. You can install it also on Linux,
Mac, and Windows. This way you have one editor with very advanced
features that is easy to learn.
If you do not have one day to familiarize yourself with editors such
as emacs, vim, or vi, you can use editors such as nano and pico.
The best commandline editor is emacs. It is extremely easy to learn when using
just the basics. The advantage is that the same commands also work in
the terminal.
Keys |
Action |
CTRL-x c |
Save in emacs |
CTRL-x q |
Leave |
CTRL-x g |
If something goes wrong |
CTRL a |
Go to beginning line |
CTRL e |
Go to end of line |
CTRL k |
Delete till end of line from curser |
cursor |
Just works ;-) |
PyCharm
The best editor to do python development is pyCharm. Install it on
your desktop. The education version is free.
VSCode
An inferior editor for python development is VSCode. It can be
configured to also use a
Remote-SSH plugin.
Moving data from your desktop to Rivanna
To copy a directory use scp
If only a few lines have changed use rsync
To mount Rivannas file system onto your computer use fuse-ssh
.
This will allow you to for example use pyCharm to directly edit files on Rivanna.
Developers however often also use GitHub to push the code to git and
then on Rivanna use pull to get the code from git. This has the
advantage that you can use pyCharm on your local system while
synchronizing the code via git onto Rivanna.
However often scp and rsync may just be sufficient.
Example Config file
Replace abc2de with your computing id
place this on your computer in ~/.ssh/config
ServerAliveInterval 60
Host rivanna
User abc2de
HostName rivanna.hpc.virginia.edu
IdentityFile ~/.ssh/id_rsa
Host b1
User abc2de
HostName biihead1.bii.virginia.edu
IdentityFile ~/.ssh/id_rsa
Host b2
User abc2de
HostName biihead2.bii.virginia.edu
IdentityFile ~/.ssh/id_rsa
Adding it allows you to just ssh to the machines with
ssh rivanna
ssh b1
ssh b2
Rivanna’s filesystem
The file systems on Rivanna have some restrictions that are set by
system wide policies that you need to be inspecting:
You can alls see your quote with
hdquota
we distinguish
- home directory:
/home/<uvaid>
or ~
/scratch/<uvaid>
/project/bii_dsc_community/projectname/<uvaid>
Y
In your home directory, you will find system directories and files such as
~/.ssh
, ~/.bashrc
and ~/.zshrc
The difference in the file systems is explained at
Dealing with limited space under HOME
As we conduct research you may find that the file space in your home
directory is insufficient. This is especially the case when using
conda. Therefore, it is recommended that you create softlinks from
your home directory to a location where you have more space. This is
typically somewhere under /project
and /scratch
.
We describe next how to relocate some of the directories to /project
and /scratch
In ~/.bashrc
, add the following lines, for creating a project
directory.
$ vi ~/.bashrc
$ PS1="\w \$"
$ alias project='cd /project/bii_dsc_community/$USER'
$ export PROJECT="/project/bii_dsc_community/$USER"
$ alias scratch='cd /scratch/$USER'
$ export PROJECT="/scratch/$USER"
At the end of the .bashrc file use
or alternative to
So you always cd directly into your project directory instead of home.
The home directory only has 50GB. Installing everything on the home
directory will exceed the allocation and have problems with any
execution. So it’s better to move conda all other package
installation directories to $PROJECT.
First, explore what is in your home directory and how much space it
consumes with the following commands.
cd $HOME
$ ls -lisa
$ du -h .
Select from this list of directories that you want to move (those that
you not already have moved).
Let us assume you want to move the directories .local
,
.vscode-server
, and .conda
. Important is that you want to make
sure that .conda and .local are moved as they may include lots of
files and you may run out of memory quickly. Hence you do next the
following.
$ cd $PROJECT
$ mv ~/.local .
$ mv ~/.vscode-server .
$ mv ~/.conda .
Then create symbolic links to the home directory installed folder.
$ cd $PROJECT
$ ln -s $PROJECT/.local ~/.local
$ ln -s $PROJECT/.vscode-server ~/.vscode-server
$ ln -s $PROJECT/.conda ~/.conda
Check all symbolic links:
$ ls -lisa
20407358289 4 lrwxrwxrwx 1 $USER users 40 May 5 10:58 .local -> /project/bii_dsc_community/djy8hg/.local
20407358290 4 lrwxrwxrwx 1 $USER users 48 May 5 10:58 .vscode-server -> /project/bii_dsc_community/djy8hg/.vscode-server
Singularity Cache
In case you use singularity you can build images you need to set the
singularity cache. This is due to the fact that the cache usually is
created in your home directory and is often far too small for even our
small projects. Thus you need to set it as follows
mkdir -p /scratch/$USER/.singularity/cache
export SINGULARITY_CACHEDIR=/scratch/$USER/.singularity/cache
`
Python
In case you use python venv, do not place them in home but under
project or scratch.
module load python3.8
python -m venv $SCRATCH/ENV3
source $SCRATCH/ENV3/bin/activate
If you succeed, you can also place the source line in your .bashrc
file.
In case you use conda and python, we also recommend that you create a
venv from the conda python, so you have a copy of that in ENV3 and if
something goes wrong it is easy to recreate from your default
python. Those that use that path ought to improve how to do this here.
On your computer in your ENV3 add the following to enable the commands
pip install pip -U
pip install cloudmesh-common
pip install cloudmesh-rivanna
pip install cloudmesh-sbatch
pip install cloudmesh-vpn
On Rivanna in ENV3 also add the gpu monitor
pip install pip -U
pip install cloudmesh-common
pip install cloudmesh-gpu
pip install cloudmesh-rivanna
pip install cloudmesh-sbatch
Note: Please send me a mail to laszewski@gmail.com
if any requirements are missing as I may not yet have included
all of them in the pip package.
Once you have activated it the cloudmesh rivanna command shows you combinations
of SBATCH flags that you can use.
To see them type in
To login into a specific node you can say (lest assume you like to log into a k80
Please be reminded that interactive login is only allowed for debugging all
jobs must be submitted through sbatch.
To get the directives template to use that GPU, use
cloudmesh sbatch
Cloudmesh-sbatch is a super cool extension to sbatch allowing you to outomatically
run parameters studies while creating permuattions on experiment parameters.
At this time we try to create some sampel applications, but you can also ararnge
a 30 minute meeting with Gregor so we try setting it up for your application with his help
See also:
cloudmesh vpn command
cloudmesh has a simple commandline vpn command that you can use to switch on
and off vpn for UVA (and other vpn’s, we can add that feature ;-))
cms vpn connect
... do your work in vpn such as working on rivanna
cms vpn disconnect
... work on your regular network
Load modules
Modules are preconfigured packages that allow you to use a specific
software to be loaded into your environment without needing you to
install it from source.
To find out more about a particular package such as cmake you can use
the command
module spider cmake # check whether cmake is available and details
Load the needed module (you can add version info). Note that some
modules are dependent on other modules (clang/10.0.1
depends on
gcc/9.2.0
so gcc
needs to be loaded first.
# module load gcc/9.2.0 clang/10.0.1
module load clanggcc
module load cmake/3.23.3 git/2.4.1 ninja/1.10.2-py3.8 llvm cuda/11.4.2
check currently loaded modules
clean all the modules
Request GPUs to use interactively
TODO: explain what -A is
ijob -c number_of_cpus \
-A group_name \
-p queue_name \
--gres=gpu:gpu_model:number_of_gpus \
--time=day-hours:minutes:seconds
An example to request 1 cpu with 1 a100 gpu for 10 minutes in ‘dev’ partition is
ijob -c 1 -A bii_dsc_community -p gpu --gres=gpu:a100:1 --time=0-00:10:00
Rivanna has different partitions with different resource availability
and charging rate. dev
is free but limited to 1 hour for each
session/allocation and no GPU is available. To list the different
partitons use qlist
to check partitions
Last Checked July 28th, note thes values may change.
Queue |
Total |
Free |
Jobs |
Jobs |
Time |
SU |
(partition) |
Cores |
Cores |
Running |
Pending |
Limit |
Charge |
bii |
4640 |
3331 |
31 |
15 |
7-00:00:00 |
1 |
standard |
4080 |
496 |
1209 |
5670 |
7-00:00:00 |
1 |
dev |
160 |
86 |
|
5 |
1:00:00 |
0 |
parallel |
4880 |
1594 |
21 |
3 |
3-00:00:00 |
1 |
instructional |
480 |
280 |
|
16 |
3-00:00:00 |
1 |
largemem |
144 |
80 |
2 |
1 |
4-00:00:00 |
1 |
gpu |
1876 |
1066 |
99 |
210 |
3-00:00:00 |
3 |
bii-gpu |
608 |
542 |
18 |
1 |
3-00:00:00 |
1 |
bii-largemem |
288 |
224 |
|
|
7-00:00:00 |
1 |
To list the limits, use the command qlimits
Last Checked July 28th, note these values may change.
Queue |
Maximum |
Maximum |
Minimum |
Maximum |
Maximum |
Default |
Maximum |
Minimum |
(partition) |
Submit |
Cores(GPU)/User |
Cores/Job |
Mem/Node(MB) |
Mem/Core(MB) |
Mem/Core(MB) |
Nodes/Job |
Nodes/Job |
bii |
10000 |
cpu=400 |
|
354000+ |
|
9400 |
112 |
|
standard |
10000 |
cpu=1000 |
|
384000+ |
|
9000 |
1 |
|
dev |
10000 |
cpu=16 |
|
384000 |
9000 |
6000 |
2 |
|
parallel |
2000 |
cpu=1500 |
4 |
384000 |
9600 |
9000 |
50 |
2 |
instructional |
2000 |
cpu=20 |
|
384000 |
|
6000 |
5 |
|
largemem |
2000 |
cpu=32 |
|
1500000 |
64000 |
60000 |
2 |
|
gpu |
10000 |
gres/gpu=32 |
|
128000+ |
32000 |
6000 |
4 |
|
bii-gpu |
10000 |
|
|
384000+ |
|
9400 |
12 |
|
bii-largemem |
10000 |
|
|
1500000 |
|
31000 |
2 |
|
Linux commands for HPC
Many useful commands can be found in Gregor’s book at
The following additional commands are quite useful on HPC systems
command |
description |
allocations |
check available account and balance |
hdquota |
check storage you has used |
du -h --max-depth=1 |
check which directory uses most space |
qlist |
list the queues |
qlimits |
prints the limits of the queues |
SLURM Batch Parameters
We present next a number of default parameters for using a variety of
GPUs on rivanna. Please note that you may need to adopt some
parameters to adjust for cores or memory according to your
application.
Running on v100
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=12:00:00
#SBATCH --partition=bii-gpu
#SBATCH --account=bii_dsc_community
#SBATCH --gres=gpu:v100:1
#SBATCH --job-name=MYNAME
#SBATCH --output=%u-%j.out
#SBATCH --error=%u-%j.err
Running on a100-40GB
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=12:00:00
#SBATCH --partition=bii-gpu
#SBATCH --account=bii_dsc_community
#SBATCH --gres=gpu:a100:1
#SBATCH --job-name=MYNAME
#SBATCH --output=%u-%j.out
#SBATCH --error=%u-%j.err
Running on special fox node a100-80GB
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=12:00:00
#SBATCH --partition=bii-gpu
#SBATCH --account=bii_dsc_community
#SBATCH --gres=gpu:a100:1
#SBATCH --job-name=MYNAME
#SBATCH --output=%u-%j.out
#SBATCH --error=%u-%j.err
#SBATCH --reservation=bi_fox_dgx
#SBATCH --constraint=a100_80gb
Some suggestions
When compiling large projects, you may neeed to make surue you have
enough time and memory to conduct such compiles. This can be best
achieved by using an interactive node, possibly from the large memory
partition.
References
Help Support
When requesting help from Gregor or anyone make sure to completely specify the issue, a lot of things cannot be solved if you are not
clear on the issue and where it is occurring. Include:
- The issue you are encountering.
- Where it is occurring.
- What you have done to try to resolve the issue.
A good example is:
I ran the application xyz, from url xyz on Rivanna. I placed code in
the directory /project/…. or I placed the data in /project/… The
download worked and I placed about 600GB. However when I uncompress
the data with the command xyz I get the error xyz. What should we do now?
6.5.3 - Rivanna Pod
Rivanna
This documentation is so far only useful for betatesters. In this
group we have
The rivanna documentation for the basic pod is available at
https://www.rc.virginia.edu/userinfo/rivanna/basepod/
Introducing the NVIDIA DGX BasePOD
Rivanna contains a BasePod with
- 10 DGX A100 nodes
- 8 A100 GPU devices
- 2 TB local node memory (per node)
- 80 GB GPU memory (per GPU device)
The following Advanced Features have now been enabled on the BasePOD:
- NVLink for fast multi-GPU communication
- GPUDirect RDMA Peer Memory for fast multi-node multi-GPU communication
- GPUDirect Storage with 200 TB IBM ESS3200 (NVMe) SpectrumScale storage array
What this means to you is that the POD is ideal for the following scenarios:
- The job needs multiple GPUs and/or even multiple nodes.
- The job (can be single- or multi-GPU) is I/O intensive.
- The job (can be single- or multi-GPU) requires more than 40 GB GPU
memory. (We have 12 A100 nodes in total, 10 of which are the POD and
2 are regular with 40 GB GPU memory per device.)
Detailed specs can be found in the official document (Chapter 3.1):
Accessing the POD
Allocation
A single job can request up to 4 nodes with 32
GPUs. Before running multi-node jobs, please make sure it can scale
well to 8 GPUs on a single node.
Slurm script
Please include the following lines:
#SBATCH -p gpu
#SBATCH --gres=gpu:a100:X # replace X with the number of GPUs per node
#SBATCH -C gpupod
Open OnDemand
In Optional: Slurm Option write:
-C gpupod
Interactive login
Interactive login to the nodes should be VERY limited and you need to use
for most activities the batch queue. In case you need to look at thisng you can
use our cloudmesh progarm to do so
Make sure to have vpn enabled and cloumdesh-rivanna installed via pip.
cms rivanna login a100-pod
Will log you into a node. The time is set by default to 30 minutes.
Please immediatly log out after you are done with your work interactive
work.
## Usage examples
### Deep learning
We will be migrating toward NVIDIA’s NGC containers for deep learning
frameworks such as PyTorch and TensorFlow, as they have been heavily
optimized to achieve excellent multi-GPU performance. These containers
have not yet been installed as modules but can be accessed under
/share/resources/containers/singularity:
* pytorch_23.03-py3.sif
* tensorflow_23.03-tf1-py3.sif
* tensorflow_23.03-tf2-py3.sif
(NGC has their own versioning scheme. The PyTorch and TensorFlow
versions are 2.0.0, 1.15.5, 2.11.0, respectively.)
The singularity command is of the form:
singularity run –nv /path/to/sif python /path/to/python/script
**Warning:** Distributed training is not automatic! Your code must be
parallelizable. If you are not familiar with this concept, please
visit:
* TF distributed training <https://www.tensorflow.org/guide/distributed_training>
* PyTorch DDP <https://pytorch.org/docs/stable/notes/ddp.html>
### MPI codes
Please check the manual for your code regarding the relationship
between the number of MPI ranks and the number of GPUs. For
computational chemistry codes (e.g. VASP, QuantumEspresso, LAMMPS) the
two are oftentimes equal, e.g.
#SBATCH –gres=gpu:a100:8
#SBATCH –ntasks-per-node=8
If you are building your own code, please load the modules nvhpc and
cuda which provide NVIDIA compilers and CUDA libraries. The compute
capability of the POD A100 is 8.0.
For documentation and demos, refer to the *Resources* section at the
bottom of this page: <https://developer.nvidia.com/hpc-sdk>
We will be updating our website documentation gradually in the near
future as we iron out some operational specifics. GPU-enabled modules
are now marked with a (g) in the *module avail* command as shown
below:
TODO: output from maodule avail to be included
6.5.4 - Rivanna and Singularity
Singularity.
Singularity
Singularity is a container runtime that implements a unique security model to
mitigate privilege escalation risks and provides a platform to capture a complete
application environment into a single file (SIF).
Singularity is often used in HPC centers.
University of Virginia granted us special permission to create
Singularity images on rivanna. We discuss here how to build and run
singularity images.
Access
In order for you to be able to access singularity and build images, you must be in the
following groups:
biocomplexity
nssac_students
bii_dsc_community
To find out if you are, ssh into rivanna and issue the command
If any of the groups is missing, please send Gregor an e-mail at
laszewski@gmail.com
.
Singularity cache
Before you can build images you need to set the singularity cache. This is due to the fact that the cache usually
is created in your home directory and is often far too small for even our small projects. Thus you need to
set it as follows
rivanna>
mkdir -p /scratch/$USER/.singularity/cache
export SINGULARITY_CACHEDIR=/scratch/$USER/.singularity/cache
Please remember that scratch is not permanent. In case you like a bit more permanent location you can alternatively use
rivanna>
mkdir -p /project/bii_dsc_community/$USER/.singularity/cache
export SINGULARITY_CACHEDIR=/project/bii_dsc_community/$USER/.singularity/cache
build.def
To build an image you will need a build definition file
We show next an exxample of a simple buid.def
file that uses
internally a
NVIDIA NGC PyTorch container.
Bootstrap: docker
From: nvcr.io/nvidia/pytorch:23.02-py3
Next you can follow the steps that are detailed in
https://docs.sylabs.io/guides/3.7/user-guide/definition_files.html#sections
However, for Rivanna we MUST create the image as discussed next.
Creating the Singularity Image
In order for you to create a singularity container from the
build.def
file please login to either of the following special nodes
on Rivanna:
biihead1.bii.virginia.edu
biihead2.bii.virginia.edu
For example:
ssh $USER@biihead1.bii.virginia.edu
where $USER is your computing ID on Rivanna.
Now that you are logged in to the special node, you can create the
singularity image with the following command:
sudo /opt/singularity/3.7.1/bin/singularity build output_image.sif build.def
Note: It is important that you type in only this command. If you modify
the name output_image.sif or build.def the command will not work and you will
recieve an authorization error.
In case you need to rename the image to a better name please use the mv
command.
In case you also need to have a different name other then build.def
the following Makefile is very useful. We assume you use myimage.def
and myimage.sif
. Include it into a makefile such as:
BUILD=myimage.def
IMAGE=myimage.sif
image:
cp ${BUILD} build.def
sudo /opt/singularity/3.7.1/bin/singularity build output_image.sif build.def
cp output_image.sif ${IMAGE}
make -f clean
clean:
rm -rf build.def output_image.sif
Having such a Makefile
will allow you to use the command
and the image myimage.sif
will be created. with make clean you will
delete the temporary files build.def
and output_image.sif
Create a singularity image for tensorflow
TODO
Work with Singularity container
Now that you have an image, you can use it while using the
documentation provided at
https://www.rc.virginia.edu/userinfo/rivanna/software/containers/
Run GPU images
To use NVIDIA GPU with Singularity, --nv
flag is needed.
singularity exec --nv output_image.sif python myscript.py
TODO: THE NEXT PARAGRAPH IS WRONG
Since Python is defined as the default command to be excuted and
singularity passes the argument(s) after the image name,
i.e. myscript.py, to the Python interpreter. So the above singularity
command is equivalent to
singularity run --nv output_image.sif myscript.py
Run Images Interactively
ijob -A mygroup -p gpu --gres=gpu -c 1
module purge
module load singularity
singularity shell --nv output_image.sif
Singularity Filesystem on Rivanna
The following paths are exposed to the container by default
- /tmp
- /proc
- /sys
- /dev
- /home
- /scratch
- /nv
- /project
Adding Custom Bind Paths
For example, the following command adds the /scratch/$USER directory
as an overlay without overlaying any other user directories provided
by the host:
singularity run -c -B /scratch/$USER output_image.sif
To add the /home directory on the host as /rivanna/home inside the container:
singularity run -c -B /home:/rivanna/home output_image.sif
FAQ
Adding singularity to slurm scripts
TBD
Running on v100
TBD
Running on a100-40GB
TBD
Running on a100-80GB
TBD
RUnning on special fox node a100-80GB
TBD
6.6 - Web Sites
Web Sites
6.6.1 - Create infomall.org
Description on how to create infomall.org
We assume you have hugo installed and cloudmesh-vpn is installed
You need to have python 3
python -m venv ~/ENV3
source ~/ENV3/bin/activate # if windows in gitbash source ~/ENV3/Scripts/activate
pip install cloudmesh-vpn -U
cms help
Creating a draft
To create a new version of the code from the repository use
git clone git@github.com:DSC-SPIDAL/infomall-org-uva.git
make serve
To view the content say
Publish
The Web site is currently published by Gregor as follows. No other person must publish it.
cms vpn info # make sure vpn is set tu UVA
cms vpn connect # only needed if vpn is off
make huge
make rsync
cms vpn disconnect #optional to make sure vpn is off
6.7.1 - Windows for Research
We describe how to set up your Windows computer for research so that it has similar features as a Linux Machine.
Unlike Linux and macOS, Windows runs on a completely different
OS. Many coding environments are adapted to Linux, so Windows users
must properly configure their machine to prepare it for a project.
This is of special importance when working in environments supporting
distributed cyberinfrastructure. Here in many cases Linux is required.
Setting Up the Python Environment
Often you need a specific version of Python. If in doubt, please
install the newest one. At time of writing this document it
is Python 3.10.5.
Please download and install it from python.org
.
We recommend that you uninstall Anaconda if you used that before
and use the verison from python.org instead..
Development is easier when using a native
Python installation instead of anaconda/conda.
To uninstall anaconda, press the Windows key
and type “Add or remove programs”. Then, press
Enter and search for conda
in the “Search this
list” box. Remove everything related to anaconda.
Note that anaconda may have set some environment
variables or added configuration scripts to your .bashrc
files in case you use gitbash. Please, remove them and make sure your
Python version from python.org works as expected.
To code in Python, we recommend using PyCharm and not VSCode.
Pycharm and Git Bash can be installed with the instructions found in
Install.
Installation may be simplified while using chocolatey.
This includes
- gitbash
- pycharm
- emacs
- docker
Before installing docker however you have to set up the
appropriate hypervisor at boot time. PLease let us know how
you set them for your machine, so we can add some information here.
You will likely have to research it.
Please also know that you MUST uninstall virtualbox before you
install docker, as old versions of virtualbox are incompatible
with docker and it is just easier to uninstall virtualbox and reinstall it.
Next we summarize the installations using chocolatey.
Before installing anything, we recommend that you read the entire section.
Especially when installing docker and if you do not have a brand new computer.
Install Chocolatey
To install chocolatey, follow the tutorial
at https://github.com/cybertraining-dsc/cybertraining-dsc.github.io/blob/main/content/en/docs/tutorial/reu/chocolatey/index.md
Install Git Bash
Git Bash must be installed with specific
configurations, as the discrepancy between
Windows and other operating systems can
cause errors during runtime, if not
properly configured. If Git Bash is already
installed, uninstall it by pressing the Windows
key and typing Add or remove programs
(and
press Enter). Then locate and uninstall Git Bash.
To install Git Bash with chocolatey, issue
the following command:
$ choco install git.install --params "/GitAndUnixToolsOnPath \
/Editor:Nano /PseudoConsoleSupport /NoAutoCrlf" -y
The /NoAutoCrlf
sets it so that the files are
not downloaded via git with Windows line endings.
If it were downloaded with such line endings, then
it would cause programming bugs. Programmers should
Commit As-Is to avoid conflicts.
Install PyCharm, emacs, and Docker
Uninstall PyCharm Community version if already
installed on the computer by pressing the Windows
key and typing Add or remove programs
(and
press Enter). Then locate and uninstall PyCharm.
The following command installs PyCharm Professional,
among other necessary development programs.
To install these programs in an easy manner,
issue the following command (you must have
chocolatey installed):
$ choco install pycharm emacs docker-desktop -y
PyCharm is advantageous over other IDEs such
as VSCode because students receive the professional
version of PyCharm for free. Furthermore, PyCharm
offers robust features such as Refactor and Inspect
Code.
A guide to activating PyCharm with a free professional
license is available at https://youtu.be/QPESX-VBnEU
Set hard wrap
Press Ctrl + Alt + S
in PyCharm and expand the
Editor
menu on the left-hand side. Then, click
on Code Style
and enter 79
in the Hard wrap at:
box. Also, check the Wrap on typing
checkbox.
This is done so that the text in files is uniformly
indented at 79 columns.
To change what the Tab key does in a Makefile,
open a Makefile in PyCharm and click on Tab
in the bottom right of the PyCharm interface.
If you cannot find the Tab
button, then click
on View
in the top-right, go to Appearance
,
and make sure Status Bar
is checked.
After clicking the Tab
button in the
bottom-right, click on Configure Indents for Makefile...
Tab size should be 4.
If PyCharm fails to render your Makefile correctly,
right-click on the Makefile in your open files tabs
and click Override File Type
. If you cannot find
Makefile
in the list, you must install the Makefile Language plugin
for PyCharm.
Preparing for Virtualization
Docker
To enable virtualization for Docker on Windows machines, some
preparations must be made. First, if the user has VirtualBox
installed it is suggested that they uninstall it and reinstall later
if necessary. Some older versions of VirtualBox do not support other
virtual images like Windows Subsystem for Linux (WSL).
Next, the BIOS settings must be changed to enable virtualization. To
do this, search Advanced startup
in the Windows Search Bar and
click Restart now
. Click Troubleshoot
and Advanced startup options
and then UEFI Firmware Settings
to get into the BIOS. NOTE: These are
not exhaustive instructions because computer brands and hardware differ
vastly. The main objective is to get into the BIOS and search
for any Virtualization
or Hyper V
options in Windows BIOS configuration.
For example, Lenovo brand laptops have a Configure
tab in the BIOS
and the virtualization settings must be enabled under that menu.
Then, the user must exit the BIOS while saving changes.
Better documentation on enabling virtualization, which is recommended by Docker
and created by Berkeley, is located at https://bce.berkeley.edu/enabling-virtualization-in-your-pc-bios.html
Lastly, check Windows features with Turn Windows features on or off
.
For Docker, Hyper-V
and Containers
must be enabled.
WSL
WSL is a Linux virtual image designed for Windows. WSL 2 is typically
used as opposed to WSL 1. To install, type this into administrative PowerShell:
To install a particular distribution, use wsl --install -d <DistroName>
instead. The available distributions can be found with
PS> wsl --list --online
After WSL is installed, it can be accessed by typing wsl
in
Powershell. More documentation can be found in the Microsoft
Official Documentation.
Directories in WSL
WSL creates a Linux environment in your Windows directory. To access
your directories with WSL, a special syntax is used. For example, your
home directory, typically C:\Users\USERNAME
and abbreviated to ~
is the following with WSL: /mnt/c/Users/USERNAME/
. So to change
directories to the Desktop in WSL, use this command:
$ cd /mnt/c/Users/USERNAME/Desktop
where USERNAME
is to be replaced with the name of the user.
bashrc
env=~/.ssh/agent.env
agent_load_env () { test -f "$env" && . "$env" >| /dev/null ; }
agent_start () {
(umask 077; ssh-agent >| "$env")
. "$env" >| /dev/null ; }
agent_load_env
# agent_run_state: 0=agent running w/ key; 1=agent w/o key; 2=agent not running
agent_run_state=$(ssh-add -l >| /dev/null 2>&1; echo $?)
if [ ! "$SSH_AUTH_SOCK" ] || [ $agent_run_state = 2 ]; then
agent_start
ssh-add
elif [ "$SSH_AUTH_SOCK" ] && [ $agent_run_state = 1 ]; then
ssh-add
fi
unset env
source ~/ENV3/Scripts/activate
cd ~/cm
6.7.2 - Windows Git Bash
We describe how to set up gitbash on Windows as it not only comes with a git client, but also enables a bash terminal directly on Windows. As on many Linux machines bash is used, this is useful as you do not have to learn two different terminal environments. Most supercomputers have bash.
Git Bash is the terminal of choice for the Windows operating
system. However, it must be properly configured for an optimal
Python development experience; for example, Pseudo Console Support
must be enabled.
First, uninstall Git Bash, if already installed.
If you installed Git with choco
, then do choco uninstall git
and choco uninstall git.install
.
If you did not install Git with choco
and you instead used
the installer wizard from the Git website, then
press the Windows key, searching for Add or remove programs
, searching for Git
, clicking on it, then
clicking Uninstall
and completing the uninstallation wizard.
If you do not have chocolatey then follow the tutorial at
https://chocolatey.org/install.
Then install Git Bash in a Run as Administrator instance of Powershell by
executing the choco command:
$ choco install git.install --params "/GitAndUnixToolsOnPath \
/Editor:Nano /PseudoConsoleSupport /NoAutoCrlf" -y
For good measure, execute the following in Git Bash to enforce
LF line endings:
$ git config --global core.autocrlf false
Also, generate an ssh-key:
$ ssh-keygen
# press enter to save to default location
# create a strong memorable password and confirm the password
If you do not have ENV3 Python virtual environment or cm dir,
then execute these commands:
$ python -m venv ~/ENV3
$ mkdir ~/cm
The following is also an ideal ~/.bashrc
file to have for
cloudmesh development on Windows. You can create
this ~/.bashrc
file by saying nano ~/.bashrc
in Git Bash, copying
the text below, and then pasting the text with keyboard shortcut
Shift
+ Insert
. Then say Ctrl + X
, y
and Enter
, and
then Enter
. Then restart Git Bash.
An error regarding bash profile after first
relaunching Git Bash after this created file is expected.
env=~/.ssh/agent.env
agent_load_env () { test -f "$env" && . "$env" >| /dev/null ; }
agent_start () {
(umask 077; ssh-agent >| "$env")
. "$env" >| /dev/null ; }
agent_load_env
# agent_run_state: 0=agent running w/ key; 1=agent w/o key; 2=agent not running
agent_run_state=$(ssh-add -l >| /dev/null 2>&1; echo $?)
if [ ! "$SSH_AUTH_SOCK" ] || [ $agent_run_state = 2 ]; then
agent_start
ssh-add
elif [ "$SSH_AUTH_SOCK" ] && [ $agent_run_state = 1 ]; then
ssh-add
fi
unset env
source ~/ENV3/Scripts/activate
cd ~/cm
Troubleshooting
If an already installed
message appears when trying to use choco to
install Git Bash, such as
git.install v2.33.0.2 already installed.
Use --force to reinstall, specify a version to install, or try upgrade.
then try choco uninstall git
. Then rerun the previously
listed choco install
command.
If that does not work, consider the --force
parameter
mentioned in the warning message.
6.8 - Docker
Cybertraining Links
Docker drivers images from NVIDIA
Install GPU drivers in a docker image
NVIDIA GPU drivers can be installed into docker images.
As the software may frequently cange, we recommend to look at
the Nvidia documentation
An example to add to a debian based Dockerfile to install the GPU drivers (this may be incomplete and you need to check the instructions):
RUN curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
apt-key add - \ &&
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ &&
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list |
RUN apt-get update \ &&
apt-get install -y nvidia-container-runtime
6.9 - Cybertraining
Cybertraining Links
A large number of tutorials and modules are avialable in our
cybertraining educational activities.
Cybertraining
The main links to our cybertraining material are:
6.10 - Raspberry Pi Cluster
Raspberry Pi Cluster
Links
The main web page for this is at https://piplanet.org
additional tutorials and resources are