
UF HPC Center Policies
The HPC Center facilities are user-financed resources operated with University of Florida support for research computing by UF faculty and graduate students. UF faculty members are encouraged to invest in the HPC Center to ensure priority access for the members of their research group and their research associates. The facilities are available to all UF faculty members and their research associates although priority is given to faculty who have contributed research funds to support the Center.
The HPC Center is primarily a batch execution environment in which jobs are submitted through the "Torque/PBS" resource manager and scheduled by the Moab scheduler under policies formulated by the HPC Committee and implemented by the HPC Center Director and Staff. An overview of these policies is presented below.
Scheduling Policies
The policies for resource allocation are described in detail in the management and governance documents of the HPC center. This page gives a summary of the principles.
- There are two clusters called Phase II (operational Jan 2006) and Phase III (operational Jan 2009).
- Faculty members (or groups of faculty members) who have invested in one or more of the HPC clusters and infrastructure are given priority access to a share of the cluster which is proportional to their investment. These are the "investor" groups. Investments are measured in NCUs (Normalized Computing Units). Each NCU represents one core in a functional environment that includes RAM, disk storage, network access, power, cooling, and management.
- Faculty who have not (as yet) contributed to the HPC Center are classified as non-investors.
- Users sponsored by investors will be assigned to their respective investor group. Non-investor faculty and users they sponsor will be assigned a group based on their departmental affiliation.
- All jobs submitted by investor groups will be considered for execution before any jobs submitted by non-investor groups. Once all eligible investor jobs have been scheduled, the scheduler will consider and start non-investor jobs if resources are still available.
- Short jobs (12 hours or less) are given special consideration which may, based on available resources, allow them to run even if normal resource limits might be exceeded. That is to say that both investors and non-investors may exceed their typical resource limits for short periods of time.
- Long jobs are generally discouraged as they reduce turn-around and can lead to frustration among users. Nonetheless, long jobs can be run but the resources available for them are limited. Investor jobs are considered "long" if they exceed seven days (168 hours) of requested wall-clock time. Non-investor jobs are considered "long" if they exceed three days (72 hours). Although there are more resources allotted for investor long jobs than non-investor long jobs, all long jobs are restricted to roughly ten percent of available resources. Non-investor jobs are limited to fourteen days (336 hours) or less. No job longer than thirty-one days (744 hours) is allowed and will be rejected by the queueing system. Note that item 5 applies to the consideration of all jobs so investor long jobs will be considered before non-investor long jobs.
- Disk storage is intended to support running jobs and time-limited projects. Using HPC Center storage for long-term or archival purposes is not permitted. It is also essential to understand that no user data is backed up. All file systems are considered volatile, regardless of /home and /scratch nomenclature, and users are urged to keep their own copy of important data on non-HPC Center media.
- Passwords must be reset (by changing your password) every six months. Any account for which the password has not been reset within six months of expiring (meaning the password is at least one year old), will be removed from the system. All data associated with the account (/home and /scratch files) will also be removed.
These allocation policies are in place to ensure that investors have access to the resources that they have helped purchase and on which they rely for their research programs. They are also intended to provide availability to non-investors so that all research faculty can benefit from the resources of the HPC Center. We invite and encourage feedback from investors and non-investors alike. It is our intent to be as flexible and helpful as possible to all users of the HPC Center facilities. As an investor, if we are not meeting your needs, please contact the HPC Center director, staff, or the Chair of the HPC Committee as soon as possible so that we may address the issue.
Account Policies
Account Expiration
Account passwords expire after six months. If, after six months from your
password expiring, you have not reset it, your account will be removed from
the system.
Account passwords can be reset from the
Account Creation
Page.
Interactive Jobs on Submit
No interactive jobs are allowed on Submit.
The host "submit" (submit.hpc.ufl.edu) is the login host and primary
gateway to the cluster. It provides a software environment essentially
identical to that found on the cluster nodes and is intended as a platform
for:
- submitting jobs to the batch queues
- compiling and linking executables (i.e. building code)
- managing files
Because of the importance of this host to a large number of users, it must not be used for testing or running user programs under any circumstances. There are other facilities for this (see below). Users found running jobs interactively on submit will lose access to the cluster for thirty (30) days. If a second offense occurs, the user's account will be permanently disabled.
Personal or Class-Related Work
The cluster is strictly for faculty sponsored research and is not to be used
by students for class assignments.
Data is NOT backed up
No data - not even home directories - is backed up. Please copy any data
which you cannot afford to lose to your group's or department's storage on
a regular basis. We make every effort to maintain the consistency and
integrity of the cluster's file systems. However, it is inevitable that
we will, at some point, lose data. Don't let it be yours.
Job Input/Output
The home directory file system (/home) is not designed to withstand the
load of hundreds of jobs performing I/O simultaneously and will become
unresponsive under such circumstances. The scratch file systems
(/scratch/ufhpc and /scratch/crn) are designed and intended for this
purpose. Therefore, all job I/O should take place on the scratch file systems.
Archiving of Data
The data storage facilities of the HPC Center are not to be used as a
repository or archive for old data or data from other sites. The storage
on the cluster exists for the sole purpose of supporting current projects
and calculations. Once you have the results you need and the data has been
analyzed, it should be deleted from the cluster. If you need to save
copies, you must do so on your own or your department's systems.
Code Development
If you need to develop code (i.e. the standard edit, build,
test cycle), there are several machines on which you can do so and run
short interactive tests. They are
- test04 (8 Cores, 16 GB RAM)
- test05 (8 Cores, 16 GB RAM)
