HYAK is part of an integrated, scalable, scientific super-computing infrastructure operated by UW-IT. Hyak is part of the University of Washington’s cyberinfrastructure plan to support world-class research in every department. It includes a high-performance research network, the Hyak compute infrastructure (HPC clusters), and scientific support services for your research workflows. The network supports fast data transfers among these systems and between them and the rest of campus or the Internet. Hyak is an excellent option for research groups requiring a fast, convenient, flexible, and cost-effective alternative to operating their own computing platforms. Any faculty or PI can send an e-mail to help@uw.edu and request a “Welcome to Hyak Tutorial” to be held at their lab or department.
School of Medicine HYAK sponsorship
The School of Medicine (SoM) has gathered all of the departmental HYAK activity under one sponsorship umbrella, administered by the Office of Research & Graduate Education (RGE). The SoM Sponsorship of HYAK consolidated existing departmental sponsorships as well as all existing self-sponsorship agreements within the School, effective January 1, 2024, providing a total Slot Allocation of 100 slots to SoM. This agreement superseded and terminated all previous agreements.
Inquiries & Slice Purchase
Visit the Hyak service catalog page for general information. Follow the Pricing link in the Hyak User document to see the costs. Click on this email link: Buy Slices Email to get into direct discussion with the UW-IT Customer Agreements team, who will handle all of the Hyak setup.
The first stages of the process happen between the PI and UW-IT. The process starts with a PI asking about placing an order. If they already know what they want there’s an order form but the best place to start is the Hyak User Docs. The Pricing page in the User Docs is the best place for new users to begin. Another option is to use the UW-IT Service Catalog. While the service catalog and its order form is not as useful a place to start as the Pricing page, it does have general information and links to get to details (especially under User Docs), so it’s also a good place to start. At the bottom is an email link that will get PIs in contact with UW-IT’s Customer Agreements team.
In the Service Catalog, there’s a form to fill out to order a Slice (aka Node), which is ultimately what starts the formal process. This will cause an order confirmation letter to go (via DocuSign) to the Sponsor (to authorize use of Slots), and then to the PI for signature to charge their grant budget for the hardware.
Sponsorship rates differ depending on the total number of slots held by a Sponsor. The scale of the SoM sponsorship allows a considerable savings on the costs charged per slot. Pooling all Hyak memberships in SoM results in a Tier I membership, with annual Slot Fees of $800 rather than the previous Tier 2 rate of $1,000 or the Self-sponsored rate of $1,750. There are no further membership charges – slot purchases are tax exempt, as permitted by state law and are exempt from Facilities and Administrative (F&A) costs (Indirect Costs), as permitted by UW F&A policies. The price is exactly equal to the quote, with no UW or UW-IT markup.
Note that billing is not based on usage; it’s based on how many slots a user has installed into Hyak. RGE bills users if they have a node installed in Hyak, not by usage of that node. By installing a node into Hyak the user is obligated to pay the associated fee for the right to use the node (aka the slot fees or the extended access slot fees).
There is no subsidization in the SoM Hyak sponsorship. The User pays 100% of the total $800 per year per slot. RGE recovers the $800 Sponsorship costs in their entirety from the PI who is always the signer on the Slice Purchase Agreement. Repayment to RGE for Sponsorship costs cannot be from a federal grant. Departmental RCR funding is the most straightforward choice for cost recovery.
For the annual fee the user receives full support of the hardware and software. Sponsorship fees fund service support expenses as well as staff who support the systems and the overall service, and other expenses related to the sustained operation of the Hyak service. Sponsors’ Hyak investment supports a specific number of “Slots” which may be occupied by equipment funded by End Users in their administrative unit. PIs are responsible to pay for and set up their own hardware separately. Individual SoM users purchase the equipment (aka the individual nodes used in Hyak) via UW-IT’s process.
SoM Hyak end users must have an official UW faculty appointment – regular, adjunct or affiliate – in order to purchase Hyak slices (i.e., dedicated Hyak resource access). PIs sign the agreement and are responsible for paying Slot Fees to RGE. Note: Hyak Slot Fees cannot be charged to a federally sponsored project.
- The PI signs the Slice Purchase Agreement (providing the Worktag of their non-federal research grant to pay for the Slice) because they are going to repay RGE for the Sponsorship costs of any Slots purchased, on an annual basis.
- RGE co-signs the Purchase Agreement, to approve the use of the SoM Slot(s) for four years.
- Purchase of a Slice – between the PI and the UW-IT Hyak team. This happens on an ad hoc basis, any time a PI wants to buy in.
- Annual Recovery of Sponsorship Costs – between SoM (RGE) and PIs. RGE collects Sponsorship fees from the PIs, based on the number of slots they had occupied in the prior 12 months.
Slices will be operated and maintained for four years, after which they will be abandoned in place by the PI with a residual value of zero dollars. After four years, PIs will be eligible for ongoing access to Hyak beyond that time for a fee, as long as resources are available in the Hyak system to support such “Extended Access”.
The cost of supporting older nodes goes up as time goes on, which includes both hardware repair and labor costs. In addition, the “lost opportunity” cost of using the limited data center resources goes up significantly with older equipment, because less computing can be accomplished with older equipment for the same power consumption as newer equipment. The increasing lost opportunity cost is reflected in an annual increase in Extended Access Fees.The fee for Extended Access will cover the costs associated with providing Extended Access, as well as proportionate support of the entire Hyak High Performance and Data Ecosystem (Slot Fee). The Fee will increase annually, with a relatively small increase for the first two years, and then ramping up significantly for the following two years.
If the SoM user is to be charged extended access slot fees (aka the right to use the equipment in HYAK that UW-IT has determined to be subject to extended access fees), then the individual SoM user will pay those extended access slot fees to UW-IT and UW-IT will credit SoM’s Tier 1 sponsorship fees for the proportion of the extended access slot fee paid by the individual SoM users that is already paid to UW-IT by SoM’s Tier 1 sponsorship.
- Archive file service
The Lolo Archive is a file-based repository appropriate for data which you rarely access but for which you want to ensure long-term safekeeping and fast, convenient retrieval. Use of the archive service is separate from Hyak, and additional charges apply. - Checkpoint queue / scheduling
Checkpoint (sometimes referred to as “backfill”) scheduling allows other jobs to use the reserved job slots, as long as the other jobs do not delay the start of another job. Checkpointed jobs, together with processor reservation, allows large parallel jobs to run, which helps to maximize resource utilization. - Compute Node
See “Slice.” - Extended Access
Any PI who owns a Slice in Hyak that has been retired is entitled to Extended Access to Hyak, under the same basic operating terms, with immediate access to a node of equivalent computing power as their previously owned Slice or Slices, and equivalent priority access to the Checkpoint Queue. Extended Access requires a monthly Access Fee, and is offered under the terms and conditions outlined in the Hyak Node Retirement and Extended Access Policy (link above). - Fair-share
Fair-share scheduling is a scheduling algorithm for computer operating systems in which the CPU usage is equally distributed among system users or groups, as opposed to equal distribution among processes. See: https://en.wikipedia.org/wiki/Fair-share_scheduling - Federation
In computer system design, federation describes the combination of two or more otherwise autonomous computer systems into one, interoperating system. The participating systems are connected by a computer network and may be geographically distributed.
In the case of the Hyak supercomputer system, federation specifically refers to the practice of sizing each system around the capacity of a large, high-performance network switch, then connecting two or more of these subsystems together for the convenience of the users.
All compute slices in all subsystems share login and data mover slices, as well as sharing a common scheduler and data storage. Strong-scaling jobs are limited to a single subsystem, while weak-scaling jobs may be placed anywhere within the overall federated system. This design approach results in the best overall performance for the workloads typically observed within Hyak. Cost is similar to more complex and poorer-performing, multi-stage network designs. System complexity is reduced, and overall reliability is increased by simplifying cabling and network management. - GPU Slice
A GPU Slice is a Hyak Slice comprised of one or more GPU processors, dedicated memory, and may also include some dedicated cores of an HPC processor, and is the smallest unit of purchase for GPU capability. GPU Slices offer a high degree of parallel computing and are especially suited for machine learning, image processing, and matrix/tensor-based calculations. A GPU Slice requires one Hyak Slot. See also: Slice. - GP-GPU
General-purpose computing on graphics processing units (GP-GPU) is the use of a graphics processing unit (GPU), which typically handles computation for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU). The use of multiple video cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing. In addition, even a single GPU-CPU framework provides advantages that multiple CPUs on their own do not offer due to the specialization in each chip. - High Speed Research Network
UW’s High Speed Research Network (HSRN) and Science DMZ have been in operation since June 2010. The current HSRN connects the high-performance compute and storage clusters Hyak, Lolo and Kopah to each other, as well as providing a 10Gbs connection to campus. The Science DMZ feature of the HSRN enables high throughput data transfers to endpoints outside the UW networks. Lolo acts as the campus Data Transfer Slice (DTN) connecting at 10Gbps both inside and outside the campus security perimeter for optimal access by both on-campus researchers and their external collaborators. - Kopah ‘Condo’ Storage service (formerly, Lolo Collaboration storage service)
Kopah Condo storage is a petabyte-scale, object storage service that will be available in Fall 2023. It will be accessed using cloud-native (object storage) protocols (AWS S3), as well as standard POSIX access protocols. It will be connected to Hyak with a high-bandwidth connection, as well as the campus research network to provide relatively low-latency connection to research labs. It will also have high bandwidth (100 mb) connection to the internet, for sharing data with other researchers, and to cloud platforms (AWS, Azure, GCP), so can also serve as a hybrid storage service, accessible from campus or the cloud. Like Hyak, the Kopah Collaboration service will be operated in a ‘condo’ model. PIs can purchase – typically on a grant budget – individual hard disk drives (13 TB) or storage nodes (321 TB) as a one-time cost, and then pay a monthly operating fee for some number of years, through an MOU commitment. Monthly cost is expected to be a small fraction of cloud storage (under 10%). Another option will be to pay for this storage on a monthly basis, with no up-front disk purchase, exactly as you would pay for cloud storage. - Many-core
Manycore processors are specialist multi-core processors designed for a high degree of parallel processing, containing a large number of simpler, independent processor cores (e.g. 10s, 100s, or 1,000s). Manycore processors are used extensively in embedded computers and high-performance computing. As of November 2021, the world’s fastest supercomputer (as ranked by the TOP500 list), Fugaku, obtains its performance from 7,630,848 compute cores. - Node
See Slice. - Science DMZ
The Science DMZ is a portion of the network, built at or near the campus or laboratory’s local network perimeter that is designed such that the equipment, configuration, and security policies are optimized for high-performance scientific applications rather than for general-purpose business systems or “enterprise” computing. Hyak, Lolo and Kopah all reside within the UW Science DMZ. - Scratch file service
Each Hyak group has an eponymous shared scratch directory under /gscratch which is shared among all Hyak slices. If your group is hyak-mygroup, your directory would be /gscratch/mygroup. You can determine your group by looking at the output from the groups command. Supplemental scratch storage is available in one (1) TB increments, billed monthly. MOX.hyak also provides at least 100TB of “scrubbed” scratch data storage for users who require substantial, but very short-term working data storage. Any data copied into the scrubbed scratch storage space will be automatically removed within a short period — typically less than two weeks. - Self-sponsored Slice
Slices may be placed outside any Sponsor’s alloted slot capacity, and are considered to be Self-sponsored Slices. Self-sponsored Slices require payment of Annual Slot Fees for each such Slice. Self-sponsored Slot Fees are higher than the standard Annual Slot Fee associated with sponsored slots, and are prorated on a monthly basis to the actual deployment period. This allows PIs whose department is not part of a Sponsor’s organization to participate in Hyak. - Slice
A Slice (formerly known as a Node, or Compute Node), represents the smallest unit of purchase in the Hyak system. Two types of Slices can be placed in Hyak: High Performance Computing (HPC), and Graphics Processing Unit (GPU). Each Slice requires one Slot. - Slot
The Hyak supercomputer follows the “condominium” deployment model in which otherwise independent research groups deploy compute equipment within a shared infrastructure. We define a slot as the infrastructure capacity required to support a single standard Compute Slice.
For the purpose of calculating Hyak capacity, any equipment occupying the chassis capacity required for a standard slice will also be counted as occupying a Hyak “slot”, even if it requires no network support. Likewise, any equipment requiring a connection to the system’s high-performance network will also be counted as occupying one Hyak slot for each network port used. - Sponsored Slice
A Slice can be deployed in a slot within a Sponsor’s allotted capacity in the condominium model (their Slot capacity), in which case it is considered a Sponsored Slice. Alternatively, slices may be deployed outside of any Sponsor’s alloted capacity, as Self-sponsored Slices.