# Hardware Requirements
Montreal Virtual Machine Configuration
Logical Deployment Diagram
In the figure above, you can see the logical deployment diagram proposed by the Calgary IT. The Montreal and Toronto sites may differ in many of these details, but the diagram pertaining to the two servers (the connectivity, user access, Compute Canada) should be very similar.
Server Configuration
Server Model: Dell PowerEdge R750.
Virtual Machine Image or ISO: Ubuntu 18.04, 20.04, 22.04 (preferred, see install link for other viable options).
Hypervisor Compatibility: Ensure that the VM image is compatible with the hypervisor software that will be used to run it (e.g., VMware, VirtualBox, Hyper-V, KVM).
VM Hardware Configuration:
CPU and RAM Allocation: 2 x Intel® Xeon® Gold processors with 8 cores each, resulting in a total of 32 cores (equivalent to 128 vCPUs for virtualization). 12 x 16GB RAM modules per server, providing a total of 192GB RAM for each server (384GB in total).
Storage: Each server is equipped with 2 M.2 SSDs, 480GB each, configured in RAID 1. This RAID 1 setup is intended for the server’s operating system and VM storage needs.
Network Configuration: Setting up the self-hosted Gitlab instance involves creating SSH keys, for example.
Disk Configuration: Define virtual hard disk size, type (e.g., VMDK, VHD). Storage location is the storage server.
Operating System Installation: If the VM doesn’t include an operating system, provide instructions or scripts for installing the OS from a standard image or ISO file.
Software and Application Configuration: Ensure that all required software and applications are pre-installed and properly configured within the VM. Document any software licenses or keys.
ZFS File System Server (see notes): A robust ZFS file system server is required for efficient data management and storage.
Ubuntu: The stable and widely recognized Ubuntu Linux distribution is recommended as the operating system for hosting Gitlab.
Git, Gitlab, Git-annex, gitlab-runner
Docker Compatibility: Ensuring compatibility with containerized software frameworks like Docker is essential for seamless deployment and management of Gitlab within a containerized environment (Example images used: DataLad, Heudiconv).
Apptainer (Singularity).
User Accounts and Permissions:
Full control over setting up user accounts and groups within the VM.
Define user roles and permissions to ensure proper access control.
Document login credentials or SSH keys for user access.
Security Configuration (for local IT):
Configure firewall settings and security policies.
Apply necessary security patches and updates.
Set up antivirus and intrusion detection software, if applicable.
Network Configuration:
Document IP addresses.
Ensure that network ports required for the VM’s services or applications are open and properly configured.
Hostname and DNS Configuration: Set a meaningful hostname for the VM and configure DNS resolution if needed.
Scripts and Automation: If any post-deployment scripts or automation are required to set up the VM, provide these scripts along with instructions for their execution.
Networking Considerations: The VMs need to communicate with other systems. Does it need an assigned static IP address or DNS name? Document any required network configurations, such as routing or port forwarding.
Backup and Restore Procedures: Document backup and restore procedures to ensure data integrity and recovery options.
Data is backed up on the storage server, and certain datasets will be shared and backed up on Digital Alliance clusters.
File Sharing and Permissions: If the VM includes shared files or folders, configure file sharing and set appropriate permissions for access.
Notes
ZFS is a bonus. It helps with the daily backups on Compute Canada. It’s because its journaling feature lets us easily index changes. However, we could easily work around.
The number of CPUs and memory is based on the type of pipelines and the volume of data we’re planning to generate every week.
There is no need for storage on the Compute server. The plan was to mount space from the storage server.
If it works standalone, there’s no need for 100 TB! 10 TB is plenty.
The original ZFS solution was meant to scale for the needs of imaging researchers, beyond the data they acquire themselves. This would host samples like HCP, UK Biobank, etc. This will take several PB over time.
The ZFS solution was meant to scale over time, funded through user fees. If we scrap that system, Vincent and Élodie need to have a credible and cheap alternative.
With the proposed solution, costs would have been roughly 10k for 50 TB. This is with redundant and resilient local storage and two copies offsite constantly being maintained. So roughly 300 TB of actual orchestrated storage, adhering to best practices for long-term data maintenance.
These numbers are very approximate, by the way. This will depend on actual up-to-date quotes and likely be cheaper than what I list here in practice.
Also, ideally, the Compute server and storage would be connected at 10 Gb/second.