Mirror Definition
- Date proposed: 2023-11-20
- RFC MR: https://gitlab.archlinux.org/archlinux/rfcs/-/merge_requests/29
Summary
Before this proposal, a large sum of work related to mirrors were manual labor with potential for errors and mistakes. This RFC outlines definitions and guidelines surrounding a new workflow and processes for becoming, maintaining and decommissioning mirrors.
Motivation
Mirrors are a well known concept on a abstract level as it's one of the foundations of most Linux distributions. However, the Arch Linux instructions and guidelines surrounding mirrors are manual and prone to errors and delays.
Having users report mirror details in a free-text-format, along with manual work to copy and enter this information directly into a database is a tedious process.
The current method of tracking issues via e-mail is not ideal either, as it has historically led to long lead times, cumbersome interactions and does not provide an easy method of automating certain tasks.
This RFC aims to rectify these things by defining a new method of registering, updating, maintaining and decommissioning mirrors via GitLab. Its main focus is on a proposed TOML format per mirror definition, which then gets consumed in various steps to produce a single source of truth for mirrors. A benefit from moving to GitLab is that Arch Linux can move away from publicly revealing the mirror administrator's e-mail address while retaining the ability to contact mirror administrators via GitLab. As GitLab contact information can be used instead. Thus complying with regulatory demands.
These changes would allow for creating central tooling and pipelines optimizing the workflow around the mirror workload.
Some points following in this discussion:
- Defining mirror types
- Defining a new-mirror versioned spec
- Defining a new proposed way of managing mirrors
What will happen with existing mirrors
Existing Arch Linux mirror metadata in the archweb will be transformed and migrated to the Gitlab repository by the Arch Linux mirror-list administrators. Mirror owners will be able to create a GitLab account where they will be able to open issues and create merge requests to alter their published mirror information. The option to mail support related tasks with mirrors will remain.
The transition period will be long and any improvements can be still made to the whole process.
Timeline
Expected outcome is that none of the changes outlined in this RFC will be noticed by end users or by the services provided under archlinux.org. However, some of the changes in this RFC will be noticeable by the Arch Linux mirror-list administrators as well as the mirror administrators themselves during this process, most notably the creation of a GitLab account will take some coordination as well as setting up tooling to manage the new mirror spec and database synchronization.
Mirror Specification
A mirror specification has been created and is hereby proposed as the new standard going forward for new mirror submissions, updating the published mirror information as well as decommissioning mirrors. The specification is a living entity, but is versioned and should aim to be backwards compatible while no such restriction is enforced at this time.
The specification aims to define what a mirror is, and must be in a machine readable format as well as being easy for humans to read. TOML is proposed as an alternative to JSON for individual mirror entries as it supports comments as well as fulfill both requirements of being human and machine readable.
Specific mirror specification versions will become deprecated followed by a discontinuation as new versions are created. Each new version aims to be backwards compatible, but is not a requirement. Wherever possible, new mirror specification versions will - if possible - automatically migrate old mirror entries to newer versions
Each individual mirror entry in TOML format might then be combined into a single source of truth in other formats, such as JSON. This single source of truth can then be used by Arch Linux back-ends and services. This RFC proposes JSON as the chosen format for the single source of truth (combined mirror entries) it's an adopted standard by many libraries and languages, and the initial format of said JSON file is proposed to follow the format of MirrorZ v1.7 or higher.
Managing Mirrors
This proposal aims to move away from managing mirrors via GitLab support tickets and archweb, and instead deal with a format that is intended for automation and reduce human errors. The proposed repository for managing mirrors can be found on the mirror project repository.
The proposed workflow proposed is this:
- Create a GitLab account.
- Fork the mirror project.
- Create a new mirror definition file in the appropriate mirror type/region. This file should contain all the necessary information for the new mirror.
- Submit a new merge request, summarizing the changes/addition surrounding the mirror information.
- Verify that any automatic tests and checks for the merge request results in a good results.
- mirror list administrators verifies and signs off on the merge request and then merge it.
- Automatic tasks will create a source of truth with the newly submitted data in the root of the mirror project in the proposed format.
- The source of truth is then parsed by other Arch Linux projects, such as archweb.
This will automate the process to such a degree that mirror list administrators only have to sign off on new mirrors. This should introduce no more and no less work than is already being done by both mirror administrators as well as mirror list administrators.
Mirror decommission & deactivation
Mirrors may be decommissioned or deactivated due to several reasons: - The mirror is unreachable or unable to fulfill its service as a mirror. - Voluntary withdrawal by the mirror owner. - Malicious behavior, such as attempting to serve malicious files, or domain hijacking. - Failure to follow specifications for a prolonged amount of time, even after given grace periods.
Decommissioning is the last step after deactivation, as there would be grace periods and ongoing communication between the involved parties.
The workflow for decommissioning a mirror is quite similar to creating a new mirror, here we outline the differences:
- Fork the mirror project.
- Delete the mirror definition file for the mirror in question.
- Submit a new merge request.
- mirror list administrators sign off on the change and merge the merge request.
- Automatic tasks creates an updated source of truth in the root of the mirror project in the proposed format.
- The source of truth is then parsed by other Arch Linux projects, such as archweb.
Deactivation is similar, but instead of deleting the mirror definition, changing its active status by toggling the visibility
flag is done instead.
Arch Linux commitment
Assuming the proposal is approved, Arch Linux commits to improving the experience of managing mirrors such that it feels modern, fast and reliable. Having to create a GitLab account might come across as demanding when compared to communicating via the traditional way of e-mail. However this also helps Arch Linux commit and ensure privacy concerns, better organization of mirrors and tasks around it. And thus Arch Linux commits to improving privacy and reliability in terms of communication between Arch Linux mirror-list administrators and the mirror administrators.
Mirror Tooling
To facilitate the changes outlined in this RFC, tooling would not only be beneficial but crucial for managing things going forward.
This RFC proposes a tool called mirrorctl, with the goal of ensuring validation and handling of the proposed mirror definition format. The tooling can also be used to automate tasks, and future scopes outlined in separate RFC's.
Drawbacks
The main drawback would be that mirror administrators would require a GitLab account. And at the time of the proposal the normal account registration workflow is not enabled in GitLab, and accounts need to be manually coordinated via e-mail for initial creation.
This is however calculated to be a small inconvenience, when compared to the benefits it would produce. Mainly the privacy and regulatory demands.
The RFC proposes no changes to the mirror layout, and thus can allow for old mirrors to remain in operating as is, assuming the mirror list administrators produce the proposed output of migrating the existing sum of mirror information to the proposed format.
Suggested future projects
- Create a service that keeps the mirror-listing up to date based on the source of truth (with parameters)
- Create new RFC's to improve: - Mirror requirements - Improved Tier model - Mirror-listing security and integrity - mirrorctl
Alternatives Considered
Alternative discussions was not mentioned in the Arch Summit 2023 as the conclusion was that this is a good step forward.