0032 Arch Linux Ports

Arch Linux Ports

Summary

Introduce "Arch Linux Ports" as testbed for unofficial architectures until they are integrated in the main Arch Linux repositories. This integration is meant to provide infrastructure and community-based support for architectures until they are fully supported by the main distribution.

Motivation

Arch Linux has historically been ported to different CPU architectures over the years. However, over the past decade we have not done enough to develop and foster expertise when it comes to this topic. Several separate projects exist today for the sole purpose of porting Arch Linux to e.g. ARM, LoongArch and RISC-V [1].

The following table serves as an overview of various external architecture ports and their up-to-date ratio in comparison to Arch Linux at the end of January 2024 [2].

architecture [core] packages ratio Outdated Missing [extra] packages ratio Outdated Missing
x86_64 265 100 0 0 13591 100 0 0
x86_64_v2 214 80.75 0 51 6998 51.49 39 6554
x86_64_v3 218 82.26 0 47 6999 51.50 36 6556
x86_64_v4 205 77.36 0 60 6425 47.27 31 7135
i486 112 42.26 143 10 3435 25.27 5926 4230
i686 154 58.11 102 9 5878 43.25 5985 1728
pentium4 155 58.49 102 8 5783 42.55 5985 1823
aarch64 242 91.32 10 13 11092 81.61 302 2197
armv7h 241 90.94 10 14 10709 78.79 484 2398
riscv64 252 95.09 5 8 11836 87.09 1223 532
loong64 142 53.58 117 6 7121 52.39 3423 3047

Anyone creating a new Arch Linux port today is required to make an effort to setup infrastructure similar to archweb, dbscripts, part of devtools, source hosting for PKGBUILDs, etc. This infrastructure should instead be shared by building a generic platform in a joint effort. This helps Arch Linux to grow its community, as well as effective architecture expertise. It engages interested people by allowing them to apply their knowledge and work hand in hand with relevant actors in the existing Arch Linux teams.

Arch Linux lacks a streamlined process, which outlines how to bootstrap a new architecture and what entails supporting one. Ports offer a way for a new architecture to be tried and worked on, without directly having to deliver full support for it. Previous RFCs have aimed at a direct integration for new architectures. This approach would have package maintainers facing new architecture-specific bugs and challenges, which may not be ideal, especially when it comes to still young or less common architectures.

Instead, this RFC outlines a data driven process through which we can estimate how much extra maintenance effort is required, how mature the architecture is, and how much interest there is for it in the community. We acknowledge the Debian Ports project as a role model for this RFC, which has been covering a similar use case for many years.

In the following sections we elaborate on specific technological changes and procedures that are required for

  • introducing a new port
  • maintaining a port until it is either promoted to an official architecture or dropped
  • providing and using source repositories of packages
  • providing and using binary package repositories
  • extending official build tooling
  • providing signatures for a new architecture
  • providing installation media
  • providing virtual machine images
  • providing distribution facing and end-user facing documentation for a Port
  • extending existing infrastructure to support additional architectures while reusing as much existing technology as possible
  • mirror administration
  • integrating with existing platforms
  • archiving of packages

Specification

Maintainers

The maintainers of a port are Arch Linux package maintainers as defined in RFC0007. Any Arch Linux package maintainer may take part in a port project.

Since the merge of binary package repositories and the move to per pkgbase git repositories as package source repositories (see RFC0014) all package maintainers have access to all package source repositories. However, only Developers can push packages built from those sources to the official [core] repository (also [core-staging] and [core-testing]).

In case external volunteers are interested in starting a port, they need to be onboarded as package maintainers to the Arch Linux team first.

Port maintainers may gain access to dedicated build infrastructure and are strongly encouraged to help maintain and improve it.

Ports

The ports project relies on a separate repository structure, that is shared among all ports.

Maintaining a port is a collective effort and requires work on related tooling, infrastructure, as well as packaging.

Architectures of bitness lower than 64 as well as legacy architectures are not considered for porting. When devices of a port do not (or cannot) support a boot flow following the Boot Loader Specification, documentation and (if applicable) packages for the boot process must be provided.

Ports are considered unsupported for as long as they are not accepted as an officially supported architecture by the distribution. Therefore all port maintainers are able to push packages also to a port's [core] repository (as well as [core-staging] and [core-testing]).

Packages for a port are built using the default package build tooling of the Arch Linux distribution (pkgctl at the time of writing) and port maintainers are expected to help extend the tooling for their port's use-case.

Names of ports must follow current allowed charsets when it comes to file naming. As an example, x86_64-v3 is not allowed in a file name, but x86_64_v3 is. For recognizability it is suggested to follow this scheme also when referring to the ports more generally.

Introducing a new port

New ports are added by proposing them in an RFC. At least two package maintainers have to lead a port to ensure it will be developed longer term.

Once an introduction RFC is accepted, the work on a port may start. Initial work includes additions of port specific integration (e.g. custom pacman.conf, makepkg.conf, etc.) to the official build tooling.

Port maintainers are expected to work together with relevant Arch Linux teams to enable the setup of any required infrastructure. Required changes to existing tooling should be proposed and if possible also implemented by port maintainers, to ease the workload on the existing teams.

As part of the porting work, port maintainers rely on the official architecture independent packages of the distribution. Any architecture independent packages, that are identified as architecture dependent by a port (this may include e.g. header-only packages for which the upstream build system provides different sets of files for differing architectures) must be adapted to become architecture dependent.

Source repositories

The package source repositories for all packages are also used for ports.

Tickets that are specific to a port are labeled accordingly to provide a better overview and allow people working on those specific ports to filter for them.

ports::aarch64
ports::loong64
ports::riscv64
ports::x86_64_v2
ports::x86_64_v3
ports::x86_64_v4

Ports are built from the same tags of a package's default branch as the supported architectures.

The build tooling matches the tag of a package source repository against the specific binary package version in the target repositories (ports, as well as supported architecture) to keep track of the current version of a given architecture.

As long as the architecture is not an officially supported architecture, the PKGBUILD arch array is not updated to contain the port architecture string. The build tooling should allow building for a specific architecture, while ignoring the particular architectures set for a given PKGBUILD.

If a port can not be built using the latest version of a given package source repository, port maintainers are advised to try and provide patches. If additional patches are sufficient, the port version may be built using an updated pkgrel. Alternatively, sub-versions (e.g. pkgrel=1.1) may be used if it is not feasible to rebuild the supported architecture(s).

In case a port is unable to upgrade (e.g. due to issues for which upstream has not yet found a solution, or due to switching to a language unsupported by the architecture), a port has to maintain compatibility with whichever is the last working version of a given project. This may entail providing further updates to pkgrel for rebuilds going forward. For these updates to pkgrel a separate branch named after the port's architecture must be used in the dedicated ports/ namespace (e.g. ports/aarch64, ports/loong64, ports/riscv64, ports/x86_64_v2, ports/x86_64_v3 or ports/x86_64_v4). This branch is kept indefinitely. If the upstream project releases a new version for this older version branch of its software, a new package source repository tracking this particular version should be created (e.g. project -> project1). In case the port is able to return to the default branch of the package source repository, updates to the package again follow the normal workflow.

When using the separate branch in the ports/ namespace, great care has to be taken. Introducing an epoch should only be done after weighing all possible options, as doing so has implications for the default branch later on. If a port returns to using the default branch again after having introduced an epoch for the namespaced ports/ branch, also the default branch must introduce an epoch.

Patches should be applied conditionally, if specific for a port architecture and unconditionally, if merged upstream. For architecture specific patches, dedicated source arrays (e.g. source_riscv64=()) are to be used.

Port maintainers should attempt to provide generic adaptations for PKGBUILDs if there are architecture specific issues to be solved. Package maintainers should apply these adaptations, so that ports can be kept as close to the current supported architecture as possible.

Binary repositories

Each port maintains their own binary package repository. Port repositories are distributed alongside the official Arch Linux repositories on the mirrors in a namespaced directory structure:

/ports/pkg/aarch64/$repo/
/ports/pkg/loong64/$repo/
/ports/pkg/riscv64/$repo/
/ports/pkg/x86_64_v2/$repo/
/ports/pkg/x86_64_v3/$repo/
/ports/pkg/x86_64_v4/$repo/

Installation media

Ports should create generic installation media, which works on a broad set of devices. At the time of writing, this can be achieved using archiso's releng profile as a base. This does not necessitate covering installation scenarios for special purpose hardware (e.g. those only able to boot from an SD card), as for those special purpose bootstrap artifacts need to be created.

Port installation media is distributed alongside the official Arch Linux binary artifacts in a namespaced directory structure:

/ports/install/aarch64/
/ports/install/loong64/
/ports/install/riscv64/
/ports/install/x86_64_v2/
/ports/install/x86_64_v3/
/ports/install/x86_64_v4/

Additionally, it is strongly recommended for port maintainers to work on integration with other software that helps with the installation of Arch Linux, such as archinstall.

Virtual machine images

Ports should create virtual machine images for their specific architecture. At the time of writing this can be achieved by adapting the arch-boxes project.

Port virtual machine images are distributed alongside the official Arch Linux binary artifacts in a namespaced directory structure:

/ports/vm/aarch64/
/ports/vm/loong64/
/ports/vm/riscv64/
/ports/vm/x86_64_v2/
/ports/vm/x86_64_v3/
/ports/vm/x86_64_v4/

Promoting a port

Promoting a port to an officially supported architecture is done by proposing a port to be promoted in an RFC. A port to be promoted should have an average 90% up-to-date ratio over the previous six months. As such a promotion RFC should include numbers on the up-to-date ratio vs. the official repositories and outline how many people are actively working on architecture specific issues and packaging. The estimated workload on maintainers of the existing, officially supported architectures needs to be highlighted and the effective difference between the long-lived port branches vs. the default branches must be evaluated.

After a promotion RFC is accepted, work starts on adapting the default branches of all package repositories. Any changes required to package the ported architecture in the official repositories are applied to the default branch. When all default branches are in sync, the architecture is bootstrapped into the official repositories.

If a promotion RFC is not accepted, it may be re-evaluated at a later point in time, if any major issues that led to it not being accepted have been worked out.

Support

For Arch Linux staff the ability to provide support for a port depends on the availability of hardware and virtual machines to test and reproduce on.

If a port introduces hardware that requires special integration (e.g. a custom bootloader setup for a single board computer), basic user-facing documentation in the Arch Wiki about this device must be provided by relevant package maintainers (e.g. stating which bootloader package to use). Furthermore, any non-trivial packaging steps for such a device must be clearly documented, so that other packagers may share the workload of maintaining such a device.

Closing an existing port

Closing a port as abandoned is done by proposing a port to be closed in an RFC. A port should be closed if no one is willing to lead the project for six months or more, if it has not seen sufficient activity for six months or more, or if the up-to-date ratio of existing ported packages falls below 50% for six months or more. Re-opening a closed port should go through the same procedure as introducing a new port.

If a port has been closed, its long-lived branches may be kept for up to two years, in case volunteers are willing to put in the effort to re-introduce it. After this grace period has passed, the long-lived branches should be removed.

Infrastructure and required work

The following subsections highlight work that has to be done for the ports project to succeed.

The introduction of automated build infrastructure is out of scope for this RFC. However, creating ports serves as learning experience for identifying common issues that need addressing in automated builds and as a stepping stone towards a centralized build infrastructure. As such port maintainers should work towards a common infrastructure and not rely on single-purpose build infrastructure.

Archweb integration

Archweb should be extended to make multi-architecture packages more organized, easy to search for and filter.

The website should clearly indicate which architectures are unofficial ports and which are officially supported.

Providing progress statistics and graphs would be ideal.

AUR integration

All port architectures alongside the officially supported ones should be allowed in the arch array of PKGBUILDs found in the AUR.

Ideally, the AUR web interface / API is extended to allow filtering packages by architecture.

Keyring

The existing archlinux-keyring will be reused for ports as the packagers remain in the same trust group (Arch Linux Package Maintainers).

Package guidelines

Existing package guidelines should be adapted to become more generic and architecture agnostic. This ensures, that new architectures can reuse existing package build sources easily.

Port maintainers should help extend the distribution's package guidelines wherever friction prevents a straight forward adoption of existing guidelines.

Package repository management

The central package repository server is used to manage the repositories of supported architectures. For this, the host stages all available packages in a package pool directory and symlinks those needed by a specific repository (e.g. core or extra) to its respective package repository directory. Package repositories expose a specific set of packages via their repository sync databases to the end user. Unneeded packages (those no longer mentioned in a repository sync database and therefore no longer in any package repository) are removed from the package repository and package pool on a regular basis.

The handling of port repositories will work in an analogous manner: A dedicated package pool for each port architecture is created on the central package repository server, that will contain packages specific to that architecture. All architecture independent packages (i.e. those of type any) are reused from the package pool of the supported architectures.

The cleanup algorithm of the supported architectures package pool is adjusted to also consider package use in port package repositories.

The contents and handling of a port's package repository relies on two package pool directories. Appropriate changes to the existing or any upcoming package repository management tooling needs to be made to reflect this behavior.

Archival

Arch Linux has a strong commitment towards reproducible builds. As such, a package archive is also required for ports to be able to reproduce packages that were built against packages which are no longer in the current repositories.

Any new port should therefore be archived using established archival procedures.

Debug packages

Debug symbols are a useful and necessary tool to help upstream developers and distribution package maintainers to debug issues with built packages. Due to the size of debug packages and the mostly on demand requirement for them, the focus of providing them should lie on debug-infod integration.

Whether the widespread hosting of dedicated debug package repositories is wanted and is feasible should be evaluated in a separate RFC.

Package build tooling

Arch Linux's official package build tooling devtools must be adapted to make dealing with ports easier and more streamlined.

As an example: Rebasing the arch array of PKGBUILDs every time the canonical package changed is cumbersome. A possible solution to this problem is to modify makepkg to accept options via makepkg.conf as well (e.g. to override -A/ --ignorearch) or to modify pkgctl in such a way that passing in -A/ --ignorearch or a specific architecture becomes possible.

A baseline for compiler flags for each port must be established. The compile flags of a port should stay as close as possible to the ones used by the officially supported architecture(s).

The package lint integration for namcap must be extended to support a port's specific requirements (e.g. verifying SIMD and CPU extension support in binaries).

Existing architecture independent packages and signatures are reused in a port. Edge-cases around these packages such as the following need to be solved: - adding existing packages from the official package pool or maybe even the package archive to a package repository without building them - file collision prevention in case a package needs to be modified and rebuilt due to architecture specific changes (e.g. by using a pkgrel of 1.1 for a port, while the pkgrel for the supported architectures remains at 1)

Whether the integration of QEMU for virtualized builds or direct cross-compilation is worth the effort should be evaluated and if feasible integrated. Virtualization with QEMU's user mode faces many challenges such as: - use of the unsafe "C" flag for binfmt - various issues such as /proc parsing, futex related hangs in Python, Go and Rust - incomplete support for vfork Support for cross-compilation appears to be even more complex as it would mean rethinking larger portions of our current source handling (e.g. defining a separate sysroot for the target architecture and passing in required environment variables etc.). As such this integration should be evaluated and concluded upon in a separate RFC.

Mirrors

Mirror maintainers are strongly encouraged to also host the ports package pools and package repositories.

The additional space requirements for a port currently range between 33 GiB (total size of the architecture dependent packages in the riscv64 port) and 61 GiB (total size of the architecture dependent packages in the official x86_64 repositories).

The space requirements for also hosting debug packages of a port are currently at 72 GiB (total size of debug packages for the official repositories).

Mirror maintainers may opt out of hosting ports (e.g. due to size limitations on their hardware). To exclude a port from the mirror's syncing operations, the port specific package pool and package repository directory structure need to be ignored. To exclude all ports, all port package pool and package repository directory structures have to be ignored.

Mirrorlist

Arch Linux's mirrorlist management, as well as user-facing tools have to be adapted to support filtering by architecture and allow generation of port specific mirrorlists.

Documentation

Ports must maintain documentation that is aimed at its maintainers and outlines how to run them and how to package for them. If the distribution handbook (RFC0021) is accepted, this documentation should be added there, otherwise a dedicated location has to be found for it. This maintainer-aimed documentation should also work towards outlining a generalized process for bootstrapping Arch Linux on new architectures.

User-facing documentation on how to install and use a port should be added to the relevant articles in the ArchWiki.

Drawbacks

This proposal does not concern itself with the details of build automation and its orchestration. The authors believe, that a dedicated RFC is better suited to discuss this topic and outline solutions there. Furthermore, the authors are convinced, that aiming for active port development will raise the need and motivation to work on automation topics and attract external expertise.

When a port is accepted as official architecture, the distribution model may have to be adapted accordingly. In the current model with only one architecture, upgrades of packages and entire ecosystems can be handled with relative ease. Adding further supported architectures in the future will require Arch Linux to establish rules around how to deal with architectures, that are blocking others from being upgraded. The authors of this RFC recommend for a step-wise upgrade scenario to be established in a separate RFC.

The size of the main package pool directory will increase depending on how many ports drift apart for how much in their use of architecture independent packages in comparison to the officially supported repositories. This may lead to an increase in size required for mirrors on top of the additional size requirements for architecture dependent port directory structures.

Unresolved Questions

Alternatives Considered

The past has shown, that if expertise and domain knowledge can not be kept with Arch Linux, alternative projects such as Arch Linux ARM are created. If these projects are not merged back into the default distribution at some point and are undermaintained, they go defunct over time.

With RFC0002 we have tried to establish x86_64_v3 as a new CPU sub-architecture. It became clear afterwards, that our efforts for automation and build infrastructure were not yet sufficient to deal with this and that our documentation of the process lags. This, in addition to fear of introducing uncalcuable overhead for packagers led to the effort stalling for a long period of time.

Separate entity

A packaging approach separate from the currently proposed one has been evaluated in which a separate entity is created for maintaining the ports to lower the entry point for contributors. This led to the evaluation of the following points for providing access to non package maintainers:

  • defining one or several new contributor roles for our platforms
  • figuring out how to do very fine grained access management on our package source repositories (are those new contributors allowed to merge things on some or all branches? are they allowed to tag releases? etc.)
  • creating a separate keyring to more clearly separate trust
  • creating a separate host for ports repositories to separate access rights
  • syncing parts of the official repositories (or rather their package pools) onto the ports repositories host (mostly to deduplicate the any packages)
  • syncing everything from the ports repository host back onto the main repository host

This approach has been discarded, as it would introduce more managerial and technological overhead. Relying on an established set of roles and security model appeared to be a more desirable outcome.

Separate branch model

A separate branch model has been evaluated for managing the package sources. It is outlined below for posterity. It has been discarded due to the technological overhead in comparison to a single integration branch model.

A long-lived branch per port is created for each package repository and kept in sync with the default branch.

The port's architecture string is added to the PKGBUILD arch array on the dedicated branch, which requires the port maintainers to rebase onto latest changes on the default branch.

If a specific package is only required by a port that is not yet an official architecture, the port must maintain the default branch and the port specific branch for as long as the port exists.

The port branches are prefixed to not collide:

ports/aarch64
ports/loong64
ports/riscv64
ports/x86_64_v2
ports/x86_64_v3
ports/x86_64_v4

Tags for packaging purposes are created with a prefix, so that they do not collide with tags for official packages or other ports:

ports/aarch64/$epoch-$pkgver-$pkgrel
ports/loong64/$epoch-$pkgver-$pkgrel
ports/riscv64/$epoch-$pkgver-$pkgrel
ports/x86_64_v2/$epoch-$pkgver-$pkgrel
ports/x86_64_v3/$epoch-$pkgver-$pkgrel
ports/x86_64_v4/$epoch-$pkgver-$pkgrel

Port maintainers should attempt to provide generic adaptations for PKGBUILDs if there are architecture specific issues to be solved. Package maintainers should apply these adaptations, so that the diff for the long-lived port branches can be kept as small as possible.

Footnotes

[1]For the sake of simplicity, architecture overlays (e.g. x86_64_v2, etc.) are summarized as ports as well. Their source and binary repository handling is equivalent to that of a port.
[2]The low up-to-date ratio for the x86_64_v2, x86_64_v3 and x86_64_v4 architecture overlays is in large parts due to them reusing the type any packages of the official Arch Linux repositories.