0016 Spdx License Identifiers

Use SPDX license identifiers in PKGBUILDs

Summary

Change to using SPDX license identifiers in the PKGBUILD license value for packages in all repositories.

Motivation

The license identifiers we use have become inadequate for the large array of open-source licenses in use today, and no longer properly describe the licenses that they are meant to represent. When it was devised, there was no commonly accepted standard for license identifiers, and so it was necessary to develop one that would work for Arch's use case.

In recent years, however, this has changed. SPDX, or Software Package Data Exchange, was formed by the Linux Foundation as a well-defined standard for software bill of material information. As of writing, it is formally recognized as the international open standard for license compliance (among other things) under ISO/IEC 5962:2021. It provides a concrete framework for identifying licenses of all kinds and with all sorts of exceptions and modifications. Additionally, its syntax supports licenses that are not officially defined in the SPDX list.

Several other distributions have changed over to the SPDX license identifier format, including OpenSUSE (https://en.opensuse.org/openSUSE:Packaging_guidelines#Licensing) and Fedora (https://fedoraproject.org/wiki/Changes/SPDX_Licenses_Phase_1).

Specification

The new license identifiers will match SPDX license expressions, as defined by the specification here: https://spdx.github.io/spdx-spec/v2.3/SPDX-license-expressions/. License identifiers can be found here: https://spdx.org/licenses/. Packages under a single license will use just the identifier in the SPDX license list, such as Apache-2.0 and GPL-2.0-or-later.

Packages under multiple licenses or with special license properties will use composite license expressions, also defined in the SPDX spec. The following operators can be used for composite license expressions:

  • If the license terms state that the current version of the license or any later version may be used, a + should be added to the end of the license, such as CDDL-1.0+.
  • If there is a choice between two or more licenses, the OR operator should be used, such as LGPL-2.1-only OR MIT; this can extend depending on the number of possible licenses.
  • If the package contains components that are under different licenses, the AND operator should be used, such as LGPL-2.1-only AND MIT.
  • If the license in use by the project contains exceptions as specified here: https://spdx.org/licenses/exceptions-index.html. Those exceptions should be applied using the WITH operator; for example, GPL-2.0-or-later WITH Classpath-exception-2.0.

These operators have a defined precedence, in the following order: +, WITH, AND, OR. Parentheses can be used to override this precedence, similar to algebraic expressions.

For clarification on any of the information stated here, see the official specification. Some examples of trivial and non-trivial license conversions follow.

Package Arch Identifier SPDX Expression
openssl ('Apache') ('Apache-2.0')
firefox ('GPL' 'LGPL' 'MPL') ('GPL-2.0-or-later OR LGPL-2.1-or-later OR MPL-2.0')
tmux ('BSD') ('BSD-3-Clause AND BSD-2-Clause')
gcc* ('GPL3' 'LGPL' 'FDL' 'custom') ('GPL-2.0-or-later AND LGPL-2.1-or-later AND GPL-3.0-or-later WITH GCC-exception-3.1 AND LGPL-3.0-or-later AND GFDL-1.3-no-invariants-or-later AND (NCSA OR MIT) AND Apache-2.0 WITH LLVM-exception')

* This was the best approximation of a conversion to SPDX that I could come up with for GCC.

All existing PKGBUILDs will be converted to this identifier specification, and the documentation will be updated in tandem for any new PKGBUILDs. Additionally, a linter will be included in the package checking process to inform a package builder of the validity of the PKGBUILD's license value.

The licenses package will be revised to source from the official SPDX licenses list data files at https://github.com/spdx/license-list-data. All new standardized license text files will be installed in /usr/share/licenses/spdx/ - the license-list-data repository provides .txt versions of all SPDX-standardized license texts.

Drawbacks

  • This system will require some significant legwork to implement. All packages must be converted to the new system over time, documentation must be updated, and tooling should be updated to verify new license expressions. This change cannot happen overnight, and as such, there would be a lengthy transition period.

Unresolved Questions

  • What linting solution will be used? Fedora has rolled their own that is more specific to their needs, as has the Linux kernel. SPDX also has official tools that should be evaluated for their potential use here.
  • How will custom licenses be specified in the license key? The specification denotes a LicenseRef- prefix to be used with custom license identifiers, though it has been voiced by some that the use of the existing custom: notation may be preferable.

Alternatives Considered

  • Update our list of common licenses to be more accurate to the licenses they represent.
  • Add additional licenses to the list of common licenses.