vanitasvitae/openpgp-notes

Fork 0

mirror of https://codeberg.org/openpgp/notes.git synced 2025-09-10 11:49:40 +02:00

Heiko Schaefer 9c4a00ea0f

ch4: merging: process feedback from paul in #121

2023-11-22 18:47:58 +01:00

32 KiB

Raw Blame History

(certificates_chapter)=

Certificates

OpenPGP fundamentally hinges on the concept of "OpenPGP certificates," also known as "OpenPGP keys." These certificates are complex data structures essential for identity verification, data encryption, and digital signatures. Understanding their structure and function is pivotal to effectively applying the OpenPGP standard.

Terminology: Understanding "keys"

The term "(cryptographic) keys" is central to grasping the concept of OpenPGP certificates. However, it can refer to different entities, making it a potentially confusing term. Let's clarify those differences.

Public vs. private keys

The term "key," without additional context, can refer to either public or private asymmetric key material. Additionally, symmetric keys may be used in OpenPGP to encrypt private key material, adding a layer of security and complexity.

Layers of keys in OpenPGP

In OpenPGP, the term "key" may refer to three distinct layers, each serving a unique purpose:

A (bare) "cryptographic key" comprises the private and/or public parameters forming a key. For instance, in the case of an RSA private key, the key consists of the exponent d along with the prime numbers p and q.
An OpenPGP component key includes either an "OpenPGP primary key" or an "OpenPGP subkey." It is a building block of an OpenPGP certificate, consisting of a cryptographic keypair coupled with some invariant metadata, such as key creation time.
An "OpenPGP certificate" (or "OpenPGP key") consists of several component keys, identity components, and other elements. These certificates are dynamic, evolving over time as components are added, expire, or are marked as invalid.

The following section will delve into the OpenPGP-specific layers (2 and 3) to provide a clearer understanding of their roles within OpenPGP certificates.

For a discussion of private key material in OpenPGP, see the chapter {ref}private_key_chapter. Bindings that connect the components of a certificate are discussed in our chapter {ref}component_signatures_chapter. For much more detail on the internal (packet) structure of certificates and keys refer to our chapter {ref}zoom_certificates. Additionally, managing certificates, and understanding their authentication and trust models are vital topics. While this document briefly touches upon these aspects, they are integral to working proficiently with OpenPGP.

Structure of OpenPGP certificates

An OpenPGP certificate (or "OpenPGP key") is a collection of an arbitrary number of elements¹:

Component keys
Identity components
Additional metadata, including connections between the certificate's components

This documentation collectively refers to component keys and identity components as "the components of a certificate."


Typical components in an OpenPGP certificate

Every element in an OpenPGP certificate revolves around a central component: the OpenPGP primary key. The primary key acts as a personal CA (Certification Authority) for the certificate's owner, enabling cryptographic statements regarding subkeys, identities, expiration, revocation, and more.

OpenPGP certificates tend to have a long lifespan, with the potential for modifications (typically by their owner) over time. Components may be added or invalidated throughout a certificate's lifetime.

Component keys

An OpenPGP certificate usually contains multiple component keys. Component keys serve in one of two roles: either as an "OpenPGP primary key" or as an "OpenPGP subkey."

OpenPGP component keys logically consist of an asymmetric cryptographic keypair and a creation timestamp. Once created, these attributes of a component key remain fixed (for ECDH keys, two additional parameters are part of a component key's constitutive data²).


An OpenPGP component key

In OpenPGP, component keys containing private key material also include metadata specifying the password protection scheme. This is another facet of metadata, akin to the aforementioned creation timestamp and additional parameters for certain algorithms. However, this discussion focuses on OpenPGP certificates, in which the component keys contain only the public part of its cryptographic key data. For information on private keys in OpenPGP, see {numref}private_key_chapter.

Fingerprint

Each OpenPGP component key possesses an OpenPGP fingerprint. This fingerprint is derived from the public key material, the creation timestamp, and, when relevant, the ECDH parameters.


Every OpenPGP component key is identifiable by a  fingerprint. Although it's technically possible for different keys to share a fingerprint, cryptographic mechanisms make it exceedingly difficult, if not practically impossible with current technology, to find keys that share a fingerprint.

The fingerprint of our example OpenPGP component key is C0A5 8384 A438 E5A1 4F73 7124 26A4 D45D BAEE F4A3 9E6B 30B0 9D55 13F9 78AC CA94³.

Primary key

The OpenPGP primary key is a component key that serves a distinct, central role in an OpenPGP certificate:

Its fingerprint acts as an identifier for the entire OpenPGP certificate.
It facilitates lifecycle operations, such as adding or invalidating subkeys or identities within a certificate.

:class: note

In the RFC, the OpenPGP primary key is occasionally referred to as "top-level key." Informally, it has also been termed the "master key."

Subkeys

Modern OpenPGP certificates typically include several subkeys in addition to the primary key, although these subkeys are optional.

While subkeys have the same structural attributes as the primary key, they fulfill different roles. Subkeys are cryptographically linked with the primary key, a relationship further discussed in {numref}binding_subkeys.

:name: Certificate with subkeys
:alt: Diagram depicting three component keys. The primary key is positioned at the top, designated for certification. Below it, connected by arrows, are two subkeys labeled as "for encryption" and "for signing," respectively.

OpenPGP certificates can contain multiple subkeys.

Defining operational capabilities with key flags

Each component key has a set of "key flags" that delineate the operations a key can perform.

Commonly used key flags include:

Certification: enables issuing third-party certifications
Signing: allows the key to sign data
Encryption: allows the key to encrypt data
Authentication: primarily used for OpenPGP authentication

Distinct component keys handle specific operations. Only the primary key can be used for certification, although it can have additional capabilities. Subkeys can be used for signing, encryption, and authentication but cannot have the certification capability. It is considered good practice, however, to [use separate keys for each capability](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-12.html#section-10.1.5-7). 

Notably, in many algorithms, encryption and signing-related functionalities (i.e., certification, signing, authentication) are mutually exclusive, because the algorithms only support one of those two families of operations[^key-flag-sharing]).

Component key metadata, including key flags

The key flags for a component key are not stored within the component key directly.

Instead, key flags, along with other metadata about that component key, such as the key expiration time, are stored using mechanisms that bind components into an OpenPGP certificate:

For the primary key, its key flags and other metadata can be defined in two ways: they can be linked with the Primary User ID or through a direct key signature.
For subkeys, the key flags and other metadata are set using the mechanism that binds the subkey to the certificate, specifically through the primary key. Further details on binding subkeys are below.

:class: warning

Write a section about algorithm preference/feature signaling

(identity_components)=

Identity components

Identity components in an OpenPGP certificate are used by the certificate holder to state that they are known by a certain identifier (like a name, or an email address).

User IDs in OpenPGP certificates

OpenPGP certificates can contain multiple User IDs. Each User ID associates the certificate with an identity.


OpenPGP certificates can contain any number of User IDs

This image could be visually improved! The new image should have an alt tag

A typical User ID identity is a UTF-8-encoded string composed of a name and an email address. By convention, User IDs align with the format described in RFC2822 as a name-addr.

For further conventions on User IDs, refer to the document draft-dkg-openpgp-userid-conventions-00, dated 25 August 2023.

One proposed variant for encoding identities in User ID is to use "split User IDs".

Heiko, please clarify what the value is of this proposal or remove it.

(primary_user_id)=

Implications of the Primary User ID

Within a certificate, a specific User ID is designated as the Primary User ID.

Each User ID carries associated preference settings, such as preferred encryption algorithms, which is detailed in {numref}zooming_in_user_id). The preferences associated with the Primary User ID take precedence by default.

:class: warning

i think crypto-refresh suggests that the direct key signature should hold the default preferences?
we might need to write a more nuanced text here, about how DKS and primary user id interact in v6, and mention the differences to v4? 

the primary User ID can also specify metadata about the primary key

User attributes in OpenPGP

While user attributes are similar to User IDs, they are less commonly used.

Currently, the OpenPGP standard prescribes only one format to be stored in user attributes: an image. Typically, this image represents the key owner, although it is not required.

Linking the components

To form an OpenPGP certificate, individual components are interconnected by the certificate holder using their OpenPGP software. Within OpenPGP, this process is termed "binding," as in "a subkey is bound to the primary key." These bindings are realized using cryptographic signatures. An in-depth discussion of this topic can be found in {ref}component_signatures_chapter).

In very abstract terms, the primary key of a certificate acts as a root of trust or "certification authority." It is responsible for:

issuing signatures that express the certificate holder's intent to use specific subkeys or identity components;
conducting other lifecycle operations, including setting expiration dates and marking components as invalidated or "revoked."

By binding components using digital signatures, recipients of an OpenPGP certificate need only validate the authenticity of the primary key to use for their communication partner. Traditionally, this is done by manually verifying the fingerprint of the primary key. Once the validity of the primary key is confirmed, the validity of the remaining components can be automatically assessed by the user's OpenPGP software. Generally, components are valid parts of a certificate if there is a statement signed by the certificate's primary key endorsing this validity.

Revocations

:class: warning

This section needs to be written

Third-party (identity) certifications

:class: warning

This section needs to be written

Third-party identity certifications have historically played a pivotal role in the OpenPGP ecosystem.

Security considerations

While a convenience for consumers, indiscriminately accepting and integrating third-party identity certifications comes with significant risks.

Without any restrictions in place, malicious entities can flood a certificate with excessive certifications. Called "certificate flooding," this form of digital vandalism grossly expands the certificate size, making the certificate cumbersome and impractical for users.

It also opens the door to potential denial-of-service attacks, rendering the certificate non-functional or significantly impeding its operation.

The popular SKS keyserver network experienced certificate flooding firsthand, causing it to shut down operations in 2019.

Improved mechanisms in OpenPGP v6

:class: warning

This section needs to be written

Advanced topics

(append-only)=

Certificates are effectively append-only data structures

OpenPGP certificates act as append-only data structures, in practice. By this, we mean that packets that are associated with a certificate cannot be "recalled", once they were published. Third parties (such as other users, or keyservers) may keep and/or distribute copies of those packets.

While it is not possible to "remove" elements, once they were publicly associated with an OpenPGP certificate, it is possible to invalidate them by adding new metadata to the certificate. This new metadata could set an expiration time on a component, or explicitly revoke that component. In both cases, no packets are removed from the certificate.

Invalidation resembles removal of a component in a semantical sense. The component is not a valid element of the certificate anymore, at least starting from some point in time. Implementations that handle the certificate may omit the invalid component in their representation.

We have to distinguish the "packet level" information about a certificate from an application-level view of that certificate. The two may differ.

Reasoning about append-only properties in a distributed system

OpenPGP is a thoroughly distributed system. Users can obtain and transmit certificate information about their own, as well as other users', certificates using a broad range of mechanisms. These mechanisms include keyservers, manual handling, WKD and Autocrypt.

User's OpenPGP software may obtain different views of a particular certificate, over time. These systems have to reconcile and store a combined version of the possibly disparate elements they may obtain from different sources.

In practice, this means that various OpenPGP users may have differing views of any given certificate. For various reasons, not all users will be in possession of a fully up-to date and complete version of a certificate.

There are various potential problems associated with this fact: Users may not be aware that a component has been invalidated by the certificate holder. Revocations may not have been propagated to some third party. So for example, they may not be aware that the certificate holder has rotated their encryption subkey to a new one, and doesn't want to receive messages encrypted to the previous encryption subkey.

One mechanism that addresses a part of this issue is expiration: By setting their certificates to expire after an appropriate interval, certificate holders can force their communication partners to refresh their certificate, e.g. from a keyserver⁴.

Good practices, like setting appropriate expiration times, can mitigate the complexity of the inherently distributed nature of certificates.

However, such mitigations by definition cannot address all possible cases of outdated certificate information in a decentralized, asynchronous system such as OpenPGP. So a defensive approach is generally appropriate when reasoning about the view of certificates that different actors have.

When thinking about edge cases, it's useful to "assume the worst." For example:

Recipients may not obtain updates to a certificate in a timely manner (this could happen for various reasons, including, but not limited to, interference by malicious actors).
Data associated with a certificate may compound, and can become too large for convenient handling. If such a problem arises, then by definition, the certificate holder cannot address it: recall that the certificate holder cannot "recall" existing packets.

Differing "views" of a certificate exist

Another way to think about this discussion is that different OpenPGP users may have a different view of any certificate. There is a notional "canonical" version of the certificate, but we cannot assume that every user has exactly this copy. Besides propagation of elements that the certificate holder has linked to a certificate, third-party certifications are by design a distributed mechanism. A third-party certification is issued by a third party, and may or may not be distributed widely by them, or by the certificate holder. Not distributing third-party certifications widely is a workflow that may be entirely appropriate for some use cases.

As a general tendency, it is desirable for OpenPGP users to have the most complete possible view of all certificates that they interact with.

However, there are contexts in which implementations may prefer to handle only a subset of the elements of a certificate. We discuss this in the section {ref}cert-mini.

Merging

As described above, OpenPGP certificates are effectively append-only data structures. As part of the practical realization of this fact, OpenPGP software needs to merge different copies of a certificate.

For example, Bob's OpenPGP system may have a local copy of Alice's certificate, and obtain a different version of Alice's certificate from a keyserver. The goal of the implementation is to add new information about Alice's certificate, if any, to the local copy. Alice may have added a new identity, replaced a subkey with a replacement subkey, or revoked some components of her certificate. Or, Alice may have revoked her certificate, signaling that she doesn't want communication partners to use that certificate anymore. All of these updates could be crucial for Bob to be aware of.

Merging two versions of a certificate involves making decisions about which packets should be kept. The versions of the certificate will typically contain some packets that are identical. No duplicates of the exact same packet should be stored in the merged version of the certificate. Additionally, if the newly obtained copy contains packets that are in fact entirely unrelated to the certificate, those should not be retained (a third party may have included unrelated packets, either by mistake, or with malicious intent).

Handling unauthenticated information

For information that is related to the certificate, but not bound to it by a self-signature, there is no generally correct approach. The receiving implementation must revolve these cases, possibly in a context-specific manner. Such cases include:

Third-party certifications. These could be valuable information, where a third party attests that the association of an identity to a certificate is valid. On the other hand, they could also be a type of spam.
Subpackets in the unhashed area of a signature packet. Again, these could contain information that is useful to the recipient. However, the data could also be either useless, or even misleading/harmful.

(cert-mini)=

Certificate minimization

Certificate minimization is the practice of presenting a partial view of a certificate by filtering out some of its components.

Filtering out some elements of a certificate can have different benefits:

For some workflows it's clear that the full certificate is not required. For example, email clients only need encryption, signing and certification component keys. They don't need authentication subkeys, which are used for SSH connections.
In some contexts, data can be added to certificates by third parties, e.g. by adding third-party User ID certifications on some key servers. In the worst case this can lead to "certificate flooding" which inflates the target certificate to a point where consumer software rejects the certificate completely. Filtering out elements can mitigate this.
Sometimes, a certificate organically grows so big that the user software has problems handing it.

Elements that can be omitted as part of a minimization process

There are different types of elements that can be omitted during minimization:

Subkeys (along with signatures on those subkeys)
Identity components (along with both their self-signatures and third-party signatures)
Signatures, by themselves:
- Self-signatures that have been superseded by newer self-signatures for the same purpose
- Third-party certifications

Minimization in applications

Hagrid, which runs keys.openpgp.org

The hagrid keyserver software doesn't publish the identity components in certificates by default. This is a central aspect of the privacy policy of the service. Certificates can be uploaded to the service by third parties, which is useful. However, identifying information is only distributed by the service on an explicit opt-in basis.

Separately, third-party certifications are currently filtered out by the service, to avoid flooding attacks.

GnuPG

GnuPG strips some signatures on key import.

In addition, GnuPG offers two explicit methods for certificate minimization, described in the GnuPG manual as:

clean: Compact (by removing all signatures except the selfsig) any user ID that is no longer usable (e.g. revoked, or expired). Then, remove any signatures that are not usable by the trust calculations. Specifically, this removes any signature that does not validate, any signature that is superseded by a later signature, revoked signatures, and signatures issued by keys that are not present on the keyring.
minimize: Make the key as small as possible. This removes all signatures from each user ID except for the most recent self-signature.

clean removes third-party signatures by certificates that are not present in current keyring, as well as other stale data. minimize removes superseded signatures that are not needed at the point when the command is executed.

Limitations that can result from stripping historical self-signatures

Some implementations, such as Sequoia, prefer to rely on the full historical set of self-signatures to construct a view of the certificate over time. This way, signatures can be verified at different reference times. In this model, removing superseded self-signatures can cause problems with the validation of historical signature.

An example for the tension between minimization and nuanced verification of the temporal validity of signatures can be seen in the case of rpm-sequoia. To handle the limited availability of historical self-signatures on certificates in the wild, the rpm-sequoia implementation was adjusted to accept self-signatures that predate the existing self-signature for the signing key.

Autocrypt

The Autocrypt Level 1 specification defines a specific minimal format for OpenPGP certificates that are distributed by the autocrypt mechanism.

Autocrypt/WKD minimization

Email clients depend only on a limited subset of the components of certificates. Thus, it's possible to use a smaller view of that certificate, which is easier to transfer by mail user-agents.

For example the following fragment drops any subkey that is not usable at the time of export. Additionally, all authentication subkeys are stripped since they do not have any use for email:

gpg --export-options export-minimal,export-clean,no-export-attributes \
    --export-filter keep-uid=mbox=wiktor@metacode.biz \
    --export-filter 'drop-subkey=expired -t || revoked -t || usage =~ a' \
    --export wiktor@metacode.biz

At the time of writing, the resulting filtered exported certificate comprises 3771 bytes. This is significantly smaller than the full certificate, which comprises 152322 bytes. The minimization made the certificate 40x smaller, which can be important in some contexts (e.g. when embedding the certificate in email headers).

Note that in some contexts it's not clear if minimization brings more benefit than harm. Consider the ProtonMail client, which fetches OpenPGP certificates via WKD automatically when composing a message. It needs only subkeys. But if the same key is fetched as part of automatic signature verification then stripping certifications and leaving only subkeys would prevent the client from performing Web of Trust calculations and authenticating the certificate.

Pitfalls of minimization

Disadvantages/risks of minimizing certificates:

Does not present full view of how the certificate (and the validity of its components) evolved over time.
As other certificates are collected, third-party certifications that were previously unusable may become usable again. Dropping third-party certifications as a part of minimization prevents this mechanism.
Removing component keys that the minimizing implementation can't use means that the receiver does not receive a copy of those, even if the receiver supports them.
Refreshing certificates from key servers may inflate the certificate again, since OpenPGP certificates tend to act as append-only structures.
Carelessly stripping all invalid components may make the certificate unusable. Some libraries, such as anonaddy-sequoia strip unusable encryption subkeys. However, at least one subkey is retained, even if all encryption subkeys are unusable. Even though this may leave only an expired encryption subkey in the certificate, this presents a better UX for the end-user who probably is still in possession of the private key for decryption.

Guidelines

Don't minimize certificates unless you have a good reason to
When presenting a minimized certificate view, consider when that view needs to be updated. Ideally, minimized certificates are freshly generated, on demand (e.g. the Autocrypt header is constructed while an email is sent or composed) and the client merges all data collected.

Fingerprints and beyond: "Naming" certificates in user-facing contexts

Version 4

With OpenPGP version 4 certificates, it was customary that user-facing software used 20 byte fingerprints as an identifier for the certificate. Or alternatively, the shortened 8 byte Key ID. Both were represented in hexadecimal format, sometimes with whitespace to group the fingerprint into blocks for easier readability.

For example, in workflows to accept a certificate for a communication partner, or during third-party certification of an identity, users were shown hexadecimal representations of a fingerprint, and asked to manually verify that the fingerprint corresponds to the expected certificate.

Version 6

The OpenPGP version 6 standard uses 32 byte fingerprints, but explicitly defines no format for displaying those fingerprints in a human-readable form. The standard recommends strongly against using version 6 fingerprints as identifiers in user-facing workflows.

Instead, "mechanical fingerprint transfer and comparison" should be preferred, wherever possible. The reasoning is that humans tend to be bad at comparing high-entropy data (in addition, many users are probably put off by being asked to compare long hexadecimal strings).

Use in APIs

However, both Fingerprints and Key IDs may (and usually must) be used, programmatically, by software that handles OpenPGP data, to address specific certificates. This is equally true for OpenPGP version 6.

Note that regardless of the OpenPGP version, software that relies on (8 byte) Key IDs should not assume that Key IDs are unique. It is trivial to generate collisions for Key IDs, so applications must be able to handle Key ID collisions gracefully.

When are certificates valid?

Full certificate: Primary revoked/key expired/binding signature expired,
Subkey: Revoked/key expired/binding signature expired
User ID: revoked, binding expired, ...

:class: warning

write, link to chapter 9

Best practices regarding Key Freshness

:class: warning

- Expiry
- Subkey rotation

Wiktor suggests to check: https://blogs.gentoo.org/mgorny/2018/08/13/openpgp-key-expiration-is-not-a-security-measure/ for important material

:class: warning

write

(unbound_user_ids)=

Adding unbound User IDs to a certificate

:class: warning

references/links missing

Some OpenPGP subsystems may add User IDs to a certificate, which are not bound to the primary key by the certificate's owner. This can be useful to store local identity information (e.g., Sequoia's public store attaches "pet-names" to certificates, in this way).

In technical terms, the elements of an OpenPGP certificate are a collection of "packets." Each component key and identity component is internally represented as a packet. Another common type of packet is the "signature" packet, which connect the components of a certificate. ↩︎
For ECDH component keys, two additional algorithm parameters are integral to the component key's constitutive and immutable properties. Those parameters specify a hash function and a symmetric encryption algorithm. ↩︎
In OpenPGP version 4, the rightmost 64 bits were sometimes used as a shorter identifier, called "Key ID." For example, an OpenPGP version 4 certificate with the fingerprint B3D2 7B09 FBA4 1235 2B41 8972 C8B8 6AC4 2455 4239 might be referenced by the 64-bit Key ID C8B8 6AC4 2455 4239 or formatted as 0xC8B86AC424554239.
Historically, even shorter 32-bit identifiers were used, like this: 2455 4239, or 0x24554239. Such identifiers still appear in very old documents about PGP. However, 32-bit identifiers have been long deemed unfit for purpose. At one point, 32-bit identifiers were called "short Key ID," while 64-bit identifiers were referred to as "long Key ID." ↩︎
See, for example, here: "Expiration dates really serve two purposes: naturally eliminating unused keys, and enforcing periodical checks on the primary key." ↩︎

32 KiB Raw Blame History