openpgp-notes/book/source/07-signing_data.md

<!--
SPDX-FileCopyrightText: 2023 The "Notes on OpenPGP" project
SPDX-License-Identifier: CC-BY-SA-4.0
-->

(signing_data)=
# Signatures over data

In OpenPGP, a *data signature* guarantees the authenticity and, implicitly, the integrity of certain data. Typical use cases include the authentication of software packages and emails.

"Authenticity" in this context means that the data signature was issued by the entity controlling the signing key material. However,
it does not automatically signal if the expected party indeed controls the signer certificate. OpenPGP does offer mechanisms for *strong authentication*, connecting certificates to specific identities. This verifies that the intended communication partner is indeed associated with the cryptographic identity behind the signature[^sign-auth].

[^sign-auth]: Other signing solutions, like [signify](https://flak.tedunangst.com/post/signify), focus on pure signing without strong authentication of the signer's identity.

Data signatures can only be issued by component keys with the *signing* [key flag](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-12.html#name-key-flags).

Note that data signatures are distinct from {ref}`component_signatures_chapter`, which are used to attach metadata or subkeys to a certificate.

## Signature types

OpenPGP data signatures use one of two [signature types](signature_types):

- **Binary signature** (type ID `0x00`): This is the standard signature type for binary data and is typically used for files or data streams. Binary signatures are calculated over the data without any modifications or transformations.
- **Text signature** (type ID `0x01`): Used for textual data, such as email bodies. When calculating a text signature, the data is first normalized by converting line endings into a canonical form (`<CR><LF>`). This mitigates issues caused by platform-specific text encodings, which is particularly important for detached signatures where the message file might be re-encoded between signature creation and verification.

Data signatures are generated by hashing the message content along with the metadata in the signature packet, and calculating a cryptographic signature over that hash. The resulting cryptographic signature is stored in an OpenPGP signature packet.

Data signature packets manifest in three distinct forms, which will be detailed in the subsequent section.

## Forms of OpenPGP data signatures

OpenPGP data signatures can be applied in three distinct forms[^sign-modes-gpg]:

- **Detached**: The OpenPGP signature exists as a separate entity, independent from the signed data.
- **Inline**: Both the original data and its corresponding OpenPGP signature are encapsulated within an OpenPGP container.
- **Cleartext signature**: A plaintext message and its OpenPGP signature coexist in a combined text format, preserving the readability of the original message.

[^sign-modes-gpg]: These three forms of signature application align with GnuPG's `--detach-sign`, `--sign`, and `--clearsign` command options.

### Detached signatures

A detached signature is produced by calculating an OpenPGP signature over the data intended for signing. The original data remains unchanged, and the OpenPGP signature is stored as a standalone file. A detached signature file can be distributed alongside or independent of the original data. The authenticity and integrity of the original data file can be verified by using the detached signature file.

This signature format is especially useful for signing software releases and other files where it is imperative that the content remains unaltered during the signing process.

### Inline signatures

An inline signature joins the signed data and its corresponding data signature into a single OpenPGP message.

This method is commonly used for signing or encrypting emails. Most email software capable of handling OpenPGP communications typically uses inline signatures.

#### Structure

An inline-signed OpenPGP message consists of three segments:

1. **One-pass signature packets**: These one or more packets precede the signed data and enable signature computation in one pass. See[One-Pass Signature Packet (Type ID 4)](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-12.html#one-pass-sig) of the RFC.

2. **Literal data**: This is the original data (e.g., the body of a message) that the user wishes to encrypt or sign, without additional interpretation or conversion. [Literal Data Packet (Type ID 11)](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-12.html#lit).

3. **Data signature packets**: These contain the cryptographic signature corresponding to the original data.


#### Creation

To produce an inline signature, the signer processes the entirety of the data by reading from an input file and writing into am output OpenPGP message file. As the data is processed, the signer simultaneously calculates a cryptographic signature. This procedure results in a data signature packet being appended to the output OpenPGP message file, an essential step for efficient signing.

For efficient verification, an application must understand how to handle the literal data prior to its reading. This requirement is addressed by the One-Pass Signature packets located at the beginning of inline-signed messages. These packets include essential information such as the fingerprint of the signing key and the hash algorithm used for computing the signature's hash digest. This setup enables the verifier to process the data correctly and efficiently.

```{admonition} TODO
:class: warning

Is the signer keyid/fingerprint in the OPS important for the verifier to be able to verify the signature efficiently? Or is it (only?) there to be hashed and signed, along with the literal data?
```

#### Verification

Inline-signed messages enable efficient verification in *one pass*, structured as follows:

1. **Initiation with One-Pass Signature packets**: These packets begin the verification process. They include the signer's key ID/fingerprint, essential for identifying the appropriate public key for signature validation.

2. **Processing the literal data**: This step involves hashing the literal data, preparing it for signature verification.

3. **Verifying signature packets**: Located at the end of the message, these packets are checked against the previously calculated hash digest.

Important to note, the signer's public key, critical for the final verification step, is not embedded in the message. Verifiers must acquire this key externally (e.g., from a key server) to authenticate the signature successfully.

### Cleartext signatures

The *Cleartext Signature Framework* (CSF) is an OpenPGP mechanism that combines two goals:

- It leaves the message in clear text format, so that it can be viewed directly by a human in a program that knows nothing about OpenPGP.
- At the same time, it adds an OpenPGP signature that allows verification of that message by users whose software supports OpenPGP.

#### Example

In {numref}`cleartext` we inspect an example of a cleartext signature in detail. Let's have a brief look at this example, here, to get a sense of what a cleartext signature looks like:

```text
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

hello world
-----BEGIN PGP SIGNATURE-----

wpgGARsKAAAAKQWCZT0vBCIhBtB7JOyRoU3SQKwtU+bIqeBUlJpBIi6nOFdu0Zyu
o9yZAAAAANqgIHAzoRTzu/7Zuxc8Izf4r3/qSCmBfDqWzTXqmVtsSBSHACka3qbN
eehqu8H6S0UK8V7yHbpVhExu9Hu72jWEzU/B0h9MR5gDhJPoWurx8YfyXBDsRS4y
r13/eqMN8kfCDw==
=Ks9w
-----END PGP SIGNATURE-----
```

The cleartext signature consists of two blocks, which contain the message and a signature, respectively. In this case, the message consists of the text "hello world".

Notice that this message is readable by a human reader, without requiring additional software tools, as long as the reader understands which elements to ignore.

The message is followed by a block that contains an ASCII-armored OpenPGP signature for the message. Using this signature, OpenPGP software can verify the authenticity of the message in the first block.

#### Use-case

One use-case for cleartext signatures is: Asking someone to sign some piece of data. The person who is asked to sign the data can easily inspect it with simple commandline tools, such as `cat`, and verify that they agree with the data they are asked to sign.

```{admonition} TODO
:class: warning

(Ask David for details:)

We use this for example to verify User ID and primary key of Arch Linux packagers before signing the User IDs on their keys with the main signing keys and to verify the data claims when introducing new packagers (i.e. already established packagers vouch for the data of a new packager).
```

#### Text transformations for cleartext signatures

```{admonition} TODO
:class: warning

explain text transformations for cleartext signatures (LF->CRLF and additional escaping)
```

#### Pitfalls

Cleartext signatures are popular and have useful applications.

At the same time, they are considered a "legacy method"[^csf-gnupg] by some.

[^csf-gnupg]: https://lists.gnupg.org/pipermail/gnupg-devel/2023-November/035428.html

The RFC points out a number of specific [pitfalls of cleartext signatures](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-12.html#name-issues-with-the-cleartext-s), and how to avoid them. It advises that in many cases, the inline and detached signature forms are preferable.

## Advanced topics

### Nesting of one-pass signatures

```{admonition} TODO
:class: warning

Write
```