blog/content/debian-packaging-101.md

580 lines
30 KiB
Markdown

+++
class = "post"
date = "2022-12-02T00:00:00-05:00"
tags = ["support", "floss", "debian", "packaging"]
title = "Building a Debian Package 101"
description = "It's not as confusing or complicated as you think"
type = "post"
weight = 1
draft = false
+++
One of the most oft-repeated reasons I've heard for software not packaging for Debian and its derivatives it that Debian packaging is complicated. Now, the thing is, it can be. If you look at [the manual](https://www.debian.org/doc/manuals/maint-guide/index.en.html) or a reasonably complicated program from the Debian repositories, it sure seems like it is. But I'm here today to show you that it can be easy with the right guide!
My target audience for this post is anyone who has software they want to build, but who currently thinks that making a `.deb` is too complex, difficult, or not worth the effort. Hopefully, by the end of this post, you'll understand exactly how to do it and be able to implement your own Debian package in under 30 minutes.
If that sounds good, read on!
For simplicity's sake, I assume you're doing all this on a Debian system, or one of its derivatives like Ubuntu. Note that things like cross-architecture building are well outside our scope here, but such things are possible. Your package will match what you build it under, so if you want an Ubuntu 22.04 package, be sure to build it on an Ubuntu 22.04 system, etc.
## Prerequisites
Before starting, you'll need a few dependencies. First and foremost is anything you need to actually build your program; for a lot of things that's `build-essential` plus a few supplemental libraries, but it could include anything else.
Keep track of what build dependencies you need, because we'll need that list later on when creating the `control` file.
Next install `dpkg-dev`, `debhelper`, and `devscripts` packages, which provide the main Debian packaging tools and some helper programs. You might also want `quilt` if you plan to make package-specific patches to the code, but I don't cover `quilt` here.
## The Basics: Creating your initial `debian/` folder
Start with your source code in a directory, GIT repo, etc. To start you'll want all your code in the root level, so that you can build it right from there. This helps keep the complexity down.
Our first step is to build a basic, boilerplate `debian/` folder, which is a sub-directory at the root of the source code repository that provides the Debian packaging instructions. So run `mkdir debian` and continue.
Within that `debian` folder are a few key files that every build needs. I'll go through each one in turn, explaining what it does and how to write one. At the end, you'll be able to run `dpkg-buildpackage` to get your binary package.
## Boilerplate files (`compat`, `source/format`, and `source/options`)
These files define some basic configuration for the build system. Given how simple and boilerplate they are, I've collected all 3 under this heading.
`compat` defines the Debian packaging compatibility version, i.e. what version of `debhelper` the package supports. What version you support depends on how old the releases of Debian you want to support are, but `8` or `9` are good baselines.
The next two entries are under the sub-directory `source` within the `debian` folder.
`source/format` defines the package layout format, and is normally just `1.0` with no other content in the file.
`source/options` defines some additional options that will be passed to `dpkg-source` when it builds your package. There's two main categories of entries here that I have used in my packages, though there are many more:
* `tar-ignore='<pattern>'`: One or more entries will define file patterns (Perl regular expressions) to ignore when creating the source tar archive. It's usually a good practice to ignore things like `.git*`, `*.deb`, and any temporary files or directories your build might produce.
* `extend-diff-ignore='<pattern>'`: One or more entries will define file patterns (Perl regular expressions) to ignore when when creating the diff of your source code. Generally you want to ignore any binary files in your source tree.
A good, safe default would be something like:
```
tar-ignore='*.deb'
tar-ignore='.git*'
extend-diff-ignore='.git*'
```
## The `copyright` file
The `copyright` file defines the copyright information for your package. Usually, for simple programs, this will just match your project's license.
The file is structured as follows:
```
Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0
```
This line defines the copyright format. The 1.0 format specified here is usually sufficient. This is a link to the full manual of the contents of the `copyright` file, so for more advanced situations it is worth the read.
```
Upstream-Name: mypackage
```
This line specifies the upstream name of the program. It should match your program's name and the name of the source package.
```
Source: https://github.com/aperson/myproject
```
This line provides a link to your source. It can be any URL you want, but you should provide something here.
Next is a newline, followed by one or more blocks:
```
Files: *
```
This line defines what file(s) this copyright entry belongs to. For a simple project all under one license, this can just be `*`. The `*` block should always be the last block; that is, define any more specific blocks first. If not `*`, this should be the relative path to the file(s) under the source repository.
```
Copyright: 2022 A. Person <aperson@email.tld>
```
This line defines the copyright year and name of the copyright owner (including email address in angle brackets). This is probably you unless you're packaging up someone else's code. While this email doesn't have to be valid, it should be in case a user wants to reach you about a copyright question, and will be shown in the information about the package.
```
License: GPL-3
```
This line, and subsequent lines prefixed with a single space, define the actual license of the files. The license name should be one of those found under `/usr/share/common-licenses` (e.g. `GPL-3`, or `Apache-2.0`). The subsequent lines should include the short version of the license text, i.e. what you would put at the top of your source files (not the full license text). Within this block, paragraph breaks should be delineated with `.` characters. At the bottom it's usually best to reference the aforementioned directory as a source of license text as these contain the full version of each license.
### A complete example
Here is a complete example of a `copyright` file for a GPL v3 program:
```
Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
Upstream-Name: myprogram
Source: https://github.com/aperson/myproject
Files: *
Copyright: 2022 A. Person <aperson@email.tld>
License: GPL-3
This package is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, version 3.
.
This package is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>
.
On Debian systems, the complete text of the GNU General
Public License version 3 can be found in "/usr/share/common-licenses/GPL-3".
```
## The `control` file
Now we're getting into the meat of the package. The `control` file defines your package, both the source component and the binary component(s). There are many available options here, but I'll provide only the most basic ones needed to build a functional package.
### The "source" package section
These entries define the source package information. The entries are structured as follows:
```
Source: myprogram
```
This line defines the name of the source package, and will usually match the name of the program.
```
Section: misc
```
This line defines the section of the repository that your application goes into. What you put here is pretty arbitrary unless you want your package to be included in the official Debian repositories, so go with `misc`.
```
Priority: optional
```
This line defines the priority of the package. Like the above entry, this only really matters if you're making an official package, so go with `optional`.
```
Maintainer: A. Person <aperson@email.tld>
```
This line defines who maintains the package (and thus, who to reach out for if help is needed by an end user. This uses the same format as the person entry from `copyright` above, and this format will be used again later as well.
```
Build-Depends: debhelper (>= 8),
libssl-dev,
somebuilddep
```
These lines define any build dependencies your package requires, i.e. what you installed in the very first section. You can safely exclude `dpkg-dev` (as this is implied), as well as `build-essential` (for the same reason), but include here any specific development libraries, additional programs, etc. that you might need to build the program. Note too the first line, which should usually be `debhelper` at `>=` the version you specified in `compat` above. This entry also demonstrates how to define specific version(s) of dependencies; `>=`/`<=` (greater than/less than or equal) are the most common, to specify minimum dependency versions, though other comparisons are possible in more advanced cases.
Entries in this list can be placed on one line, comma separated, or on separate lines as shown here. The final entry should not have a comma after it.
```
Standards-Version: 3.9.4
```
The version of the package standards that the package uses. I usually use `3.9.4` as a baseline for my own packages; the latest version as of writing is `4.6.1`.
```
Homepage: https://myproject.org
```
This line defines a URL to the homepage of your project.
### The "binary" package section(s)
These entries define the output binary package information. There should be one block for each binary package you produce from the single source package, though for a simple project there is a 1-to-1 relationship here. The entries are structured as follows:
```
Package: myprogram
```
This line defines the name of the package, usually the name of your program.
```
Architecture: any
```
This line defines the architecture that the package will support. For simple packaging, this should be `any` (the program can be built against any architecture that Debian supports) or `all` (for native cross-platform packages like Python code or documentation).
```
Depends: mypackagedependency (>= 1.0),
someotherdependency,
afinaldependency
Recommends: asoftdependency
```
These line defines any package relationships that the final package will have, formatted like the `Build-Depends` entry in the "source" section above. These are optional: if your program doesn't depend on any other (binary) packages at runtime, just leave it out, but usually you'll depend on *something*.
The `Depends` entries are strict: the package will refuse to install if any of these are missing (when using `dpkg --install`), and will pull them in automatically when using the package manager (e.g. `apt install`). Use this for any hard dependencies the program has.
The `Recommends` entries are malleable: the package will still install if these are missing, but this relationship exists to define anything that might be "nice to have" alongside your program. By default, `apt` *et al* will not install recommended packages, but will show them when installing the package.
```
Description: The oneline description of your program for 'apt search'
Some additional lines that will describe the program in more depth.
.
You may have multiple paragraphs here with . deliniators.
```
These lines provide a description of your package so users know what they're installing. The first line (along with the `Description:` label) is a short version that will be shown as output when running `apt search` and the like. Any additional lines provide more detail for use with `apt info`.
### A complete example
Here is a complete example of a basic `control` file for a simple program:
```
Source: myprogram
Section: misc
Priority: optional
Maintainer: A. Person <aperson@email.tld>
Build-Depends: debhelper (>= 8),
libssl-dev,
somebuilddep
Standards-Version: 3.9.4
Homepage: https://myproject.org
Package: myprogram
Architecture: any
Depends: mypackagedependency (>= 1.0),
someotherdependency,
afinaldependency
Recommends: asoftdependency
Description: The oneline description of your program for 'apt search'
Some additional lines that will describe the program in more depth.
.
You may have multiple paragraphs here with . deliniators.
```
## The `changelog` file
The changelog file defines the current, and any past, versions of your package, along with a (generally brief) changelog, as the name implies.
This file is important when releasing new versions: whatever entry is at the top of this file is the "current version" of your program, and will determine the version of the output package. Thus you will have to add a new entry to the top of this file each time you release a new version of your package.
It is required to have at least one entry here (to define the current version of the package), but also good practice to keep older versions in descending order for as long as feasible, so people can compare what changed between various versions of your program.
The entries are structured as follows:
```
mypackage (1.0-1) unstable; urgency=medium
```
The first line defines the values for the changelog entry, and is in a very specific format.
First is the name of the program, which *must* match the `Source:` entry in the `control` file.
Next is the version of the package enclosed in parentheses. This should be the real version of the program that you are building. The first part (before the `-`) defines the "upstream" version, so in this case, we're building version `1.0` of the program, corresponding to a hypothetical Git tag of `v1.0`. The second part (after the `-`) is the version of the *package*. This can be used to define multiple versions of the package that use the same underlying upstream version; unless you're doing complicated stuff involving delegating packaging, just set this to `-1` or `-0`, or leave it out altogether.
Next is the code-name of the release of the package; just set this to `unstable`. Note the semicolon after this.
Finally is the "urgency" of the package. This is used by `apt` to determine how "important" the update is, but can be pretty arbitrary. I usually use `urgency=medium` as a safe default.
```
* Here is a changelog entry
* Here is another changelog entry
```
The next section, separated from the first line by an extra newline, contains individual changelog entries. You must provide at least one explaining what's changed, but you can specify several as shown here. Each entry must be prefixed by two spaces then an asterisk (`*`) character before starting the entry. Standard formatting is to capitalize the first letter, keep it short and sweet, and end without a full stop (`.`); if you're using Git and [are writing good Git commit messages](https://cbea.ms/git-commit/), you can just use your Git commit titles here! What you put in each line is up to you, and you can include any metadata or information you might want. Finally note the trailing newline before the final line.
```
-- A. Person <aperson@email.tld> Fri, 02 Dec 2022 14:28:01 -0500
```
The final line of the changelog entry specifies who wrote the entry, again in a very specific format. The line begins with a single space followed by two dashes (`--`) then another space, followed by author in the standard name + email format (I did say it would come up again!), then two spaces, and finally an RFC Email date (i.e. the output of `date --rfc-email`) defining when the entry was written.
### A complete example
Here is a complete example of a single `changelog` file entry for version `1.0` of our simple program:
```
mypackage (1.0-1) unstable; urgency=medium
* Here is a changelog entry
* Here is another changelog entry
-- A. Person <aperson@email.tld> Fri, 02 Dec 2022 14:28:01 -0500
```
If we were to add version `1.1` of the program in the future, we would add it to the top, and the file would thus look like this (note the extra line between entries):
```
mypackage (1.1-1) unstable; urgency=medium
* This is a newer version after fixing a bug (GitHub #123)
-- A. Person <aperson@email.tld> Fri, 03 Dec 2022 18:28:01 -0500
mypackage (1.0-1) unstable; urgency=medium
* Here is a changelog entry
* Here is another changelog entry
-- A. Person <aperson@email.tld> Fri, 02 Dec 2022 14:28:01 -0500
```
### The `dch` helper program
The `devscripts` package provides a helper program to assist in automating changelog entries, named `dch`. In my experience, you have to change so much from the generated content (or set so many environment variables) as to not make it worthwhile, but is something to consider if you do a lot of packaging.
## The `rules` file
The `rules` file is a `make` script that defines how to build your package. This is the part that usually trips a lot of people up, because this file can get very complicated. However, for most simple programs using standard build tools, `dh` - the Debian build helper - automates a lot of the grunt work for you, and this file can thus be very simple.
The file is structured as follows; note that this is `make` format, so indentations *must* be a tab (`\t`) character, *not* spaces, and the file *must* be executable to work:
```
#!/usr/bin/make -f
```
The first line is a shebang line defining that this is a `make` script with the `-f` option.
```
export DH_VERBOSE = 1
```
This line sets verbosity when building the package, useful for troubleshooting.
```
MY_FILE := binary.out
```
This line defines a variable that can be used later in the script. I show this example here only to specify the format (note the `:=`); a simple program likely won't need any variables.
```
%:
dh $@
```
This section defines the basic rules for the build. The `%:` heading is "any stage"; there are about two dozen stages in a normal package build that can be defined, and `%` is the "wildcard" for all of them.
Next, the tab-indented line(s) specifies what commands happen during this stage. Note that each line here is executed in its own shell context, so if you were to e.g. `cd`, that would get lost on the next line. In this basic example though, all we do is pass all of the arguments for the stage on to the `dh` program.
And that's it! Really! If your program uses `./configure && make && make install` style installation, or `cmake`, or is a properly-formatted Python module, or really any "standard" build type, this is all you need to do. `dh` takes care of it all, automatically determining how to build the program, putting it in the right places, and giving you a package out the other side.
### Overriding build stages
Now, of course, you can do some more advanced things in this file as well. Any stage can be overridden by using an `override_dh_<stage>` section, which will replace this normal `dh $@` with whatever you specify. For example, lets say that `make clean` doesn't actually clean up all of our artifacts, so we want to define some custom cleanup that will happen as well. We can override the default `dh_auto_clean` step with the following to achieve this:
```
override_dh_auto_clean:
rm -f artifacts/out/$(MY_FILE)
dh $@
```
Note here that we also use the variable we defined above as an example; variable references in `make` are surrounded by normal brackets (i.e. `(`/`)`) and not curly braces (i.e. `{`/`}`) like in BASH.
Another common example is overriding `dh_auto_configure` to run a `./configure` script with special options. For example:
```
override_dh_auto_configure:
./configure --my-option-1 --my-option-1 \
--newlined-option
```
Note that this example doesn't include `dh $@`, so `dh` will not be executed for it. You can use this for completely manual control of a build stage if appropriate.
You have a lot of flexibility here, which is why `rules` files seem so complex. But don't be scared: start simple, see if it works, and only override if you find you *really* need it.
### Handling the pesky shell context
As mentioned above, each line runs in its own shell context. This is mostly relevant if you're moving around directories. So for example, this is *not* valid:
```
override_dh_auto_clean:
cd artifacts/out/
rm -f $(MY_FILE)
cd ../..
dh $@
```
Because that first `cd` runs in its own shell context, the next line (`rm -r out`) is actually relative to the base directory, *not* the `artifacts/out/` directory! You can work around this by putting everything on one line like so:
```
override_dh_auto_clean:
cd artifacts/out/ && rm -f $(MY_FILE)
dh $@
```
And since context is discarded, you don't even need to worry about the `cd ../..` part; you will always be back at the root of the repository on the next line.
### In-built variables
One final note is a special variable that can be used, `$(CURDIR)`. This variable is a full path to the current directory (usually the root of the repository) and can be used for commands that need a full path, for example:
```
override_dh_auto_clean:
cd $(CURDIR)/artifacts/out/ && rm -f $(MY_FILE)
dh $@
```
There are several other in-built variables that you can use as well, but for simplicity, I won't cover them here.
### A complete example
Here is a complete example of the basic `rules` file, with some comments:
```
#!/usr/bin/make -f
# Be verbose during the build
export DH_VERBOSE = 1
# This variable contains a pesky file that 'make clean' won't remove
MY_FILE := binary.out
# Main debhelper entry
%:
dh $@
# Override dh_auto_clean to clean up MY_FILE
override_dh_auto_clean:
cd $(CURDIR)/artifacts/out/ && rm -f $(MY_FILE)
dh $@
```
## Installing files manually with `install`
Sometimes, and in fact quite often, you will have some static files that will need to be manually installed into the package, i.e. that your build process doesn't take care of automatically. For example, if you had a systemd service unit file called `myprogram.service` that needs to be installed.
These custom files can be defined in the `install` file, which tells the package build to add the files to the resulting package after the build is completed.
Each line in the file is structured as a source and then a destination (either a directory or filename), just like a `cp` or `mv` command.
The source is always relative to the root of the repository, while the destination is always relative to `/` on the target system. So using our `myprogram.service` example, we might put that file in `debian/conf/` and then have an entry in `install` like so:
```
debian/conf/myprogram.service lib/systemd/system/
```
This will ensure that the `myprogram.service` is installed to `/lib/systemd/system/myprogram.service`. This is smart: if the destination is known to be a directory, you don't need the trailing `/` (though adding it makes it clear), otherwise it will treat it as a filename.
### `install` shenanigans: a build-less package
This file also allows shenanigans if you want to create a "source" package that doesn't actually do any "building", just moves files around. You could for example have a `rules` file that does nothing:
```
#!/usr/bin/make -f
%:
/bin/true
```
And then use `install` to just copy a bunch of files into place:
```
src/myprogram.py /usr/bin/myprogram
```
This can be useful for things like pure documentation or a collection of scripts that are entirely static.
### An `install` per package
While not explicitly covered here, `control` lets you make multiple binary packages out of one source package. It can thus be useful to have separate `install` lists for each binary package. To do this, you simply start the filename with the name of the binary package (i.e. what is defined in `Package:` in the `control` file) followed by `.install`. For example, you could have `mypackage.install` and `mypackage-docs.install` which install different sets of files.
## The `conffiles` file
Sometimes, you might have configuration files shipped with your program that you want users to be able to edit themselves, and that won't be (automatically) overwritten by a new version of your package. You can handle this with the `conffiles` file.
By default Debian will treat any file under `/etc` as a `conffile`, so you don't need to explicitly define these. Thus, if your program follows the Linux filesystem hierarchy standard, you don't need this file.
However, if you have configuration files elsewhere on the system, you should define them in this file, one file per line.
The `conffiles` of a program are treated specially during a package removal. `apt remove` will not remove them by default, in order to preserve the configuration of a package; you must use `apt purge` to remove any defined `conffiles`, so keep this in mind if you want to define them.
## Controlling installation and removal with maintainer scripts
When your package is installed on a user system, it can often be useful to do "things" to the system. A canonical example would be creating a service user and enabling our example `myprogram.service` unit on install, then deleting the service unit and user when the package is removed.
There are 4 types of maintainer scripts that can be specified. Each script is a `/bin/sh` script (starting with a `#!/bin/sh` shebang) which can then do arbitrary things to the system. They do not need to be executable in the source repository, but will be once installed by the package.
Each script has `set -o errexit` enabled by default; thus any failure of any step will be a fatal error, and will terminate the configuration (and, for `pre` scripts, the remaining installation) of the package, so be careful to explicitly "catch" errors with `||` as needed. Note too that the scripts run as `root`, so be very careful here!
* `preinst` runs during package installation, before the actual files of the program are installed. You can use it to check the sanity of the system or other similar tasks, though this file is likely the least-used.
* `postinst` runs during package installation, after the actual files of the program are installed. This is the most common maintainer script, often used to configure services, add users, `chown` directories, etc.
* `prerm` runs during package removal, before the actual files of the program are removed. This is the second most common maintainer script, often used to de-configure services, remove users, remove created directories, etc.
* `postrm` runs during package remove, after the actual files of the program are removed. Really, anything that goes in `prerm` could likely also go in `postrm`, but where you put tasks depends on the specifics of your program.
In very simple programs, you might not need any of these scripts, or might only need one or two of them. For our example we'll only need `postinst` and `prerm` to handle our service and user.
Thus we would have a `postinst` as follows:
```
#!/bin/sh
# Create the user and set their home to /var/lib/myprogram, shell /usr/bin/nologin to prevent login
useradd \
--no-user-group \
--create-home \
--home-dir /var/lib/myprogram \
--shell /usr/bin/nologin \
--group daemon \
--system \
myprogram
# Enable and start the service
systemctl enable --now myprogram.service
# Explicitly exit 0
exit 0
```
And a `prerm` as follows:
```
#!/bin/sh
# Disable and stop the service
systemctl disable --now myprogram.service
# Remove the user
userdel myprogram
# Clean up the data directory (don't worry about program files, 'dpkg' handles that!)
rm -rf /var/lib/myprogram
# Explicitly exit 0
exit 0
```
### Maintainer scripts per package
Like the `install` file above, these maintainer scripts can be defined per-binary-package, using the same `<package name>.<script>` format, if your package requires it.
### Don't do sketchy things in maintainer scripts!
Finally I want to point out to not do sketchy things in maintainer scripts. 2 years ago, the Raspberry Pi Foundation [abused their maintainer scripts in a critical package](https://github.com/RPi-Distro/raspberrypi-sys-mods/commit/655cad5aee6457b94fc2336b1ff3c1104ccb4351) [to install a completely unrelated repository for Microsoft VS Code](https://www.reddit.com/r/linux/comments/lbu0t1/microsoft_repo_installed_on_all_raspberry_pis/) [without any obvious traces in the usual Debian places](https://hothardware.com/news/raspberry-pi-microsoft-repository-phones-home-added-pi-os) (i.e. anywhere visible with `dpkg -L`/`apt-file search`/etc.)
DO NOT do this, EVER. Maintainer scripts are NOT for adding files to the system; that's what `install` and the build process are for, which allow the files installed by packages to be tracked by the `dpkg` system. You could perhaps make a case for modifying files in maintainer scripts, but adding new files or trying to do anything "trixy" is verboten, and certainly do not do what the RPF did. Abuse of maintainer scripts like this not only destroys user trust, but it actively hides changes to the system from the package manager, and prevents these entries from being managed and modified in the future by new package versions. It's a horrible practice all around. Use maintainer scripts only to do the bare minimum tasks needed to ensure your package will work and to clean up after it, nothing more.
## Building your package
Now that you've prepared your `debian` folder and package configuration, it's time to actually build your new package! In the root of your source repository, run the following command:
```
dpkg-buildpackage
```
This will build the package for you. You should get 5 files out of the build, one level higher than your current directory (i.e. at `../`):
* `mypackage_1.0-1_amd64.deb`: The actual binary package. The version and architecture are auto-populated based on the build.
* `mypackage_1.0-1_amd64.buildinfo`: A file containing information on the build, including checksums, dependencies, environment, etc.
* `mypackage_1.0-1_amd64.changes`: A file containing information about the package including changelog, checksums, and the description.
* `mypackage_1.0-1.dsc`: The Debian source package information.
* `mypackage_1.0-1.tar.gz`: An archive of the source for use with the `.dsc` file.
You can then install your `.deb` or add it to a repository manager like `reprepro`.
Happy building!