Welcome to metasnap.d.n. This service is an addition to snapshot.d.o and is meant to answer questions like:
For tools like debrebuild, debbisect or debootsnap we need to know which snapshot.d.o timestamp of a given distribution contains which package versions. snapshot.d.o cannot provide this functionality because it doesn't associate GPG-signed Release
files and Packages
files to the binary packages it offers for download. The first_seen
attribute from the snapshot.d.o API indicates the timestamp a package file was first seen anywhere in Debian and is not useful beyond the unstable/sid distribution. Even when we limit ourselves to unstable/sid, the first_seen
attribute doesn't help in situations where we want to find a minimal set of timestamps providing a set multiple of packages. For this functionality we would need a last_seen
property which would still not sufficient because it would not be possible to provide it per suite unless Packages
files are parsed. Lastly, the first_seen
attribute is based on files as they appear in /pool
and files in there might never appear in any Packages
file (example: ocaml-odoc 1.5.0-1) or they appear in /pool
before being in the Packages file.
metasnap investigates the Packages
files for all known snapshot timestamps, all suites and all architectures and then records for each suite, each architecture and each binary package, which version of it was seen in which snapshot.d.o timestamps in that suite. In detail:
The data is provided by a Python cgi script with read-only access to a git repository and a sqlite database. The order of arguments matters. Timestamps are given in the same format as snapshot.d.o uses: %Y%m%dT%H%M%SZ. The only valid component names are main, contrib and non-free. On success, HTTP status 200 will be returned. If you inserted invalid data, status 500 will be returned. If the requested package(s) cannot be found, HTTP 404 will be returned. If multiple packages cannot be found, the response body will contain a list of packages that could not be found. For other bad requests, error 400 is returned with the explanation following the HTTP error code. The cgi script supports the following functionalities:
Returns a list of all timestamps from snapshot.d.o, one per line. The value <R>
controls the archive the timestamps were recorded for. Possible values are "debian",
"debian-archive", "debian-backports", "debian-debug", "debian-ports", "debian-security" and "debian-volatile".
Example: cgi-bin/api?timestamps=debian
Use this interface if you have a timestamp T and you want to know which version a certain package P with architecture A in suite S and component C had at that time. The result is one or more versions with one version per line. This is equivalent to downloading and parsing http://snapshot.debian.org/archive/debian/<T>/dists/<S>/<C>/binary-<A>/Packages.xz but has the advantage, that instead of downloading and parsing several megabytes of data and thus unnecessarily putting load on snapshot.d.o, only a few bytes need to be retrieved.
Like 4.2 above but returns all binary packages with their version for the given timestamp, suite and component, one per line, separated by a space.
Example: cgi-bin/api?archive=debian×tamp=20090825T163333Z&suite=unstable&comp=main&arch=hurd-i386
Given a package name, returns a plain file with six space separated entries per line. The first is the architecture, the second the version, the third is the suite name, the fourth the component name, the fifth the first timestamp this version was seen in this suite and the sixth the last timestamp the package was seen in it.
Example: cgi-bin/api?archive=debian&pkg=base-files
Given a package name and an architecture, returns a plain file with five space separated entries per line. The the first is the version, the second is the suite name, the third the component name, the fourth the first timestamp this version was seen in this suite and the fifth the last timestamp the package was seen in it.
Example: cgi-bin/api?archive=debian&pkg=base-files&arch=amd64
All you have is a packagename, architecture, version tuple and you want to know which suites and components contained that package for what range of timestamps. The result of this query is a plain file with four space separated entries per line. The first is the suite name, the second the component name, the third the first timestamp this version was seen in this suite and the fourth the last timestamp the package was seen in it.
Example: cgi-bin/api?archive=debian&pkg=libcamel1.2-dev&arch=powerpc&ver=2.27.90-1
Same as 4.6 above but the output doesn't contain the first column with the suite name, as the request is restricted to a specific suite. As a result the output only contains the component, the first timestamp and the last timestamp per line, separated by a space.
Example: cgi-bin/api?archive=debian&pkg=libcamel1.2-dev&arch=powerpc&ver=2.27.90-1&suite=rc-buggy
Same as 4.6 and 4.7 above but the output doesn't contain the first and second column with the suite and component name, as the request is restricted to a specific suite and component. As a result the output only contains the first timestamp and the last timestamp per line, separated by a space.
Example: cgi-bin/api?archive=debian&pkg=libcamel1.2-dev&arch=powerpc&ver=2.27.90-1&suite=rc-buggy&comp=main
Even though a given three-tuple of package name, version and architecture should not appear in the same suite multiple times, this still sometimes happens. This is why the result of API 4.4, 4.5, 4.6, 4.7 and 4.8 can contain multiple rows for the same suite.
After pasting a buildinfo file into the textarea below and clicking the button, client-side JavaScript will use this API function and compute a minimal set of timestamps containing all package versions referenced by the buildinfo file. The script assumes that all packages were at some point part of Debian unstable main.
If you need to find a snapshot timestamp providing multiple packages at once, then instead of making multiple queries using the api endpoints above, you can use this query instead. You supply a list of binary package names, a default architecture, a suite and a component and the result will be a minimal list of timestamps which include all the requested packages in their specific version. Each line in the results will list the architecture and the snapshot timestamp, separated by a space. The pkgs argument lists packages separated by a space. Versions are appended to package names after an equal sign. If the architecture differs from the default value given by the arch argument, then it can be appended directly after the package name and before the version with a colon. This mimics the way that a package, version, architecture three-tuple is understood by the apt command line.
Similar to 4.9 above, but as you do not specify the suite and component, the result will include all timestamp ranges containing the requested version across all suites and components. So instead of issuing query 4.6 multiple times, you can retrieve the same data in a single query. You then have to perform the computations to find a good set of timestamps that contain all your versions by yourself. Each line of the output contains seven values, separated by a space: The package name, the architecture, the version, the suite, the component, the first timestamp that the package was seen in that suite and lastly, the last timestamp the package was seen in it.
Example: cgi-bin/api?archive=debian&pkgs=dpkg%3D1.18.18,diffutils%3D1%3A3.5-1&arch=amd64
Large package lists can easily go beyond the size limits of a GET request (32768 bytes) and a RFC 2388 multipart/form-data POST interface is provided for these situations. Required form fields are "archive" and "pkgs". Optional fields are "arch" (the default architecture), "suite" (limit to a single suite) and "comp" (limit to a single component). The value of the "archive" field can be "debian-backports", "debian-debug", "debian-ports", "debian-security" or "debian-volatile". The "pkgs" field has the same format as the "pkgs" argument in 4.9 and 4.10 but may also be separated by whitespace or newlines instead of a comma. Without the optional "arch" field, all packages must declare their architecture explicitly after the package name and before the version, separated by a colon. Without the optional "suite" and "comp" fields, the result will be a text file with seven space separated entries per line. The first is the package name, the second the architecture, the third the version, the fourth the suite, the fifth the component, the sixth the first timestamp the package was seen in the respective suite and the seventh the last timestamp the package was seen. If the "suite" and/or "comp" fields are used, then the respective columns of the returned data are ommitted.
You can also fill the form using the following curl command:
curl -F 'buildinfo=<-' https://metasnap.debian.net/cgi-bin/api < foo.buildinfo
Archive | git repo | sqlite db |
debian | 1.2 GB | 3.5 GB |
debian-backports | 11 MB | 11 MB |
debian-debug | 896 MB | 337 MB |
debian-ports | 2.3 GB | 1.3 GB |
debian-security | 51 MB | 60 MB |
debian-volatile | 8.3 MB | 6.1 MB |
The code that downloads Packages
files from snapshot.d.o puts it into a git repository and turns that into an sqlite database with ranges of timestamps per package, version and architecture can be found here:
https://salsa.debian.org/metasnap-team/metasnap
The work is licensed under MIT/Expat.
Report them in the salsa issue tracker
Johannes Schauer Marin Rodrigues <josch [at] debian [dot] org>