Fully reproducible builds are important because they bridge the gap between auditable open source and convenient binary artifacts. Technologies like TUF and Binary Transparency provide accountability for what binaries are shipped to users, but that's of limited utility if there is no way (short of reverse engineering) of proving that the binary is in fact the result of compiling the intended source.
That's why the Debian project is putting in tremendous effort into making packages reproducible. The good news is that Go builds are reproducible by default.
Prerequisites
There are a few common sense requirements.
- Of course, the builds must be reproducible in the weaker sense: that means the source code must match perfectly.
- This includes dependencies, so the project has to vendor them strictly. This is important beyond binary reproducibility: you don't want for "version 1.3" of a software to mean different things based on when it was built.
- The compiler version must be the same.
- GOPATH and GOROOT must match (#16860), annoyingly.
- Note: the default GOROOT, the one that the compiler will use if the environment variable is not set, must also match, since it will be copied into binaries (#17943). You can only change that by recompiling the toolchain in the right directory.
- In cgo be dragons (#15405, #19964, #9206), meaning that it's possible to get reproducible builds since 1.7 but it depends on the C linker.
Interestingly, the build host architecture does not matter. In other words, builds are reproducible across cross-compiling.
Reproducing rclone
I picked rclone for this exercise because it's a self-contained Go binary that vendors dependencies and offers binary installs.
Here's the binaries we will try to reproduce.
bfe0d7e041b4020001b6c48ff170e727243855cbb447f96d983e05b04c090ea8 rclone-v1.36-windows-386/rclone.exe
71827d554c5d860d302ec76d79dcd8433fe63065eac5df4d81b4d2bbefc760b3 rclone-v1.36-linux-amd64/rclone
61ab593c6a007e54c63e64ff2b6ee66dba77c40e12d8ca6b81cf50e8272f43b3 rclone-v1.36-openbsd-amd64/rclone
Detecting parameters
To start, we need to figure out the GOPATH and GOROOT values they were built with. This is easy to figure out using debug/gosym
and debug information to query the file path of known functions. (PE support is... left as an exercise to the reader.)
$ go run gosym.go rclone-v1.36-linux-amd64/rclone
/home/ncw/go/src/github.com/ncw/rclone/rclone.go
/opt/go/go1.8/src/runtime/extern.go
So the GOPATH is /home/ncw/go
and the GOROOT is /opt/go/go1.8
.
For the compiler version I don't have a good solution (that will work even if DWARF is stripped), so I'll give you a bad one, that relies on the global variable backing runtime.Version()
.
$ egrep -a -o 'go[0-9\.]+' rclone-v1.36-linux-amd64/rclone
go.
go1.8
go1.8
Yes, it's literally strings
.
You're also on your own for the compiler's default GOROOT, but strings
will bring it up.
Finally, you might have to look at the project docs to find out what flags they use. rclone uses -s
, -X
and CGO_ENABLED=0
.
Reproducing it
Since the host architecture does not matter but the environment does, we'll use Docker to do our build.
FROM debian:jessie
RUN apt-get update && apt-get install -y unzip wget tar ca-certificates git build-essential
RUN wget https://storage.googleapis.com/golang/go1.8.linux-amd64.tar.gz
RUN tar xvf go1.8.linux-amd64.tar.gz
RUN mkdir -p /opt/go && cp -r go /opt/go/go1.8
RUN cd /opt/go/go1.8/src && GOROOT_BOOTSTRAP=/go ./make.bash
ENV PATH "/opt/go/go1.8/bin:$PATH"
RUN mkdir -p /home/ncw/go/src/github.com/ncw/
RUN cd /home/ncw/go/src/github.com/ncw && git clone https://github.com/ncw/rclone
RUN cd /home/ncw/go/src/github.com/ncw/rclone && git checkout v1.36
ENV GOPATH /home/ncw/go
ENTRYPOINT ["go"]
$ docker run -it --rm -v $(pwd):$(pwd) -w $(pwd) -e CGO_ENABLED=0 4f6d1bc86d5e \
build --ldflags "-s -X github.com/ncw/rclone/fs.Version=v1.36" \
-o rclone-v1.36-linux-amd64/rclone.ours github.com/ncw/rclone
To cross-compile, I just added the GOOS and GOARCH environment variables with docker run -e
.
Debugging
Reproducing someone else's build is not always easy. And indeed, my rclone build mismatched.
The first thing to look at is the Build ID. The Build ID is a hash of the filenames of the compiled files, plus the version of the compiler (and other things in zversion.go, like the default GOROOT). See pkg.go.
You can read it with readelf -x .note.go.buildid
or by extracting it from the text section.
If the build ID does not match, the first thing you can compare are the paths of all symbols, again with gosym. Here's a slight patch to the gosym.go script we used above:
for _, fu := range table.Funcs {
path, _, _ := table.PCToLine(fu.Entry)
fmt.Println(path)
}
If the build ID matches, then you're looking at compiler flags.
Failing all that, strings and vbindiff are your friend.
What got me with rclone was not rebuilding the compiler in the new location to get the right default GOROOT—the make.bash
step of the Dockerfile. If you enjoy debugging, here's the tootstorm on Mastodon.
Result
bfe0d7e041b4020001b6c48ff170e727243855cbb447f96d983e05b04c090ea8 rclone-v1.36-windows-386/rclone.exe
bfe0d7e041b4020001b6c48ff170e727243855cbb447f96d983e05b04c090ea8 rclone-v1.36-windows-386/rclone.ours
71827d554c5d860d302ec76d79dcd8433fe63065eac5df4d81b4d2bbefc760b3 rclone-v1.36-linux-amd64/rclone
71827d554c5d860d302ec76d79dcd8433fe63065eac5df4d81b4d2bbefc760b3 rclone-v1.36-linux-amd64/rclone.ours
61ab593c6a007e54c63e64ff2b6ee66dba77c40e12d8ca6b81cf50e8272f43b3 rclone-v1.36-openbsd-amd64/rclone
61ab593c6a007e54c63e64ff2b6ee66dba77c40e12d8ca6b81cf50e8272f43b3 rclone-v1.36-openbsd-amd64/rclone.ours
So good news, rclone is not backdoored!
If you enjoy these exercises, you can follow me on Twitter or Mastodon.