Back when crypto/tls
was slow and net/http
young, the general wisdom was to always put Go servers behind a reverse proxy like NGINX. That’s not necessary anymore!
At Cloudflare we recently experimented with exposing pure Go services to the hostile wide area network. With the Go 1.8 release, net/http
and crypto/tls
proved to be stable, performant and flexible.
However, the defaults are tuned for local services. In this articles we’ll see how to tune and harden a Go server for Internet exposure.
crypto/tls
You’re not running an insecure HTTP server on the Internet in 2016. So you need crypto/tls
. The good news is that it’s now really fast (as you’ve seen in a previous article on this blog), and its security track record so far is excellent.
The default settings resemble the Intermediate recommended configuration of the Mozilla guidelines. However, you should still set PreferServerCipherSuites
to ensure safer and faster cipher suites are preferred, and CurvePreferences
to avoid unoptimized curves: a client using CurveP384
would cause up to a second of CPU to be consumed on our machines.
&tls.Config{
// Causes servers to use Go's default ciphersuite preferences,
// which are tuned to avoid attacks. Does nothing on clients.
PreferServerCipherSuites: true,
// Only use curves which have assembly implementations
CurvePreferences: []tls.CurveID{
tls.CurveP256,
tls.X25519, // Go 1.8 only
},
}
If you can take the compatibility loss of the Modern configuration, you should then also set MinVersion
and CipherSuites
.
MinVersion: tls.VersionTLS12,
CipherSuites: []uint16{
tls.TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
tls.TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
tls.TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305, // Go 1.8 only
tls.TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305, // Go 1.8 only
tls.TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
// Best disabled, as they don't provide Forward Secrecy,
// but might be necessary for some clients
// tls.TLS_RSA_WITH_AES_256_GCM_SHA384,
// tls.TLS_RSA_WITH_AES_128_GCM_SHA256,
},
Be aware that the Go implementation of the CBC cipher suites (the ones we disabled in Modern mode above) is vulnerable to the Lucky13 attack, even if partial countermeasures were merged in 1.8.
Final caveat, all these recommendations apply only to the amd64 architecture, for which fast, constant time implementations of the crypto primitives (AES-GCM, ChaCha20-Poly1305, P256) are available. Other architectures are probably not fit for production use.
Since this server will be exposed to the Internet, it will need a publicly trusted certificate. You can get one easily and for free thanks to Let’s Encrypt and the golang.org/x/crypto/acme/autocert
package’s GetCertificate
function.
Don’t forget to redirect HTTP page loads to HTTPS, and consider HSTS if your clients are browsers.
srv := &http.Server{
ReadTimeout: 5 * time.Second,
WriteTimeout: 5 * time.Second,
Handler: http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
w.Header().Set("Connection", "close")
url := "https://" + req.Host + req.URL.String()
http.Redirect(w, req, url, http.StatusMovedPermanently)
}),
}
go func() { log.Fatal(srv.ListenAndServe()) }()
You can use the SSL Labs test to check that everything is configured correctly.
net/http
net/http
is a mature HTTP/1.1 and HTTP/2 stack. You probably know how (and have opinions about how) to use the Handler side of it, so that’s not what we’ll talk about. We will instead talk about the Server side and what goes on behind the scenes.
Timeouts
Timeouts are possibly the most dangerous edge case to overlook. Your service might get away with it on a controlled network, but it will not survive on the open Internet, especially (but not only) if maliciously attacked.
Applying timeouts is a matter of resource control. Even if goroutines are cheap, file descriptors are always limited. A connection that is stuck, not making progress or is maliciously stalling should not be allowed to consume them.
A server that ran out of file descriptors will fail to accept new connections with errors like
http: Accept error: accept tcp [::]:80: accept: too many open files; retrying in 1s
A zero/default http.Server
, like the one used by the package-level helpers http.ListenAndServe
and http.ListenAndServeTLS
, comes with no timeouts. You don’t want that.
There are three main timeouts exposed in http.Server
: ReadTimeout
, WriteTimeout
and IdleTimeout
. You set them by explicitly using a Server:
srv := &http.Server{
ReadTimeout: 5 * time.Second,
WriteTimeout: 10 * time.Second,
IdleTimeout: 120 * time.Second,
TLSConfig: tlsConfig,
Handler: serveMux,
}
log.Println(srv.ListenAndServeTLS("", ""))
ReadTimeout
covers the time from when the connection is accepted to when the request body is fully read (if you do read the body, otherwise to the end of the headers). It’s implemented in net/http
by calling SetReadDeadline
immediately after Accept.
The problem with a ReadTimeout
is that it doesn’t allow a server to give the client more time to stream the body of a request based on the path or the content. Go 1.8 introduces ReadHeaderTimeout
, which only covers up to the request headers. However, there’s still no clear way to do reads with timeouts from a Handler. Different designs are being discussed in issue #16100.
WriteTimeout
normally covers the time from the end of the request header read to the end of the response write (a.k.a. the lifetime of the ServeHTTP), by calling SetWriteDeadline
at the end of readRequest.
However, when the connection is over HTTPS, SetWriteDeadline
is called immediately after Accept so that it also covers the packets written as part of the TLS handshake. Annoyingly, this means that (in that case only) WriteTimeout
ends up including the header read and the first byte wait.
Similarly to ReadTimeout
, WriteTimeout
is absolute, with no way to manipulate it from a Handler (#16100).
Finally, Go 1.8 introduces IdleTimeout
which limits server-side the amount of time a Keep-Alive connection will be kept idle before being reused. Before Go 1.8, the ReadTimeout
would start ticking again immediately after a request completed, making it very hostile to Keep-Alive connections: the idle time would consume time the client should have been allowed to send the request, causing unexpected timeouts also for fast clients.
You should set Read
, Write
and Idle
timeouts when dealing with untrusted clients and/or networks, so that a client can’t hold up a connection by being slow to write or read.
For detailed background on HTTP/1.1 timeouts (up to Go 1.7) read my post on the Cloudflare blog.
HTTP/2
HTTP/2 is enabled automatically on any Go 1.6+ server if:
- the request is served over TLS/HTTPS
Server.TLSNextProto
isnil
(setting it to an empty map is how you disable HTTP/2)Server.TLSConfig
is set andListenAndServeTLS
is used orServe
is used andtls.Config.NextProtos
includes"h2"
(like[]string{"h2", "http/1.1"}
, sinceServe
is called too late to auto-modify the TLS Config)
HTTP/2 has a slightly different meaning since the same connection can be serving different requests at the same time, however, they are abstracted to the same set of Server timeouts in Go.
Sadly, ReadTimeout
breaks HTTP/2 connections in Go 1.7. Instead of being reset for each request it’s set once at the beginning of the connection and never reset, breaking all HTTP/2 connections after the ReadTimeout
duration. It’s fixed in 1.8.
Between this and the inclusion of idle time in ReadTimeout
, my recommendation is to upgrade to 1.8 as soon as possible.
TCP Keep-Alives
If you use ListenAndServe
(as opposed to passing a net.Listener
to Serve
, which offers zero protection by default) a TCP Keep-Alive period of three minutes will be set automatically. That will help with clients that disappear completely off the face of the earth leaving a connection open forever, but I’ve learned not to trust that, and to set timeouts anyway.
To begin with, three minutes might be too high, which you can solve by implementing your own tcpKeepAliveListener
.
More importantly, a Keep-Alive only makes sure that the client is still responding, but does not place an upper limit on how long the connection can be held. A single malicious client can just open as many connections as your server has file descriptors, hold them half-way through the headers, respond to the rare keep-alives, and effectively take down your service.
Finally, in my experience connections tend to leak anyway until timeouts are in place.
ServeMux
Package level functions like http.Handle[Func]
(and maybe your web framework) register handlers on the global http.DefaultServeMux
which is used if Server.Handler
is nil. You should avoid that.
Any package you import, directly or through other dependencies, has access to http.DefaultServeMux
and might register routes you don’t expect.
For example, if any package somewhere in the tree imports net/http/pprof
clients will be able to get CPU profiles for your application. You can still use net/http/pprof
by registering its handlers manually.
Instead, instantiate an http.ServeMux
yourself, register handlers on it, and set it as Server.Handler
. Or set whatever your web framework exposes as Server.Handler
.
Logging
net/http
does a number of things before yielding control to your handlers: Accept
s the connections, runs the TLS Handshake, …
If any of these go wrong a line is written directly to Server.ErrorLog
. Some of these, like timeouts and connection resets, are expected on the open Internet. It’s not clean, but you can intercept most of those and turn them into metrics by matching them with regexes from the Logger Writer, thanks to this guarantee:
Each logging operation makes a single call to the Writer’s Write method.
To abort from inside a Handler without logging a stack trace you can either panic(nil)
or in Go 1.8 panic(http.ErrAbortHandler)
.
Metrics
A metric you’ll want to monitor is the number of open file descriptors. Prometheus does that by using the proc
filesystem.
If you need to investigate a leak, you can use the Server.ConnState
hook to get more detailed metrics of what stage the connections are in. However, note that there is no way to keep a correct count of StateActive
connections without keeping state, so you’ll need to maintain a map[net.Conn]ConnState
.
Conclusion
The days of needing NGINX in front of all Go services are gone, but you still need to take a few precautions on the open Internet, and probably want to upgrade to the shiny, new Go 1.8.
Happy serving!