21 Apr 2017Are you in the mood for a stroll inside Go’s type system ?
If you are already familiarized with it, this post can be funny
for you, or just plain stupid.
If you have no idea how types and interfaces are implemented on Go,
you may learn something, I sure did :-)
Since I worked with handwritten type systems in C,
like the one found in glib GObjects,
I’m always curious on how languages implement the
concept of type safety on a machine that actually only
has numbers. This curiosity has extended to how I could
bend Go’s type system to my own will.
Go’s instances do not carry type information on them,
so my only chance will involve using an interface{}. All type systems
that I’m aware off usually implement types as some sort of integer code,
which is used to check if the type corresponds to the one you are casting.
To begin the exploration I wanted to find how type assertions are
made, so I wrote this ridiculous code:
packagemainimport"fmt"funcmain(){varaintvarbinterface{}=ac:=b.(int)fmt.Println(c)}
Compiled it outputing it’s assembly:
go build -gcflags -S cast.go
Found that the assembly code corresponding to this:
Is (roughly) this:
0x002a 00042 (cast.go:7) LEAQ type.int(SB), AX
0x0031 00049 (cast.go:8) CMPQ AX, AX
0x0034 00052 (cast.go:8) JNE $0, 162
0x0036 00054 (cast.go:9) MOVQ $0, ""..autotmp_3+56(SP)
0x003f 00063 (cast.go:9) MOVQ $0, ""..autotmp_2+64(SP)
0x0048 00072 (cast.go:9) MOVQ $0, ""..autotmp_2+72(SP)
0x0051 00081 (cast.go:9) MOVQ AX, (SP)
0x0055 00085 (cast.go:9) LEAQ ""..autotmp_3+56(SP), AX
0x005a 00090 (cast.go:9) MOVQ AX, 8(SP)
0x005f 00095 (cast.go:9) PCDATA $0, $1
0x005f 00095 (cast.go:9) CALL runtime.convT2E(SB)
The runtime.convT2E call caught my attention, it was not hard to
find it on the iface.go
file on the golang source code.
It’s code (on the time of writing, Go 1.8.1):
// The conv and assert functions below do very similar things.// The convXXX functions are guaranteed by the compiler to succeed.// The assertXXX functions may fail (either panicking or returning false,// depending on whether they are 1-result or 2-result).// The convXXX functions succeed on a nil input, whereas the assertXXX// functions fail on a nil input.funcconvT2E(t*_type,elemunsafe.Pointer)(eeface){ifraceenabled{raceReadObjectPC(t,elem,getcallerpc(unsafe.Pointer(&t)),funcPC(convT2E))}ifmsanenabled{msanread(elem,t.size)}ifisDirectIface(t){// This case is implemented directly by the compiler.throw("direct convT2E")}x:=newobject(t)// TODO: We allocate a zeroed object only to overwrite it with// actual data. Figure out how to avoid zeroing. Also below in convT2I.typedmemmove(t,x,elem)e._type=te.data=xreturn}
There is the eface type, that has a _type field. Checking out what
would be a eface I found this:
typeifacestruct{tab*itabdataunsafe.Pointer}typeefacestruct{_type*_typedataunsafe.Pointer}
From what I read before onRuss Cox post about interfaces
I would guess that the iface is used when you are using
interfaces that have actually methods on it. That is why it has
an itab, the interface table, which is roughly equivalent to
a C++ vtable.
I will ignore the iface (althought it is interesting) since it does not
seem to be what I need to hack Go’s type system, there is more potential
on eface, which covers the special case of empty interfaces
(the equivalent of a void pointer in C).
On the post Russ Cox says that the empty interface is a special case that holds
only the type information + the data, there is no itable, since it makes
no sense at all (a interface{} has no methods).
The interface{} is just a way to transport runtime type information + data
on a generic way through your code and it seems to be the more promissing
way to hack types.
The type is:
type_typestruct{sizeuintptrptrdatauintptr// size of memory prefix holding all pointershashuint32tflagtflagalignuint8fieldalignuint8kinduint8alg*typeAlg// gcdata stores the GC type data for the garbage collector.// If the KindGCProg bit is set in kind, gcdata is a GC program.// Otherwise it is a ptrmask bitmap. See mbitmap.go for details.gcdata*bytestrnameOffptrToThistypeOff}
Lots of promissing fields to hack with, but actual type check is just
a direct pointer comparison:
ife._type!=t{panic(&TypeAssertionError{"",e._type.string(),t.string(),""})}
It seems easier to just find a way to get the eface struct and overwrite itstype pointer with the one I desire. This smells like a job to theunsafe package.
I still don’t have a good idea on how to get the _type, or how to manipulate
the eface type. My guess would be to just cast it as a pointer and do some
old school pointer manipulation, but I’m not sure yet.
One function that is a good candidate to give some directions
on how to do it is reflect.TypeOf:
funcTypeOf(iinterface{})Type{eface:=*(*emptyInterface)(unsafe.Pointer(&i))returntoType(eface.typ)}
Yeah, just cast the pointer to a eface pointer:
// emptyInterface is the header for an interface{} value.typeemptyInterfacestruct{typ*rtypewordunsafe.Pointer}
It seems that although the eface was private on the runtime
package it is copied here on the reflect package. Well, if the
reflect package can do it, so can I :-) (a little duplication is
better than a big dependency, right ?).
Before going on, I was curious about where the types are initialized.
It seems that there is just one unique pointer with all the type
information for each type. Thanks to vim-go
and go guru
for the invaluable help on analysing code and allowing me to
check all the referers to a type it has been
pretty easy to find this on runtime/symtab.go:
// moduledata records information about the layout of the executable// image. It is written by the linker. Any changes here must be// matched changes to the code in cmd/internal/ld/symtab.go:symtab.// moduledata is stored in read-only memory; none of the pointers here// are visible to the garbage collector.typemoduledatastruct{pclntable[]byteftab[]functabfiletab[]uint32findfunctabuintptrminpc,maxpcuintptrtext,etextuintptrnoptrdata,enoptrdatauintptrdata,edatauintptrbss,ebssuintptrnoptrbss,enoptrbssuintptrend,gcdata,gcbssuintptrtypes,etypesuintptrtextsectmap[]textsecttypelinks[]int32// offsets from typesitablinks[]*itabptab[]ptabEntrypluginpathstringpkghashes[]modulehashmodulenamestringmodulehashes[]modulehashgcdatamask,gcbssmaskbitvectortypemapmap[typeOff]*_type// offset to *_rtype in previous modulenext*moduledata}
A good candidate is the typemap field, checking out how
it is used I found this on runtime/type.go:
// typelinksinit scans the types from extra modules and builds the// moduledata typemap used to de-duplicate type pointers.functypelinksinit(){iffirstmoduledata.next==nil{return}typehash:=make(map[uint32][]*_type,len(firstmoduledata.typelinks))modules:=activeModules()prev:=modules[0]for_,md:=rangemodules[1:]{// Collect types from the previous module into typehash.collect:for_,tl:=rangeprev.typelinks{vart*_typeifprev.typemap==nil{t=(*_type)(unsafe.Pointer(prev.types+uintptr(tl)))}else{t=prev.typemap[typeOff(tl)]}// Add to typehash if not seen before.tlist:=typehash[t.hash]for_,tcur:=rangetlist{iftcur==t{continuecollect}}typehash[t.hash]=append(tlist,t)}ifmd.typemap==nil{// If any of this module's typelinks match a type from a// prior module, prefer that prior type by adding the offset// to this module's typemap.tm:=make(map[typeOff]*_type,len(md.typelinks))pinnedTypemaps=append(pinnedTypemaps,tm)md.typemap=tmfor_,tl:=rangemd.typelinks{t:=(*_type)(unsafe.Pointer(md.types+uintptr(tl)))for_,candidate:=rangetypehash[t.hash]{iftypesEqual(t,candidate){t=candidatebreak}}md.typemap[typeOff(tl)]=t}}prev=md}}
It seems that the typemap is initialized on the startup of the
process, with help of information collected by the linker, on
build time.
The typelinksinit function is used on the schedinit function
(from runtime/proc.go):
// The bootstrap sequence is://// call osinit// call schedinit// make & queue new G// call runtime·mstart//// The new G calls runtime·main.funcschedinit(){// raceinit must be the first call to race detector.// In particular, it must be done before mallocinit below calls racemapshadow._g_:=getg()ifraceenabled{_g_.racectx,raceprocctx0=raceinit()}sched.maxmcount=10000tracebackinit()moduledataverify()stackinit()mallocinit()mcommoninit(_g_.m)alginit()// maps must not be used before this callmodulesinit()// provides activeModulestypelinksinit()// uses maps, activeModulesitabsinit()// uses activeModulesmsigsave(_g_.m)initSigmask=_g_.m.sigmaskgoargs()goenvs()parsedebugvars()gcinit()sched.lastpoll=uint64(nanotime())procs:=ncpuifn,ok:=atoi32(gogetenv("GOMAXPROCS"));ok&&n>0{procs=n}ifprocs>_MaxGomaxprocs{procs=_MaxGomaxprocs}ifprocresize(procs)!=nil{throw("unknown runnable goroutine during bootstrap")}ifbuildVersion==""{// Condition should never trigger. This code just serves// to ensure runtime·buildVersion is kept in the resulting binary.buildVersion="unknown"}}
And schedinit, at least according to go guru, is not called anywhere.
The output of -gcflags -S also has no reference to this initialization.
Searching inside the runtime package:
(runtime)λ> grep -R schedinit .
./asm_amd64.s: CALL runtime·schedinit(SB)
./asm_mips64x.s: JAL runtime·schedinit(SB)
./asm_arm.s: BL runtime·schedinit(SB)
./proc.go:// call schedinit
./proc.go:func schedinit() {
./asm_s390x.s: BL runtime·schedinit(SB)
./traceback.go: // schedinit calls this function so that the variables are
./asm_ppc64x.s: BL runtime·schedinit(SB)
./asm_arm64.s: BL runtime·schedinit(SB)
./asm_386.s: CALL runtime·schedinit(SB)
./asm_mipsx.s: JAL runtime·schedinit(SB)
./asm_amd64p32.s: CALL runtime·schedinit(SB)
It seems like the bootstraping code for each supported platform is ASM code.
Lets take a look at the amd64 implementation:
CLD // convention is D is always left cleared
CALL runtime·check(SB)
MOVL 16(SP), AX // copy argc
MOVL AX, 0(SP)
MOVQ 24(SP), AX // copy argv
MOVQ AX, 8(SP)
CALL runtime·args(SB)
CALL runtime·osinit(SB)
CALL runtime·schedinit(SB)
// create a new goroutine to start program
MOVQ $runtime·mainPC(SB), AX // entry
PUSHQ AX
PUSHQ $0 // arg size
CALL runtime·newproc(SB)
POPQ AX
POPQ AX
The whole thing has more than 2000 lines, so I just copied the
part that confirms that schedinit is called before running
the actual code, and on schedinit the typelinksinit will be
called, that will initialize the types map.
Sorry, got pretty far from the objective, lets go back to the type system
hacking fun. Lets start the copying fun, just like the reflect package does,
to inspect details on different types:
packagemainimport("fmt""unsafe")// tflag values must be kept in sync with copies in:// cmd/compile/internal/gc/reflect.go// cmd/link/internal/ld/decodesym.go// runtime/type.gotypetflaguint8typetypeAlgstruct{// function for hashing objects of this type// (ptr to object, seed) -> hashhashfunc(unsafe.Pointer,uintptr)uintptr// function for comparing objects of this type// (ptr to object A, ptr to object B) -> ==?equalfunc(unsafe.Pointer,unsafe.Pointer)bool}typenameOffint32// offset to a nametypetypeOffint32// offset to an *rtypetypertypestruct{sizeuintptrptrdatauintptrhashuint32// hash of type; avoids computation in hash tablestflagtflag// extra type information flagsalignuint8// alignment of variable with this typefieldAlignuint8// alignment of struct field with this typekinduint8// enumeration for Calg*typeAlg// algorithm tablegcdata*byte// garbage collection datastrnameOff// string formptrToThistypeOff// type for pointer to this type, may be zero}typeefacestruct{typ*rtypewordunsafe.Pointer}func(eeface)String()string{returnfmt.Sprintf("type: %#v\n\ndataptr: %v",*e.typ,e.word)}funcgetEface(iinterface{})eface{return*(*eface)(unsafe.Pointer(&i))}funcmain(){varaintvarbintvarcstringvardfloat32varefloat64varfrtypevargefacefmt.Printf("a int:\n%s\n\n",getEface(a))fmt.Printf("b int:\n%s\n\n",getEface(b))fmt.Printf("c string:\n%s\n\n",getEface(c))fmt.Printf("d float32:\n%s\n\n",getEface(d))fmt.Printf("e float64:\n%s\n\n",getEface(e))fmt.Printf("f rtype:\n%s\n\n",getEface(f))fmt.Printf("g eface:\n%s\n\n",getEface(g))}
The output of running the code:
(typehack(git master))λ> go run inspectype.go
a int:
type: main.rtype{size:0x8, ptrdata:0x0, hash:0xf75371fa, tflag:0x7, align:0x8, fieldAlign:0x8, kind:0x82, alg:(*main.typeAlg)(0x4fb3d0), gcdata:(*uint8)(0x4b0eb8), str:843, ptrToThis:35392}
dataptr: 0xc42000a2f0
b int:
type: main.rtype{size:0x8, ptrdata:0x0, hash:0xf75371fa, tflag:0x7, align:0x8, fieldAlign:0x8, kind:0x82, alg:(*main.typeAlg)(0x4fb3d0), gcdata:(*uint8)(0x4b0eb8), str:843, ptrToThis:35392}
dataptr: 0xc42000a390
c string:
type: main.rtype{size:0x10, ptrdata:0x8, hash:0xe0ff5cb4, tflag:0x7, align:0x8, fieldAlign:0x8, kind:0x18, alg:(*main.typeAlg)(0x4fb3f0), gcdata:(*uint8)(0x4b0eb8), str:5274, ptrToThis:44480}
dataptr: 0xc42000a400
d float32:
type: main.rtype{size:0x4, ptrdata:0x0, hash:0xb0c23ed3, tflag:0x7, align:0x4, fieldAlign:0x4, kind:0x8d, alg:(*main.typeAlg)(0x4fb420), gcdata:(*uint8)(0x4b0eb8), str:6791, ptrToThis:34880}
dataptr: 0xc42000a478
e float64:
type: main.rtype{size:0x8, ptrdata:0x0, hash:0x2ea27ffb, tflag:0x7, align:0x8, fieldAlign:0x8, kind:0x8e, alg:(*main.typeAlg)(0x4fb430), gcdata:(*uint8)(0x4b0eb8), str:6802, ptrToThis:34944}
dataptr: 0xc42000a4e8
f rtype:
type: main.rtype{size:0x30, ptrdata:0x28, hash:0x622c3ba0, tflag:0x7, align:0x8, fieldAlign:0x8, kind:0x19, alg:(*main.typeAlg)(0x482ca0), gcdata:(*uint8)(0x4b0ec9), str:11620, ptrToThis:35904}
dataptr: 0xc420014270
g eface:
type: main.rtype{size:0x10, ptrdata:0x10, hash:0x4358c73f, tflag:0x7, align:0x8, fieldAlign:0x8, kind:0x19, alg:(*main.typeAlg)(0x4fb3e0), gcdata:(*uint8)(0x4b0eba), str:11606, ptrToThis:74272}
dataptr: 0xc42000a5c0
Since this is already getting pretty extensive I won’t dive in every single
detail of the outputs. But we can observe some interesting things.
The hack seems to have worked perfectly, since the size of all types
makes sense. Alignment information also makes sense too.
And also there is the kind information. Like I said on the start,
type systems usually just use a number to differentiate on the types.
But comparing two different structs shows an interesting characteristic
from Go. Although rtype and eface are two different types, and
casting between the two types won’t work, they are of the same kind 0x19.
On the reflect package there is the Kind
type, which is a enumeration of all Go’s base types. There is some information
about that here. Every named/unnamed type
you define in Go will always have an underlying type, which will be one
of the types on the kind enumeration (that shows up on our type struct).
So in Go you can’t create types in the same sense of the native types that
comes with the language, you can’t create new kinds, at least AFAIK.
This is considerably confusing, because kind is a synonim of type :-),
but it is one of the two hardest challenges on programming, giving names
to things (the other ones is implementing caches :-)).
But the types you create work well enough, the compiler will help you, and
reflection will also work properly. Even with the same kind, different
types will have different rtype pointers associated with them, even different
size in the struct case, but it is an interesting detail that I tought it was
worth mentioning.
Well, now we can go back to hacking the type system.
There is a lot of ways to manipulate this type information, but the
more naive way that I can think of is to define a function that gets
an interface{} variable representing the value that will be casted
and another interface{} variable that will carry the type information
from where you want to cast to. The return is a new interface{}
that can be casted to the desired target type. Something like this:
funcMorph(valueinterface{},desiredtypeinterface{})interface{}
Well, in this case the lack of generics on Go obligates me
to use an interface{} and push the cast to the client, or develop
a function for every basic type, but types defined by the client
would require the client writing its own functions.
Let’s just let the client do some heavy lifting on this
case, Jersey’s style
(not that “The right thing” also does not have it’s place).
The final implementation can be found on morfus,
the most small and stupid Go library ever :-).
I say this because the final hack on the type system is so simple
that it makes me want to cry, Go is indeed terribly simple, nothing
to feed my ego here :-(. The whole magic:
packagemorfosimport"unsafe"typeefacestruct{Typeunsafe.PointerWordunsafe.Pointer}funcgeteface(i*interface{})*eface{return(*eface)(unsafe.Pointer(i))}// Morph will coerce the given value to the type stored on desiredtype// without copying or changing any data on value. The result will// be a merge of the data stored on value with the type stored on// desiredtype, basically a frankstein :-).//// The result value should be castable to the type of desiredtype.funcMorph(valueinterface{},desiredtypeinterface{})interface{}{valueeface:=geteface(&value)typeeface:=geteface(&desiredtype)valueeface.Type=typeeface.Typereturnvalue}
My very first passing test:
packagemorfos_testimport("testing""github.com/katcipis/morfos")funcTestStructsSameSize(t*testing.T){typeoriginalstruct{xintyint}typenotoriginalstruct{zintwint}orig:=original{x:100,y:200}_,ok:=interface{}(orig).(notoriginal)ifok{t.Fatal("casting should be invalid")}morphed:=morfos.Morph(orig,notoriginal{})morphedNotOriginal,ok:=morphed.(notoriginal)if!ok{t.Fatal("casting should be valid now")}iforig.x!=morphedNotOriginal.z{t.Fatalf("expected x[%d] == z[%d]",orig.x,morphedNotOriginal.z)}iforig.y!=morphedNotOriginal.w{t.Fatalf("expected y[%d] == w[%d]",orig.y,morphedNotOriginal.w)}}
This test is “safe” because both structs have the same size,
C programmers must be feeling butterflies on their bellies :-).
Although the hack is small, there is a lot of fun we can have
with it, but before we go on there is one single line of unsafeness
that is usually unknow to Go newcomers:
funcgeteface(i*interface{})*eface{return(*eface)(unsafe.Pointer(i))}
My feeling the first time I saw this was:
![Go or C]()
With this kind of casting, my hack could be written as:
packagemainimport("fmt""unsafe")typeastruct{aint}typebstruct{bint}funcmain(){x:=a{a:100}y:=*(*b)(unsafe.Pointer(&x))fmt.Printf("x %v\n",x)fmt.Printf("y %v\n",y)}
And get this:
It works, as can be read here
the unsafe.Pointer has special properties that allows it to be
cast just like you do in C:
A Pointer can be converted to a pointer value of any type.
You may be thinking, what was the point of all this then ?
Well, the objective was to hack the type system, which is to
make the runtime casting facility behave as I want,
my interest was to break this:
Make the “safe” cast behave unsafely, based on sheer curiosity on
how this can actually be safe (take a look under the hood).
And this has been achieved.
So lets forget that there is a VERY simpler way to force casts in Go
and have some fun with my useless hack (we should at least have some
fun, right ?).
In Go strings are immutable, or are they ?
funcTestMutatingString(t*testing.T){typestringStructstruct{strunsafe.Pointerlenint}varrawstr[5]byterawstr[0]='h'rawstr[1]='e'rawstr[2]='l'rawstr[3]='l'rawstr[4]='o'hi:=stringStruct{str:unsafe.Pointer(&rawstr),len:len(rawstr),}somestr:=""morphed:=morfos.Morph(hi,somestr)mutableStr:=morphed.(string)ifmutableStr!="hello"{t.Fatalf("expected hello, got: %s",mutableStr)}rawstr[0]='h'rawstr[1]='a'rawstr[2]='c'rawstr[3]='k'rawstr[4]='d'ifmutableStr!="hackd"{t.Fatalf("expected hackd, got: %s",mutableStr)}}
To do this I exploited the fact that Go’s strings are just structs
with a pointer to the actual byte array and a len, the string does not
need to be null terminated, thanks to the len field.
As expected this test pass. Without reassigning the mutableStr
variable at any moment I was able to make it represent a different
string, by changing its internal byte array.
Besides being fun, this hack is another example on how using theunsafe package will trully make your program unsafe. Seeing this
on the code:
y:=*(*b)(unsafe.Pointer(&x))
Will make all kind of alarms bell on your head, but this:
Well, if ok is true it is safe to use val, or is it ? :-)
Thanks to my friends/reviewers: