Showing posts with label Chrome. Show all posts
Showing posts with label Chrome. Show all posts

Sunday, 4 January 2009

Embedding values in C++ pointers

Short version:

Read, compile and run the following piece of C++ source code:

#include <iostream>

class Smi {
public:
static Smi* fromInt(int value) {
return reinterpret_cast<Smi*>(value);
}
int value() {
return reinterpret_cast<int>(this);
}
void sayHello() {
std::cout << "Hello, my value is "
<< this->value() << "."
<< std::endl;
}
};

int main(void) {
Smi* five = Smi::fromInt(5);
Smi* seven = Smi::fromInt(7);
five->sayHello();
seven->sayHello();
return 0;
}


Notice the only state class Smi has is the integer value "embedded" in "this" pointer.

Lil' longer version:

C++ allows you to do pure magic. From time to time I see a piece of C++ code that makes me think: "Does this even compile?". A few days ago I discovered one of those "gems" within V8 (google chrome's javascript engine) source code.

Let's begin with a quiz: what do you think the following code could be for?

reinterpret_cast<int>(this)


Now with some more context...

int Smi::value() {
return reinterpret_cast<int>(this) >> kSmiTagSize;
}


Ummmm..... :|
Well, let's unveil the mystery...

V8 models every representable entity available in javascript (ECMAscript), all deriving from class Object, as comments in file objects.h sugest:

//
// All object types in the V8 JavaScript are described in this file.
//
// Inheritance hierarchy:
// - Object
// - Smi (immediate small integer)
// - Failure (immediate for marking failed operation)
// - HeapObject (superclass for everything allocated in the heap)
// - JSObject
// - JSArray
// - JSRegExp
// - JSFunction
...


Every instance of such entities is allocated and managed by class Heap, V8's runtime memory manager. When Heap is asked to allocate an Object, it returns an Object*, but such pointer carries a hidden surprise, as comments in objects.h depict:


// Formats of Object*:
// Smi: [31 bit signed int] 0
// HeapObject: [32 bit direct pointer] (4 byte aligned) | 01
// Failure: [30 bit signed int] 11


Such comments state three things (apart from the obvious one: what Heap returns as pointers to Object are not such...):
  • the least significant bits of the "pointer" carry a "tag" to indicate the kind of Object.

  • In the case of Smi* and Failure*, the bits remaining are not used to store any kind of pointer, but a numeric value (31 and 30 bits long respectively). This is the way to create an Smi*...

    Smi* Smi::FromInt(int value) {
    ASSERT(Smi::IsValid(value));
    return reinterpret_cast<Smi*>((value << kSmiTagSize) | kSmiTag);
    }

    ...and this is how to retrieve the value...

    int Smi::value() {
    return reinterpret_cast<int>(this) >> kSmiTagSize;
    }

    Thus they avoid the "overhead" of storing the pointer and the pointee, as both are the same.

  • When the "pointer" "points" to a HeapObject instance, the 30 most significant bits carry an actual pointer to a HeapObject that is aligned to 4 bytes, thus the two other bits are always zero, space which is used for the tag. To illustrate this, the following piece of code is the one that, from a true object address, makes up the tagged pointer:

    HeapObject* HeapObject::FromAddress(Address address) {
    ASSERT_TAG_ALIGNED(address);
    return reinterpret_cast(address + kHeapObjectTag);
    }



The trick works as long as you don't try to dereference one of those Object*...

Like some stuff in V8 native code generators I blogged about some time ago, this "tagged" "pointer" trick is not new but can also be found in StrongTalk and SelfVM (respectively Smalltalk an Self virtual machines that share creators with V8 :p)

Hope you enjoyed this curious trick!

Wednesday, 26 November 2008

Trace of Sun's JVM in V8



Short version:

I suspect a few files within V8 source code have Sun's JVM as their origin.


Lil' longer version:

Let me introduce our guests...

...V8 is the high-performance javascript engine Google developed for their web browser Chrome. One of the reasons why V8 seems to outperform other engines is it compiles javascript to native machine code (currently just for x86 and ARM) instead of interpreting it. V8 is written in C++ and is distributed under BSD license.

...Sun's Java Virtual Machine, codenamed hotspot, which was released some time ago as part of OpenJDK, among other things, provides a Just-in-Time (JIT) java bytecode compiler which on the fly "translates" bytecode fragments into machine code. Hotspot is written in C++ and is distributed under GPL license.

A couple of days ago I was tinkering with V8's code, concretely with the x86 machine code generation stuff, and I came across a surprising copyright note in file assembler-ia32.h:

// Copyright (c) 1994-2006 Sun Microsystems Inc.
// All Rights Reserved.

Yes, it is Sun's copyright note! And it can be also found in other files: assembler-ia32.cc, assembler.cc, assembler.h, assembler-arm.h, etc.

After that copyright note, you can find the BSD license, and after, this comment:

// The original source code covered by the above license above has been
// modified significantly by Google Inc.
// Copyright 2006-2008 the V8 project authors. All rights reserved.

I immediately googled around to find out something more on that, and I found someone who already noticed it, although he didn't go any further from the discovery thing.

Intrigued for it, I searched which piece of software from Sun Google took the original files from, and I reached to OpenJDK: if you carefully examine files assembler-ia32.h from V8 and assembler_x86_32.hpp from OpenJDK you'll notice how similar they are.

But I found it strange that V8's supposedly derived code, released under BSD license, had been initially taken from Sun's JVM which is distributed under GPL, as Google would be violating the license. Also, hotspot lacks ARM support while V8's ARM assembler files do have Sun's copyright note.

After taking some history lessons I found the actual nexus between V8 and Sun: Lars Bak, V8 tech leader, is a former employee of Sun where he worked in StrongTalk, Self, Hotspot and in...

...Monty VM which powers CLDC, (Sun's high performance Java ME implementation) and supports x86 and ARM architectures. It was open sourced as part of phoneMe under GPL license.

BINGO!

Although I still have no answer for the licensing thing (perhaps Sun licensed some parts of MontyVM to Google under BSD?), I think the origin of V8 Sun-copyrighted files is MontyVM JIT compiler.

Maybe I am right, maybe not, who cares, but this "investigation" has granted me interesting background on where the concepts implemented in V8 come from and hopefully will help me understand V8's design decissions as I digg deeper and deeper in its code :-)


UPDATE:
I found a more feasible origin for Sun's copyrighted stuff: StrongTalk VM (open sourced by Sun, now hosted at google code); I think this is the actual origin because:
    has the suspect file.
    Lars Bak was part of its development team.
    it is distributed under BSD license.