Monday, 1 December 2008

Implementation inheritance is like playing russian roulette

Short version:

Implementation inheritance breaks encapsulation and leads to the Fragile Base Class (FBC) problem.

Lil' longer version:

Here is the list of buzzwords I'm going to use along this post: implementation inheritance, method precondition, method postcondition, class invariant, encapsulation, Liskov substitution principle. Let us give some "loose definitions" to them:

  • implementation inheritance is when you use class inheritance as a mechanism for reusing behaviour. This concept is opossed to "interface inheritance" by which a class just inherits "the contract" it has to comply with.
  • method preconditions are the set of conditions that the caller of a method needs to assure before actually calling it (i.e. argument 'x' is greater than 0, argument 'parent' is not null, etc).
  • method postconditions are the result of a call to a method (either as a return value of as its effect over the state of the object it belongs to).
  • class invariant is the set of constraints that express what means for an object of such class to be in a consistent state (i.e. 'width' atribute an instance of class 'Window" must not be less than zero).
  • encapsulation (a.k.a. Information hiding) is the principle by which in a class you "hide" anything (state/behaviour) that is not really needed to be known by other classes, thus reducing their dependence on the mentioned class to a bare minimum. This way, you reduce the probability of needing to change the former ones upon modification on the later.
  • Liskov Substituition principle (LSP) (a.k.a. Design by contract) states that is should be possible to treat an instance of a subclass as if it were a base class object, meaning a subclass must comply with everything that could be expected from its parent.
Now let's ask some questions about these terms so we can get some ideas...

First question: How can the developer "communicate" the preconditions and postconditions of a method?

You can choose among some options to let the reader know the pre/postconditions of a method:
  • Using language facilities: type constraints (e.g. if a parameter must be between 0 and 65535 you can set it to be an unsigned short in C++), assertions, etc.
  • Using comments: method headers usually express in natural language what cannot be expressed using programming language syntax(e.g. /* this class is not thread safe */ ).
  • Not communicate them: either because you don't feel it necessary to communicate them or simply do not know about them (in this case pre/postconditions still exist, they just are implicit in the method body).

Second question: What does LSP actually mean when it says a subclass must comply with everything that could be expected from its parent?

To call a method of an object, you first prepare its arguments as it expects them (preconditions) then perform the call and after that you expect some sort of result of the invocation (postconditions), either returned result or a variation in the object state. If such call were issued to a subclass, the mentioned preconditions and postconditions should still be perfectly valid.


Third question: So what is the relationship between the pre/postconditions and invariant of a class and its subclasses?
  • Preconditions of a subclass should not be stronger than its parent's.
  • Postconditions of a subclass should not be weaker than its parent's.
  • Invariant of a subclass should not be weaker than its parent's.

Fourth question: But that means a subclass is also responsible of maintaining its parent's pre/postconditions and invariant, isn't it?

Yeah, right. That's precisely the point. This is the very reason why implementation inheritance is said to break encapsulation, as the subclass has to know details of the implementation of its base class.


Fifth question: I guess such responsibility can be troublesome upon code changes, am I right?

You are quite right. Every change in the superclass' pre/postconditions or invariant, either explicit (due to changes in method signatures) or implicit (due to changes in the code) forces to verify in each subclass' pre/postconditions and invariant. To perform that verification task properly the coder needs to fully understand the intent of the code of both the base class and the derived one, which is not likely to happen (even if he's the same person who wrote both classes); this is why modifications to the base class often break subclasses. This is known as the "Fragile Base Class problem".


Sixth question: How does this stuff apply to interface inheritance?

With interface inheritance you also have to comply with pre/postconditions of the interface you implement, however in this case there are no implicit conditions within code (as there is no code at all), everything you have to comply with is either expressed with language constructs or as comments in method headers. Upon a change in methods signatures, implementors have also to be changed (otherwise they won't compile).


Seventh question: Your points seem to rely on the correctness of LSP, is it kinda dogma or what?

In Uncle Bob's words: "It is only when derived types are completely substitutable for their base types that functions which use those base types can be reused with impunity, and the derived types can be changed with impunity". Sure you can violate LSP, but not if you want to achieve the code reuse promise from object oriented programming.


Summing up....

Implementation inheritance breaks encapsulation because subclasses have to know every (either explicit or implicit) precondition and postcondition from its parent -plus its invariant- to comply with LSP; this knowledge that must be present in the subclass is likely to be flawed due to unknown (but existing) pre/post conditions, plus the mentioned elements can vary during the lifecycle of the project (e.g. maintenance phase), thus leading to the Fragile Base Class problem, by which modifications to the base class break subclasses.

Wednesday, 26 November 2008

Trace of Sun's JVM in V8



Short version:

I suspect a few files within V8 source code have Sun's JVM as their origin.


Lil' longer version:

Let me introduce our guests...

...V8 is the high-performance javascript engine Google developed for their web browser Chrome. One of the reasons why V8 seems to outperform other engines is it compiles javascript to native machine code (currently just for x86 and ARM) instead of interpreting it. V8 is written in C++ and is distributed under BSD license.

...Sun's Java Virtual Machine, codenamed hotspot, which was released some time ago as part of OpenJDK, among other things, provides a Just-in-Time (JIT) java bytecode compiler which on the fly "translates" bytecode fragments into machine code. Hotspot is written in C++ and is distributed under GPL license.

A couple of days ago I was tinkering with V8's code, concretely with the x86 machine code generation stuff, and I came across a surprising copyright note in file assembler-ia32.h:

// Copyright (c) 1994-2006 Sun Microsystems Inc.
// All Rights Reserved.

Yes, it is Sun's copyright note! And it can be also found in other files: assembler-ia32.cc, assembler.cc, assembler.h, assembler-arm.h, etc.

After that copyright note, you can find the BSD license, and after, this comment:

// The original source code covered by the above license above has been
// modified significantly by Google Inc.
// Copyright 2006-2008 the V8 project authors. All rights reserved.

I immediately googled around to find out something more on that, and I found someone who already noticed it, although he didn't go any further from the discovery thing.

Intrigued for it, I searched which piece of software from Sun Google took the original files from, and I reached to OpenJDK: if you carefully examine files assembler-ia32.h from V8 and assembler_x86_32.hpp from OpenJDK you'll notice how similar they are.

But I found it strange that V8's supposedly derived code, released under BSD license, had been initially taken from Sun's JVM which is distributed under GPL, as Google would be violating the license. Also, hotspot lacks ARM support while V8's ARM assembler files do have Sun's copyright note.

After taking some history lessons I found the actual nexus between V8 and Sun: Lars Bak, V8 tech leader, is a former employee of Sun where he worked in StrongTalk, Self, Hotspot and in...

...Monty VM which powers CLDC, (Sun's high performance Java ME implementation) and supports x86 and ARM architectures. It was open sourced as part of phoneMe under GPL license.

BINGO!

Although I still have no answer for the licensing thing (perhaps Sun licensed some parts of MontyVM to Google under BSD?), I think the origin of V8 Sun-copyrighted files is MontyVM JIT compiler.

Maybe I am right, maybe not, who cares, but this "investigation" has granted me interesting background on where the concepts implemented in V8 come from and hopefully will help me understand V8's design decissions as I digg deeper and deeper in its code :-)


UPDATE:
I found a more feasible origin for Sun's copyrighted stuff: StrongTalk VM (open sourced by Sun, now hosted at google code); I think this is the actual origin because:
    has the suspect file.
    Lars Bak was part of its development team.
    it is distributed under BSD license.

Ready, steady, go!


Short version:

Welcome to my blog. I invite you to visit it to read and discuss about software design, good/bad programming practices, design patterns, programming languages, frameworks, software technology and stuff alike.

Lil' longer version:

Welcome to my blog.

I am a young software engineer who enjoys creating software, gathering knowledge about it, sharing it with people and discussing it to enrich my own vision of stuff.

I'd like this blog to be a place to share my ideas and opinions on topics like software design,
good/bad programming practices, design patterns, programming languages, frameworks, software technology and so on, and to discuss them with you.

I'll be glad of reading and answering any comments.

P.S.: please excuse my sometimes-crappy English :-p