Wednesday, November 05, 2008

Countdown to the U.S.S.R.

Normally I don't blog on politics. I think they tend to draw away from technical content. But it's the day after the election, so I can't help myself. The countdown to the birth of the U.S.S.R. has begun. That's the United States Socialist Republic. For years now Bush has eroded our personal freedoms in the name of physical safety from terrorists. Now Obama will launch down the same path in pursuit of economic safety. In the end, America stands to lose all that has made it great, not from powers from without, but wasted away due to fears from within. Of course it's not too late. We stand to lose our greatest strengths, but they are not lost yet, and what is lost can be regained. We are still a democracy, by and large we are still free, and the voice of liberty will still be heard if we have the courage to raise it.

Sphere: Related Content

Monday, August 04, 2008

Languages Readability and Tool Support

I received a few comments on my blog about type inference and its affect on readability saying that the problem isn't really a problem if you have proper tool support.  You have API docs, IDE based assistance, and even interesting tools like the OCaml Browser.  The problem is that these don't really address the problem.  Programming requires a lot of concentration and is best done in a state of flow.  This means that anything that causes distraction or disruption is the enemy.  Flipping to another window in order to see some documentation requires a non-value added thought.  So does moving the cursor so that an IDE will display a popup with the inferred type.  Thoughts simply flow better if the code is readily readable, and code that requires a special tool to read is not readable.

There's also less benefits to having code that is readable without external assistance.  While code may spend most of its life being displayed in an IDE, it certainly doesn't spend all of its life there.  Books, articles, blogs, and other such media often contain code as well.  Despite the ubiquity of the internet, I think having at least one book in dead tree format is still essential for a programming languages to be successful (and in some cases even taken seriously), and the last time I checked dead trees don't have popup windows.  Most online postings don't have intelligent help, either, although I suppose it would be possible if someone really wanted to put in the effort.  Regardless, the readability of a language in these formats will have a major impact on how easy a language is to learn, and ultimately how well it is accepted.

The bottom line is that despite all the great and useful tools there are out there, it is still critical for a language to stand on its own without major tool support.

Sphere: Related Content

Sunday, August 03, 2008

The Costs of Generality

I've been pondering the results of the Cedric's Code Challenge, and wondering just how much benefit is derived from optimized, purpose-specific solutions as opposed to solutions that rely on a more general libraries or frameworks.  It's fairly common to see debates where one person (or group) insists that general constructs from a standard library or other such solutions represent an unacceptable overhead, and the other side claims that the overhead is meaningless compared to runtime optimizations performed by HotSpot and the cost of programmer time.  These debates can be rather painful to watch, as both sides generally have good points, yet often seem to be arguing right past one another.  Consequently, I think a little exploration of various potential optimizations and what their respective impacts on performance would be beneficial.

For purposes here, I'm going to say that a solution based on a general framework would be one that uses a general purpose library to generate permutations of digits, filters out the ones with a leading zero, converts the permutations to numbers, and then collects the desired statistics.  A purpose specific solution would be one such as Crazy Bob's that is tailor-made for generating numbers based on permuted digits.

The General Solution

I'm not aware of a combinatronics library for Scala, but it is simple enough to write a generic permutation generating function:

  def permute[E](s: Set[E], n: Int)(f: List[E] => Unit): Unit = {
    def p(s: Set[E], r: List[E]): Unit =
      if (r.length == n) f(r) else for(e <- s) p(s - e, e :: r)
    p(s, Nil)
  }

This recursively generates all of the possible permutations.  When it as generated a complete permutation, it passes it to the function specified by the caller.  If s is an ordered set, then the permutations will be generated in a predictable order. This can then be used to generate the permutations of digits for the code challenge, as follows:

  val digits = TreeSet(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L)
  def main(args: Array[String]): Unit = {
    val start = java.lang.System.nanoTime
    var last, cnt, jumpLow, jumpMagnitude = 0L
    for(d <- 1 to 10) permute(digits, d) { p =>
      val xs = p.reverse // digits are generated in the wrong order so must be reversed
      if (xs.head != 0L) {  // make sure this is a valid set of digits
        val cur = xs.foldLeft(0L)((z, a) => (z * 10L) + a)
        val dif = cur - last
        if (dif > jumpMagnitude) {
          jumpLow = last
          jumpMagnitude = dif
        }
        last = cur
        cnt = cnt + 1L
      }

    }
    val end = java.lang.System.nanoTime
    println("Count: " + cnt)
    println("Jump: " + jumpMagnitude + " (from " + jumpLow + " to " + (jumpLow + jumpMagnitude) + ")")
    println("Time: " + ((end - start) / 1000000L) + " ms")
  }

This solution takes about 13 seconds on my MacBook.

Generate Only Valid Permutations

The above permutation function can be tweaked as follows to generate only valid permutations (ones without the leading zero), and thereby saving about 10% execution time.

  def digitPermute(n: Int)(f: List[Long] => Unit): Unit = {
    def p(s: Set[Long], r: List[Long]): Unit =
      if (r.length == n) f(r) else for(e <- s) p(s - e, e :: r)
    for(first <- (digits - 0L)) p(digits - first, first :: Nil)
  }

The above solution executes in about 12 seconds.

Accumulating Numbers Instead of Lists

Both of the methods above construct lists of numbers which are later assembled into numbers.  This wastes memory and cycles, because only the resulting numbers are required and they can be accumulated much more efficiently.  Doing so, as shown below, reduces execution time to about 7 seconds.

  val digits = TreeSet(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L)
  def digitPermute(n: Int)(f: Long => Unit): Unit = {
    def p(s: Set[Long], d: Int, r: Long): Unit =
      if (d == n) f(r) else for(e <- s) p(s - e, d + 1, r * 10L + e)
    for(first <- (digits - 0L)) p(digits - first, 1, first)
  }

Long Set without Boxing

The above implementations all use TreeSet from Scala's standard library, which imposes a few performance penalties. For one, it is "generic." This means that it requires both type-erasure and boxing instead of using primitives.  Second, if you look carefully at the definition of TreeSet, you'll notice that it doesn't require its contents to be Ordered, but rather uses a (potentially implicit) view converting the contained type into an Ordered.  This adds an extra layers of indirection and therefore an extra cost.

  final class LongSet (val contents: Array[Long]) {
    private def indexOf(v: Long, min: Int, max: Int): Int = {
      if (min > max) -1
      else {
        val mid = (min + max) >>> 1
        val midVal = contents(mid)
        if (midVal < v) indexOf(v, mid + 1, max)
        else if (midVal > v) indexOf(v, min, mid - 1)
        else mid
      }
    }
    def foreach(f: Long => Unit) {
      var i = 0
      val max = contents.length
      while (i < max) {
        f(contents(i))
        i = i + 1
      }
    }
    def -(v: Long): LongSet = {
      val max = contents.length - 1
      if (indexOf(v, 0, max) < 0) this
      else {
        val a = new Array[Long](max)
        var i, j = 0
        while (i <= max) {
          val cur = contents(i)
          if (cur != v) {
            a(j) = contents(i)
            j = j + 1
          }
          i = i + 1
        }
        new LongSet(a)
      }
    }
  }
  val digits = new LongSet(Array(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L))
  def digitPermute(n: Int)(f: Long => Unit): Unit = {
    def p(s: LongSet, d: Int, r: Long): Unit =
      if (d == n) f(r) else for(e <- s) p(s - e, d + 1, r * 10L + e)
    for(first <- (digits - 0L)) p(digits - first, 1, first)
  }

This implementation brings the execution time down to ~1.7 seconds, representing a substantial savings over TreeSet. The comparison isn't quite fair, as TreeSet uses a Red-Black balanced tree and the code above uses a sorted array, but the difference is still substantial and shows that having a more targeted data structure can improve performance significantly.  At this point you might be thinking "Well no sh*t sherlock!  Of course a data structure tuned for a specific type is faster than one that is written to handle any type!"  That's true...to a point.  Not all languages implement generics using type erasure and require boxing of values within parameterized classes.  For example, C++ was designed to ensure that data structures implemented using templates imposed little or no overhead above more raw ones.

Special Purpose Set for Permutation Generation

Another approach is to use a more special-purpose data structure in the permutation function without reducing its generality.  The linked set used in Crazy Bob's solution can be generalized to generating permutations of any type.  Unfortunately, this structure is mutable, and mutates on every invocation.  This means that while it would be possible to pass it directly to client code, it would be extremely dangerous because the client code may maintain a reference to the rapidly changing data structure.  Consequently, the structure needs to be copied into a list or similar structure before being passed to client code.  The solution built around the code below completes in ~5 seconds, which is slower than using an structure explicitly coded for dealing with longs and generating longs, but over twice as fast as generating permutations using the standard TreeSet class.

  private final class Element[E](val value: E, var next: Element[E], var previous: Element[E]) {
    /** remove an element from the set*/
    def use() {
      if (previous ne null) previous.next = next
      if (next ne null) next.previous = previous
    }
    /** put an element back in the set */
    def yieldValue() {
      if (previous ne null) previous.next = this
      if (next ne null) next.previous = this
    }
  }
  private def buildElements[E](s: Set[E]): Element[E] = {
    val iter = s.elements
    val first = new Element(iter.next, null, null)
    var cur = first
    while(iter.hasNext) {
      cur.next = new Element(iter.next, null, cur)
      cur = cur.next
    }
    first
  }
  def permute[E](s: Set[E], n: Int)(f: List[E] => Unit): Unit = {
    def p(start: Element[E], head: Element[E], r: List[E]): Unit = {
      def take(current: Element[E]): Unit = {
        if (current ne null) {
          val newR = current.value :: r
          if (newR.length == n) {
            f(newR)
            take(current.next)
          } else {
            current.use()
            val newHead = if (current eq head) head.next else head
            p(newHead, newHead, newR)
            current.yieldValue()
            take(current.next)
          }
        }
      }
      take(start)
    }
    val first = buildElements(s)
    p(first, first, Nil)
  }

Conclusion

The various implementations here represent a sampling of various ways that Cedric's Code Challenge can be implemented in Scala, and the effects they have on performance.  A relatively direct port of Crazy Bob's solution to Scala completes in ~0.4 seconds, making it by far the fastest solution and about 30 times faster than the solution using standard data structures with a generic permutation generator.  That's not really surprising, so what can we conclude?  The most obvious conclusion is that avoiding the construction of intermediate objects yields a substantial speedup.  This can be seen in two places.  The first is in the switch from constructing a List to represent the permutation to accumulating the Long directly.  The second is in using a special-purpose mutable data structure to generate the permutations, thereby avoiding repeated allocations of Set objects.  Finally, reducing overhead due to things like boxing and the casts associated with type erasure does make a noticeable difference in performance.  On the flip side, Scala's closure based constructs, such as nested functions and for loops, added negligible overhead, if any at all. Using more general constructs instead of more specific ones clearly has a substantial performance cost, but it's also worth mentioning that the cost is trivial compared to the benefit received in the transition from a brute-force solution to an optimal algorithm.

Sphere: Related Content

Tuesday, July 15, 2008

System.identityHashCode

A recent thread on the the Scala IRC channel piqued my curiosity about exactly how System.identityHashcode works.  It looks like as less heated remarks have flushed out the conversation more the definition was irrelevant, but regardless, I think it's interesting.  Thankfully Sun open-sourced the their implementation of the Java platform so it was pretty easy to find out exactly how it works.

It turns out the Sun JVM contains three different algorithms for calculating the identity hash, none of which are guaranteed to yield unique hash codes.  That's not surprising, because on a 64 bit architecture, doing that would be almost impossible.  I say almost because one could just generate them as sequential numbers, and then crash the JVM when it runs out of 32 bit hash codes (an OutOfHashCodesError ;-).

From share/vm/runtime/synchronizer.cpp:

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ; 
  if (hashCode == 0) { 
     // This form uses an unguarded global Park-Miller RNG, 
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.  
     value = os::random() ; 
  } else
  if (hashCode == 1) { 
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.  
     intptr_t addrBits = intptr_t(obj) >> 3 ; 
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ; 
  } else 
  if (hashCode == 2) { 
     value = 1 ;            // for sensitivity testing
  } else { 
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ; 
     t ^= (t << 11) ; 
     Self->_hashStateX = Self->_hashStateY ; 
     Self->_hashStateY = Self->_hashStateZ ; 
     Self->_hashStateZ = Self->_hashStateW ; 
     unsigned v = Self->_hashStateW ; 
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ; 
     Self->_hashStateW = v ; 
     value = v ; 
  } 

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ; 
  assert (value != markOopDesc::no_hash, "invariant") ; 
  TEVENT (hashCode: GENERATE) ;
  return value;
}

What's interesting is that there's an algorithm in there (hashCode == 2) where the identity hash code turns out to always be the same all objects.  I'm pretty sure it's there solely for testing purposes, so they can run tests against the JVM and standard library and ensure no piece of code is relying on the uniqueness of the identity hash code.

This relates back to my previous point about transparency.  Because Sun's JVM is open source, it's possible for anyone to peek inside and see how the various bits work, and what assumptions the Sun JVM engineers are and are not making about those pieces.

Sphere: Related Content

Trust in Authority vs Trust in Transparency

This morning Murph posted a blog on the "censorship" of Wikipedia by over-zealous article owners, citing a posting by Lawrence Solomon about an experience with editing an article related to global warming.  Murph uses this to support one of his common conspiracy theories that Wikipedia, and social media in general, is doomed because of this kind of censorship and deliberate distortion of the facts.

What's interesting is that this censorship took place in the open.  Anyone who knows where to look can see exactly what changes were made by Solomon, and their disposition relative to the current official article or any other version of it.  Here's Solomon's version versus the current version.

The core issue here isn't censorship.  Editors will always censor.  Their job is to act as filters.  Furthermore, every editor is subject to biases, whether they be his own or those imposed on him by a third party such as his employer.  With most publications the editorial process happens behind closed doors with unseen forces.  With Wikipedia the process happens right before the eyes of the world.

The real issue here is that Murph trusts authority more than transparency.  Someone corrected an article, and the correction was subsequently removed due to political bias.  I'm sure that's happened time and time again in every encyclopedia that has ever been published.  The difference is in this case the change was transparent, with the politics open for all to judge, where with a traditional model we would have never known.

If history has taught us anything, it's that placing too much faith in authority is a bad idea.  Our authority figures are all human, and our authoritative organizations are still organizations of people.  They are both fallible and corruptable.  We can't entirely strip authority from our society because that would lead to anarchy, but we can make authority more transparent, and with transparency we can judge for ourselves.

In this particular case the editorial process of Wikipedia probably failed to yield the most accurate article possible, at least as the article stands today.  I'm not an expert on the subject matter, but I think Solomon's corrections were most likely correct.  That being said, I think the process has succeeded.  The changes Solomon made have not be entirely censored, they have merely been driven from the main page.  Discourse continues with regards to their validity.  The biases of the editors have been made public.  The last thing we want to do is reseal that process behind closed doors, simply because in this case we didn't like some of the results.

Sphere: Related Content