linguistics, computers, and puns

TopQueue in Scala - Part 3

July 08, 2018 #Algorithms #Data Structures #Deep Dive #Generics #Interviews #Scala #TopQueue

In the last post we finished implementing the TopQueue class and looked at how it was able to perform so much better than the plain sorting solution from Part 1.

This time we're going to drill down into Scala and turn our special purpose integer collection into an elegant, general purpose tool. We'll implement and briefly discuss a number of Scala features along the way.

I'll include snippets of our changes as we go. When we're done, I'll link to the finished (and tested) project.

Traits

Recall that TopQueue has only three public methods:

Here are the methods we need to override to extend the Builder trait. In fact, TopQueue is such a good match that you'd almost think I planned it that way. (I didn't.)

The result method maps directly onto our own interface. += is essentially the same as add with a return value. clear is the only missing element, but is easily added.

import collection.mutable.Builder

class TopQueue(val capacity: Int) extends Builder[Int, Seq[Int]] {

  private[this] val queue = new PriorityQueue[Int]()(Ordering[Int].reverse)

  override def +=(elem: Int): this.type = {
    if (queue.size < capacity) {
      queue.enqueue(elem)
    } else if (elem > queue.head) {
      val _ = queue.dequeue
      queue.enqueue(elem)
    }
    this
  }

  def length: Int = queue.length
  def size: Int = length

  override def clear: Unit = queue.clear
  override def result: Seq[Int] = queue.dequeueAll.reverse
}

After annotating each of our three methods with override, we can take full advantage of all the functionality derived from this trait.

scala> val nums = Seq.fill(20)(rng.nextInt)
nums: Seq[Int] = List(1472582494, -1205266267, 2110246338, -1899208084, -48495158, -1555380301, -930665556, -506692887, -1803449477, 335650907, 1588492614, -1883579611, 1106055165, -207075109, -874047448, 1074765327, -2058546192, -1073301285, 1441493014, -1317047274)

scala> val tq = new TopQueue(3)
tq: TopQueue = TopQueue@b1cc697

scala> tq += 1
res0: tq.type = TopQueue@b1cc697

scala> tq ++= Seq(2, 4, 3, 9)
res1: tq.type = TopQueue@b1cc697

scala> nums.foldLeft(tq)(_ += _)
res2: TopQueue = TopQueue@b1cc697

scala> tq.result
res3: Seq[Int] = Vector(2110246338, 1588492614, 1472582494)

Implicits

TopQueue is fine if you want to find the highest grade levels in your class, the authors who belong on the New York Times Best Sellers list, or which World Cup athletes spent the most time rolling on the ground.

But with just a little work, we can also use it to find the fastest times for the Tour de France, the lowest unemployment rates, or the politicians with the lowest approval rating.

Instead of telling PriorityQueue how we want to order our values, why don't we let our users tell us what they want? Using implicit values, we can turn our TopQueue into a MinQueue. First, here's how it will work:

scala> implicit val minOrd = Ordering[Int].reverse
minOrd: scala.math.Ordering[Int] = scala.math.Ordering$$anon$4@44bbdd5c

scala> val tq = new TopQueue(3)
tq: TopQueue = TopQueue@536d0528

scala> nums.foldLeft(tq)(_ += _)
res4: TopQueue = TopQueue@536d0528

scala> tq.result
res5: Seq[Int] = Vector(-2130155797, -1828018352, -1822446556)

Much like in our TopQueue class, we call reverse on an Ordering[Int] instance. We then assign the "reversed" instance to a variable named minOrd using the implicit keyword ... and never mention it again.

And yet TopQueue has still become a MinQueue, correctly returning the three smallest values passed to it. What is this magic?!

Let's look at the updated implementation:

class TopQueue(val capacity: Int)(implicit val ord: Ordering[Int])
    extends Builder[Int, Seq[Int]] {

  private[this] val queue = new PriorityQueue[Int]()(ord.reverse)

  override def +=(elem: Int): this.type = {
    if (queue.size < capacity) {
      queue.enqueue(elem)
    } else if (ord.gt(elem, queue.head)) {
      val _ = queue.dequeue
      queue.enqueue(elem)
    }
    this
  }

  ...
}

We've only made three changes here:

With these changes, all decision logic is derived from the Ordering[Int] object (that we never even passed... or did we?) If you haven't seen implicit variables before, there are probably a few questions you're asking about now.

First, how does Scala know to implicitly pass the minOrd variable to TopQueue when we didn't even mention it?!

In order for a variable to be implicitly passed, several things have to be in alignment:

If the method parameter is not declared implicit, the method will expect the variable to be passed explicitly.

If there is no implicit variable with the correct type in scope when the method is called, the invocation will fail.

Okay, second question: isn't this implicit business just a little too clever and magical? Probably. But it's also very powerful, and allows the programmer to deterministically choose when to override default behavior --- hence our use of it in TopQueue.

Note that I'm not defending the language's use of implicits. I merely find myself in Rome, surrounded by Romans.

Generics

We've factored the ordering preference out of our class entirely. We can track smallest or largest values at the user's whim. There's nothing innately numeric about any of our code anymore, so why limit ourselves to integers?

In fact, if we look at the implementation of PriorityQueue, we see that it doesn't care what we store in it either. It will happily store and sort any type of value, so long as there's an Ordering[A] implicit for it:

sealed class PriorityQueue[A](implicit val ord: Ordering[A])

And if PriorityQueue can do it, so can we.

And because we've done such a good job of factoring out type-specific logic, turning TopQueue into a generic class is literally just changing Int to A.

class TopQueue[A](val capacity: Int)(implicit val ord: Ordering[A])
    extends Builder[A, Seq[A]] {

  private[this] val queue = new PriorityQueue[A]()(ord.reverse)

  override def +=(elem: A): this.type = {
    if (queue.size < capacity) {
      queue.enqueue(elem)
    } else if (ord.gt(elem, queue.head)) {
      val _ = queue.dequeue
      queue.enqueue(elem)
    }
    this
  }

  def length: Int = queue.length
  def size: Int = length

  override def clear: Unit = queue.clear
  override def result: Seq[A] = queue.dequeueAll.reverse
}

And now for a test spin:

scala> val iq = new TopQueue[Int](3)
scala> iq ++= Seq(1, 3, 5, 2, 4, 7, 8)
scala> iq.result
res0: Seq[Int] = Vector(8, 7, 5)

scala> val dq = new TopQueue[Double](3)
scala> dq ++= Seq(1.0, 2.4, 3.14159, 4.2, 3.0, 2.0, 5)
scala> dq.result
res1: Seq[Double] = Vector(5.0, 4.2, 3.14159)

scala> val sq = new TopQueue[String](3)
scala> sq ++= lorem.split(" ")
scala> sq.result
res2: Seq[String] = Vector(voluptate, veniam, velit)

Companion Objects

Now let's make our user's lives a little easier. Using a companion object, we can streamline the creation of TopQueue instances. Here's how it'll work.

scala> val tq = TopQueue[Int](3)
tq: TopQueue[Int] = TopQueue@4cceb3a1

Well, that's got rid of the new. Feels a little more like Python, but no big deal. Let's move on.

scala> val nums = Seq.fill(20)(rng.nextInt(100))
nums: Seq[Int] = List(70, 34, 59, 48, 3, 20, 54, 0, 4, 66, 3, 59, 64, 66, 72, 8, 99, 13, 83, 72)

scala> TopQueue(3, nums).result
res0: Seq[Int] = Vector(99, 83, 72)

scala> TopQueue(3, lorem.split(" ")).result
res1: Seq[String] = Vector(voluptate, veniam, velit)

With one method, we've disposed of the new, automated populating the queue with our data, and got rid of that pesky type annotation ([Int], [Double], [String]). This works because Scala is able to infer the type of A from the collection we pass (Seq[A]).

But let's go one step further. The following code generates 1,000 192-bit BigInt values, selects the five largest values, and displays them one per line:

scala> val bigNums = TopQueue(5, 1000) {
       | BigInt(192, rng)
       }
scala> bigNums.result.foreach(println)
6271062381133668254405510778307065078929875644850405394465
6267447238665173934555606099974970878732800791325289036101
6261977580545774650519866804001706195383900351836286095420
6247944717903870481800176157059032425225672699263315794305
6241830593043113801159745624306070018623673052833017718145

We accomplish this by passing a code block to our object method as a parameter where it's evaluated 1,000 times, and each result is added to the queue.

Here's the code for our TopQueue companion object.

object TopQueue {
  def apply[A](size: Int)(implicit ord: Ordering[A]): TopQueue[A] =
    new TopQueue[A](size)

  def apply[A](size: Int, coll: Iterable[A])(implicit ord: Ordering[A]): TopQueue[A] = {
    val tq = new TopQueue[A](size)
    coll.foreach(tq += _)
    tq
  }

  def apply[A](size: Int, count: Int)(block: => A)(implicit ord: Ordering[A]): TopQueue[A] = {
    val tq = new TopQueue[A](size)
    for (_ <- 1 to count) tq += block;
    tq
  }
}

These are short, simple methods, but they go a long way to making our class more elegant and intuitive to use. Companion objects are frequently used as convenience constructors for Scala types, making TopQueue more intuitive for Scala programmers to use.

Implicit Classes

Finally for some pure syntactic sugar. Implicit classes allow us to approximate what C# users know as extension methods.

Our target behavior is this:

scala> val nums = Seq.fill(1000)(rng.nextInt(1000))
nums: Seq[Int] = List(223, 213, 19, 968, 555, 714, 291, 638, 1, 476, 199, 957, 727, 206, 283...

scala> nums.top(3)
<console>:13: error: value top is not a member of Seq[Int]
       nums.top(3)
            ^
scala> import topqueue.utils._
scala> nums.top(3)
res0: Seq[Int] = Vector(999, 996, 995)

Let's take a look at this magical utils package:

package object utils {
  implicit class IterableWithTop[A](val iter: Iterable[A]) {
    def top(capacity: Int)(implicit ord: Ordering[A]): Seq[A] =
      TopQueue(capacity, iter).result
  }
}

Implicit classes work much the same way as implicit variables. Where implicit variables are matched by type and passed whenever applicable, implicit classes are also matched by type and invoked when necessary to provide the requested method.

Consider the following.

val lorem = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
val words = lorem.split("\\W+")

import topqueue.utils._

val asciiAscending = Ordering[String].reverse
val alphaAscending = Ordering.by[String, (String, String)] {
  (word) => (word.toLowerCase, word)
}.reverse

val first5ASCII = words.top(5)(asciiAscending)
val first5Alpha = words.top(5)(alphaAscending)

You might remember above that I said implicit values had to be labeled implicit. This is only if we want Scala to pass them implicitly. As you can see here, we're passing them explicitly, so no implicit keyword is required.

This might seem a lot like default parameters, like you find in a lot of dynamic programming languages. But not quite. Recall that all these Ordering instances are being passed from the caller's context, whether they are the "default" values or custom overrides.

Ordering objects make good candidates for implicit values because Scala provides default values automatically. Another common implicit is the ExecutionContext used in the implementation of Scala's Future and Promise.


The Complete Works

With Scala as our canvas, we've followed a simple interview question from rough idea to finished work. We sketched several concepts and trashed the impractical ones. We finished a highly effective design and had it proofed. Once we had executed (and tested) our work, we fleshed it out to a complete, user-friendly tool.

As promised, I've posted the finished product for you to browse, study, or experiment with.

Thank you for joining me!