TopQueue in Scala - Part 3
July 08, 2018 #Algorithms #Data Structures #Deep Dive #Generics #Interviews #Scala #TopQueueIn the last post we finished implementing the TopQueue
class and looked at how it was able to perform so much better than the
plain sorting solution from Part 1.
This time we're going to drill down into Scala and turn our special purpose integer collection into an elegant, general purpose tool. We'll implement and briefly discuss a number of Scala features along the way.
I'll include snippets of our changes as we go. When we're done, I'll link to the finished (and tested) project.
Traits
Recall that TopQueue
has only three public methods:
add(elem: Int): Unit
size: Int
result: Seq[Int]
Here are the methods we need to override to extend the
Builder trait. In fact, TopQueue
is such a good match that you'd almost think I planned it that way. (I
didn't.)
+=(elem: A): Builder.this.type
result: B
clear: Unit
The result
method maps directly onto our own interface. +=
is
essentially the same as add
with a return value. clear
is the only
missing element, but is easily added.
import collection.mutable.Builder
class TopQueue(val capacity: Int) extends Builder[Int, Seq[Int]] {
private[this] val queue = new PriorityQueue[Int]()(Ordering[Int].reverse)
override def +=(elem: Int): this.type = {
if (queue.size < capacity) {
queue.enqueue(elem)
} else if (elem > queue.head) {
val _ = queue.dequeue
queue.enqueue(elem)
}
this
}
def length: Int = queue.length
def size: Int = length
override def clear: Unit = queue.clear
override def result: Seq[Int] = queue.dequeueAll.reverse
}
- In Line 1, we import the
Builder
type. - In Line 3, we extend
Builder
and plug in the type we're building from (Int
), and the type we're building to (Seq[Int]
). - In Line 7, we rename
add
to+=
, and change the return type fromUnit
("void") tothis.type
- In Line 14, we explicitly return
this
. - In Line 19, we add
clear
and delegate it directly to thePriorityQueue
object, just likelength
.
After annotating each of our three methods with override
, we can
take full advantage of all the functionality derived from this trait.
scala> val nums = Seq.fill(20)(rng.nextInt)
nums: Seq[Int] = List(1472582494, -1205266267, 2110246338, -1899208084, -48495158, -1555380301, -930665556, -506692887, -1803449477, 335650907, 1588492614, -1883579611, 1106055165, -207075109, -874047448, 1074765327, -2058546192, -1073301285, 1441493014, -1317047274)
scala> val tq = new TopQueue(3)
tq: TopQueue = TopQueue@b1cc697
scala> tq += 1
res0: tq.type = TopQueue@b1cc697
scala> tq ++= Seq(2, 4, 3, 9)
res1: tq.type = TopQueue@b1cc697
scala> nums.foldLeft(tq)(_ += _)
res2: TopQueue = TopQueue@b1cc697
scala> tq.result
res3: Seq[Int] = Vector(2110246338, 1588492614, 1472582494)
Implicits
TopQueue
is fine if you want to find the highest grade levels in
your class, the authors who belong on the New York Times
Best Sellers list, or which World Cup athletes spent the most
time rolling on the ground.
But with just a little work, we can also use it to find the fastest times for the Tour de France, the lowest unemployment rates, or the politicians with the lowest approval rating.
Instead of telling PriorityQueue how we want to order our values, why
don't we let our users tell us what they want? Using implicit
values, we can turn our TopQueue into a MinQueue. First, here's how it
will work:
scala> implicit val minOrd = Ordering[Int].reverse
minOrd: scala.math.Ordering[Int] = scala.math.Ordering$$anon$4@44bbdd5c
scala> val tq = new TopQueue(3)
tq: TopQueue = TopQueue@536d0528
scala> nums.foldLeft(tq)(_ += _)
res4: TopQueue = TopQueue@536d0528
scala> tq.result
res5: Seq[Int] = Vector(-2130155797, -1828018352, -1822446556)
Much like in our TopQueue
class, we call reverse
on an
Ordering[Int]
instance. We then assign the "reversed" instance to a
variable named minOrd
using the implicit
keyword ... and
never mention it again.
And yet TopQueue has still become a MinQueue, correctly returning the three smallest values passed to it. What is this magic?!
Let's look at the updated implementation:
class TopQueue(val capacity: Int)(implicit val ord: Ordering[Int])
extends Builder[Int, Seq[Int]] {
private[this] val queue = new PriorityQueue[Int]()(ord.reverse)
override def +=(elem: Int): this.type = {
if (queue.size < capacity) {
queue.enqueue(elem)
} else if (ord.gt(elem, queue.head)) {
val _ = queue.dequeue
queue.enqueue(elem)
}
this
}
...
}
We've only made three changes here:
- In Line 1, we add an
implicit
parameter to theTopQueue
constructor, specifying the nameord
and the typeOrdering[Int]
. - In Line 4, we call
reverse
on the implicitly passedord
variable when creating ourPriorityQueue
. - In Line 9, we've changed the
>
operator to theord.gt()
method.
With these changes, all decision logic is derived from the
Ordering[Int]
object (that we never even passed... or did we?) If
you haven't seen implicit variables before, there are probably a few
questions you're asking about now.
First, how does Scala know to implicitly pass the minOrd
variable
to TopQueue
when we didn't even mention it?!
In order for a variable to be implicitly passed, several things have to be in alignment:
- The method must be declared to accept an implicit parameter using
the
implicit
keyword. - The variable to be passed must be declared with the
implicit
keyword. - And finally, the type of the implicit variable must match the type of the method's implicit parameter!
If the method parameter is not declared implicit, the method will expect the variable to be passed explicitly.
If there is no implicit variable with the correct type in scope when the method is called, the invocation will fail.
Okay, second question: isn't this implicit business just a little too clever and magical? Probably. But it's also very powerful, and allows the programmer to deterministically choose when to override default behavior --- hence our use of it in TopQueue.
Note that I'm not defending the language's use of implicits. I merely find myself in Rome, surrounded by Romans.
Generics
We've factored the ordering preference out of our class entirely. We can track smallest or largest values at the user's whim. There's nothing innately numeric about any of our code anymore, so why limit ourselves to integers?
In fact, if we look at the implementation of
PriorityQueue
, we see that it doesn't care what we store in it either.
It will happily store and sort any type of value, so long as there's an
Ordering[A]
implicit for it:
sealed class PriorityQueue[A](implicit val ord: Ordering[A])
And if PriorityQueue
can do it, so can we.
And because we've done such a good job of factoring out type-specific
logic, turning TopQueue
into a generic class is literally just
changing Int
to A
.
class TopQueue[A](val capacity: Int)(implicit val ord: Ordering[A])
extends Builder[A, Seq[A]] {
private[this] val queue = new PriorityQueue[A]()(ord.reverse)
override def +=(elem: A): this.type = {
if (queue.size < capacity) {
queue.enqueue(elem)
} else if (ord.gt(elem, queue.head)) {
val _ = queue.dequeue
queue.enqueue(elem)
}
this
}
def length: Int = queue.length
def size: Int = length
override def clear: Unit = queue.clear
override def result: Seq[A] = queue.dequeueAll.reverse
}
And now for a test spin:
scala> val iq = new TopQueue[Int](3)
scala> iq ++= Seq(1, 3, 5, 2, 4, 7, 8)
scala> iq.result
res0: Seq[Int] = Vector(8, 7, 5)
scala> val dq = new TopQueue[Double](3)
scala> dq ++= Seq(1.0, 2.4, 3.14159, 4.2, 3.0, 2.0, 5)
scala> dq.result
res1: Seq[Double] = Vector(5.0, 4.2, 3.14159)
scala> val sq = new TopQueue[String](3)
scala> sq ++= lorem.split(" ")
scala> sq.result
res2: Seq[String] = Vector(voluptate, veniam, velit)
Companion Objects
Now let's make our user's lives a little easier. Using a companion
object, we can streamline the creation of TopQueue
instances. Here's
how it'll work.
scala> val tq = TopQueue[Int](3)
tq: TopQueue[Int] = TopQueue@4cceb3a1
Well, that's got rid of the new
. Feels a little more like Python,
but no big deal. Let's move on.
scala> val nums = Seq.fill(20)(rng.nextInt(100))
nums: Seq[Int] = List(70, 34, 59, 48, 3, 20, 54, 0, 4, 66, 3, 59, 64, 66, 72, 8, 99, 13, 83, 72)
scala> TopQueue(3, nums).result
res0: Seq[Int] = Vector(99, 83, 72)
scala> TopQueue(3, lorem.split(" ")).result
res1: Seq[String] = Vector(voluptate, veniam, velit)
With one method, we've disposed of the new
, automated populating the
queue with our data, and got rid of that pesky type annotation
([Int]
, [Double]
, [String]
). This works because Scala is able
to infer the type of A
from the collection we pass (Seq[A]
).
But let's go one step further. The following code generates 1,000
192-bit BigInt
values, selects the five largest values, and displays
them one per line:
scala> val bigNums = TopQueue(5, 1000) {
| BigInt(192, rng)
}
scala> bigNums.result.foreach(println)
6271062381133668254405510778307065078929875644850405394465
6267447238665173934555606099974970878732800791325289036101
6261977580545774650519866804001706195383900351836286095420
6247944717903870481800176157059032425225672699263315794305
6241830593043113801159745624306070018623673052833017718145
We accomplish this by passing a code block to our object method as a parameter where it's evaluated 1,000 times, and each result is added to the queue.
Here's the code for our TopQueue
companion object.
object TopQueue {
def apply[A](size: Int)(implicit ord: Ordering[A]): TopQueue[A] =
new TopQueue[A](size)
def apply[A](size: Int, coll: Iterable[A])(implicit ord: Ordering[A]): TopQueue[A] = {
val tq = new TopQueue[A](size)
coll.foreach(tq += _)
tq
}
def apply[A](size: Int, count: Int)(block: => A)(implicit ord: Ordering[A]): TopQueue[A] = {
val tq = new TopQueue[A](size)
for (_ <- 1 to count) tq += block;
tq
}
}
- In Line 2, we add a simple wrapper around the constructor.
- In Line 5, we let the user pass in any
Iterable[A]
and iterate over it, adding each value to the queue. - In Line 11, we see the syntax for passing a block (
=> A
--- essentially "any block expression that evaluates to anA
value"). We evaluate the expressioncount
times, pushing the result each time.
These are short, simple methods, but they go a long way to making our
class more elegant and intuitive to use. Companion objects are
frequently used as convenience constructors for Scala types, making
TopQueue
more intuitive for Scala programmers to use.
Implicit Classes
Finally for some pure syntactic sugar. Implicit classes allow us to approximate what C# users know as extension methods.
Our target behavior is this:
scala> val nums = Seq.fill(1000)(rng.nextInt(1000))
nums: Seq[Int] = List(223, 213, 19, 968, 555, 714, 291, 638, 1, 476, 199, 957, 727, 206, 283...
scala> nums.top(3)
<console>:13: error: value top is not a member of Seq[Int]
nums.top(3)
^
scala> import topqueue.utils._
scala> nums.top(3)
res0: Seq[Int] = Vector(999, 996, 995)
Let's take a look at this magical utils
package:
package object utils {
implicit class IterableWithTop[A](val iter: Iterable[A]) {
def top(capacity: Int)(implicit ord: Ordering[A]): Seq[A] =
TopQueue(capacity, iter).result
}
}
Implicit classes work much the same way as implicit variables. Where implicit variables are matched by type and passed whenever applicable, implicit classes are also matched by type and invoked when necessary to provide the requested method.
Consider the following.
val lorem = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
val words = lorem.split("\\W+")
import topqueue.utils._
val asciiAscending = Ordering[String].reverse
val alphaAscending = Ordering.by[String, (String, String)] {
(word) => (word.toLowerCase, word)
}.reverse
val first5ASCII = words.top(5)(asciiAscending)
val first5Alpha = words.top(5)(alphaAscending)
You might remember above that I said implicit values had to be labeled
implicit
. This is only if we want Scala to pass them
implicitly. As you can see here, we're passing them explicitly, so
no implicit
keyword is required.
This might seem a lot like default parameters, like you find in a lot
of dynamic programming languages. But not quite. Recall that all
these Ordering
instances are being passed from the caller's
context, whether they are the "default" values or custom overrides.
Ordering
objects make good candidates for implicit values because
Scala provides default values automatically. Another common implicit
is the ExecutionContext
used in the
implementation of Scala's Future
and Promise
.
The Complete Works
With Scala as our canvas, we've followed a simple interview question from rough idea to finished work. We sketched several concepts and trashed the impractical ones. We finished a highly effective design and had it proofed. Once we had executed (and tested) our work, we fleshed it out to a complete, user-friendly tool.
As promised, I've posted the finished product for you to browse, study, or experiment with.
Thank you for joining me!