Owen Stephens London-based Polyglot Programmer

Scala Patience Diff

Consider diffing the following two files:

<div style="width: 49%; float: right;">


It's easy to see that the call to the "fib" function has been added instead of the "fact" function call. The "fact" function was removed, with the "fib" function being added a few lines below. Let's examine how a typical diff tool presents these changes.

<div style="width: 49%; float: right;">


It's clear for this example that the patience diff has produced a superior diff output; it represents something closer to how one would describe the changes verbally.

The reason that the patience diff algorithm (whose name comes from the fact that a patience sort is used to calculate the LCS) produces more readable diff outputs is because it only considers common lines that are unique within each file. What this means practically, is that lines that are blank, or contain a single brace are not considered as forming part of the common subsequence; the diff algorithm will not focus on trying to match braces (the GNU diff output shown above does this) but instead, focusses on unique lines (e.g. function definitions).

My Scala implementation can be found on Github, here. An example usage (which takes the two filenames to diff on stdin and prints the diff) is shown below. Patience diff is used by the Bazaar version control system, whose implementation I consulted for ideas whilst coding my Scala version.

import scala.io.Source._
import OwenDiff._
object Main {
    def main(args : Array[String]) : Unit = {
        val file1 = fromFile(args(0)).getLines.toList
        val file2 = fromFile(args(1)).getLines.toList

        val diffList = Diff.diff(file1, file2)


The main diff method is fairly simple; it recursively creates a list of pairs of matching line indices between the two files and coalesces them into contiguous blocks if possible, which are used to calculate the list of line insertion/deletion/changes required to go from file1->file2, as per diff.

A slight modification of my patience sort code, with the inclusion of a bisect method implementation gives the code that calculates the longest increasing subsequence (LIS) of its argument. The LIS code is used to determine the LCS of the line pairs.