Streams

Introduction

In this blog post, we will look at the concept of Streams using the language of Scala as the vehicle.

Motivation

In Scala, a List can be expressed as data type parameterized on type A and contains 2 data constructors: Nil and Cons

Screen Shot 2018-04-13 at 8.33.04 PM

Cons data constructor is a recursive structure which takes an element A as head and another List as tail. The terminal element in the List is a Nil data constructor.

Screen Shot 2018-04-13 at 8.36.54 PM

Strict and non-strict functions

A function is non-strict, if it chooses not to evaluate one or more of its arguments, whereas a strict function always evaluates its arguments.

In the above example, Cons data constructor is strict in evaluating its arguments head and tail. For eg., Consider this example:

Screen Shot 2018-04-13 at 8.52.12 PM.png

In the above example, map(_ + 25) produces an intermediate list which is then filtered with a condition of (_ % 7), producing another list. Each transformation produces an intermediate list which is used as an input to the next transformation and then discarded to produce the final list. If we do it in an imperative way, we can do the above logic in one pass without creating any intermediate lists.

Screen Shot 2018-04-13 at 9.03.24 PM.png

In order to achieve similar behavior as the above using functional programming, we will use “non-strict” (or) “laziness” property of functions.

Thunked…

As explained before, non-strictness (or laziness) is a property of a function where a function may choose not to evaluate one or more of its arguments.

For eg:

Thunk

The parameters a and b are non-strict and they need to be invoked in the function body so as to get evaluated. In lazinessExAGetsEvaluated function, a gets evaluated and b is not evaluated whereas in lazinessExBGetsEvaluated function, b gets evaluated and a is not evaluated. The syntax () => A is called thunk.

The above syntax can be further simplified by removing () as below:

Screen Shot 2018-04-14 at 11.33.56 AM.png

With this new knowledge of creating thunks to make functions parameters non-strict, we will solve the problem of creating a non-strict version of List which we call it Streams.

Streams.png

In the above definition of Stream, similar to List we have a recursive data-structure in Cons. The key difference in this version of Cons is, both the head and tail are non-strict /lazy/thunks. Since they are non-strict, they are yet to be evaluated which is the fundamental difference compared to Cons data constructor of List data type. Note, for Streams we should be having smart constructors, which will make use of lazy val syntax to make sure head and tail thunks for Cons data constructor are being evaluated only once and the result is memoized but we will not cover this part in detail in this blog. 

For eg., if I need to construct a stream with 3 elements:

Screen Shot 2018-04-14 at 9.39.06 PM.png

The most important difference in this case is, we are utilizing non-strictness property of the Cons data constructor. head and tail of the first Cons data constructor are thunks. So in essence, non-strictness or laziness allows us to separate the description of an expression from the evaluation of that expression. This gives us a great ability to describe a larger expression than we need and then evaluate only a portion of it.

Ok, so we have separated program description from evaluation. How do we evaluate the stream? Streams will provide a toList function to actually evaluate the thunks that is captured in the stream.

toList.png

How do we evaluate only a small portion from a larger expression? foldRight is a primitive combinator for Streams using which we can traverse the stream. This gives us the ability to build a lot of derived functions or combinators on top of foldRight.

foldRight.png

The key point in definition of foldRight is function f which is non-strict in its 2nd argument. This gives function f, a chance to figure out whether it needs to evaluate its 2nd argument or not on the need basis. This gives the ability to only evaluate a smaller portion from a larger expression.

For eg., if we need an exists function to figure out whether an element e is present in the stream or not, we will do:

exists.png

Screen Shot 2018-04-15 at 10.52.29 AM.png

If we look at the above example, function to figure out element 2, will stop the execution of the tail 3 and 4 once it figures out element 2 is present in the stream. This gives the powerful ability to only evaluate a smaller portion from a larger expression.

Coming to the original problem of using Stream to avoid intermediate lists, we need to come up with the definition of map and filter for Streams. Both the derived combinators map and filter can be implemented using the primitive combinator foldRight.

Screen Shot 2018-04-16 at 10.24.31 AM

In the above function definition for map and filter, we can see that both are returning Streams. In Streams, we know that both head and tail are thunks where we only describe the expression. In the case of map, we return a Stream.cons(f(h),t)​ and in the case of filter, we either return Stream.cons(h, t) if f(h) is true else we return the tail t which is again a Stream. 

We have all the building blocks to provide the solution by avoiding intermediate data structure creation using Streams instead of Lists

Screen Shot 2018-04-16 at 9.57.23 PM.png

In the above example, toList triggers the computation of map​ to the first element 1. Applying map to element 1​ makes it 26. Now filter is applied to 26 which is not divisible by 7​ and is discarded. toList​ triggers the computation of map to the second element 2 and is then filtered and hence discarded. toList triggers computation of map and filter for element 3​., since element 28 is divisible by 7, element 28​ becomes the first element of the resultant List. toList triggers computation of map and filter for element 4 and then element 5 and both of them are discarded. In this above example, we did not fully instantiate the intermediate stream during map and filter. Also, we took an element and applied both map and filter to see if it’s eligible to make into the resultant List before moving on to the next element. It behaved very similar to the imperative way of solving using loops as shown before.

Conclusion

Streams are very powerful structures which exploit non-strictness or laziness property of the functions using which we can improve the efficiency and modularity of functional programs.

References

Leave a comment