Deferred Execution and ToList-ing everything? – Part 1 of 2

I’m sure we all know someone who will religiously say “Do it LINQ, it’s faster!” Or maybe you are the person who’s saying it. Is it really faster than the traditional way of iterating results? The answer (as it often tends to be) is: well,  it depends.

In my opinion the one thing that makes LINQ so powerful and has made it the bread and butter of almost every .NET developer out there, is that it makes use of Deferred Execution. Deferred what?!? Deferred Execution simply means that the execution of a LINQ query is delayed until you need to get some results. In other words, you can write your epic LINQ query, but it will not execute until you either:

  • Iterate the results (e.g. foreach loop)
  • Cast it to a List (.ToList())
  • Get a single result from the query (e.g. .Count() or .First())

Many years ago, when LINQ started showing it’s face and we could use generics with IEnumerable and List, it was an exciting time for many developers, myself included. A more seasoned developer than me at that time, told me to always .ToList all my LINQ results. His reason was: “You can do more with a List than IEnumerable!“. At that time, young and naive (now I’m just naive), I replied with “Sounds great!” Fortunately since then I’ve painstakingly found that blindly .ToList-ing everthing had cost my code much in performance.

So, is ToList-ing really so bad? Well, it could be. Let’s see why :


namespace DeferredExecution
{
  class Program
  {
    static void Main(string[] args)
    {
      Stopwatch sw = new Stopwatch();
      sw.Start();

      var lotsOfNums = Enumerable.Range(0, 100000000);

      // Get all the even numbers
      var a = lotsOfNums.Where(num => num % 2 == 0);

      // Multiply each even number by 100.
      var b = a.Select(num => num * 100);

      // Get the top 10
      var c = b.Take(10);

      // a, b and c have not executed yet. Execution happens here for the first time.
      foreach (var num in c)
      {
        Console.WriteLine(num);
      }

      sw.Stop();
      Console.WriteLine("Elapsed milliseconds: " + sw.ElapsedMilliseconds);
      Console.Read();
    }
  }
}

The above code snippet does some work. Each result (a, b and c) are of type IEnumerable and do NOT execute until the foreach in line 22.

The code below does exactly the same, except this time we’re converting each LINQ result (a, b and c) ToList().


namespace DeferredExecution
{
  class Program
  {
    static void Main(string[] args)
    {
      Stopwatch sw = new Stopwatch();
      sw.Start();

      var lotsOfNums = Enumerable.Range(0, 100000000).ToList();

      // Get all the even numbers
      var a = lotsOfNums.Where(num => num % 2 == 0).ToList();

      // Multiply each even number by 100.
      var b = a.Select(num => num * 100).ToList();

      // Get the top 10
      var c = b.Take(10);

      // a, b and c have executed on each step.
      foreach (var num in c)
      {
        Console.WriteLine(num);
      }

      sw.Stop();
      Console.WriteLine("Elapsed milliseconds: " + sw.ElapsedMilliseconds);
      Console.Read();
    }
  }
}

And the result of the above 2 snippets:

screenshot1 screenshot2

  • The first test took 8 ms to complete.
  • The second took 3097 ms. (That is 387 times slower than the 1st test. Ouch!!)

Why was the first test fast as lightning compared to the second? The simple reason is that it captured the intent of a, b and c and in the end, we only wanted the first 10 results. Deferred execution has meant that when we actually delayed execution until the foreach loop in line 22. By this time we only needed to process 10 results and so only 10 were processed.

The second test, we converted the results a, b and c into a List (.ToList). This meant that we forced execution and had to go through all 100 million results, find the even numbers and store them in memory (step a). Then we took all the even numbers in memory and did a calculation of all 50 million even numbers, storing it in memory once again (step b). Finally (step c), we took only the first 10 from memory.

This example is a bit silly and we wouldn’t usually break up a, b and c into 3 different queries. But none the less I hope it does show that we need to be aware of deferred execution and forced execution.

Lastly, please do not end this article believing ToList is the root of all evil. On the contrary, there are many reasons why you would want to explicitly cast your results into a List. For examply you may want a list to manipulate the collection by adding, removing and inserting items. Other times it’s a great way to force execution immediately and store a result set in memory which can be great for caching purposes.

Charlie Calvert has covered deferred execution in more detail in his article LINQ and Deferred Execution.

That’s it for now. In Part 2, I will cover yield return. What is it anyways? How does it work? And importantly should I care about it?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s