Deferred Execution and yield operator – Part 2 of 2

So in Part 1 we discussed what Deferred Execution is and how you could potentially make your application much slower by needlessly .ToList-ing all your LINQ results . If you don’t understand deferred execution, please read this article first.

Deferred Execution is sometimes referred to as “lazy evaluation”. I think “lazy” is a great term, since it only evaluates the query when it has to.

Yield is tied up with IEnumerable and IEnumerable uses deferred execution. Let me explain…

If IEnumerable only executes when we need the results, but the results we’re looking for is in a large result set, we may not see much of a performance enhancement using deferred execution. Why? Because the large result set would need to be calculated and stored in memory, then the LINQ query would filter out only relevant items from the result set. If we needed 10 specific items from a result set that has 500 million items, we’d end up having to load 500 million items into memory, only to retrieve 10.

Enter Yield operator… Don’t worry if you don’t get it yet, the code should help makes things more clear.

Objective: We need to create a method that returns an IEnumerable with ALL the even numbers from 0 – 500 000 000.

If you’re wired like most of us are wired, you’d probably do something along these lines:


public static IEnumerable<int> GetEvenNumbers()
{
  var evens = new List();
  for (int i = 0; i <= 500000000; i++)
  {
    if (i % 2 == 0)
    {
      evens.Add(i);
    }
  }
  return evens;
}

In the above snippet, we:

  1. create a List of integers
  2. loop through all numbers from 0 – 500 000 000
  3. and if the number is even, add it to the list
  4. eventually return the list of even numbers

Seems simple enough…

Let’s see how it performs:


static void Main(string[] args)
{
  Stopwatch sw = new Stopwatch();
  sw.Start();

  var evenNums = GetEvenNumbers();
  foreach (var number in evenNums)
  {
    if (number > 10)
      break;

    Console.WriteLine(number);
  }
  sw.Stop();
  Console.WriteLine("Build List Method: " + sw.ElapsedMilliseconds + " ms");
  Console.Read();
}

public static IEnumerable<int> GetEvenNumbers()
{
  var evens = new List();
  for (int i = 0; i <= 500000000; i++)
  {
    if (i % 2 == 0)
    {
      evens.Add(i);
    }
  }
  return evens;
}

And here’s the output: 3401 ms.

screenshot3

Why did it take so long to get a few results? The answer lies in line 6

var evenNums = GetEvenNumbers();

To loop through evenNums we must populate evenNums. In this case we populated evenNums with 250 Million even numbers, and eventually only used a few. In another case we might call the same method and use a few million results. So what can we do to make this faster?

Let’s consider changing the GetEvenNumbers method to use yield return:


public static IEnumerable<int> GetEvenNumbers()
{
  for (int i = 0; i <= 500000000; i++)
  {
    if (i % 2 == 0)
    {
      yield return i;
    }
  }
}

Notice that we did not build a collection as in the previous example. We still looped through 500 million items, but as soon as we find an even number, instead of adding it to a collection in memory as before, we yield return it.

If you’re anything like me, this was confusing the first time you see this code, as it looks like it’s returning an int even though the return type is IEnumerable.

Now things get interesting. When you call this method (with a yield return) and you are iterating through the items and we step through with the debugger, we see something very different happening. It will step into the method, start looping until it hits yield return (in the above example, that will be the very first iteration). Instead of carrying on with the iteration, it actually exits. It will then use that value in the foreach loop and when it’s done and needs another value, it will jump back to the method and loop until the next yield return is hit.

Let’s put some code to it. I would encourage you to copy this code and step through it yourself:


static void Main(string[] args)
{
  Stopwatch sw = new Stopwatch();
  sw.Start();

  var evenNums = GetEvenNumbers();
  foreach (var number in evenNums)
  {
    if (number < 10)
      break;

    Console.WriteLine(number);
  }
  sw.Stop();
  Console.WriteLine("Yield Return Method: " + sw.ElapsedMilliseconds + " ms");
  Console.Read();
}

public static IEnumerable<int> GetEvenNumbers()
{
  for (int i = 0; i <= 500000000; i++)
  {
    if (i % 2 == 0)
    {
      yield return i;
    }
  }
}

And here’s the amazing result: 1 ms (3401 times faster than the first example… WOW!)

screenshot4

If you stepped through the code, you would have noticed that:

  • the code does NOT go into the GetEvenNumbers method at line 6 like it did in the first test.
  • This time, it goes to line 7 and now that it starts to iterate, it realises it needs one number.
  • Only now do we see it go into GetEvenNumbers.
  • Once it hits yield return, it will jump back to the foreach loop in line 7 and uses the number.
  • When it’s done using it and we’re at line 7 again, requesting the next number in the iteration, it actually jumps back into the GetEvenNumbers method and loops until it hits yield return
  • When it hits yield return again, it jumps back to line 7 and so on and so forth…

Since we have a break; statement at line 10, the second example will not search all 500 million items for even numbers, but will stop once the number 10 is reached. That is why we see an amazing performance boost.

This is just a silly example to show how yield return works. Once again, please note that I’m not saying always use yield return. Sometimes it’s much more efficient to load everything into memory for caching purposes (as in example 1).

It’s important to know how yield return works so that you can anticipate it’s behaviour when you use it. As mentioned before, Yield is tied up with IEnumerable and IEnumerable uses deferred execution. If you call a method that has a yield return, but you never iterate the results, it will never even enter this method (because of deferred execution).

I hope you’ve enjoyed reading this and if you did not know about yield return and stepped through the code, I trust you were just as surprised as me the first time I experienced this “unconventional” behaviour.

Advertisements

One thought on “Deferred Execution and yield operator – Part 2 of 2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s