What's A Closure?
A closure is a function that is bound to the environment in which it is declared. Thus, the function can reference elements from the environment within it's body. In the case of a C# 2.0 anonymous method, the environment to which it is bound is its parenting method body. This means that local variables from the parenting method body can be referenced within the anonymous method's body. So, this code prints 0 to the console as expected:
delegate void Action();
static void Main(string[] args)
{
int x = 0;
Action a = delegate { Console.WriteLine(x); };
a();
}
static void Main(string[] args)
{
int x = 0;
Action a = delegate { Console.WriteLine(x); };
a();
}
Most developers don't have any problem with the code above. A local variable "x" is declared and initialized to 0. Then, a new delegate "a" of type Action is declared and assigned to an anonymous method that writes "x" to the console. Finally, "a" is called and the value of "x" (0) is printed to the console. The rub occurs when the code is changed like this:
delegate void Action();
static void Main(string[] args)
{
int x = 0;
Action a = delegate { Console.WriteLine(x); };
x = 1;
a();
}
static void Main(string[] args)
{
int x = 0;
Action a = delegate { Console.WriteLine(x); };
x = 1;
a();
}
Now, "x" is reassigned to a value of 1 before "a" is called. What will be output to the console?
(NOTE: This has actually been the cause of misunderstanding and controversy. There has even been confusion by some at Microsoft itself about this issue. You can read about it here and here. Make sure that you read the comments on these blog posts to get the full flavor of the debate. If you're looking for a blog post by a .NET legendary figure that ends the debate, look no further.)
It turns out that the answer is 1, not 0. The reason for this is that the anonymous method is a closure and is bound to its parenting method body and the local variables in it. The important distinction is that it is bound to variables, not to values. In other words, the value of "x" is not copied in when "a" is declared. Instead, a reference to "x" is used so that "a" will always use the most recent value of "x". In fact, this reference to "x" will be persisted even if "x" goes out of scope. Consider this code:
delegate void Action();
static Action GetAction()
{
int x = 0;
Action a = delegate { Console.WriteLine(x); };
x = 1;
return a;
}
static void Main(string[] args)
{
Action a = GetAction();
a();
}
static Action GetAction()
{
int x = 0;
Action a = delegate { Console.WriteLine(x); };
x = 1;
return a;
}
static void Main(string[] args)
{
Action a = GetAction();
a();
}
That will still print 1 to the console even though "x" is out of scope by the time that "a" is called. So, how is this achieved? Well, the good news is that this is handled through compiler magic. There isn't any runtime support for closures. That means that you could use the same techniques to create a closure without using an anonymous method. In fact, it is so simple that you could even do it with C# 1.0.
How's It Work?
I hinted at how closures work earlier when said "a reference to 'x' is used so that 'a' will always use the most recent value of 'x'". The key here is that a reference is used and not a value. The variable "x" is promoted from the stack to the heap in some way. And that promotion is made in such a way that the scope of "x" can be increased from its local scope. Oh, and this is done without boxing "x" to a reference type.
To make all of this possible, the C# compiler generates a special helper class. "x" becomes a field of that class and the anonymous method assigned to "a" becomes an instance method of that class. In code, it looks something like this:
delegate void Action();
sealed class ActionClosure
{
public int x;
public void AnonMethod()
{
Console.WriteLine(x);
}
}
static Action GetAction()
{
ActionClosure closure = new ActionClosure();
closure.x = 0;
Action a = new Action(closure.AnonMethod);
closure.x = 1;
return a;
}
static void Main(string[] args)
{
Action a = GetAction();
a();
}
sealed class ActionClosure
{
public int x;
public void AnonMethod()
{
Console.WriteLine(x);
}
}
static Action GetAction()
{
ActionClosure closure = new ActionClosure();
closure.x = 0;
Action a = new Action(closure.AnonMethod);
closure.x = 1;
return a;
}
static void Main(string[] args)
{
Action a = GetAction();
a();
}
The "GetAction" method is really where the magic happens:
- At the beginning of method, an instance of the "ActionClosure" class is created. (Note that I chose to use the names "ActionClosure" and "AnonMethod" for clarity. In reality, the compiler generates names to prevent name collision.)
- All references to the local variable "x" in the "GetAction" method have been replaced with references to the "x" field on the "ActionClosure" instance.
- The delegate "a" is now assigned to a new delegate instance for "AnonMethod" on the "ActionClosure" instance.
The cool thing to me is that this code will compile under C# 1.0 and work the same as the code using an anonymous method. It's all compiler magic and completely transparent to a C# 2.0 programmer.
Why Do I Care?
The biggest sell for closures is that they can dramatically improve code and make it more robust. Many developers have voiced a mistrust of C# 2.0 anonymous methods because of their potential for abuse. After all, it's not too hard to imagine anonymous methods turning quickly into spaghetti code. But in my experience, many developers have dismissed them simply because they don't understand the power behind them: closures.
Let's look at an example:
static List<string> g_Names = new List<string>(
new string[] {
"Bill Gates",
"Dustin Campbell",
"Dustin's Trophy Wife",
"Foster the Dog"
});
static void Print(List<string> items)
{
foreach (string item in items)
Console.WriteLine(item);
}
static void Main(string[] args)
{
Print(g_Names);
}
new string[] {
"Bill Gates",
"Dustin Campbell",
"Dustin's Trophy Wife",
"Foster the Dog"
});
static void Print(List<string> items)
{
foreach (string item in items)
Console.WriteLine(item);
}
static void Main(string[] args)
{
Print(g_Names);
}
This application simply creates a list of names and then outputs them to the console. It works perfectly well.
Now, let's assume that we need to add the ability to retrieve a list of names that starts with some particular text. This is pretty easy to implement because there is a handy method on List<T> called FindAll that simply takes a Predicate delegate and produces a new List<T> containing all of the items that the Predicate returns true for. We can add this new function like so:
static string g_StartingText;
static bool NameStartsWith(string name)
{
return name.StartsWith(g_StartingText);
}
static List<string> GetNamesStartingWith(string startingText)
{
g_StartingText = startingText;
return g_Names.FindAll(NameStartsWith);
}
static bool NameStartsWith(string name)
{
return name.StartsWith(g_StartingText);
}
static List<string> GetNamesStartingWith(string startingText)
{
g_StartingText = startingText;
return g_Names.FindAll(NameStartsWith);
}
Everything is working fine until our client calls and says that there is a new requirement for this function: it must be thread-safe. In other words, the function must produce valid results even if it is called by multiple threads. This is problematic because, while one thread is finding all names starting with "D", another thread could change "g_StartingText" to something else and bad results would be returned.
One possibility might be tempting to place a lock on "g_StartingText". This would certainly make the function thread-safe but it has some drawbacks. The biggest issue with this approach is that threads will not be able to access this function concurrently. If a thread acquires the lock, all other threads must wait until that thread is finished. In other words, this method becomes a potential bottleneck because only one thread can access it at a time and if there any additional processors on the machine they won't be used.
The solution is to use an anonymous method to create a closure and remove the shared state:
static List<string> GetNamesStartingWith(string startingText)
{
return g_Names.FindAll(
delegate(string name)
{
return name.StartsWith(startingText);
});
}
{
return g_Names.FindAll(
delegate(string name)
{
return name.StartsWith(startingText);
});
}
Even with the verbosity of anonymous methods, the code has been greatly reduced. And, assuming that "g_Names" will never be modified, this function could run concurrently on multiple threads and multiple cores without any synchronization.
Admittedly, my example is highly-contrived but it's not too hard to imagine the same situation in larger scale applications.
Closures are critical to functional programming. Without closures, several techniques like currying and memoization (more on these in future articles) don't work. And, without understanding the subtleties of closures, you won't be able to use the features of C# 3.0 properly. I don't need to defend the value of functional programming in C# because others have done that extremely well.
We will be building on this knowledge in future articles so understanding closures is important. Things will quickly start to make your head spin if you don't understand this concept.
Until next time...
https://aakinshin.net/posts/closures/
C# gives us an ability to use closures. This is a powerful tool that allows anonymous methods and lambda-functions to capture unbound variables in their lexical scope. And many programmers in .NET world like using closures very much, but only few of them understand how they really work. Let’s start with a simple sample:
public void Run()
{
int e = 1;
Foo(x => x + e);
}
Nothing complicated happens here: we just captured a local variable
e
in its lambda that is passed to some Foo
method. Let’s see how the compiler will expand such construction.*public void Run()
{
DisplayClass c = new DisplayClass();
c.e = 1;
Foo(c.Action);
}
private sealed class DisplayClass
{
public int e;
public int Action(int x)
{
return x + e;
}
}
As you see from the sample, an additional class containing the captured variable and the target method is created for our closure. This knowledge will help us understand how closures behave in different situations.
The for loop
Probably, this is the most classic sample cited by everyone:
public void Run()
{
var actions = new List<Action>();
for (int i = 0; i < 3; i++)
actions.Add(() => Console.WriteLine(i));
foreach (var action in actions)
action();
}
The sample contains a typical error. Newbie developers think that this code will output
"0 1 2"
, but in fact it will output "3 3 3"
. Such strange behavior is easy to understand if you look on the expanded version of this method:public void Run()
{
var actions = new List<Action>();
DisplayClass c = new DisplayClass();
for (c.i = 0; c.i < 3; c.i++)
list.Add(c.Action);
foreach (Action action in list)
action();
}
private sealed class DisplayClass
{
public int i;
public void Action()
{
Console.WriteLine(i);
}
}
In this case they say that the variable is cycled by reference, not by value. Many programmers criticize this peculiarity of closures. They think it’s unclear though it’s quite logical for those who get a clear idea what’s inside the closures.
The foreach loop
Let’s review a more interesting sample:
public void Run()
{
var actions = new List<Action>();
foreach (var i in Enumerable.Range(0, 3))
actions.Add(() => Console.WriteLine(i));
foreach (var action in actions)
action();
}
What will the code output? Sorry to say that there is no simple answer to this question. The matter is that earlier versions of C#, behavior of foreach was equal to behavior of for: variable of the cycle was created once and was captured in all lambdas. Starting from C# 5.0 this behavior has changed (here Eric Lippert admits that Microsoft made the breaking change). Now this code outputs
"0 1 2"
. Note that this is a peculiarity of language, not of the platform. If you work in Visual Studio 2012 and change target framework to 3.5, nothing will change. And you will be able to see the old behavior in Visual Studio 2010. John Skit explains why it was decided to make different behavior for foreach
and for
. Let’s have a look at a new variant of the expanded version of the code:public void Run()
{
var actions = new List<Action>();
foreach (int i in Enumerable.Range(0, 3))
{
DisplayClass c = new DisplayClass();
с.i = i;
list.Add(c1.Action);
}
foreach (Action action in list)
action();
}
private sealed class DisplayClass
{
public int i;
public void Action()
{
Console.WriteLine(i);
}
}
You can easily see the difference: in C# 5.0, for every iteration of the foreach cycle, we get a new instance of the generated class providing closure logic.
Closure of multiple variables
Let’s review a case when we get multiple variables that are captured in different variables:
public void Run()
{
int x = 1, y = 2;
Foo(u => u + x, u => u + y);
}
One can think that in this case two additional classes are generated and each of them will be responsible for a single variable. Actually, a single class will be generated:
public void Run()
{
DisplayClass с = new DisplayClass();
с.x = 1;
с.y = 2;
Foo(с.ActionX, c.ActionY);
}
private sealed class DisplayClass
{
public int x;
public int y;
public int ActionX(int u)
{
return u + x;
}
public int ActionY(int u)
{
return u + y;
}
}
Thus, lambdas are bound; garbage collector will access them when no reference to either of them will remain. Imagine the situation when the first lambda is used when initiating a long-living object, the second one is used when completing the work with it. And let there be many such objects. In this case initializing lambdas will stay in memory for quite a long time, though no one will ever invoke them.
Scope
There is one more peculiarity of closures that you need to know. Let’s review a sample:
public void Run(List<int> list)
{
foreach (var element in list)
{
var e = element;
if (Condition(e))
Foo(x => x + e);
}
}
And here is the question: where the closure object will be created? In spite the lambda is created inside
if
, the object will be created in the same scope the captured variable is located in.public void Run(List<int> list)
{
foreach (int element in list)
{
DisplayClass c = new DisplayClass();
c.e = element;
if (Condition(c.e))
Foo(c.Action);
}
}
private sealed class DisplayClass
{
public int e;
public int Action(int x)
{
return x + e;
}
}
This peculiarity is important when the
list
is quite big and the Condition(e)
is executed quite rarely. DisplayClass
instances will be created uselessly. It will affect memory and performance. We can fix the situation:public void Run(List<int> list)
{
foreach (var element in list)
if (Condition(element))
{
var e = element;
Foo(x => x + e);
}
}
This method will be deployed in a more optimal manner since
DisplayClass
constructor will be invoked when it is really necessary:public void Run(List<int> list)
{
foreach (int element in list)
if (Condition(element))
{
DisplayClass c = new DisplayClass();
c.e = element;
Foo(c.Action);
}
}
private sealed class DisplayClass
{
public int e;
public int Action(int x)
{
return x + e;
}
}
Problems
You can find problems about the subject in ProblemBook.NET: ClosureAndForeach, ClosureAndFor, ClosureAndVariable.
No comments:
Post a Comment