TensorFlow Eager Execution v.s. Graph (@tf.function)

Jonathan Hui
14 min readDec 29, 2020

--

Eager execution is highly promoted in TF 2. It makes coding and debugging easier. But that is not necessarily suggested for real training or production. In this article, we will talk about the two options: the eager execution mode and the graph mode, as well as its pro and con. In particular, running code in graph mode is not as seamless as it was projected. It can have significant performance impacts if some issues are ignored.

By default, TF operations in 2.x are run in eager execution mode. For example, tf.matmul (matrix multiplication) below executes immediately and returns a tf.Tensor object containing the result [[4.]]. This is what we usually expect in running Python codes. Codes are executed line by line with computation results returned immediately.

However, the graph mode paints a different picture. Instead, tf.matmul returns a symbolic handle to a node in a computational graph. The multiplication execution is deferred.

Source: TensorFlow doc

Eager Execution Disadvantage

In graph mode, tf.matmul adds computation node(s) (tf.Operation) onto a computational graph (tf.Graph). In TF v1 API, we call session.run later to compile and execute the computation graph. This deferred execution allows the TF Grappler to run automatically in the background. It applies a long list of graph optimizations to improve execution performance. For example, node operations can be combined or removed for efficiency. To take advantage of this optimization in 2.x, we need to run the code in the graph mode, instead of the eager execution. TF internal benchmark indicates 15% performance improvement on average. But for the computational-heavy models, like ResNet50, eager execution mode with a GPU is comparable to the graph mode. But this gap will increase when there are lots of small operations. And it decreases with fewer expensive operations like CNN. In reality, your mileage varies depending on your model.

It is pretty easy to change the code to run in graph mode. We simply annotate a function with @tf.fuction such that the whole function will be compiled, optimized, and run as a single computational graph. No extra code is needed. And @tf.fuction will extend its coverage over all the methods that it calls and create a single graph.

The graph mode creates a dataflow graph out of the Python code. This graph creates a portable solution. We can be restored the model without the original Python code or deployed it into a device without the Python environment. Indeed, this is required for saving models to files with SavedModel. This portability has a great advantage in production deployment. By export a SavedModel including data preprocessing, we eliminate possible mistakes in re-creating the data preprocessing logic in production. These data preprocessing logic may be sensitive to training data. For example, a TextVectorization layer needs to be initialized by the original vocabulary of the training dataset. This can be error-prone during deployment, in particular for NLP problems.

Graph Mode Catches

However, there is a major catch for graph mode. The key reason for using eager execution, as the default for TF 2, is to make coding and debugging easier. TF 1 APIs are tedious and hard to debug. In graph mode, tf.matmul adds node(s) to the computational graph rather than returning the computation results immediately. The graph mode will not allow the debugger to stop at a breakpoint where tf.matmul is in TF 2.x. It is hard to trace the code.

So during early development and debugging, we may comment out the annotation temporarily. Or we can use tf.config.run_functions_eagerly(True) to turn on eager execution. By setting True before the square function below and False later, we can break inside the @tf.function method.

With eager execution, we can use the regular Python constructs to program our logic. This makes the code Python friendly and much easier to read. But to use the same code in the graph mode, we need to convert these constructs to be part of the graph. AutoGraph (discussed later) will transform some Python flow controls (if, while and for loop) into TF operations automatically. But being said, there are irregularities with no easy to understand rationalities or rules. These irregularities may result in an exception or the constructs will simply be ignored. Otherwise, it may create unexpected side effects. In the next few sections, we will go through issues that we may encounter in the graph mode.

Not to use Assert

For example, some Python constructs are simply not supported in the computational graph. For example, Python “assert” within a @tf.function function will throw an exception. Use tf.debugging.assert_{condition} instead for both modes.

Trace

But how the Python and TF code are transformed into a graph in the graph mode. When an annotated tf.function method is first invoked, it will be traced first to convert the function into a computation graph. Conceptually, the function is compiled into a graph. In this process, TF operations will be converted into nodes in the graph. Once the graph is done, it is executed automatically. This is not the full picture but let’s illustrate it with a very simple example first.

print v.s. tf.print

Python “print” prints its parameters to the console. But when it is inside a @tf.function method, it is executed during tracing only — the graph compilation phase. It does not add any nodes to the graph. Therefore, this operation is phrased as a Python side effect as it has impacts during tracing but not during the graph execution. On the contrary, tf.print is a TF operation. It adds a node to the graph in the tracing without any output to the console. It prints out to the console when the graph is executed. For this reason, we use these operations to troubleshoot and differentiate the trace and the execution phase.

During tracing, operations in method f are executed even its purpose is to compile a graph. We can categorize these operations into Python operations and TF operations. Of course, every operation is executed by Python. But the TF operations do not perform the real operations. It simply adds nodes to the graph. Thanks to the @tf.function annotation, once the trace is done, the graph is executed automatically. Let’s illustrate it with an example.

(I run the code in debugger line-by-line so the display will follow the chronicle order of the code. Otherwise, the output statement can be displayed slightly out-of-order.)

When f(1) is called for the first time in line 23, it will trace the method first to build a graph. In tracing, print outputs ① and tf.print does nothing but to add a node to the graph. Once the graph is created, it will be executed. “print” is absent in the graph, and produces nothing while tf.print outputs ②. So line 23 outputs the first 2 lines in the console for two different phases.

When we call f(1) again in line 25, the method is already traced and the graph can be reused. Therefore, it goes to graph execution directly, and ③ will be printed out only.

AutoGraph (tf.autograph)

The eager execution allows us to use Python control flow like “while”, “for”, “if”, “break” and “continue”. To make it works with graph mode, AutoGraph converts some of these Python flow controls into TF operations automatically. So they will be treated as TF operations instead of Python operations. This allows us to reuse the more natural Python control syntaxes for both modes. Below are examples of converting Python flows to TF operations by AutoGraph.

AutoGraph also converts the iterations of a dataset in a for loop into TF operations.

Nevertheless, there are major issues that need to be very careful. But we will defer the discussion until we know more about the tracing.

tracing in “if” statement

These conversions will be made if the condition in “while” or “if” is a Tensor. What happens during the trace below may surprise you. n is a Tensor so the statement “if n==0” will be converted to the equivalent TF operation tf.cond.

But, why there are three trace printouts above (① ② and ③).

To have the computational graph to work with different values of the input Tensor, TF actually traces all the branches. Therefore, all three branches are called and each branch prints out one output. This mechanism helps us to reuse the trace. When we make a second call on f with a Tensor of the same shape, no trace is required. The graph can handle different values of n already.

What will happen when the input n is a scalar, as in f(1)? If it is a scalar, AutoGraph will not transform “if” into tf.cond. The trace runs the “if” statement as-is. With n=1, the branch “elif n==1:” is traced but not other branches, and no “if” operation is recorded in the graph.

The method is simply traced as:

The “if … elif” statement is simply a Python side effect.

So what will happen when we call f(2) which needs code in another branch? Fortunately, the code will be traced again to create another graph for f(2). And we will detail it later, including its impacts.

tracing in “while” and “for”

Let’s repeat it with the “while” loop. If a Tensor is used in the conditional, it will convert into tf.while_loop and its content will be traced once. If it is not a Tensor, it runs as Python “while” loop. As shown in the example below, it will loop 3 times, and each time, it adds its content into the graph.

If it is invoked with f(4), f will be re-traced as.

Again, we should expect the same behavior in the “for” statement. When the expression in “for i in expression” is evaluated to be a Tensor, it will be replaced by tf.while_loop. tf.range returns a Tensor and therefore, the for loop below will be replaced.

The code below shows how the trace is done differently if it is a scalar expression.

Dataset v.s. NumPy ndarray

If a training procedure is tf.function-ized, like the one below, it is sometimes important that the “in” expression of the “for” loop to be a dataset (tf.data.Dataset), not a Python or Numpy structure. In the latter case, every iteration will add nodes to the graph during the trace. Therefore, hundreds of thousands of nodes may be added. But if a dataset is used, only a combination of tf.Data.Dataset ops are added to the graph once, not for every iteration. TensorFlow loop traces the body of the loop once and dynamically selects how many iterations to run at execution time.

Python List v.s. TensorArray

Python list is poorly supported in graph mode. In particular, when the list is modified inside or outside the @tf.function method. I experience too many catches that I would suggest not to use a Python list within the annotated method.

For example, the l.append operation is handled by the Python runtime and does not create any node in the graph. This is one of the Python constructs that will be badly ignored in the graph execution with unexpected behavior in tracing. If you need a list-like data structure that adds items in runtime, use TensorArray instead. This is particularly common in RNN where we may accumulate the hidden states for every timestep.

Polymorphic on Performance

Python is a dynamic-typed language. It permits method parameters to have different types in different calls. It is up to the callee on how to handle it. TensorFlow on the other hand is quite static. Parameters’ datatype and shape information are required to build the graph. Indeed, it builds a different graph when it is invoked with parameters of different data types or shapes for more efficient execution. Even if the shape of an input Tensor is changed, the trace may be redone.

f.get_concrete_function returns the ConcreteFunction. It is a wrapper around the tf.Graph that represents the computational graph. In the example below, f1 and f2 take an input tensor of a different shape. Therefore, the ConcreteFunctions for both functions are different as they have two different graphs. Gladly, this is wrapped around a Function (python.eager.def_function.Function) that manages a cache of ConcreteFunction. And the callers work with this Function object and the internal differences are hidden from them.

If you want to force them to use the same graph, we can add an input_signature with a TensorSpec that have a more general shape. For example, by specifying the shape as None, it can use the same graph for the vector and matrix below. But the graph can be less efficient.

A None dimension below is a wildcard that allows Functions to reuse traces for variably-sized input.

When a method is retraced because the parameters have un-encountered data types or shapes, it adds overhead. In particular, this can be a concern when the input parameter is a scalar. TF triggers retrace whenever the scalar values are different. As shown below, f3 has a different scalar input and therefore, the method is retraced and its graph is different from f1 and f2. Ironically, this mechanism allows TF to handle the scalar conditions in the “if” and “while” statements as discussed before. To avoid the overhead, design the method correctly. For example, developers may pass a scalar parameter innocently as the training step number. This can trigger many retraces. It slows down performance.

To avoid that, use Tensor objects instead of a scalar.

Graph is a Snapshot

We take a snapshot of the function when it is traced to create a graph. Hence, even the list l below is changed before calling f again for the second time, it still sees the old l values. (But just listen to me, avoid the use of a list.)

Iterator

Many Python features, such as generators and iterators, rely on the Python runtime to keep track of the state. As run in graph mode, the graph will not be aware of these changes. As shown below, the iterator does not advance upon multiple calls. The graph just has a snapshot of the iterator value when it is traced.

Variables can be created once only

You can create tf.Variable variables only when it is first invoked. Without line 15, line 16 may create a variable other than the first call. This will get an exception. This operation intends to modify the graph once it is created. TF does not allow that since TF graph is intended to be quite static. Instead, if applicable, we may create non-model variables outside the function and pass them as parameters.

Model Training

We can also configure and turn off eager execution in model.compile. When model.fit is processed, the model will be traced and run in graph mode.

Tensor objects & Numpy interoperability

With eager execution, Numpy operations can take tf.Tensor as parameters.

Vice versa, tf.math operations convert Python objects and NumPy arrays to tf.Tensor objects. To convert tf.Tensor objects into Numpy ndarray explicitly, use numpy().

Function-ize a function

A python function can be executed as a Graph without the annotation. tf_function below converts a function into a python.eager.def_function.Function — the same class discussed in the @tf.function annotation.

Execute Python code in Graph

In graph mode, we want to transform all operations into a Python-independent graph for execution. But in case we want to execute Python code in the graph, we can use tf.py_function as a workaround. tf.py_function casts all inputs/outputs parameters to tensors. However, the portability benefit of the graph will be lost and it will not work well with distributed muli-GPU setups. Here is the code on a Python list. We make it works with py_function.

While we should avoid its use as far as we could, the most common use cases are the data augmentation of images using an external library like the scipy.ndimage,

Here, we use the arbitrary rotation in scipy.ndimage to augment the data.

Use type annotations to reduce retracing

There is also an experimental feature that can reduce retracing for scalar input. For example, an input argument annotated with tf.Tensor is converted to Tensor even when the input is a non-Tensor value. So f(scalar) will not be retraced even for different values. As shown, f_with_hints(2) below will not trigger a retracing.

Thoughts

Many issues discussed in this article can be classified as the current TF implementation limitations rather than as the golden design rules to be followed. (For example, different input scale values trigger a retracing.) Sometimes, it is hard to justify or explain why it is done in a particular way. Nevertheless, because of the constant changes in TF, check with the latest documents when implements the code — in particular for areas that you feel odd. Fortunately, most model coding in many applications does not deal with the tricky issues mentioned in this article.

Credits & References

Introduction to graphs and tf.functions

Better performance with tf.function

tf.function

--

--