Start using Source Generators. It's easier than you think

Tech Deep Dive

2022.07.07

In simple terms a source generator is a class that produces code based on other code. The result is available upon compilation. It may seem like magic because without creating any new *.cs files the developer can start using classes, extension methods, structs or whatever we decide our generator to create. This is because it includes the output in compilation artifacts. There is a lot the developer has to know about what the compiler is; how it sees and processes the code we feed to it. Understanding of those aspects is crucial to work efficiently with source generators.

In this article I want to provide everything required to write a simple incremental source generator. You will learn about Roslyn, what differentiates source generators from incremental source generators and finally we will build a generator.

Compilation and Build process
We will be referring to the compilation a lot. It is important not to confuse it with the build process. Build process can be understood as creation of an executable. In order to build a .NET executable or assembly we must use a specific tool. Most often it is MSBuild. What it does is it runs the compiler providing it with all the inputs it requires: referenced assemblies, source files, etc. Language specific compiler produces intermediate language from the source code and is one of the steps in the build process. Compilation is lighter than build and is just one of its components. This is good because we need compilation to be executed often if we want to use the Compiler API and its richness.

Roslyn
Roslyn is the name used for .NET Compiler. It is open source and includes versions for C# and Visual Basic. Roslyn exposes various types of APIs:

Compiler APIs - Corresponding to phases of the compiler pipeline. We will use mostly those api for our generator.
Diagnostic APIs - If you see colored squiggles in your IDE that is thanks to the Diagnostic API.
Scripting APIs - Allow to use C# as a scripting language.
Workspace APIs - Allow to work with how our program is structured i.e. Solution, Project.

Syntax Trees and Syntax Analysis
At the most basic level we work with our code as static text. This text is processed by the a parser which produces syntax trees. Plural because each source file corresponds to a separate syntax tree. Referring to the compiler pipeline illustration it corresponds to the Parser box. A syntax tree is a hierarchical representation of text consisting of syntax nodes. It is best pictured with the following tools:

Syntax Tree Viewer in Rider

Syntax Visualizer (with Directed Syntax Graph in Ultimate editions of Visual Studio)

The tree is composed of:

Nodes - Basic building blocks of the syntax tree consisting of combination of tokens, trivia and other nodes.
Tokens - Leaves of the syntax tree. These are elements like keywords or identifiers.
Trivia - Parts of syntax with really low significance like whitespace or comments.
Values - Some tokens store the characters they consist of in a separate field called Value.

Syntax trees are used in what is called syntax analysis. You could compare a syntax tree to a diagram of code in one source file. Let's assume this file has a definition of a class. Syntax analysis can tell us a lot about that class but we won't learn about how it is used in the broader context of the entire program. In order to get that kind of information we need semantic analysis.

Compilation and Semantic Analysis
Next up in the compilation pipeline there are two separate boxes: symbols and metadata import. The metadata allows the formation of symbols. They are the key to obtaining semantic information about our code from the compilation. Why is the metadata required? Some elements are imported into our program from assemblies. Metadata allows to get information about those foreign objects. There are various types of Symbols. To illustrate what we can learn from a symbol lets use an example. This is the documentation for INamedTypeSymbol. We can learn about such properties as:

Arity of the type,

List of its constructors,

List of interfaces this type implements,

If it is static.

This is just a minor part of all the information we can get about the associated code element. With semantic analysis we see our constructs not in isolation like in the case of syntax trees but in a broader landscape. This context can be imagined as a compilation unit: an assembly or a project in our solution. So in other words: compilation can be understood as a bunch of syntax trees stuck together with added metadata.

Analyzers
Source generators are the topic of this article. If we want to build a knowledge base to work with them it is worth mentioning the mechanism they are derived from. Namely: analyzers. They use the same concepts of syntax trees and compilation to inspect the code. They allow to report diagnostics through the use of DiagnosticAnalyzer. Diagnostics are those very helpful squiggles that we get in our IDE everytime we do something fishy. The other helpful feature of the IDE enabled by analyzers are code fixes. They are implemented with CodeFixProvider which allows us to get useful suggestions on how to fix problems. A source generator is an unusual analyzer which apart from inspecting code, produces it based on the results of that inspection.

Regular Source Generators
In this article we are focusing on incremental source generators. You may be wondering if there are non-incremental source generators then? Yes, there are! They were introduced in .NET 5 but there was a problem. All the processing required for them to work happened on each compilation. Compilation itself occurs very often, pretty much with every keystroke. This caused the developer experience in the IDE to deteriorate badly. It could be improved by aggressively filtering the syntax processed by our generator but still was not good enough. The mechanism could be improved with caching and filtering in its contract. Because of those limitations a next iteration was introduced - incremental source generators.

Incremental Source Generator
The requirement of caching is realized through IncrementalValueProvider<T> (and its sibling IncrementalValuesProvider<T>). When working with a generator we will have access to IncrementalGeneratorInitializationContext which allows to get the to a set of providers. They are the points through which we can work with different components of our program:

SyntaxProvider - this provider will serve us changes in the syntax of our program.
CompilationProvider - is the gateway to semantic analysis.
AdditionalTextsProvider - allows to obtain files with static content included in the project.
MetadataReferencesProvider - providers information about referenced assemblies.
AnalyzerConfigOptionsProvider & ParseOptionsProvider - allows to read configuration values.

All of which are utilizing IValueProvider<TSource> e.g., CompilationProvider is IncrementalValueProvider<Compilation>. The provider hides all of the implementation details related with caching. What's important for us is that the provider operators run only for changes.

Let's write our own Incremental Source Generator
Best way to learn something is to create it on your own. I will skim over some important parts of an incremental source generator to get to the vital ones first. The generator has to implement the IIncrementalGenerator interface. The interface consists of only one method:

Initialize(IncrementalGeneratorInitializationContext)

This IncrementalGeneratorInitializationContext is what gives us access to all the providers mentioned before. It is worth mentioning that the implementation of incremental source generator in this article is a functional one, however it distilled so that we can focus on the most important things. It lacks some checks and operations you would normally add.

A certain code does this:

It uses the context.SyntaxProvider.CreateSyntaxProvider() to construct the filtering pipeline. It consists of two lambda functions:

the first one is called the predicate and is the first level of filtration which processes the syntax,

the second one is called the transform and is used to obtain semantic information from the syntax that got through the predicate

In our generator the predicate looks through the syntax for nodes which represent a class whose name ends with "Controller"

The transform step uses the node to obtain the semantic information and check if the base of the class we are checking is of the ControllerBase type

The context.SyntaxProvider.CreateSyntaxProvider() returns the IncrementalValuesProvider<INamedTypeSymbol> which we already know does all of the caching magic.

It's important to underline that splitting the process into the predicate and transform is a window optimisation. It should do a lightweight check to quickly filter the incoming syntax. If the work it does is time consuming the experience in the IDE will quickly become unbearable.

LINQ-like syntax
The code ends with a Where(m => m != null). This is not a LINQ operator. It behaves in a similar way but it is an IValueProvider extension method. There are other similar ones:

Select
SelectMany
Where
Collect
Combine all of which are described in more detail in the following document
Collect and combine don't have their counterparts in LINQ.

Collect
Can be thought of as similar to materializing operators from LINQ. If we use it we will get a collection of all items being processed by the provider instead of obtaining them one by one. It will be represented as ImmutableArray.

Combine
Like the name says it allows to create a conjunction of two providers. The result will be a series of tuples containing values from both providers. It behaves different based on various multiplicity scenarios of it's arguments:

collection & collection,
single item & single item
collection & single item
In our case we will use this operator with CompilationProvider which has a single value of a Compilation. We will Combine it with the result of calling the Collect operator on IncrementalValuesProvider<INamedTypeSymbol> which holds the symbols of Controllers we found earlier.

The details of how a certain code is generated are hidden in the Execute method. It is important to stress the fact that the code we see in this Initialize() method only deals with registering the pipeline. All of those registered lambdas will execute whenever the context provides relevant changes in the syntax. All of those changes will be passed to the function used in RegisterSourceOutput. In our case it passed processing further to the Execute method.

After some examination you will see that in case of our simple generator the Compilation is not utilized in the Execute method. A lot of more advanced examples use the Compilation in this final step to obtain additional information. I've used this pattern as an opportunity to explain how the Combine and Collect operators work.

The most relevant part here is the SourceProductionContext and its AddSource() method. The method accepts the name of the output file and the template to inject values into. The template is nothing sophisticated in case of our example. It is just a class with a static method that provides the text and placeholders for values provided from the generator.

By looking at the template we can easily come up with what the generator actually does: it provides a very useful functionality of listing the controllers defined in our ASP.NET application. That shows that most of the work done in the generator is extracting the values to combine with the template.

Generators at work
How do we know if our generator works? All you need to do is execute the app after cloning it from the repo. You will notice that nowhere does it define the IncrementalMetadataController but after running it and visiting the https://localhost:7259/IncrementalMetadata/incremental/controllers address you will get a response listing all of the controllers defined.

The EmitCompilerGeneratedFiles and CompilerGeneratedFilesOutputPath properties allow to save the generated code to disk. There is a caveat: the generator works pretty much on every keystroke but the files are saved only on build. To observe that behaviour I have added a static method in the controller that has the name of the first Controller in our app. If you change the name of the DummyController the name of the static method on the IncrementalMetadataController should update immediately in the IDE but not on disk. It will synchronise on disk only after a build. I have noticed some irregularities in how this mechanism is acting so I would not rely on it. I was surprised that although breaking from time to time it worked better in Rider (version 2021.3.3) than in Visual Studio 2022 Community (version 17.1.3).

Debugging the Source Generator
Unfortunately the code that we write does not always produce the results we have expect. How can we debug a source generator? It is a bit awkward. In order to break on generator execution you need to add the following line to it:

Debugger.Launch()

When the generator executes you will be presented with a prompt with the choice of IDE's to use for the debugging session.

When using Rider, make sure you have the correct Debugger option selected.

Summary
There are more and more developers contributing their source generators to .NET ecosystem through open-source. The list of Source Generators to use is growing. I hope that after reading this article you will have enough information and resources to make use of this fantastic new tool and maybe add a new position to this list.

Jarosław Ogiegło
Senior Backend Engineer