In simple terms a source generator is a class that produces code based on other code. The result is available upon compilation. It may seem like magic because without creating any new *.cs files the developer can start using classes, extension methods, structs or whatever we decide our generator to create. This is because it includes the output in compilation artifacts. There is a lot the developer has to know about what the compiler is; how it sees and processes the code we feed to it. Understanding of those aspects is crucial to work efficiently with source generators.
In this article I want to provide everything required to write a simple incremental source generator. You will learn about Roslyn, what differentiates source generators from incremental source generators and finally we will build a generator.
We will be referring to the compilation a lot. It is important not to confuse it with the build process. Build process can be understood as creation of an executable. In order to build a .NET executable or assembly we must use a specific tool. Most often it is MSBuild. What it does is it runs the compiler providing it with all the inputs it requires: referenced assemblies, source files, etc. Language specific compiler produces intermediate language from the source code and is one of the steps in the build process. Compilation is lighter than build and is just one of its components. This is good because we need compilation to be executed often if we want to use the Compiler API and its richness.
Roslyn is the name used for .NET Compiler. It is open source and includes versions for C# and Visual Basic. Roslyn exposes various types of APIs:
The picture above shows the compiler pipeline and APIs corresponding to its phases. We will need some fundamental knowledge about how the compiler works in order to make use of source generators.
At the most basic level we work with our code as static text. This text is processed by the a parser which produces syntax trees. Plural because each source file corresponds to a separate syntax tree. Referring to the compiler pipeline illustration it corresponds to the Parser box. A syntax tree is a hierarchical representation of text consisting of syntax nodes. It is best pictured with the following tools:
Syntax visualisation of everyone's favourite WeatherForecastController:
It shows elements out of which the tree is composed of:
Syntax trees are used in what is called syntax analysis. You could compare a syntax tree to a diagram of code in one source file. Let's assume this file has a definition of a class. Syntax analysis can tell us a lot about that class but we won't learn about how it is used in the broader context of the entire program. In order to get that kind of information we need semantic analysis.
Next up in the compilation pipeline there are two separate boxes: symbols and metadata import. The metadata allows the formation of symbols. They are the key to obtaining semantic information about our code from the compilation. Why is the metadata required? Some elements are imported into our program from assemblies. Metadata allows to get information about those foreign objects. There are various types of Symbols. To illustrate what we can learn from a symbol lets use an example. This is the documentation for INamedTypeSymbol. We can learn about such properties as:
This is just a minor part of all the information we can get about the associated code element. With semantic analysis we see our constructs not in isolation like in the case of syntax trees but in a broader landscape. This context can be imagined as a compilation unit: an assembly or a project in our solution. So in other words: compilation can be understood as a bunch of syntax trees stuck together with added metadata.
Source generators are the topic of this article. If we want to build a knowledge base to work with them it is worth mentioning the mechanism they are derived from. Namely: analyzers. They use the same concepts of syntax trees and compilation to inspect the code. They allow to report diagnostics through the use of DiagnosticAnalyzer. Diagnostics are those very helpful squiggles that we get in our IDE everytime we do something fishy. The other helpful feature of the IDE enabled by analyzers are code fixes. They are implemented with CodeFixProvider which allows us to get useful suggestions on how to fix problems. A source generator is an unusual analyzer which apart from inspecting code, produces it based on the results of that inspection.
In this article we are focusing on incremental source generators. You may be wondering if there are non-incremental source generators then? Yes, there are! They were introduced in .NET 5 but there was a problem. All the processing required for them to work happened on each compilation. Compilation itself occurs very often, pretty much with every keystroke. This caused the developer experience in the IDE to deteriorate badly. It could be improved by aggressively filtering the syntax processed by our generator but still was not good enough. The mechanism could be improved with caching and filtering in its contract. Because of those limitations a next iteration was introduced - incremental source generators.
The requirement of caching is realized through IncrementalValueProvider<T> (and its sibling IncrementalValuesProvider<T>). When working with a generator we will have access to IncrementalGeneratorInitializationContext which allows to get the to a set of providers. They are the points through which we can work with different components of our program:
All of which are utilizing IValueProvider<TSource> e.g., CompilationProvider is IncrementalValueProvider<Compilation>. The provider hides all of the implementation details related with caching. What's important for us is that the provider operators run only for changes.
Best way to learn something is to create it on your own. I will skim over some important parts of an incremental source generator to get to the vital ones first. The generator has to implement the IIncrementalGenerator interface. The interface consists of only one method:
This IncrementalGeneratorInitializationContext is what gives us access to all the providers mentioned before. It is worth mentioning that the implementation of incremental source generator in this article is a functional one, however it distilled so that we can focus on the most important things. It lacks some checks and operations you would normally add.
What the code above does is:
It's important to underline that splitting the process into the predicate and transform is a window optimisation. It should do a lightweight check to quickly filter the incoming syntax. If the work it does is time consuming the experience in the IDE will quickly become unbearable.
The code ends with a Where(m => m != null). This is not a LINQ operator. It behaves in a similar way but it is an IValueProvider extension method. There are other similar ones:
Collect and combine don't have their counterparts in LINQ.
Can be thought of as similar to materializing operators from LINQ. If we use it we will get a collection of all items being processed by the provider instead of obtaining them one by one. It will be represented as ImmutableArray.
Like the name says it allows to create a conjunction of two providers. The result will be a series of tuples containing values from both providers. It behaves different based on various multiplicity scenarios of it's arguments:
In our case we will use this operator with CompilationProvider which has a single value of a Compilation. We will Combine it with the result of calling the Collect operator on IncrementalValuesProvider<INamedTypeSymbol> which holds the symbols of Controllers we found earlier. This operator is the way to access a Compilation object from IncrementalValuesProvider:
After adding those elements our generator will look like this:
The details of how the code is generated are hidden in the Execute method. It is important to stress the fact that the code we see in this Initialize() method only deals with registering the pipeline. All of those registered lambdas will execute whenever the context provides relevant changes in the syntax. All of those changes will be passed to the function used in RegisterSourceOutput. In our case it passed processing further to the Execute method:
After some examination you will see that in case of our simple generator the Compilation is not utilized in the Execute method. A lot of more advanced examples use the Compilation in this final step to obtain additional information. I've used this pattern as an opportunity to explain how the Combine and Collect operators work.
The most relevant part here is the SourceProductionContext and its AddSource() method. The method accepts the name of the output file and the template to inject values into. The template is nothing sophisticated in case of our example. It is just a class with a static method that provides the text and placeholders for values provided from the generator:
By looking at the template we can easily come up with what the generator actually does: it provides a very useful functionality of listing the controllers defined in our ASP.NET application. That shows that most of the work done in the generator is extracting the values to combine with the template.
How do we know if our generator works? All you need to do is execute the app after cloning it from the repo. You will notice that nowhere does it define the IncrementalMetadataController but after running it and visiting the https://localhost:7259/IncrementalMetadata/incremental/controllers address you will get a response listing all of the controllers defined.
There is also a different way of verifying what was produced:
The EmitCompilerGeneratedFiles and CompilerGeneratedFilesOutputPath properties allow to save the generated code to disk. There is a caveat: the generator works pretty much on every keystroke but the files are saved only on build. To observe that behaviour I have added a static method in the controller that has the name of the first Controller in our app. If you change the name of the DummyController the name of the static method on the IncrementalMetadataController should update immediately in the IDE but not on disk. It will synchronise on disk only after a build. I have noticed some irregularities in how this mechanism is acting so I would not rely on it. I was surprised that although breaking from time to time it worked better in Rider (version 2021.3.3) than in Visual Studio 2022 Community (version 17.1.3).
Unfortunately the code that we write does not always produce the results we have expect. How can we debug a source generator? It is a bit awkward. In order to break on generator execution you need to add the following line to it:
When the generator executes you will be presented with a prompt with the choice of IDE's to use for the debugging session:
When using Rider, make sure you have the correct Debugger option selected:
There are more and more developers contributing their source generators to .NET ecosystem through open-source. The list of Source Generators to use is growing. I hope that after reading this article you will have enough information and resources to make use of this fantastic new tool and maybe add a new position to this list.
//by Jarosław Ogiegło - Senior Backend Engineer
Want to join the most innovative neobank in Europe? Check this out!