As I have let myself criticize C++20 modules, maybe this time I’d try to make a draft of an idea, how modules in C++ could be added the way that would be in the form usable and desirable by the real C++ developers.
Change of plans!
Ok, I was trying to write this article first using the bottom-up method, starting from C-modules and explaining how to transit to the C++20 modules model, but I think top-down will be more interesting.
Before I start though, let me specify the overall goal of this whole project. This is not only to define what the final form of the source files should be, and how the build system should look like, but also how to get there from what we currently have. Therefore we need, of course, to define the build system using the source files defined already as modules (“module-instrumented” source files), just as well as to define it for the currently used source files with header files (‘module-uninstrumented”).
First of all, let’s consider, what exactly we want to have as modules in C++, why we need them, and what exactly the reason is to have them.
Today in the software development world the software in C++ is written in order to have two kind of targets:
- applications, that is, executable files that can be run in the system environment
- libraries, that is, reusable software packs that can be used by applications or libraries, or extended by other libraries
I think that this thing has been largely ignored by the C++ standard, just as well as naming explicitly “header files” and “implementation files”. Although for most of the time it was not a problem that the C++ standard was ignoring them, in case of modules this simply leads to create at best a “half-design”, something that would be then “possibly implemented” by compilers, as long as any of them can make any “full design” out of it, obviously one way incompatible with the other. Therefore let’s first try to look at the complete solution, something that could be, when ready, used in a daily software development job.
The software we are writing in files is rarely written as an application in a single file, so we have multiple files, each one compiled separately into an intermediate file, and then they all get linked together to produce the target. We need to have some sensible nomenclature, so I’ll try to reuse names that function already:
- Target: an application or a library (the matter of shared libraries will be put aside for now).
- Project: a set of definitions to compile single source files and link them together in order to produce a target. Note that a project may produce multiple output files, but in a single project there can be only one target; if you want to have multiple targets in your build system, then you have multiple projects.
Note also that I’m not trying to redesign the C++20 modules from scratch – I’m trying to reuse as much as possible, even preferably go strictly along with the C++20 module definition, maybe at best adding some small improvement ideas. So let’s consider an example:
Let’s say, I have a small project of an application that contains three files: m.cc
, t.cc
and u.cc
. The m.cc
file contains the main()
function. How then does this function, or any other function inside m.cc
call any function from t.cc
or u.cc
? For that we need the interface of these files.
But not through header files – they are the biggest PITA here. We want to simply mark the real entities that they will be accessible outside this file, that’s all. The modules give us this opportunity. Therefore we have this instruction in m.cc
:
For now, ignore this dot before the t
and u
names – just pretend they aren’t there. This will be explained later.
Alright, but then what are these t
and u
here? This is the source file name without extension, sure, but binding it strictly to the filename doesn’t serve the language well. Therefore we want to be able to also declare it explicitly as a name that will be used the same way throughout the whole project. Still, a compiler must have some idea what to search for when this name is seen.
This name then designates a module form file.
There are several types of module form files, so we have:
- A module interface form file is the result of compiling the interface part only, that is, the compiler only grabs the signatures of entities marked for export and makes a database out of it, which can be then used by other files.
- A module template form file is a special form of a module interface form file, which contains alternative entities or their fragments, which’s form depend on expansion of a preprocessor macro. These files are the direct form that can be installed among system files, equally like today the header files, although if there’s no need to have any macro-selectable alternatives inside, the module interface form files can be used instead. If not, the module interface form file for the need of building a dependent project will be created as an intermediate build file basing on the module template form file.
- A module form file is the result of compiling a single source file (which should contain a complete module or a complete partition). To create such a file you need to compile the source file, while already having compiled the module interface form file for this very module (if one is needed), as well as all dependent module interface form files. The interface file for this module will be included in the resulting module form file, and the derivation and dependency information about the included interfaces of other modules will be also recorded in this file’s manifest.
So, the names used in the import
instructions designate a module interface form file.
The module form files are these intermediate files produced directly out of the source files. Note though that we need to be able to compile particular file without having the dependent module form file yet; in case of cyclic dependency between t
and u
it would be impossible – that’s why we need to produce first the module interface form files. Let’s define then that t
and u
use entities defined in them both (so they form a cyclic dependency) and finally m
uses entities from t
and u
. Hence we need to first compile their interfaces only, so let’s imagine a set of required command lines for that:
c++ -mi t.cc #> produces: t.cmi
c++ -mi u.cc #> produces: u.cmi
c++ -mc m.cc #> produces m.cm; requires t.cmi and u.cmi
c++ -mc t.cc #> produces t.cm; requires u.cmi
c++ -mc u.cc #> produces u.cm; requires t.cmi
c++ -ma m.cm -o myprog #> produces executable myprog
Alright, but wait: why do we use only m.cm
(module form file for m.cc
) to produce the application? What about the t
and u
modules?
It’s simple. They are dependent modules, as declared by the import
instruction. And yes, this instruction not only requires to load the interface of the imported module, but applies also a dependency on that module form file. Hence all the information about all other files to be “linked together” with m.cm
(speaking the language of the current build definition) is already in there. And yes, you don’t have this way any “separate header file to #include
and object file to link against”. You have imported a module – this makes your module dependent on that module, both in the interface part and the linkage part. And yes, every unit in the project is a module (note of course that there still must remain a possibility to link the *.o
files because the libraries for older C++ standards and also for C language should be still usable).
For a traditional approach using the make
tool we’d have then:
%.cmi: %.cc
c++ -mi %<
# Now according to dependencies (generated by 'c++ -mMM'):
m.cm: m.cc t.cmi u.cmi
t.cm: t.cc u.cmi
u.cm: u.cc t.cmi
c++ -mc $<
myprog: m.cm t.cm. u.cm
c++ -ma m.cm -o myprog
Likely the rule for myprog
can be as well generated – we don’t need the list of modules to be linked together, we just need the name of the main module and all deps will be there already, but make
must know about them in order to update things the right way.
Ok, just to have an explanation what these options are, there would be a kinda hypothetical C++ compiler that can compile in modules. The following options are used for this c++
command (this is a complete list of options, some of them will be used in later examples – this includes also compiling “uninstrumented” source files, that is, those that do not use the module and export syntax):
-mc
: create a module form file out of the source file. This is similar to gcc’s -c
option that creates an object file.
-mi
: create a module interface form file. There’s no sensible equivalent in gcc, maybe compiling a header file, which goes without specific options, but this option here is to be used for either header files directly, if the old source layout is in use, or export-instrumented source file. Also the closest equivalent of the module interface file is today the precompiled header file.
-ma
: create an executable file out of the main module. This is similar to using gcc without build specific options when passing source or object files. Might be that the command line doesn’t need any option, but this is used here for making things clear.
-mheader
is the option to specify the header file that is associated with the given source file in case when you compile an uninstrumented source file. This option is obligatory in this case, together with -mname
-mname
allows to specify the module name for the currently compiled module. For uninstrumented sources this is obligatory, as there’s no other way to specify that name. For instrumented module sources this is optional to override the module name.
-mdepend
: specify the dependent module interface files. This is only for a case when compiling an uninstrumented source file that has specified source dependencies using header files instead of module names (no matter whether by #include
or by import
), or when compiling a module interface file from a single header file. This is necessary to record a module dependency information in the compiled module form file.
-mroot
: specify the directory that will be toplevel for the source files given by path. This directory defaults to the current directory, but this option allows to specify the base directory, relative to which the source paths should be. This is necessary when you compile instrumented sources and the default module name should use the file path pattern.
Now, why am I starting with this? Because we need to have a build system working with C++ units as modules (instead of *.o
files) first before we even start thinking about the modules in the C++ syntax.
Ok, then, what exactly is the *.cm
file and how much does it have to do with *.o
file?
So, likely in today’s systems the *.cm
file will have to be an archive file that contains the *.o
file inside, and additionally the *.cmi
file, and a kind of manifest file with dependencies and other information.
Now; the compiler can easily extract the dependency information (-mMM
option) by reading the file’s import
instructions. Then the module form files required for linkage are there in the manifest file in the *.cm
file.
Build system details: fixed directories and naming
It’s not that simple as it was before with simply *.o
files that you could even give whatever names you want. This time it’s serious: the module name is bound with the module form filename and therefore also the directory of the project matters. This is because likely when you have source files placed in a directory, they will have to have directory path names of the modules. Of course, you can configure your build to have intermediate steps, compile all files in a single directory, bind them together, and then those intermediate files bind with other so created intermediate files. Or, you can build everything in one directory, just have the source files distributed throughout a specific directory tree. That’s why we need to decide something here – and that’s why this will already require specific source-to-module binding. Unlike *.o
files, the name of the *.cm
file matters because this is the name being used in the C++ source files with the import
instruction.
We need then two directories to be defined – stating that we may also have shadow builds:
- The project toplevel directory. This is the directory from which the source path starts for the sake of module naming. Defaults to CWD, can be overridden by
-mroot
.
- The build target directory. This is the place where target files will be created as a result of compiling. Defaults to CWD, maybe some option could be used to override it (note that for
*.o
files you could simply override the output name by the -o
option, and this could also include the path; with the module form file names the problem is that you might be able to override the path, but not the filename).
Therefore the most sensible rule for the default module name (that is, used when you have a module default;
declaration inside) is the pathname as passed to the command line, either towards the CWD or towards the declared toplevel directory (-mroot
option in the examples). Note that this toplevel directory might need to be always explicitly defined in case of shadow builds. So, this command line (stating that we have simply module default;
declared inside):
c++ -mc ../common/utils/array.cc -mroot ..
will produce the file named common.utils.array.cm
. And it will be accessible for importing under this name. Be careful here: the same exactly file may resolve to a separate module form file if compiled by the compiler using a different toplevel path. How the path to the source file is specified though it doesn’t matter – if it is an absolute path, it will be reshaped towards the toplevel directory, unless it is not directly available from the toplevel directory (without going uplevel), in which case the command reports an error.
The other way around: you may have an explicitly named module, so you have this declaration, for example:
module common.utils.array;
There are two good ways how to translate this into a module name: it could be common.utils.array.cm
or common/utils/array.cm
. There are various reasons to have the first or the second possibility, or even mixed. The compiler should allow to use either, or even allow to have a mixed form like common/utils.array.cm
. Searching for these two alternatives shouldn’t be a problem.
Note though that if you name your module the above way, as common.utils.array
, it will always have this name and will be always identifiable by this name, no matter where your source file is located towards the command’s toplevel directory (note that you can also compile a file with an explicit module name even if its path contains uplevel).
The use of partitions
If you saw the design of the C++20 modules, you saw also the idea of partitions. But there was, first, some feature in C++20 that I don’t think is worth saving, without partitions: you can have a single module split into single files. That’s rather a wrong approach, and partitions would be a method to cover that need, among others.
A module partition is defined in a single source file, but it contains definitions that will belong to the module. They cannot be nested and the only module that may import a partition is the partition’s master module. For example, we can have a module common.utils.array
(master module) and separate parts can be defined in partitions named common.utils.array:static
and common.utils.array:dynamic
(note that one single source file must contain either a complete module or a complete partition).
The merit behind the partitions is to allow to split a single module into multiple source files, when splitting into smaller modules is impossible – like, for example, because you have a big class with a lot of methods, or you have a big class hierarchy out of which only one should be exposed as interface.
Every module that contains partitions should also have a “master source” for the module, which declares itself the name of the module and it should import also all partitions. The import
declaration for the partition doesn’t exactly have the same meaning as importing foreign modules.
For other modules, it simply declares a dependency on that module. Partitions, however, are not dependencies on the master module – it’s the other way around. That’s why this only defines a binding, it’s a declaration that this partition is an integral part of the module. Partitions can as well be declared export import
, in case when they provide their part of the interface.The interface for a partitioned module is a little bit more complicated.
The declarations provided in the master module are the base declarations for all other things in this module and all partitions. Therefore the first thing you do is to compile the “default interface” of the master module source. This interface is imported by default by every partition (that is, there’s assumed “import” for the master module in every partition). Additionally, partitions may even depend on one another by using import :another_partition;
. Therefore compiling partition files requires also compiling interfaces of all partitions, and in case when there’s export import
in one of them, also with the right order with regard of the dependencies. Then, when all interfaces are done, there’s an interface integrator call which binds all parts together.
As you have seen, the processing of the full partition name turns the :
partition separator into the -
separator in the filename. So, if you have a partition source file that declares module common.utils.array:static;
, the compile command will produce the common.utils.array-static.cm
form file. Note that if a source file has to contain a partition, the module name (with partition) must be this time defined explicitly (name deduction in case of partitions isn’t possible). The interface filename for the main part ofthe interface of the partitioned module has then -default
suffix. So, summing up, with this setup, you have the following source files:
common/utils/array.cc
– the master module source
common/utils/array-static.cc
– the :static
partition
common/utils/array-dynamic.cc
– the :dynamic
partition
Now we compile the interface in the following order:
common/utils/array.cc
-> compile default interface to common.utils.array-default.cmi
(this happens always when the compiler has detected at least one partition import inside in a module with no partition name)
common/utils/array-static.cc
-> compile to common.utils.array-static.cmi
dependent on the one above
common/utils/array-dynamic.cc
-> compile to common.utils.array-dynamic.cmi
like above
- Then call the integrator for all these above
*.cmi
files to produce common.utils.array.cmi
Compiling the implementation then simply requires the integrated interface and all dependent interfaces, as usual. Compiling and integrating all partial interfaces first is required for a case when there are interface dependencies.
Local modules
That’s one of many things that likely were forgotten by the designers of C++20 modules.
As you know, with #include
directive, you could pass the argument as <file.h>
to be searched globally and "file.h"
to be searched locally (though with fallback to global). That distinction not always makes sense, but at least for visibility it is often required, at least to mark that particular header file is a file from the current project, not from some external library. Maybe not every project makes this distinction correctly, some others don’t at all – but still, it’s needed, and the lack of this feature may also make programmers stick to includes.
The sensible solution for it would be to use the leading dot in the name. If you use it, it’s local, otherwise it’s global. There are theoretically also other solutions possible, like marking only global ones with double dot (to look like something similar to uplevel), but then you’d have to use it also as import ..std.iostream
, which doesn’t look good.
Note though an important difference to #include "file.h"
– the local path to the header file here is the relative path towards the source file being compiled, which means that you can use the path as your system allows it, even multiple uplevel path specifications. In the module system this isn’t possible – that’s why this toplevel directory is needed for the command line.
So, if you do import .common.utils.array;
then it means that the form file should be searched in the build target directory as common.utils.array.cm
file (plus combinations with path-separator). If you do import gnome.gtk.widgets;
– it will then search the form file in the installation directory for library interfaces, either something like /usr/include
and other standard paths, or specified with something similar to -I
option.
Whether the module form files for a local module import declaration are to be searched also in the global path (which is the case of #include "file.h"
), or exclusively in the local project path, it’s further to be decided, as this should be well thought through, but I personally see no reason for it. If you declare the local path, you want it to be the part of your project and not to be provided otherwise by some external library. Conversely, if you want this to be provided by the external library, you might at best want that it be provided in one of the “external directories”, including your private project’s directories, just added to this list. That’s the most sensible naming control practice.
By this opportunity, let me remind you that many developers are using #include "file.h"
syntax even though the actual result would be that this file.h
isn’t found in the same directory as the file including it, so the compiler will fallback to global search, including the directories specified with -I
option (this is actually quite a popular technique, especially when you keep header files in a directory separate from implementation files). This makes it no different to #include <file.h>
, but programmers like to use this syntax just to make it explicit that what is meant here is the file from the local project, just located in some other directory, so people wanted to have a shortcut to #include "../../include/file.h"
. This is actually a bad practice (I can tell you a lot how often I’ve been untangling long lists of include directory specifications when the #include "common.h"
directive has picked up the wrong one out of about 7 various completely different common.h
files in the same project), but then programmers often need this distinction, even if this is visual only. That’s why the syntax import .localmod;
that refers to a module form file that should exist in the build directory where the compiler is compiling your file, as a distinction from the external library (including the C++ standard library), will be strongly desired.
By the local modules feature there’s one more concern: inside a project you might have various different directories where you have smaller “group” of files, and then you have some bigger parts that rely on them. You could make these parts separate libraries (some project do it this way), but if this isn’t something required to be replaced in runtime or by single installation upgrade, it’s not worth a shot. It also wouldn’t be a good idea that a whole project have to be compiled in a single directory. We need then, for a compiling command line, one directory where it should store currently compiled modules (and it is by default one of local module directories), but you should be able to specify additionally extra local module directories. I know, so far in the compilers there was only one -I
option for header files and it was only to extend the list of global paths, but I think – especially that module names have limitations – there should be also a separate option to extend local module path as well as global module path. Still the same idea: local modules are those that are in the current project, global modules are outside of the current project. This is important because limitations of this so far was forcing projects to split into single libraries also within the frames of one application.
There’s also one more merit to distinguish local and global modules: not every module is to be used publicly by the library user. A library, which is to be used by other projects through importing modules, would like to expose only several distinct modules as public ones, and only those should be visible outside the library as those that could be imported. Only within a single project should all modules be accessible for importing. Of course, there’s always a question: could this be “hacked” if needed – that is, can I use private modules from a library because my project needs them? No problem with this – all you have to do is to compile the dependent project the way that allows you to reach out its modules as local ones, and then add the directory where they are accessible to the list of private module directory – this way you will still be able to import them, of course, as local modules.
The general module syntax
The module syntax in general isn’t going to change much towards what is defined in C++20, although there are some distinct changes. Optional parts are surrounded by ?question marks?
like this.
This is the general structure:
module; // optional: Global Module Fragment
...
... (only preprocessor directives directly used)
... (definitions provided here stored on the local files' database)
...
?export? module ModuleName; // starts import section
... (import section)
...
... (contents - up to the end of file)
Explanations for this fragment:
The module;
declaration, if GMF is present, must be the very first declaration in the file, possibly except comments. If GMF is not used, then module <name>;
must be the first declaration, otherwise it ends the GMF section and starts the import section.
The import section starts after the named module declaration and ends with the first declaration that isn’t import
. Since that point, no other import
declarations are allowed for the rest of the file.
The named module declaration may have a form of module default;
in which case the name of the module is defined by the compiler (the compiler may also reject this request). The named module declaration must be added the export
keyword if the module is going to export an interface.
The import declaration can have the following forms:
import <filename>; // parse the "filename" like in include and attach declarations
import "filename"; // like above, but first search in the same directory
import ModuleName; // import the interface declarations for the source file
export import ModuleName; // import the interface declarations to the interface of this source file
The import
instruction with the filename argument may be used without having the module section in the same file, but it is required that the module declaration is in the beginning of the parsed source file, although it may be on any global level also between other toplevel declarations. More practically, the import
instruction with the filename argument may be contained in a header file. It does merely the same thing as #include
, except that it works in a separate environment, that is, declarations provided in this file will be accessible for the source file that declares this import, also recursively.
The import
instruction with the module name can be placed exclusively in the import section. If the export
declaration is present, then it creates an interface dependency, otherwise only the implementation is dependent on that imported interface. Note that the export-import must be done in case when exported declarations create parts of the interface dependent on parts of the imported interface (there are strict rules in which case what declarations are part of the interface).
In the module declared as export
you can mark various entities as export and this way make them part of the interface. There are several rules how it is done.
- The
namespace
declaration is level-transparent, that is, it only defines the namespace for the contained declarations, while namespaces themselves cannot be exported. Unnamed namespaces also cannot contain exported declarations. Note that declarations in unnamed namespaces (just like static) are not visible outside the same file, even for other partitions of the same module.
- Type names declared using
typedef
or announced (with e.g. struct MyType;
) become part of the interface if they are used anywhere in the interface. Otherwise they can be used if they are explicitly marked export
. This concerns also complete types provided by interfaces of other modules that are not export-imported. If these types cannot be used as incomplete in the interface, an error is reported.
- Normal functions (with no special storage class, or extern by default), if marked
export
, have their function signature only declared a part of the interface (not the body). Functions declared with inline
are whole part of the interface (together with the body).
- Global variables that are exported will be visible with an incomplete declaration of their type. If they are initialized with an initialization expression, this expression will not be a part of the interface. The type of this variable must be explicitly provided in the module that would like to use it, but it’s not necessary to be imported if this global variable is not used.
- Complete types (struct/class/enum) can be marked
export
and this way they become a part of the interface as a whole. Although two new features can be considered:
- Normally a method defined inside the class is treated as inline. If you declare a method
extern
, then it will be just like defined outside the class (that is, the body will not be a part of the interface, even if defined inside the class). You can still define the method outside the class (the old way) with the same effect.
- If you declare the whole class
extern
, then it will form something like a “pimple pattern”. Every method in such a class will be extern (that is, with only the declaration being part of the interface), and all fields and derived classes must be private. Friends are still allowed, but even for friended classes and functions there’s no access to fields. Special fields must be marked extern
to be present in public or protected section or accessible for friends; those are reached out using special accessors that do not rely on the structure offset and can survive changes in the class. In extern classes all exported names are part of the interface, but not the class layout and therefore you can’t do sizeof
on such a class, for example.
- The preprocessor macros can be exported into the interface. This is accessed through
#export
directive used either as a replacement for #define
or with the syntax of #export (name1, name2...)
to declare export for existing symbols.
Free ordering and open classes
Just before we pass on to the smooth transition rules, let’s show also additional opportunities that the modules, defined in one single file without separate header files, and importing (not including) other modules, give us.
The module system provides a unique opportunity to free yourself from the order of definitions. It’s always been annoying in C++ that you have to provide the function signature (which is a repeated the same definition that must be kept in sync with the one in the function definition) before the code that calls it, just to be able to place this function after this code, or reordering the functions’ definitions just to avoid doing it.
Imagine then that you don’t have any #include
in your source file, so your source file is plain, reaches out to the other files’ interfaces through import
and your file contains only its own entity definitions. Imagine then that the compiler may go multi-pass, where one pass only reads the signatures of the functions, or if you have a class, it reads the class’s contents, derived classes, fields, and method headers, but not method bodies (including initialization expressions for global variables and in-class fields). That’s actually what the compiler has to do when compiling in the interface mode.
If the compiler can do it this way, there’s no need to preserve the order of declarations. That is, you don’t have to extract functions’ signatures into function declarations just to paste it before a function that calls it. Go simply:
module default;
import std.threads;
import std.functional;
import .mainapp;
int main()
{
start(mainapp::startup);
return mainapp::loop();
}
void start(std::function<void()> fn)
{
::mainthr = std::thread(fn);
}
std::thread mainthr;
This way you are free from any ordering. This should be the rule after the module <name>;
declaration, let’s say you can add some extra things before module
(in the global module fragment) and there the order matters. But, after the module
declaration, all signatures are read separately from bodies, so you don’t have to have function declarations at all, even forwarding declarations. Even if you have a global variable, or even a constant, their name and type will be read and remembered in the first pass, then only the second pass will read their initialization expressions. All that is required for the compiler is that by the syntax it must be able to distinguish the “alleged” (because it’s not yet defined) type name and object name – and the variable declaration should suffice for this distinction (except the infamous case of X Y( Z() );
declaration, but that’s to be separately taken care of).
Note of course that not every declaration is possible to be unordered. That’s because the C++ syntax relies on the main qualification of a symbol, whether it is:
- an unknown symbol (doesn’t exist in the database yet)
- a type
- an object
- a template
So, theoretically declaring a variable of a type, which’s declaration will only be provided later, is theoretically possible, but you need to first provide some information for the compiler that particular name designates a type – for example by using class/struct/typename
before the type name.
This may go even further: open classes. That is, you can declare additional methods for a class without having it prematurely declared in the class, as long as they are declared in the same module as the class. The class probably would have to have some special explicit modifier added so that this is enabled. This feature would have limitations though:
- You can’t provide an “open method” for a virtual method or virtual method override.
- You cannot define a method with a name that would eclipse a name from the base class. That might create a confusion because in C++ an eclipsing method eclipses also all overloads of this method.
- The access specifier (public/private/protected) must be specified with this method
There shouldn’t be any problem with the interface – even if you define a public method in a partition (although this may go against readability). When the interface file is being generated by the compiler, it should take into account the whole class, including open methods, and the interface viewer should show it as the class with all these methods visible as if they were declared inside the class, although of course only with their signatures.
Smooth transition
And now we have the most important part: we need a method to provide a smooth transition from the current “C modules” style into the C++ module style. That’s why the first thing to do was to define the sensible rules of the build system for C++ using modules. In this system, every source file compiles into a module or a module partition, no matter if the specific module syntax has been used in the source files or not. The sources in the current form with having separate header files were being referred to before as “uninstrumented” sources.
Why is this smooth transition needed and important?
Ok, that’s kinda pathetic, but I feel that I need to explain several things to the people who were designing the modules for C++20.
We are today in a situation that rarely anything in C++ is created from scratch – even if you skip system and standard libraries. The code is being widely reused, and every software company has lots of legacy C++ code and switching to another language is not easy and not always possible.
The C++20 modules design, as defined by the ISO standard, offers completely no possibility for a smooth transition (or, at best, gives this thing up to the compiler designers, while those seem to have completely no idea how to do it). You would have to translate every file in the project to the new form, nothing can be done partially or as shareable with older standards – once switched, there’s no going back. And that’s the problem. Because stating what I said in the previous paragraph, a possibility for a smooth transition is crucial to adoption of C++ modules. Either this is provided, or C++20 modules will be at best used by some maniacs, and will never reach the software business.
The very first thing that would have to be done is to define the new build system. That’s why I started with the build rules using the modules already in the C++ syntax, but now it should be adopted for existing files of the C++ projects that have no C++20 modules syntax in them.
This is a huge task for the high level build systems – such as Autotools, Cmake, or my own project, Silvercat. These systems should offer you just adding a single option to your project definition and take the source files as they were there before, then generate appropriate command lines to produce the intermediate files in the module style.
The first thing you need to do is to be able to provide this new build system also for old sources. So, we had the following instructions in our first example:
c++ -mi t.cc #> produces: t.cmi
c++ -mi u.cc #> produces: u.cmi
c++ -mc m.cc #> produces m.cm; requires t.cmi and u.cmi
c++ -mc t.cc #> produces t.cm; requires u.cmi
c++ -mc u.cc #> produces u.cm; requires t.cmi
c++ -ma m.cm -o myprog #> produces executable myprog
While having instrumented sources. Let’s say we have now uninstrumented sources provided as implementation and header files:
The following instructions should be used to compile them:
c++ -mi -mheader t.h -mname t
c++ -mi -mheader u.h -mname u
c++ -mc m.cc -mname m -mdepend t -mdepend u
c++ -mc t.cc -mheader t.h -mname t -mdepend u
c++ -mc u.cc -mheader u.h -mname u -mdepend t
c++ -ma m.cm -o myprog
Note that the last instruction hasn’t changed. This is because it works on module form files already, and those should be completed just like from the instrumented source files.
Of course it doesn’t look very good especially that you need to mention the module name and also the dependencies. Unfortunately the compiler won’t be able to autodetect dependencies on modules when the interface is being imported as a file, no matter if you use #include
or import
for it. Only when you use import
with the module name can this dependency be autodetected and only then will the -mdepend
option be superfluous.
Note also that there should be a possibility to supply multiple dependencies, so the syntax allowed here should be like with -Wl
option: either multiple arguments separated by comma, or one argument at a time and multiple single-argument instances of the option are allowed (the latter is used in the above example). For the sake of makefile the second one is actually more comfortable because replacement of a space with comma is, although possible, kinda complicated.
Mixed sources
This build mode for uninstrumented sources will have to be available actually forever. That’s not only because this form is required to be used during the transition. Also in lots of projects there are cases of sharing the source code between projects, of which one must stay with an older standard, and simultaneously we’d like to make use of it in our project, written in the C++ module style.
Hence you should be able to tolerate “traditional” C++ files, having their own interface in the header file, with #include
for other header files, which must stay this way, while we want it to fit in the build system with modules.
Let’s try then discuss the smooth transition step by step.
Step 0: Source shaping
You should simply think of single implementation files like single modules, in several cases you might resolve to having one module in multiple files (partitions). So, in the “C modules” system we can have in the source files shaped only as the following cases:
- A single implementation file (to become a single module) and its header file.
- Multiple implementation files having one common header file.
- A single header file without an implementation file (“header only”).
- A single implementation file without any interface (usually the file with
main()
function).
If your project contains any differently shaped sources, this must be reworked to fit in these explicit layouts. This is the first thing to do, as in order to switch your project to modules, you have to start “thinking modules”. So you have to have idea, how to assign particular files to modules and define this assignment.
Once this is ready, you can switch to the new building rules.
Step 1: New building rules
I wanted to have some example project – it’s hard to find anything that is in the middle between completely simple and unimaginably complicated, so I finally resolved to the https://github.com/JanSimek/voice-over-lan project that I found on Github.
In this project we have quite a simple structure and a build defined in qmake. When you take a look at this project, you can see that it has a structure very easy to translate – the main.cpp
file with main()
function and then all files split into implementation and header file. So, for Makefile we’d have the following rules:
QTPKG := QtCore QtGui QtWidgets QtMultimedia
QTHEADERS := `pkg-config --cflags $(QTPKG)`
QTLIBS := `pkg-config --libs $(QTPKG)`
# Compile header files into module interface files. Module names
# are taken explicitly from the file names. Note that we don't
# compile interface for main.cpp - nothing is dependent on it.
voiceio.cmi: voiceio.h
voicesocket.cmi: voicesocket.h
buffer.cmi: buffer.h
messenger.cmi: messenger.h buffer.cmi voiceio.cmi voicesocket.cmi
%.cmi: %.h
c++ -mi $< -mname $(basename $<) $(QTHEADERS) $(addprefix -mdepend ,$^)
main.cm: main.cpp messenger.cmi #<<< 'messenger.cmi' as identified by header 'messenger.h'
c++ -mc main.cpp -mname main -mdepend messenger.cmi $(QTHEADERS)
# General scheme for modules without dependencies and matching a simple name scheme
%.cm: %.cpp %.h
c++ -mc $< -mheader $(basename $<).h -mname $(basename $<) $(QTHEADERS)
# This takes ^source ^its header file ^module name to be given to this file
messenger.cm: messenger.cpp buffer.cmi voiceio.cmi voicesocket.cmi
c++ -mc messenger.cpp -mheader messenger.h -mname messenger $(addprefix -mdepend ,$(filter-out $<,$^)) $(QTHEADERS)
# You need -mdepend option because messenger.cpp still uses #include with a header file
# Application executable
QtIntercom: main.cm voiceio.cm voicesocket.cm buffer.cm messenger.cm
c++ -ma -o QtIntercom $^ $(QTLIBS)
Note one important thing here: these rules cannot be anyhow generated. The definition of the modules in this form is ephemeric – they exist only in the build file and this isn’t known to the source files at all.
And that’s it for now. This build system should work just as well as the traditional one. Actually I had the high level build definition in qmake, but it was easy to prepare the right structure with make, while the right header/libs options can be provided from pkg-config. What is important is that here you have the whole structure defined just as well with the structure of modules without using module declarations explicitly in the code. This is the first step and the build definition may actually stay the same all the time, until you get rid of all the header files finally. Actually it doesn’t disturb that, for example, -mdepend
is still in use, it will be simply superfluous in the future, although likely if you, for example, turn messenger.cpp
and messenger.h
into a single messenger.cpp
file, you’d have to change this build rule as well. High level build systems would have to add some special feature to provide in-build module declarations using the source and header file, for every module, just to be blocked systematically for every module converted to a single-source.
Ok, but then let’s explain how this still matches the whole theory of modules. After all, in the module definitions we had the export
keyword to mark things to be visible outside, we had the export import
variant of the import to mark that things that are exported intermediately, but now we have all these things without declaration. How does this work in this intermediate form?
That’s where the -mheader
option comes in. Everything that is in the file that you declare as -mheader
in this compiling command is being declared as an interface for the module, which’s name is given in -mname
option. This way, everything that this header file declares, is exported. This, obviously, limits the decisiveness, which parts you should be able to export, but then, the visibility of the public parts is still the same as it was with the C-modules style.
Of course, such a header file can as well include another header file – but then, this cannot really be tracked. A dependency like this should be defined by a dependency on a module interface form file, which is only a definition in the build system. If you add any local module dependency by adding include of a local header file, you have to manually update the module interface form file dependency. Fortunately, this will be way less troublesome once you switch to real modules.
Step 2: get rid of #include
We need to replace now every occurrence of #include
with import
. As per definition in the C++20 modules, import
can also take an argument identical as the one for #include
and with the same meaning. Also the #include
directive with a local path take the path towards the source file being compiled, so you can freely move the toplevel directory as you need in the build definition – only in a case when you move the source file to another location there would be a need to fix the include path. Thanks to that you can prepare the toplevel paths in the build definition as you need.
Note also that you can freely mix #include
and import
, but #include
should not be allowed after the module <name/default>;
declaration (that’s why you have the global module fragment started with module;
statement). That means, you should fix any cases when you do #include
into an implementation file somewhere in the middle, have moved some set of functions to be in the other file, but compiled together with the implementation file (I often saw it as *.inl
file). This thing would have to be solved somehow, although the partition feature should be a good enough replacement for such cases. In effect, you should simply translate every #include
into import
, and leave #include
(in the GMF) only if you absolutely have to.
Here we have then the beginning of the voiceio.cpp
file:
#include "voiceio.h"
#include <QAudio>
#include <QAudioInput>
#include <QAudioOutput>
#include <QDebug>
VoiceIO::VoiceIO(QObject *parent) : QObject(parent)
{
QAudioFormat format;
...
Now we simply change #include
to import
and nothing more for now.
module default;
import "voiceio.h";
import <QAudio>;
import <QAudioInput>;
import <QAudioOutput>;
import <QDebug>;
VoiceIO::VoiceIO(QObject *parent) : QObject(parent)
{
QAudioFormat format;
Why is this step important? Because the project file may unconsciously use so-called “implicit inclusions”, that is, the compiler doesn’t complain about any undefined, but used, entity in a particular header file only because they were included from another header file in the same implementation file earlier. The import
instruction, however, doesn’t work this way. Every imported file is being compiled separately in the “default environment” (meaning, having particular preprocessor macros defined) and the resulted database is incorporated into the file that declares import
. That is then a good opportunity to get rid of these bad practices. Note that the macroguards in the header files will not be taken properly, as a header file will be working without having any macros defined – but the macroguards will not be necessary anyway, as with import
it’s also taken care that the same file imported multiple times in a step will resolve to the very first import. Here also note that it’s an opportunity to get rid of yet another bad practice – including the same file by specifying different paths, even by other header files. No problem if you are referring to a file using a different relative path (because the path will be always recorded as absolute, and with symbolic links resolved), but the problem could be if you take the file with the same name, actually it should resolve to a different path.
So, this step should be quite easy to do, and after that you should recompile the whole project and fix any possible errors that arise from any messup with includes.
Note that this time you need the module declaration in the beginning of the file. This is necessary before any import instruction can be added.
Just a side note – I know that the early implementation of the C++20 modules in gcc requires it to “precompile” particular files explicitly and make the resulting file (*.gcm
) available, otherwise even a stupid import <iostream>;
won’t work. That’s a wrong approach. The compiler should do exactly the same thing as it did with #include
and the only different thing should be that it should be compiled in separation. If any “compiled version of the header” is required, the compiler should take care of it on its own, as it earlier did with the “precompiled headers” feature. If not possible, it should simply grab the raw text file and process it as before, just in the separated environment.
Note that also likely some of the includes will have to stay as they are in this form; if the Qt library, which is used here, isn’t provided with the modules (and that’s a separate large topic that will be taken up later), then the only way to use it would be to use its original header files, so this like import <QAudio>;
is likely here to stay for at least longer time, if not forever. Not even mentioning C libraries – hardly anyone would like to provide a special wrapper with C++ modules for them.
Step 3: Translate the module imports
Now that everything is still compiling correctly with import
using the header file, let’s change the imports to the module name, using the name that is given to the module in the command line (at least for now). Reminder: with the current build definition, you already compile the project in modules, which means, module form files are produced, and if so, they can be also imported as modules.
Note: you can’t yet remove “importing your own interface” in the implementation files. We are about to change imports that require interfaces of other modules.
In our project we have a kinda unusual situation: a header file includes other header files. We could be not always sure if the intention was to “use” the interface or “reexport” it – but by including a file you always do the second one, so let’s resolve to it and this can always be fixed later. Here’s then the beginning of the messenger.h
file:
#ifndef MESSENGER_H
#define MESSENGER_H // keep the macroguard until getting rid of H completely.
import <QUdpSocket>;
import <QAudio>;
import <QAudioInput>;
import <QAudioOutput>;
import <QDebug>;
export import .buffer;
export import .voiceio;
export import .voicesocket;
class Messenger : public QObject
...
Note that even though we don’t have a module declaration file, this is still a header file to be imported into an implementation file, which already has the module declaration. Header files may be imported also by other header files, but even if so, they are at the top level imported from an implementation file, and there the module declaration must be provided.
Note also that we have imported some local names as modules even though they don’t use the module syntax yet. That’s also not a problem. We have the module names given at the compiling command using the -mname
option and then the compiler should reach out for the module interface file through the local names in the import
instruction.
Note that there still shouldn’t be a problem with replacing the import declaration with the module names only partially or only in several files. Moreover, it will be much more complicated to switch the sources into the module form. Therefore it would be even recommended that you change the import
form into the module name for only those modules that you plan to switch to single-source in the first shot.
Step 4: Switch “uninstrumented” module sources to single-source form (first for the least dependent or independent modules).
Ok, as the messenger
module in this project seems to be short enough so that I can give it as an example, let me paste here the complete source file that would be the resulting messenger.cpp
file after transforming it to the single (“instrumented”) source.
module messenger;
import <QUdpSocket>;
import <QAudio>;
import <QAudioInput>;
import <QAudioOutput>;
import <QDebug>;
import .buffer;
import .voiceio;
import .voicesocket;
export class Messenger : public QObject
{
Q_OBJECT
public:
explicit Messenger(QString address, QObject *parent = 0);
signals:
public slots:
private:
QUdpSocket _udp;
// Voice
VoiceIO* vio;
VoiceSocket* vso;
Buffer* buf1;
Buffer* buf2;
};
Messenger::Messenger(QString address, QObject *parent) : QObject(parent)
{
quint16 buffersize = 0;
buf1 = new Buffer(buffersize);
buf2 = new Buffer(buffersize);
vso = new VoiceSocket();
vio = new VoiceIO();
if(address != "")
{
vso->connectToHost(QHostAddress(address), 30011); // QHostAddress::LocalHost
qDebug() << "Connecting to " << address;
}
else
{
qDebug() << "No peer address specified. Will only play sound from others";
}
vso->setEnabled(true);
vso->startListen();
// Voice in
connect(vio, SIGNAL(readVoice(QByteArray)), buf1, SLOT(input(QByteArray)));
connect(buf1, SIGNAL(output(QByteArray)), vso, SLOT(writeData(QByteArray)));
// Voice out
connect(vso, SIGNAL(readData(QByteArray)), buf2, SLOT(input(QByteArray)));
connect(buf2, SIGNAL(output(QByteArray)), vio, SLOT(writeVoice(QByteArray)));
}
Note one important thing here. It’s just a pasted header and implementation file; the definition of the method is still separate from the class, instead of being defined inside the class. That shouldn’t be a problem – you should be able to use either, or even use open classes, as I described earlier.
OTOH we should be able to declare even long methods inside the class. That would require a change in C++, which could be applied only for a case when a source is defined as a module: the inline
keyword, and the rule that a method declared inside the class should be inline, could be got rid of. The compiler should compile every function body to be callable, and decide whether to call a function or expand it inline on its own. The matter of inlines should be configurable in the compiler, but the user should be also given an opportunity to decide about this on their own.
There’s one more special focus here. In the earlier header file, messenger.h
, we had the other interfaces used by e.g. export import .buffer
, but now we have just import .buffer
. This causes that the interface file for the messenger
module will not reexport interfaces for buffer
, voiceio
and voicesocket
. Yes, but that’s actually not necessary. Let’s show now the form of the main.cpp
file translated into the module form:
module main;
import <QCoreApplication>;
import <QCommandLineParser>;
import .messenger;
export int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
a.setApplicationName("voice-over-lan");
a.setApplicationVersion("0.1");
QCommandLineParser parser;
parser.setApplicationDescription("Voice over LAN");
parser.addHelpOption();
parser.addVersionOption();
parser.addPositionalArgument("address", QCoreApplication::translate("main", "Address of the counterpart"));
parser.process(a);
QString address("");
if(!parser.positionalArguments().isEmpty())
{
address = parser.positionalArguments().at(0);
}
Messenger msgr(address);
return a.exec();
}
So, was that export import
really necessary? Looxlike no. The only part of the messenger
module being used in the main
module is the Messenger
class, so the main module will need the interface of the messenger
module and so with linkage, but all other things that this class is using need not be visible. There would be another thing if, for example, this class defined any method in the public interface that uses any types defined in these dependent modules, and the use of the interface of this class required to use a full definition of this type (that is, if it was using just a pointer or reference to this type, which requires only an incomplete type definition, it is still not necessary). But that’s not the case here. If we had hypothetically here any Messenger
class public method, which has, for example the const VoiceSocket&
parameter, this still doesn’t make it necessary that the messenger
module do export import .voicesocket
. The module that is using the Messanger
class may not even have a need to call this method. However, if it does, and somehow must create the object of the VoiceSocket
class in order to be able to call this method, then this module must – independently on the messenger
module – do import .voicesocket
on its own. On the other hand, if there’s any inline method defined in the class, which uses details of given type, then this class must also reexport the module that defines this type.
As for Qt library itself, there could be another concern – the Qt library has additionally a special generator that generates extra files for implementing things marked in the class by Q_OBJECT
and signals
markers. That shouldn’t be a big problem in this case – this required previously a header file so that the class definition can be extracted from it, this time it will be the whole module source file. This will likely have to be made a module partition and declared the name in the command line, until the moc
generator will get an option to create the source as a module partition.
Problems and the environment
There’s one thing you likely won’t be able to change in the C++ development. This is so-called “environment”.
It’s simply the set of macrodefinitions that are known before parsing the C++ source file first by the preprocessor. Some of them are defined by the compiler internally. Others can be provided in the command line (like with e.g. -DENABLE_LOGGING
you provide a macro named ENABLE_LOGGING
with the value 1
). Additionally, the current form of the preprocessor allows them also to be defined explicitly before the #include
directive so that this macro is intentionally visible inside. There’s an infamous example of stdint.h
header file that requires that a user do #define __STDC_FORMAT_MACROS
before including so that particular part is defined; fortunately the C++ standard didn’t adopt this method, but you can guess that this feature is still largely used in commercial projects, although mostly by using the command line (that is, a header file has declarations based on a condition check of a macro that would be provided in the command line). It can’t be simply said “sorry, not supported, and no replacement is available” because this way this project will simply stay with pure #include
.
A different environment used for compiling particular source file may mean a different form of the intermediate file created out of it, as the preprocessor may cause the raw C++ file to be completed differently. That includes the sole C++ syntax. And macros are there everywhere, including the C++ standard library, which also largely relies on the underlying C headers from system libraries.
This simply means that the module interface form files (*.cmi
in examples) cannot be a complete replacement for header files – they can replace header files only within the current build environment. Can be also worse – the header files in the current form may be also parametrized by changing the environment before including, or even worse, the same header file configured with a different environment (as included in different implementation files) – either through definitions before including, or by command-line macros – may result in a different “preprocessed text” of that file in different files compiled in the same build environment. That might also render the precompiled headers feature not to work properly.
By this reason, if the module system should be able to go anywhere beyond the current project (that is, when it should be used in the library distribution), we need some form that could look like the header file before preprocessing. Note that the normal workflow for having a header file finally into the code (let’s say, for example, you have an inline function defined in the header) is:
Header source file | preprocess -> preprocessed file | compile -> compiled db | link -> exec
How to place the module form files here? Currently the only sensible method is:
Header source file | preprocess -> preprocessed file | precompile -> precompiled db | link
And this precompiled form can be available as interface. The problem is, it’s preprocessed already. While what we need is something that can be called “a module template form file” (MTFF). Here “template” means that it’s something to be instantiated through the preprocessor macros. So that it can be used this way:
Header source file | templatize -> MTFF | compile -> compiled templated db | preprocess ->…
How to achieve it? Well, it’s not exactly important how to create them, but how to use them, how to incorporate them into the whole application-library system. If a compiler has a problem with preparing something being a “precompiled, but not preprocessed” header file, it can simply resolve to providing an original header file for it, including by generating a header file (by completing it from the exported definitions in a module source file) in case when you already have a single-source form. Then this header file will be simply raw included in the module template form file.
How can I imagine such a thing to exist – a form that has passed compiling, but needs to be run through the preprocessor to resolve to the final linkable form? That’s not impossible, at least I can imagine it as running first through the preprocessor, but not the usual one. A special one is needed, which will also parse the C++ entities, and will not just “resolve” macros in place, but instead generate alternatives. How?
Preprocessor macros are being used twofold – in the source code intended to be C++ as a macro to create a replacement, and in the conditionals to enable or disable particular sections. So, if there is a macro to be resolved, it will be resolved in place, as usual (note that an undefined macro will be simply not replaced and will look like a function call – but that’s nothing new). But conditionals using the macros will be resolved into multiple definitions that will be exposed as to be resolved later by a macro. If a macro definition is always provided, but depending on other macros it may have a different form, then everywhere this macro is used, there should be generated alternatives. The code to be compiled must be then provided as possible alternatives, to be resolved with providing the right values of the macros that were used in conditionals. Frankly – I don’t know if this is possible to do, that’s a fresh concept with no feasibility study, but that’s more-less how it should look like.
And such an interface file should be able to be created so that it can be attached to the library.
Object files and libraries
And here is one of the biggest problems, partially hooked up with the templates and the infamous “export template” feature from C++03.
The object files, as they were defined earlier for the C language, cannot hold any form of templates. They can’t also hold classes, structures, function signatures, and even constants, unless they are kept as variable and maybe the particular platform features marking them as constants.
It’s completely not a problem if you compile everything statically, even if you make a library. You can define a completely new format, which will allow to keep also entities like classes, templates, compile-time constants, and having a module form file that would be used simultaneously as an object file to be linked, and an interface file. Also, you don’t have to worry about upgrades, nor even variants – when you have a newer version of a library, your dependent application would have to be recompiled anyway, in order to take advantage from the upgrade.
The biggest problem here will be with shared libraries. And it’s not possible to simply move all C++ features into the world of shared libraries. At least in the very beginning you’d have to agree that simply only several types of entities are allowed to be “shareable”. Others can be only dependent statically, which means, that if your application is using a shared library, it can only use the next version of a shared library from the predefined library compatibility line and correct variant, and of course not all features may undergo shared upgrade. For example, all publicly used structures must stay the same throughout the whole compatibility line, you can at best modify the code inside the function that operates on it. In case of a structure, you can at best change a name of a field, if it was declared as not being in use (kinda stub or something), but it must remain at the same position and with the same type. This is a problem known as “ABI compatibility” and it’s a big problem in the development today, about which the C++ standard doesn’t care. Moreover, the known “pimple” pattern (by making the class that is the real API have only one field that is a pointer to some internally defined class, and all methods are then defined in the implementation file, and only those will see the true fields of this class) was one of the methods to not only avoid the needs of recompiling, when changing the structure of the class, but make also the class flexible for any internal changes. In a “pimple class” all you must preserve throughout the whole compatibility line are existing methods. In a true class, you are only allowed to add new methods, everything else, especially fields and their order, must be preserved.
Lots has been written and said about the ABI compatibility. And the problem isn’t that these compatibility rules aren’t enforced or granted – the problem is that the current software management tools do not give any possibility to take care of the ABI compatibility. At least I saw something in the MacOS system, but actually this problem is much bigger in C++ than in C. The C++ modules feature is a unique opportunity to take care of this problem. If you have a module form file from the old version of the library, there should exist a method to do automatic ABI compatibility check – that is, something that simulates the old compiled application to be used together with the upgraded version of the library. That should be doable, as long as the module form file contains enough information to perform that check. That’s why module form files are superior to object files, just because they do contain this information that would be otherwise wiped out in the library (actually it doesn’t matter if static or dynamic because both base on object files).
Modules in the service of libraries
Modules might still be useless, if we can’t find a way how to make our project use an external library still by importing modules and take all advantages from them. But to allow this, we can’t distribute libraries exactly the same way as before, that is, with header files and *.a
and *.so
files that will be then referred to by -l
option.
Let’s imagine then first that we’d like to distribute a static library – this will be simpler and more general as development package.
We need to distinguish first the module importing method for modules within one project and importing modules from a foreign library – within the source project of the library. We can have interfaces for modules inside the project, but these are kinda “private interfaces” – which sounds maybe kinda ridiculous, but what is meant here is that these are interfaces and modules used in the project, but not to be exposed for the library user, which part is a “public interface”.
And also don’t forget the final form – we want to have everything defined directly in the source implementation file and only have marked particular functions or types that they will be a part of a public interface, while others might be exposed as interface for other modules within the project. Only those parts will go to the public interface and therefore will be stored in the module form file of the public interface.
There are many problems to solve here and several solutions we could use. Note that you can’t make a public part of an existing module any sensible way. A module is compiled into a module form file and is visible under the name of the module. The public interface must at least have a different name, or selected modules must be public (as whole).
The simplest way to solve it – although not quite nice for the programmers (because it simply makes the same thing as the “good ole headers”) is to create separate module that would only become a public interface. You know which modules constitute a public interface and only the interface form files of the public interface (you don’t need full form files because they will be in the library) will be copied to the library package, together with the library.
Another possibility is to create a special name with a partition-like suffix except that it is public. You can then mark any toplevel-capable entity (meaning, classes, structures or functions, but not, for example, methods) as export public
, and this way this will be added to the public interface. The compiling command of the module interface will then produce two *.cmi
files, including one with -public
suffix (as this is a keyword, you can’t make it a partition name). This module will be loaded by default if you use its name with import
, even though it has this suffix.
Don’t forget however what was said already about the current form of header files and the module template form files – stating that you can still use preprocessor directives and depending on the externally defined macro provide one or the other version of the function signature (for example), or whether to provide it, the conditional must be also kinda “repeated” in the interface and allow a user to decide about things in their own application. This could be avoided if module form files were created out of modules in your project and land in your build directory, but this is a file to be distributed.
And no, this can’t be avoided. Just because it’s currently possible to decide about that, this possibility must be provided, otherwise many projects won’t be able to be moved to use modules. On the other hand, it’s not that complicated – if the compiler is unable to provide the module template form file in the already compiled form, it can provide it as a hidden header (that is, the whole header file will be put as an ingredient of the module interface form file). If the header isn’t available (you have a single-source form already), the compiler should simply generate it.
The resulting (static) library then should provide:
- The static library file. This should be something different to the current
*.a
files (because they are supposed to contain only *.o
files), there must be some marker of a different format, and although they should be able to contain also *.o
files, they may likely need to contain also some detailed code (might be that also that in the source form) that should be “re-imported” into the application’s source files as per request of the interface through import
. Might be something like *.ca
file, which will be also just the *.a
file, but containing different contents. Linkage against such a library would be specified different way – the directory could be specified more-less like today with -L
option, but the name of the library shouldn’t have to be specified – this information should be provided in the module interface form files.
- The module interface files. You can have as many of these files as you want and you can fine-grain the interface of the library as you wish, just all of them should be provided with the library as a package.
The package can as well contain multiple libraries and there’s no need to have any outside binding of the module interface files per single library. Each module interface file should simply have an information of the library that it interfaces.
Now: when the compiler imports an interface file, which for example contains a function template, then this template will have to be physically provided in the library, not in the interface file (the interface file doesn’t provide any bodies). If it can’t be in the compiled form anyhow (and note that it will still have to be a form of module template form), then it should be provided as a source.
Linkage against such libraries – provided they are still done traditional way, that is, there is a set of *.o files there and additionally some, possibly precompiled, headers – should be done by having them used where it is necessary. Note that in the single-source and module-aware build system we only have an import
instruction to load the interface (and this would include also the whole header file inside the module package), and then you compile the main module of your target, which will do linkage of appropriate *.o
files when necessary – which ones, it’s already specified in your main module.
Likely your module-based library will also need something like a main module. Your application can then import everything that the library offers, by importing, for example gnome.gtk.all
. The module with this name doesn’t have to contain any definitions, just import all modules that this library provides. As per how this can physically be provided, there are two potential possibilities:
- Distribute the library as a package of modules. Every module has its name and contents, for both importing and linkage – linkage will be done automatically anyway. The whole package might be dissolved in a particular directory and provide a
pkg-config
entry that will instruct the compiler where to get the module path. This will contain module form files that will serve as both import source as well as linkage. Note though that this way it would be impossible to use shared libraries.
- Distribute the library in the library form and – in place of traditional headers – the module template form files (which, in simplest form, could be raw header files containing also static definitions, that is, non-shareable ones, see below). This is not exactly necessary for static libraries, but it’s then the only way how shared libraries can work, and it’s also doable for static libraries.
Whichever method will be chosen, it actually doesn’t matter for the library user. All you still have to do is to make the compiler know the directory containing the module form files and you use them in your code by importing.
As the first method is quite obvious, let’s try the second method.
This will require likely synthesizing the header files and create some precompiled form of them (provided the compiler can do it) in the preprocessor-templated form. If you are lucky and don’t need it “templated”, it can suffice if you simply create the module interface form files. Note that for entities that are “not shareable” it will have to keep things as “header only”, which will be in this case either true headers, or the “preprocessor-templated precompiled” form.
These form files will serve as interface and simultaneously provider of the “static only” entities, while the “potentially shareable” entities will be provided as before. The library name will be provided in the manifest file in the module interface form file, so it’s enough that one source file imports this module through this module form file and it becomes dependent on the while library.
Motivation for the shared libraries
Before we discuss how this new build system would fit in the shared libraries, let’s check why we need them. There are actually three reasons as to why we need shared libraries, that is, what advantage they have over static libraries:
- The TEXT sharing
- The file sharing
- Separated upgrade
The TEXT is one of memory sections of a program. The program, when started, gets three main memory ranges to use:
- the stack, used for subroutine calls
- the DATA section – a dynamically extendable region that is initially allocated for global variables and can be extended by request of the dynamic allocation.
- the TEXT section – the memory that is read-only (an attempt to modify it crashes the program) and contains the assembly code for the procedures to execute. Sometimes it’s also used to store constant data there.
So here the idea of the TEXT sharing is that if the same set of procedures is used by multiple applications, there could be only one instance of them in the memory, while multiple applications can call them. In case of static linkage, every such procedure would have one instance per application in the memory – at least no one so far developed any way to identify identical procedures in statically linked applications. Not only that – also if you have them provided in a shared library, they are guaranteed to be the same on the same system, while after upgrading one application with the new library version you may have potentially multiple versions of the same function even if it doesn’t make any sense.
A similar situation is with the file sharing – if you have the same library linked to multiple applications, you’ll have multiplied this one library as a part of these applications, even if the linker optimizes the process by not attaching functions that the application doesn’t call.
And the separated upgrade: If you upgrade a static library, nothing will be changed until an application that uses it is itself recompiled and reinstalled. If you upgrade a shared library, all applications using it take advantage from it, while remaining installed as they were before.
With the new rules for C++ shared library we may have to resign from some of these advantages, but can still take advantage of the others. For example, TEXT sharing can only happen if you have one solid piece of procedure to load into the memory, and it has exactly this form everywhere. This can be not possible with a function template, which expands to a different body in every instance (although we may still have a compiler easily detect if multiple instances can be indeed linked to the same compiled version). But even if you can’t do TEXT sharing, you may still take advantage of the shared file and separated upgrade. That’s something for the future, of course, but the build system and the project compiling system can be already prepared to handle these cases, and the system will align to it when ready.
Shared libraries with C++ modules
It was quite easy to define C++ the way that all inline functions can be expanded inline, or if compiled, it’s a compiler’s problem how to implement it (might make each solid implementation static and multiplied in the code), hence it was possible to make shared function templates only as inline functions, or – since C++11 – can be also shared, but not the templates themselves, but only their particular instantiations. Likely, until there are created “dynamic library resolution methods for templates”, sharing should be available only for any solid definitions, hence only those parts will be concerned when upgrading a shared library. In a single-source exporting C++ definition there must be then a way to mark particular parts as capable of putting into a shared library. There would be desired some clear marker to explicitly define, whether a particular entity is going to be put into shared or static part. It was easy in case of the so far C++, where externally visible parts were static, when defined in the header (as a header should not contain any parts that would be visible in the *.o
file, as long as not “adapted” for itself by the implementation file), and shared, when defined in the implementation file. But with a single-file definition for everything, you might have an exported solid function defined just as well as an exported function template – the first of which can be easily put into the shared part of the library, but the latter one, at the currently available C++ implementation, could not.
My first idea for this would be to obligatory mark every exported function, that is not shareable, as inline
. Exported classes (and other type definitions) would need adding a static
modifier. In other words, if you define a class in a single source file, this class is your local (not visible outside the module). If you want to make it visible, you have to mark it as static export
. The export
(without static) might be available, but only if the compiler supports it, and whether it does, it might also depend on the platform, system and simply compiler capabilities (none of the today compilers and platforms is capable of doing it). The class that is marked export
(but not static
), or a function template that is marked export
(but not inline
), will have to be resolved in runtime special way. For example, if you use fields of a class, the code would contain dynamic-resolvable markers with field names, and the advanced C++ runtime linker will resolve these named field references into appropriate field offset by reading the information from the shared library. Such solutions do not exist today, at least not for C++, and we’ll likely not see them soon, but this is the only way how these things may work when a class is shareable through the shared library. Until then, all exported entities that can’t be resolved during shared linkage must have a special marker that defines that they are not shared-linkage-resolvable.
As for the export static class
statement, this static
can be tolerated with only an extra warning (no software developer would be surprised that a class can’t be shared), which can be also turned off in the compiler options. But export inline
would clearly differ to just export
when applied to a function and in the simplest implementation it would put the body into the (extra) header or the object file respectively, and the function template might be simply required to be made export inline
, while if you want to make use of the shareable template instantiations, then you should simply define the template without exporting and then export explicitly (without inline) the single instantiations, which in this case must be declared anyway.
This is extremely important for the library versioning, upgrades, and keeping up to the ABI compatibility. If you have a shareable class, which’s details are resolved through the runtime linkage, the only thing required for the ability to upgrade the library is that all so far defined methods stay as they are, as well as all fields with their current names and types – but fields may be reordered and new fields can be added (or maybe even some feature can be added that would allow to rename fields, but keep the old name as alias, through using
for example). Also if you have an application that calls the library’s function to fill a structure, the new library can also modify the structure by adding new fields and everything still works. The application will use the new structure without even knowing it that it has some extra fields. In the current implementation where structures can be only statically shared, this is impossible – the structure with added fields will not be ABI-compatible with the old structure definition, hence the application will have to be recompiled, or otherwise the call to the function will “render to undefined behavior” due to reaching out to fields that are outside the old version of the structure.
But that’s only for the future. No one will create such a thing soon, so shared libraries must stay what they are – collect only solid entities, that is, functions (including method implementations). They do contain also global variables, but this is only because the functions stored in an *.o
file may use variables from another *.o
file, so they are shared between calls. And therefore only those solid entities can be upgraded with the upgrade of the library, while all static parts must remain either completely unchanged, or get at best a “compatible upgrade”, that is, for example, a class may be only added new methods (including virtual methods – because adding a virtual method actually modifies the runtime part), but not new fields – and there should also be available a mechanism that takes the compiled version of the earliest-backward-compatible library and checks the differences to the upgraded version whether the static entities differ only in things that do not change the module’s ABI. And, obviously, exported static entities must be marked. The compiler may display only warnings when this isn’t used, as it’s just to make the programmer aware of that.
After having this, we should have simply the library defined just like before: the compiler-linker creates the shared library *.so
file and the module template form files (*.cmt
) or module interface form files (*.cmi
) containing the library’s interface. For static libraries there are still two forms possible: either the above cmi/cmt files as interfaces plus *.a
file containing the solid part, or just as well simply cmt/cm single files in the package that will be requested as interface and linkage source by the compiler for the library client. That method could be only available as static so far and only for applications using C++20 and also itself compiled in modules. This may also follow the Java’s method with their *.jar
files, so you can similarly have *.car
files that are really zip files containing module form files (although if compressed it may better use a *.caz
extension :D).
But it is still desired that we have a library that is maybe itself written using C++20, but available for applications that must use an earlier standard, or even interface themselves to C applications. Let’s then talk smooth transition here as well.
Modules in the service of public library
And here is the biggest problem, and something likely not even taken up by the designers of C++20 modules.
We want to have a library that is distributed two-fold:
- traditionally, like before, with a development package containing header files and optionally a static library, and a runtime package with a shared library
- some better way, although only available to the newer standard
Actually, for a long time the distribution will have to use both. In order to finally abandon the C++ preprocessor and its key role in interfacing and library distribution, you need to keep the library useful for the older standards, and make it available for the newer one, so that at some point the old way can be abandoned. But the old way will never be abandoned until the new way is widely adopted.
As this has to work just as well in case of libraries that are already written in a single-source module style, the header files that would have to be distributed with the development package of the library will have to be generated. You no longer have any header files, and the older standard can only use header files to access the library’s interface. Obviously, the compiler must be aware of what the oldest standard it should use for the generated header file and which features not to use. For having the header file for the old standard library distribution you will obviously have to have a separate module that will collect all things required for interfacing and mark appropriately the exported parts with a block with export extern "C"
, which will allow to generate a header marking extern "C"
for C++ and just pure function names for C.
Alternatively, of course, if you want to make an interface for C language and prefer to complete the header file yourself, you can simply keep your header file as is, and provide an implementation file that will define direct functions as declared in the header file and it will be compiled still in modules, but using the “intermediate” way to compile an uninstrumnented file as module.
Variants and versioning
Ok, let’s think about this for a while: why would your application require a specific version and variant of the library?
Simply: because your application requires particular features that are only provided in certain variant (if a library can exist in variants) and are available only in given version. All of these things must be obviously defined by the library provider, but then you distribute the application without the required shared library, or even you distribute it in sources and it will be compiled on the target system – it should be known in advance if the user’s system provides the library in a correct variant and version. Obviously though you distribute an application (or a dependent library) compiled in the form matching appropriate variant and version line of the required library. With sources you have more flexibility; you may be variant independent, or require a limited variant selection. In short: if you have a source distribution, you need the API backward compatibility; if binary – you need the ABI backward compatibility. And changing even one detail in the variant selection creates a different ABI. API compatibility defines the ability to even compile your dependent target with this library. ABI compatibility makes particular library’s binary distribution usable with the current binary distribution of the dependent target.
These things make it hard to distribute a shared library and use a shared library by the applications in the system. But there are things that are more and less avoidable. The use of variants shall be at best used only when absolutely necessary, and if possible, several details should be selected in runtime so that multiple variants are available already in the library. Because the version line is defined only within the frames of one “static variant”.
Now, the versions can be upgraded within only a single variant. A variant may be defined by a set of parameters, so it’s such a kind of thing – in theory, you may resolve to multiple parameters and their infinite combinations; in practice, most of the variants align to particular operating systems (so it’s impossible that you have more than one of them in the system), very few refer to some specific dependent libraries implementing particular feature different ways (although still only one of them is usually in use in particular system). Therefore you will rarely have to worry about variants. Problems will be mostly with versions and version line.
What is the version line? It is started by a certain version of the library and continues with the same variant (a single variant is a single line, so forking off a new variant starts a new base version) and continues with preserving the backward compatibility (in case of currently known platforms, it’s about ABI compatibility). A new version of the library may be installed by replacing the old version of the same line and all targets compiled for the earlier version of this library can work just as well with the new one.
But that’s rather a song for the future and things that need consideration from the operating systems and their packaging and installation systems. For now we need to focus on how to write a library in C++ using the single-source modularization and fit the resulting library into the frames of the current (well, of course, prepared mainly for C libraries) operating systems.
Creating libraries in a distributable package
So, we need to have these things compiled as before, of course, in the build system we do have the module form files produced out of source files, but finally this should produce the library, both possibly static and dynamic, but in two possible forms:
- For the older standards: the library file and separately header files. Header files should be generated out of declarations that are marked
export
. Entities marked export static
will be put as a whole in the header (actually it’s not imaginable to remain accessible with the old standard with header files and having structures dynamically exported, so for generating the old type package non-static export should be not supported), and similarly functions (including methods) marked as export inline
should be put whole into the header file, while for those only marked export
, there should be only the function header in the header file. Macrodefinitions should be put there also if they are exported.
- For C++20 module users: the library file is also created just like in 1, but instead of header files, you have shorthand module form files (template and interface files, whichever fits better) for every exported module. A possibility might exist to do some specific marking of the public modules, but this is only to make things easier for the build definition (so that you can simply walk through all module form files and take to the package only those that are marked public – might be a good idea that you declare module as
public module
and if so the module name gets the partition-like suffix -public
). If you have any macrodefinition-dependencies detected, the compiler should create then (with -mi
flag) the module template form file (*.cmt
), otherwise it will be a module interface form file (*.cmi
). Of course, the module interface form files must be “instantiated” through the macrodefinitions and they will produce the corresponding module interface form files in the build directory.
In the system using GNU installation manners, the library files will be installed in the same place as usual. Module form files have to find possibly some new place to install, neither /usr/lib
nor /usr/include
is appropriate, especially that we’d like to have already one directory for C++ modules that would in the future contain either interface and template form files, as well as possibly in the future the full form files. For the sake of further explanations, let’s say it will be /usr/modules
.
So, for the interim period we’d need that the library contain both the usual header files installed in /usr/include
and the module form files in /usr/modules
, as well as libraries in the appropriate place. The module form files that are of interface kind (that is, not full), should contain also information about the library name that they interface.
So, as a raw example, let’s say we have a library like… ok, I really tried to avoid it, but for the sake of the article I’ll go simplest way for me: libsrt. This library provides an interface for C language, although it’s written in C++. This library has one main header file srt.h
and several secondary headers for special purposes. The interface implementation file is srt_c_api.cpp
file and this one would become the public module. All gets compiled into libsrt.a
library file (and similarly shared one) and the installation contains headers: srt/srt.h
, srt/access_control.h
, and srt/logging_api.h
. There would have to be then three public modules defined in these library.
In the stage of “old C++ sources, new compiling rules”, it would compile as the old sources, and header files will be normally put into the installation directory. The static library will be made by extracting all *.o
files embeded into *.cm
files and bound into an archive. For a shared library they will be linked together into a shared library. The module form files produced out of public modules will be then stored in /usr/modules/srt
as srt.cmi
, access_control.cmi
and logging_api.cmi
. The two latter can be simply produced out of an uninstrumented header files, while for srt.cm
it will have to use the command that defines the module name srt.srt
, and passes srt_c_api.cpp
as implementation file and srt.h
as header file.
Now let’s say we have an application consiting of two source files: application.cpp
(main module) and utils.cpp
. The traditionally created program will then use the header files in the application.cpp
file in the beginning:
#include <srt/srt.h>
#include <srt/access_control.h>
and the source files of the application will be compiled as (with the help of pkg-config
):
c++ -c application.cpp
c++ -c utils.cpp
c++ -o application application.o utils.o -lsrt -lcrypto -lssl
while the new C++20 application (stating that it is already module-instrumented) will do:
module default;
import srt.srt;
import srt.access_control;
and the source files of the application will be compiled as:
c++ -mi utils.cpp
c++ -mc application.cpp
c++ -mc utils.cpp
c++ -o application -ma application.cm
Now, you might ask: is the library specification simply skipped in case of “with module” compilation? No, only superfluous. Your application is still free to mix importing by modules and importing by header files, whichever method the particular library provides, although by using the module specification with import
you automatically get the library dependency information so that it’s not necessary to be specified for linkage.
This works also for both static and shared libraries and the resulting application, if it uses shared libraries, has exactly the same shared library dependencies. This method should be also adaptable on every step of the gradual module adaptation, as shown in the beginning.
The future: installing a module package
Of course, potentially there’s another possible method of installing a C++ library built with modules: install them directly as module form files. There are, however, several problems to solve here:
- Public module interface files should be marked somehow so that accessing a module from the current project, accessing a public module from an external library (possibly installed in a custom directory) and accessing a private module from an external library are distinguished. And the latter is disallowed. Making a so-far library with extra module interface form files satisfies this condition already, while for this case it should be somehow solved.
- The module should be shared-loadable. Therefore a C++ application must have an appropriate format, where various markers will be filled with appropriate “meaning”, and it’s not only about filling in the call of a function defined in a separate file, but also things like instantiating a template or find an appropriate offset of a structure’s field. Yes, that should be done by a C++ dynamic linker at the moment when an application is being run.
These are problems to be solved in the file format on particular platform, not much to fix in the C++ language – although the earlier proposed public module
statement should be useful here as well. Might be that – as suggested already – separate options should be available to specify directories with modules used in the current project and separate for directories with modules as external libraries. This allows developers to do tricks here – but on the other hand, here is also proposed a separate syntax for local and global modules. Hence, with the local module syntax you can reach out to any module, while with global syntax it will only load public modules.
Simple naïve implementation
It’s not so that in order to implement such a system you’d have to deal with the compiler loadable database already. If you have it, it’s better, but a simple implementation can base on the simplest things. What the compiler should be capable of is, of course, the header file generation. We have then the following rules here:
- If you have a class declaration (or just any other type), it is being ignored (considered as a local type definition, visible only in the implementation file).
- If you have a type definition with
export
modifier, then this type definition will be read whole and placed in the header file. If this is a structure definition and it has any methods defined inside, they will be grabbed as well. Only functions that are defined outside the class will remain only as signatures.
- If you have an exported function, only the signature is provided to the header file. If there is an
inline
modifier, you take the whole definition into the header file.
- Templates similarly like classes and functions; important thing is that if there is a non-inline function or a method, the only allowed sharing is through
extern template
, that is, only explicit instantiations can be exported.
The compiler should add options to generate the header file from it and also to configure its name and the macroguard name, by default being INC_MGEN_FILENAME_H
.
The naïve implementation of the interface compiling may do the following:
- Generate the header file from the implementation file, as described above.
- Create the archive file that contains the following files:
- The generated header file
- The manifest file
The manifest file should have a 4-byte header, like normally special type files in POSIX which defines its format. First two characters shoul be then #!
so let’s follow them by ^C
. And then the NL so that it is then written as key: value
form. So this manifest file will contain:
#!^C
module: F
interface-type: header
header-filename: hdr.h
This file will be then written in a file named MODULE.MF
. Then the F.cmi
file will be the AR archive file containing:
Now, when you are compiling a module source file – m.cc
– that declares include .F
inside, then the compiler should expect to find the F.cmi
file, so it finds the MODULE.MF
and reads the configuration. This finds the method of interface as header file so it takes its filename, extracts from the archive and performs interpretation in isolation – the same way as it would do in case of import "filename"
statement.
Now, when you compile the implementation file itself, f.cc
, then the interface module should exist, but not necessary. There should be always expected the location where the module form file should be stored (separately from having the directory where the other modules of the same project should be searched – as those could be also multiple), and the module interface form file should be first searched in this directory. If it is found, it’s read and interpreted, otherwise the compilation is simply performed anew.
Note also that if the implementation file contains any export declarations, then the header file generation must happen; if the module interface form file isn’t found, then the header file generation must still happen. This could be the case of an “independent” module file – that is, a module source that does not import any other modules from the project, so it can be compiled as first without having any other modules’ interfaces. As other modules may also be dependent on it – and it’s usually the intention, if you have anything exported – then effectively the *.cm
file will be used as an interface. Therefore compilation of such a file involves all things as per interface, and then rest of the things required for a complete module form file.
Anyway, the final module form file should contain the following:
f.cm:
MODULE.MF
hdr.h
f.o
And of course the manifest file will contain this time:
#!^C
module: F
interface-type: header
header-filename: hdr.h
implementation-type: object
object-filename: f.o
This module is independent; if a module has dependencies, then this list should be also added; for example when a module X is dependent on F nad L, then it has an additional line in the manifest file:
So, let’s try to return to our first example. We had t.cc
, u.cc
and m.cc
source files. Contents are known, so let’s show now how the MODULE.MF
file will look like in these files. Note that this time we do have dependencies
, which have been taken from all import
declarations. In practice, this should read only the first-hand import declarations, as more isn’t necessary – from the dependent modules further dependencies will be read recursively.
File: t.cm
:
#!^C
module: t
interface-type: header
header-filename: t.h
implementation-type: object
object-filename: t.o
dependencies: .u
File: u.cm
:
#!^C
module: u
interface-type: header
header-filename: u.h
implementation-type: object
object-filename: u.o
dependencies: .t
and now m.cm
– note that it doesn’t export anything, so there’s no header generated and no interface declared:
#!^C
module: m
interface-type: none
implementation-type: object
object-filename: m.o
dependencies: .t .u
So, these are the files that are produced by the alleged c++ -mc <source.cc>
command.
Now there’s the c++ -ma m.cm
command that is expected to create an executable (default output filename is, of course, a.out
;). This command does the following:
- Read the manifest file from the given module form file. This reads the implementation type (which is the
*.o
file) and the filename that should be present in this archive file.
- Additionally it reads dependencies and finds modules for these names. Module form files identified for these names have been found and their manifests read. This time we have the
.t
and .u
modules, so we look for local module form files named t.cm
and u.cm
(this time we can’t tolerate having only an interface file, although this could be done in case of public modules of libraries) and read their manifests. This drives us to providing additionally the u.o
and t.o
object files together with our m.o
file, so these files are being extracted and there’s called the linker:
gcc m.o t.o u.o -lstdc++ -o a.out # whatever
The same could be done for libraries. Although this time you may have simply an interface file, which will be distributed together with the library file. These interface files are also simply archive files containing header files plus the manifest.
After having this implementation working, you can bet for more. For example, the interface compilation may actually compile completely the implementation file, just with unresolved details. That is, for example, function bodies cannot be compiled without having the dependent interface, but they still can be tokenized and completed as a database that should be only filled with details once we have an interface of the dependent module. Therefore the *.cmi
file will contain the completely compiled file, that is, something that the compiler can next take (without even taking a look into the source file) and simply continue compiling the dependent entities. These parts could he kept in a separate file so that this interface file can be additionally processed to turn it into a “slim” interface file for publishing as a library interface.
Final notes
I had some more ideas how to go further with this system, but they were rather focused on supporting some old practices, mainly with bad practices widely used in C development and shared with many C++ projects. One of important parts were possibilities to deliver alternative implementations like dependency on a library that can exist in various flavors and with a different API, but created for the same purpose, including having a variant of the main library with no support to features that require dependency on this library. Such a thing will be still a problem and it would be nice to have a solution for it and without the preprocessor.
This what has been proposed above should however allow the C++20 module system to be provided for the real C++ development and help C++ projects do their job better.