References considered harmful

0. Introduction


Yes. Of course, references are indispensable (I don’t count languages that state they don’t need them because I haven’t ever written anything in such a language), but it doesn’t change the fact.

References are always needed in a language which operates with large objects – and when you operate with something more than hello world or a simple “school” equation training, you will create large structures and dynamic data, which means that you need large objects and containers. If you want to operate with variables simultaneously, you need references.

Although references are something indispensable, they have also various properties, of which many provide lots of problems.

I don’t mean only references in, for example, Smalltalk. I should rather speak about pointers, as this is how it’s called in many programming languages. In case of C++ I mean exactly pointers, not C++ references, as C++ references have been named this way because they keep the “referring” feature of C++ pointers, but do not play many other roles that pointers do (and have also other limitations, for example they are not changeable). For C#, on the other hand, I mean references, as pointers in this language are used only as a low-level feature. Let’s state then that by “references” I mean reference in most of the languages and pointers in C, C++, Pascal, and other languages that have pointers, but not references 🙂

1. Maybe types

Well, ok, as I explained what I’m going to talk about, now I’ll talk about something different.

In some languages (especially functional languages) there are sometimes optional types, that is, a meta-type with any inside type being a value type, while the variable of this type may not only have values of exactly this type, but it can also have additionally a special value stating that there is no value at all. In C++’s boost library there is a template type ‘optional’ that does merely this thing, in Haskell it’s called “Maybe”.

A funny thing is that in C++ the ‘optional’ type is defined similar way as smart pointers, despite that it’s not a smart pointer. But there’s no other way, of course, to save the original type’s features and add some this meta-type’s specific.

Funny because the pointer itself does exactly the same thing. Of course, pointers should not be used in C++ to implement the “optionals” (in favor of boost::optional) mostly because the pointer by itself does not take care of the object of value type. But for objects of object type it’s completely enough.

But even leaving the matter of implementation beside for a while, it seems that ‘optional’ has more common things with pointer. Yes, if you, for example, create a pointer to bool type (and create two global constant variables initialized with true and false), then you may have a reference variable that points to true, points to false, or points to nothing, that is, its pointer value is null.

2. Nullable means optional

In other words, references that are always nullable (I mean, then, all mentioned languages but Vala), play the role of “optionals” (or Maybes). You can have a reference referring to some existing object, but you may have also a null reference, which does not refer to anything.

This is a new treat in OO languages; new and very unfortunate. It’s not true for Smalltalk, almost the only such programming language (including its dialects and extensions, like Self or Strongtalk). In Smalltalk you have only objects, but there is no such thing as “no object”; there is no “null” in the, say so, Java sense. This role is played in Smalltalk by ‘nil’, however nil is not a reference to nothing – it’s also an object, like any other being in Smalltalk. It’s only special in such a sense that if you send a message to (call a method for) this object, it will just do nothing (I didn’t checked what is returned by such a method, but I guess it just returns nil). It may end up with exception, probably, if you try to, say, add a number and a result of this call, which will not be able to be coerced. But still nothing wrong happens. You probably may check every argument you receive whether it isn’t nil, however it rarely makes sense; you’d rather check whether the object is of some class, or just try to call some method, which at worst will fail (usually by exception, called from doesNotUnderstand:). Anyway, in Smalltalk, despite that there is some ‘nil’ in it, it rarely makes sense to check the reference for it.

While in case of C, C++, Java, C# etc. (Vala does it behind the scene) manual checking every reference value received by a function for null is not so rare (I would even say, very often). Even if, usually, you would never have any situation of passing null there.

So this is the main problem with references. Maybe they do not do such mess in Smalltalk, but they do in the others.

Well, you can have a very similar thing to C++’s bool* type, or even better thing, with Java’s value wrappers – for example java.lang.Boolean. It’s a perfect trinary logic type: it can be true, false, and (nomen omen – see Haskell) “maybe”, when null. The actual problem with making Smalltalk way “everything object” in Java (even integers) was not only performance (although significant). The actual problem was that in Java if you create a method that gets 3 integer numbers in the beginning, declared as java.lang.Integer (not int), you’d have to first check them all whether they are not null, or otherwise you theoretically risk NullPointerException.

This was for me one of a very shocking experience when I came to the project being written in Java (after some experience with C++). So far I was sure that when I pass a string to a function then it can be only empty in worst case. This time the string passed to a function could be not only empty, but also null, and moreover, on the null string I could not perform any dumbest operation, even check if it’s empty. And moreover, I couldn’t do anything to prevent nulls against being passed to the function. This time was the first (but not last) when I felt like returning to C language.

Especially when I discovered a nice hack to omit the null String problem ("valid".equals( s ) instead of s.equals("valid")).

3. You are doing it wrong

First of all, then, you have to reconsider that if you are using a reference, you simultaneously allow such an option that this reference can be null. Effectively then, you’d better have some additional type with a reference that is never null.

In C++ you should first of all avoid using plain pointers in favor of some of smart pointers that can verify things for you. And first of all you’d better make sure that your data type should be a value type or an object type. This way, if you have a value type, you should operate this type by value only, and use boost::optional if you need optional types. If object type, then you should also make sure that you allow its reference to be null in specific situations.

There’s no much things that you can do in Java. Every user defined type must be a reference type there. However you can create a generic type, say ValidRef, that will wrap any reference type, return its object via get(), and set the object via set(), while by passing null to set() will always end up with exception (not of RuntimeException class). There is still one problem, though – you can’t prevent passing null where a reference of ValidRef is required. Anyway, this is still not a solution.

In C# you can use structs for types that should be value types, and such types cannot be ever nullable. You can still make types optional by using ‘?’ mark.

In Vala the situation seems to be the best, as this language supports defined nullability. Unfortunately it’s not perfect in this statement. As types are non-nullable by default, while they only can be made nullable (by adding ?), it should be then forced that a type returned by an external function may only be nullable (and it can only be non-nullable only in some specific cases, explicitly defined). Unfortunately it’s a relatively new feature and it’s used not in every place where it should be, so this nullabiliy is only visible by generated null-checks. While it should be defined that a nullable reference cannot implicitly convert to non-nullable, it can only be done explicitly, and such a conversion may end up with exception if the source value is null. In other words, it would be desired that as Vala already has such a great feature, it should make this nullability check statically supported, with also good dynamic support that will throw exception not when a null was passed to function, but already at the first time when a null has been assigned to a non-nullable type. If you have read my article about bugs, you should know that this is the only right thing to do: report the error at as earliest place as possible.

You should also remember that in C++ in some specific situations pointers may well contribute to the “optional” feature needed – in particular in places where a function expects a reference to a variable. Last time I have been researching some code in WebKit and there was some function in JavaScript engine that evaluates the call and returns an information of possible exceptions. It took two arguments: one was of ‘bool&’ type, which was a reference to a variable to be set, and one was of ‘bool’ type, which when true requested that the variable be written accordingly.

This is an example what is the wrong thing to do. What was expected in this case was to write to a variable only when it was requested. This way, the plain pointer was the best thing to be used in this place: when a user passed an address of a variable, it should be used to be written; when there is NULL passed – it means that the user doesn’t want to get this information. Very simple, and conforms to a well known rule that reference types should not be used to pass variables to a function to be written, as this fact is not visible at the call place. While I think this is not the right rule as it allows a null pointer to be passed, in this particular case the null pointer case would be exactly one of expected things.

4. More precise?

Remember: be most precise as possible; if a language does not define tools for you to explicitly state it, at least remember to add some statements in the documentation.

Also, make sure that you are conscious about types’ properties: whether it’s a value type, whether you want it be optional, whether you expect only valid objects. I think I’ll finish with that because a bit more things like that and I’ll say shortly: THINK, STUPID! Well, this is useless because if you think, you’ll come to all these statements without my help, and if you don’t, you just waste your time anyway.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s