Whatever method you’d try to use to describe the world of data, it can never be just one kind of beings to be treated the same way. Some have tried. They all failed, of course.
Audacious. People think that they are creating programming languages as they want to, so they are able to make the language do and look as exactly what they like; if there’s anything that they don’t like or see it harder to understand, they will replace it with something else. As usually, it results with the “short quilt” effect, as I have already described in my previous article.
No division on objects and values? Sure! Let there be only objects! Wonderful. Now let’s fight with problems of “comparing by identity” and “comparing by contents”, the terms of “shallow copy” and “deep copy” etc. – just in order to not have “values” in the language.
One of the main problem, especially in an imperative language (or even partially imperative) is how to treat the fact of changes done through a variable. That’s why partially this problem is solved by using immutable objects for representing values. Of course, this doesn’t fix the problem as a whole because this rule is not possible to be imposed on all value types (especially user defined ones).
The only possibility to not introduce distractions in a language is to allow user explicitly define whether the type has to be a value type or an object type. Many languages, though, use value types, but do not allow a user to create their own. Also many programmers (especially those, who like these wounded languages) are questioning the existence of values and objects. Let’s try to decompose them all.
1. Welcome to the real world
Do then values and objects exist in the real world? Are they only a human spell-up or something observed?
To all appearances, it’s very simple. As objects describe things that are material, values describe things that are immaterial. In consequence, object is something identifiable that may have specimens, while values base on pure definitions and may only “hang in the air”.
Now it’s time for a serious question – just coming from the OO theory.
Can “green” be one of car’s attributes?
You’d say that of course it can. Don’t be so fast. Some people would say that only “a color can be car’s attribute”.
WTF? A car can have a “color” attribute? Yes, a car is color, probably it’s also speed, power and comfort. LOL.
A car can be green, fast, strong and comfortable. Or it can be red, slow, frail and cumbersome. We can at most say that a car has an attribute of type color (which is green), an attribute of type speed, power and comfort, respectively. So, a color can be at most one of possible types for car’s attributes.
So: car is green, or in other words, car’s attribute of type color is green, so, one of car’s attributes is green (not “color”). Color is then an attribute type, not an attribute.
So, if color is not attribute, then is yellow an attribute? Well, not exactly, we’d rather say that “yellow can be an attribute”, but of course, we’re closer to the real world.
And now: is color something material? Well, physicists would say that color is only an optical wave of some length, but – unfortunately – this isn’t true. Color is not a wave, nor is it its length. Color is a human brain’s specific signal created in response to receiving a wave of some length (by the same way, luminance is a signal created in response to the wave’s amplitude – it’s not the same category as color, but it’s close; the luminance can be relative, depending on the current eye sensitivity). But anyway, the term of color is our shortcut, which is not a description of what an object possesses, but only something that can be a visible treat of an object. It can be the same as same type of attribute of another object, or different. But it’s just “something we can say about” this object.
But it’s merely the same as with numbers, either ordinal numbers or count (a curiosity: in Japan and Korean languages there are separate sets of number names for ordinal numbers and for counting numbers; the first set is usually derived from some Chinese language). Number also can be a treat, and it’s also immaterial.
These “immaterials” (I doubt I may say “immaterial things”) are called “values”. On the other hand, objects are things that are material and may have attributes, which are these values.
This is how things are in the real world.
The important thing about objects and values is not exactly that they are material or immaterial (I highlighted it just in order that you get the point) but rather the fact that objects may have specimens (or copies, or *-plicates – sorry, English is not my first language :)). I still don’t think I have used correct word – a duplicate or a copy suggests that two objects are same, while they may, but need not be. One object can be a copy of another, it can be then modified, so it’s no longer same as the source object. But even if we can say that the contents of one object is same as the contents of another object, they do not comprise the same object.
2. Creating worlds
In the software world objects are not material anyway, of course. But it’s the nature of existing in specimens the main thing that makes objects different than values. The main thing that makes us conscious of this treat is that objects may be referred to, which means that we may have two references, and if two references refer to the same object, it’s the same object. If not, they refer to two separate objects (not different objects because these objects still may look same!). By having references we can then identify objects.
Please pay attention that in the real world objects do not have any references identifying them by default. We may spell up some identifier and assign it to particular objects. That’s one of the reasons why (until today!) relational databases allow objects to have a special field with auto-increment, which is treated as an identifier for the object (the “primary key”). But we still may have no identifier and we can have multiple objects with equal contents.
In programming languages, fortunately, objects have unique identifiers by default, so you may treat it as an attribute of every object – that is, every object has an attribute of type “reference”. This reference identifies the object uniquely; if you have two references, they refer to the same object if they are equal, and they refer to two separate objects otherwise. A reference is then nothing more than just a property, which has one special treat: every object has it by default and the value of this property is unique for every object.
And also, values cannot have treats – they are treats. A value is always one single thing, you can name it, you can also identify the treat itself, but a value can be only equal to another value of the same type – values can be different, but not separate. Values also may (but need not) belong to some domain of finite number of possible values – practically also they are rarely finite, if we think about values in the real world. Numbers are infinite, as you know. You may think that colors are finite, but well, if you think that color values can be finite, you probably are a male and never discussed with a woman about colors. Nonetheless, we can talk about values that they are either fuzzy (like floating-point numbers) or finite (like enumerations), but they still are, more or less approximately, contained in some already existing (at least in theory) set of values. Objects do not. Even though the number of existing objects of the same type may be controlled special way (keeping the number of objects constant), usually we can create and destroy objects, so their number is usually dynamic. You would ask, if the objects which’s number is constant can be values. Well, no, although their references can be values of some specific type, as they all are taken from already known set.
There have been taken up many trials of how to make some “common” form of values and objects. The results are usually very sorry.
Theoretically, you may think about a value as about an object with one treat; this way the object symbolizes this treat.
Very good, but now try to compare them. Can you?
How does Smalltalk do it? Well, very simple way – by employing pointers to more roles than they are naturally predestined to. In particular, we make a pointer, theoretically, which has some least significant bits reserved for special purposes. These bits define whether it’s a pointer or an integer, while its real value is then recorded in the rest of bits. From the language point of view, then, there are objects (!) of class Integer, which really exist (!) and which can be compared for identity (that is, the “references” to integer numbers are integer numbers, shifted with some bits and with some special bits set, which are never set for references to objects). This is a smart approach, as effectively when you, say, add two Integers, you get a new Integer – although, such an “Integer” is never created anew, but rather it looks like a matching object is being looked for and found; you cannot create objects of this type, but it looks as if there is an object in the system that refers to every existing number.
Yes, it works. For integer numbers. It doesn’t work for any other type of value. For example, it doesn’t work for floating-point numbers or strings. In case of these, as with other value types, they are just compared by contents (using overloaded operator =). In addition, Smalltalk does not use variables, but rather named binders to objects’ references. These all things, together with garbage collector, allowed objects to “emulate” values. In order that this emulation be able to work, though, a usual variable semantics are not allowed (no modifications in place). For objects that are allowed to be modified in place (for example, collections to which you can add objects) the result is the “shallow copy” and “deep copy” semantics (so this emulation isn’t perfect either).
Things get complicated once we regard the existence of variables.
A variable is something that exists in programming languages (not all of them) and – beware – can be set a value. Not an object. A value. This is generally agreed even in languages in which Integer is object. At least one thing must be ensured to exist in every programming language: a value of type “reference to object” (as it’s simple, integers belong to this, too). This is the only case where values are regarded in Smalltalk.
But variance is not only a statement that the value can be set anew. Variance means also that the value can be changed in place. Not every type of value predestines variables of this type to be changeable in place (reference doesn’t), however there are many types with this property – integer and string belong to them.
Of course, theoretically all such cases can be explained as creating a new value and assigning it to the variable. Well, yes, practically the operators like ++ or += have been created mainly because of convenience; they can be even theoretically expanded logically as creating a new value and setting it anew to the variable. Also (in case of integers, of course) the compiler can optimize them enough so that the “expanded” version (a = a + 2) looks the same as the “compressed” version (a += 2). However this is still thought of as about “changes in place”, even if this is not implemented this way. Note, however, that the ability to change in place (in implementation details) is maybe not needed in case of integers, but they are indeed needed in case of values of complicated types – just such as string (or a container of integers). How much effort does it take to add a new item to a list? Not much. But a lot, if it can only be done by creating a new list consisting of the old elements and one new element (or at least it can only be a language builtin list with dedicated strong optimizations).
A language that does not allow a user to create new value types must at least have a builtin management of the most basic value types. Such as containers and strings (container is a meta-type, not a type itself, so it may be a value type only if its element type is a value type – although it is, in most cases). Otherwise we have a mess and compromises. This happens, for example, with Java, which has one funny common treat with C language (a string that may be both empty and null). In Java when you assign a value of one string variable to another string variable, you just copy the reference. To prevent uncontrollable changes, such a string cannot be modified (see above), nor you have to worry about object deletion (gc), however the only way to glue subsequent strings is to just use the + operator (builtin for java.lang.String) that will just create a new String object getting two other strings as a source. This is, of course, very inefficient, so there is another type provided, StringBuffer. As this still can be inefficient because as a universal purpose type it has additional mutex protections, there is one more string type, StringBuilder, which has no mutex protection. Anyway, the StringB* types are exactly modifiable strings, which means that if you assign one StringB* variable to another and modify it thru the first, the change will be visible thru the second. But yes, of course, these strings are only just simple tools to optimize string operations. Ok, if this is still so bad, the latest Java compilers have also an optimization (!) that detect subsequent string concatenations done by operator + and implement these operations using StringB* objects (you can check how it is done by decompiling the java bytecode back to the sources using, for example, jad).
Well, and this is the language that is “much easier than C++”, which has just one type dedicated for strings, std::string. And even if some library creates its own, it usually works the same way. As long as interaction with some legacy C code isn’t required, of course.
Almost the same situation as in Java is in C# – however it has at least one problem less: comparing strings by operator == compares string values, not their references. But rest of the problems remain. Although they can be solved similar way as in Smalltalk – that is, it’s enough to forbid changes in place and overload the == operator, and you have a value type.
It would be much better for strings in these languages if they are totally builtin value type, just as, for example, double. What is the reason why it wasn’t done this way? There can be some explanations. First, because Smalltalk does it with objects. Of course, but Smalltalk uses objects for integers and floats, too. Another, because this is how it is done in C (you should use dynamically allocated memory to manage strings, if you want to operate with them advanced way). And this is probably the only sensible explanation. Because in any other language older than Java, which follows value semantics at least for bultin types, string is value.
It’s really not a problem to add various language features to make the string possible to be modifiable in place and to be a value. But we know, of course, that the real reason is that people think that string should be object, and comparing two strings is similar to comparing two cars. Well, let’s consider, why.
4. True nature
What’s the true nature of particular data types that predestines them to be object types or value types? We can define many things about it, like:
- value types should be easy copyable (by “easy” I mean it does not require any complicated and effort-consuming operations)
- value types should be easy comparable
- variables of value types should not have significant identity
But, unfortunately, all these things are secondary. These are all language-dependent things, and no matter how much excused these requirements are, a language is not forced to do it, especially when it’s a language that suppresses the existence of value types. So the reason why it’s natural for some data types to be value or object type should be searched elsewhere.
You can try to compare it with the things in the surrounding world, but the problem is that in software world nothing is material anyway. A database record, a book title and color are all exactly the same material.
You can talk about changeability – but this way Smalltalkers and Javers will quickly kill you with a statement that it’s enough that the class does not allow for altering an object. So changeability isn’t the reason either.
The main problem is that the values exist always. In every programming language, no matter how strongly a language fights with them. Even in Smalltalk there is one value type – it’s a reference to an object. A value type is, in effect, everything that could be a type of a variable.
But this doesn’t explain, why, for example, string is not an object type, but a value type.
It’s not easy to explain, but I’ll try.
Even stating that string is for you only an array of characters, if you copy this array to another array, and when you have a function ‘f’ that was called with the use of the first one, replacing in this call the first array with the second one, in no grade should change any results of calling this function, for every possible function ‘f’.
Of course, you may say that the same holds for large objects, as long as the function does not write to them (ok, let’s even say, objects of class that does not allow for altering the objects). So I have missed something important.
The value should be passed by value because if you have a reference, you have an object. So, I don’t even make a restriction that the function can’t write to this array of characters. It means that you can pass this array of characters to function ‘f’ as value, which means that the function ‘f’ can modify the variable via which it keeps this string and the string in the original variable is not modified. Alternatively, of course, you can make the receiving variable constant and in this case it’s not important whether you pass by value or by reference.
But why then the string should be copied by value, that is, why another object should be created and wasted so much place… No. Stop. I haven’t said that you should allocate a new piece of memory then copy the characters. I just said that you should copy a value. Just as C++ does: if you pass std::string by value, the receiving variable can be modified inside the function, while this modification won’t be seen outside the function. Not always should it mean that a new array should be created and copied from the original.
You rather wouldn’t do it in case of large objects. Not only because such copying could be expensive. Also because copying objects should be only done as part of “cloning”, while not every object should be able to be cloned. And also because you wouldn’t pass the whole object to a function just to pass some data to the function – you’d rather pass some of object’s contents to it. Or you’d pass this object by reference so that the function can retrieve these values by itself (that is, the function will copy some contents of the object that it needs). Also because you may do it many times, also when the object is being altered.
Also, string’s characters compose one single term and they are not independent parts – they all comprise one single entity, even though you might extract parts of it (not in every language you can also extract single characters; in many of them you can at most slice the string; this happens in some of early BASIC languages with MID$ function and, for example, in Tcl language that does it via [string range] and [string index] commands). Practically the reason why there can be a “character” as a separate data type, which is usually also a kind of integer in some languages, results only from efficiency reason. The fastest way to operate with a string is to have it organized as an even array of characters, where every character occupies exactly some number of bytes. Of course, this method you can’t implement variable-length-character strings with UTF-8 encoding (which is implemented, for example, in GLib::ustring from glibmm). So, effectively, the operations of extracting single characters in case of string is just an implementation detail, which is a kind of “extension” to the normal set of operations done to a string. Normal operations done to a string can be: concatenating, extracting a range, splitting by character (or expression), tokenization, search/replace etc. Actually treating a string as an object would put a limitation on all these operations as they would be unusual as for an object (because, for example, objects usually do not have a variable number of properties, even though some languages allow this for objects).
In other words – it’s not possible that you take multiple objects of some class, glue them together, and get one “integrate” object in result, then make a new object by “extracting” some parts of an object (of still the same class). As you know, extraction of object parts can be extracting sub-objects or accessing properties, while every part is different (of different class). You can’t, for example, extract some sub-object of an object of some class as object of the same class because classes of these objects are always different classes (a class cannot derive nor contain itself!). While all these operations are normal operations being done on a string and can be done on other value types. You’d say you can do all these things with trees? Yes… because tree is also a value type, as it’s a container.
Also, please remember that objects usually have stable contents. Despite that in many programming languages (Smalltalk, Python, also OO libraries for Tcl) it is possible to create (and delete!) object’s contents (fields) during runtime. It’s usually because in these languages everything is done in runtime (there is no such thing as “declarative statements” in these languages). This, however, does not support any real logics. This is only just because the language allows to do it. Creating new fields during the object’s existence cannot be compared to anything in the logics. If you think that you may, for example, have several people in a car, I’d say, at most, that to contain several objects we need a container, and containers are value types.
Note though that in some programming languages variables can be identifiable. It means, simultaneously, that they are objects. Yes. A variable of type int in C++ is an object. It has its contents, its identification, this identification can be compared. So, can something be a value and object simultaneously?
Not exactly. A type can’t be a value type and object type simultaneously (not in every programming language, of course, for example, in Java and Smalltalk they can’t, as variables aren’t identifiable in these languages, although it’s possible in C++, Tcl, Perl, Pascal etc., as these languages allow to identify variables). A type (both value type and object type) may create objects, and variables are objects – of course, only in languages where variables are identifiable. This still doesn’t change the fact that if we have object type, this type may create only objects, never values. And an object of value type is something like an object that has exactly one property, which is of this value type. That’s all. Same thing about an array of integers, where you can extract single elements and even modify them in place. Variables of array types are objects (in C++ you can identify a vector<int> variable). However you should be able to, for example, return the array by value (in C++ you can still return a value of type vector<int>).
So what about these value types that may be object types simultaneously? In summary:
- value types: should be held usually by single variables and should be passed by copying and identified by value
- object types: types that create only objects and these objects should never be copied when passing nor identified by its content
Never copied? Well, yes. Objects are not copied. Objects can be at most cloned. This is a totally different thing. As objects should be identified by their reference, then cloning an object causes that a new reference is created. While copying a value is like this “main identifier” is copied because the value’s identifier is the value itself. In other words, cloning objects increases the number of specimen, while copying a value does not (values do not have specimens). Moreover, copying does not concern values – copying concerns variables. We can at most say that some value can be copied from one variable to another. It’s copied because the previous value of the second variable is overwritten by the value read from the first one.
I’d say, there is one more thing that makes cloning different than copying. When you copy, you just take the value from one variable and make the other variable the same value (as values are always copyable). When you want to clone, you should first:
- check if the whole object’s contents can be cloned; if not, the object can’t be cloned – that’s why the “noncopyable” idiom is often used in C++, as objects are clonable by default (well, usually because in C++ every class is a value type by default)
- for each concrete (non-reference) value type contained, the target object should have them copied from the source
- for each reference value type, you have to decide whether the object referred by this reference is some private part of the object (so it should be “sub-cloned”, that is, in the target object this reference should be set to a newly created object as a clone of that in the source object) or this has to be some “widespread” object to which the current object should only refer – in this case this reference should copied as value
Does it sound familiar for a C++ developer? Yes, this is exactly why copy constructors in C++ can be defined. Stupid people say that “it’s because if you have pointer in your class, then this pointer usually points to some owned object, so you should define your copy constructor in order to create a new object for this pointer because the default implementation will just copy the pointer value” (or will just say “it’s because C++ doesn’t have a garbage collector!”). What a bullshit! The default implementation is just copying all object contents according to their copy constructors and every pointer is just a value, so it will be also copied by value. So, if you keep a reference to some object, which is your private object and should not be referred to in the object’s clone (but rather cloned also), don’t use plain pointers to keep them; instead use a smart pointer that is especially predestined to keep pointers to “owned objects”, which will create a clone of the object by itself, when copying.
You’d say “you still have to define your copy constructor, you just do it in different place”. Bullshit again! I’m just explicitly declaring my intentions by the fact of using such a special pointer. This way I’m explicitly declaring why I’m keeping this pointer there and what of the purposes of pointer I mean in this exactly place. Because if I used plain pointer and a dedicated copy constructor, I suggest “this is only a weak reference”, but then I override this statement by special things being done in the copy constructor.
People who think that C++ is a stupid language because you must define copy constructor to specify the way of cloning pointed objects, usually forget that in languages that do not feature copy constructors API users must worry about the “shallow copy” and “deep copy” semantics. As I have already said, these are implementation details, which API users shouldn’t even see (in case of C++’s copy constructor this is really an implementation detail and this way it’s possible to hide this detail against the API user). The division for shallow/deep is very indeterministic and it’s practically impossible to define whether copying should be deep or shallow for particular data type, while division for value types and object types is much simpler to define – or at least it’s feasible.
But what about value types that are internally implemented as complicated objects, which need to be copied, or even cloned? Well. Please remember that the definition, whether the type is value type or object type, is purely logical and not connected to the implementation at all. For example, in case of gcc’s implementation of C++, where the std::string type is using implicit sharing with copy-on-write for its contents, can we talk about complicated copying methods? From the logical point of view, no, because this is an implementation detail. A user can see only a string being a value and if a user assigns one string to another, they can only see that values are copied. Whether there is any large object “behind the shell” of a string and whether this large object is shared between multiple “string values”, it’s still only an implementation detail. From the user’s point of view, they all are independent copies of the same value. Of course, users may be interested with some details in case when there are some race condition issues because of that (for me it’s still unknown thing and I’m finding various informations about this – I have seen lock-free algorithm used in some version, but people on forums say that only the refcount atomicity is ensured, not protection against simultaneous access; this problem is going to be partially fixed in C++0x by using move semantics, without implicit sharing), but this has nothing to do with the value semantics aspect.
Languages sometimes make value and object types treated different way, some others do not. One of problems of this mess is the C++ language. In this language its absolutely up to you whether you make your type a value type or an object type. You can make things like “int x = 0” as well as “int* x = new int(0)”. The language does not force you various things regardless whether your type should be value type or object type.
No language does the right thing in this matter, though. C++ is the best in this matter not because it does the right thing, but because it does nothing, which means, that it also does not do the wrong thing, in contrast to other languages.
The right thing would be to have separate keywords used to create object types and to create value types, and also make them not interchangeable, object of value types should never be created by ‘new’ operator, while objects of object type should only be created by operator new (also variables can be only of value type). Java does it? Well, don’t be ridiculous. Java also treats String and arrays as object types. Also, only value types should be copyable and comparable by operator ==, while for object types it shouldn’t be possible (only cloning should be available on demand, if this is something that objects may do).
All in all, there are some important things to be satisfied for value types:
- every function receiving the value passed from a variable (being only read) should do the same regardless of which of variables was passed to it
- value types do not have contents, but at most there can be done some operations on them that create new values
For object types, it does not matter what contents the object has – things that matter are usually which object you pass to the operation because the object is identified by its reference.
Should then a container be an object? Well, no. Practically container is a value in every case. It doesn’t even matter how they are implemented. Every container consists of nodes, while every node wraps an object of some value type (however these nodes are still implementation detail). There can be some controversy in this topic in case of intrusive list (a list where every element’s class must derive from the list node type). But this container is a value type, too – the only thing it contains is a reference to element; it does not even allow for a random access to elements.
I know, you’d say “doing a extracts some parts of it, so it must be an object, moreover, you can assign to a, that is, you can change parts independently”. Well, actually you don’t change these parts independently because the values are in specific order (which does not hold true for objects). Also, doing a does an operation on a container, not extracts parts of it. It’s similar as with complex numbers: a.re() does not extract the “re” part, but does “Re” mathematical function on a complex number that converts it to Real type containing the “real part” of the number (“part of the number” in mathematical sense – it has nothing to do with reading specific field in the “complex” type structure). The same way you can do operations like “abs” and “conj”, which this time do not just simply read fields, so this time you wouldn’t doubt that they are conversions rather than extractions. These operations belong to the same category, as re() and im(), and they all are conversions. The same way, a is also a conversion of a container to the container’s value type. And what about setting via a? Well, as I have already said, variables are objects. The ‘a’ expression converts ‘a’ container into a variable of its element type.
So can a value contain multiple elements, then? This operation belongs to a category of “conversion”, and it can be compared to extracting the integer part and fraction from a floating-point value, or extracting a series of bits from an integer value treated as a bitset. So, values can also have parts, as well as you can compose parts to create a value. These things still do not make values have object semantics.
Of course, there is one more controversy concerning containers: they also have something like a “number of elements currently contained”. This is a read-only property. Well, I can’t fit this fact into any theory I have described above but just saying that the operation that retrieves the number of object is also a kind of conversion. Whether some operation done on an object or value is conversion or retrieving a property, is defined on the logical level. If we have already decided that our data type’s characteristics predestine it to be a value type, then any such thing, even implemented as calling a method to return us some value, loosely related to the contents (note that in some languages integers have methods, too!), is still a conversion. A lossy conversion, of course, but still a conversion (complex::re() is also a lossy conversion). Containers may contain lots of things like that. But don’t forget that there can also be things looking like properties in integer numbers – especially if you use infinite precision numbers, like those provided by libgmp. Such numbers may have a kind-of “property” that informs about the current number of bytes used to represent particular value. In case of value semantics, of course, this is still a conversion.
Another important treat of value types is that they cannot be hierarchized as usual objects. This is, practically, a simple consequence of not having contents.
Well, of course, you may think that integer and floating-point numbers can have a base type named ‘number’. But, well, not in programming languages. You may refine some common properties in them, of course, but it doesn’t change much. To be able to be hierarchized, there must be two conditions satisfied:
- there should be some common operations on this type that can be made the same way
- there should be some abstract operations defined in some base class, which can only be specified in the derived class
The problem here is that as value types should be kept directly in variables and passed by value, there’s no way how a possible derivation can be made use of. Of course, we can always define some types by classes deriving from some others, however the main problem is that if we want to have this derivation fact meaningful for anything, we should operate with them via references. Although in C++ it’s possible, it’s only possible because in C++ value types can be used as object types as well. If you strictly keep the rule to use value types only in cases that are typical for value types and same for object types, you won’t have any occasion to make any least use of the fact that the types are hierarchized. At most you can save some part of work when you define them, but that’s all (actually, it won’t be a hierarchization – it’s only a C++ feature that allows to reuse a type when defining another, which is only a tool usually being, among others, used for derivation – or, in other words, this is extending, but not subclassing).
Of course, you’d say that actually string can be derived from array of characters… but why derived, not containing? LOL.
5. Some certain statements
So, are there any important things that we can consider in order to make sure that some type should be a value type or an object type? There are several important things to consider then:
- value types do not have contents, while in object types there can be extracted some contents or elements (especially properties)
- values can undergo conversion to another value type, and moreover, there don’t have to be a rule that there is only one operation to convert between two types; there can be done some “operation” for that (an example is converting complex to floating-point type by using real() and imag() operations) – while object types cannot undergo any conversion (they can only be extracted parts from, including base objects)
- there should not be “references” to value types ever; there can only be: optional types (a value or not – see my article about references) and pass-by-variable idiom (in C++ this can only be done by using C++ references, though; only C# defines strictly a pass-by-variable idiom)
- as I have already said, value types do not undergo derivation, although this is a simple consequence of being not able to have contents (when deriving, there can always be extracted a “subobject”, that is, a separate, consistent part that is of its base class type)
Let’s collect them then:
|identity||Identifiable by value (by itself)||Identifiable by reference|
|Having contents||Do not have contents. Any real contents or even objects implementing all things required for the value (including special tricks when copying or comparing) are at most operated behind the scene.||Have contents. Parts of the objects may be exposed to public and may be changed independently. These parts are usually called “properties” and some languages provide support for them. Also subobjects in hierarchic structure of the type are object contents.|
|Conversion to another type||May undergo conversion. The conversion may be done implicitly or may be requested by performing a deterministic unary function. It is not limited how many kinds of operation may convert from one type to another, nor whether such a conversion is lossy or lossless.||Do not undergo conversion. Note that conversion of the object’s reference to a reference of its base type is a conversion of reference (that is, a value type in result) rather than conversion of object. You can’t define any operation that it “converts” one object to another unrelated type (unrelated because cloning subobjects may be treated as “conversion” to related type, but it’s still cloning rather than conversion). Whatever operation you’d spell up to be looking as “conversion”, this will always be either object cloning or reference conversion.|
|Having references||If a language allows to create references to variables of value type, this reference is not significant in the type’s sense. Variables being referred to should be rather treated as object with one single property and referring to variables is usually used for pass-by-variable idiom (although this can be deferred by using separate objects and this way references to variables may also be sometimes parts of objects). Note that some languages do not allow for referring to variables (Java) or they at most allow for pass-by-variable idiom (C#)||References are indispensable. Every object must have a reference or it does not exist otherwise. References are needed to both identify object and at having any least some “handle” through which you can do any operation on an object.|
|Type hierarchy||Values cannot be hierarchized types, which is a straight consequence of not having contents.||May undergo type hierarchization. Derived object types (classes) may be extracted base subobjects from.|
|Number of occurrences||Either constant and unchangeable (if of such a type that has a finite number of possible values) or infinite (like numbers, for which always one more can be spelled up or it can be divided in half). Whichever takes place in particular case, the number of occurrences does not vary in runtime.||Number of objects can be at most limited from upside or there can be some definite number of objects pre-created, but usually objects can be dynamically created and deleted (even if deletion is implicit). It means that the number of occurrences of objects may vary in runtime.|
6. Typical examples of value types
Ok, so what types can be good examples of value types? As the rules are still a bit confusing, examples should make things more clear. Let’s enumerate them, then:
- integer numbers, of course, as always
- floating-point numbers, too
- complex numbers, despite that they usually are defined as a structure of two floating-point fields (and they are also methods that return their values), they are not properties of this type – for example, you cannot set only imaginary part to some else value
- vectors and matrices (in algebraic sense)
- sets of number-based data: points, rectangles, shifts etc.
- strings, as I have already mentioned
- containers (even though they usually keep large objects and despite that they seem to have parts – they don’t actually; the elements kept by a container are not parts of the object; parts of the objects are nodes and nodes are never public; the only exception seems to be an intrusive list, but it’s still not – only list or set can be such a structure, and the main object keeping them usually keeps only a reference to the first and last element, moreover, it’s still not public)
- iterators for containers (even though iterators for particular types can be very complicated and even not so cheap in copying)
- various enumeration types
- various tokens, descriptors, identifiers and alike (actually they all play very similar role as references)
That’s actually all that currently comes to my mind.
7. Foot in yourself shoot
Among the languages approaching to this topic, there are not many that make programmer’s life easier and only few that do not make it harder.
Smalltalk seems to be very bad in this topic. This is a language that would make references objects, too, if it was possible. Although it at least allows for easy simulation of value types by not allowing for altering and overriding operator = – although it isn’t perfect emulation because containers are allowed to be changed in place.
The really evil language is Java. Only some selected builtin types are value types (integers, characters, floats, and of course references), a notable exception is String (don’t try to tell me that it’s not a builtin type – only builtin types have operators in Java). Users cannot create nor even simulate value types, even such imperfect way, as it’s in Smalltalk.
Some approach to this topic was made in C#: with structs that are value types. However what a value type is it when it should still be created using ‘new’ expression? Other interesting thing is that it always has a default constructor, I may declare object without calling its constructor and then normally use it etc. This is only a set of my objections to the existing solution – because for me it also lacks of destructors and derivation (as extending only). Of course, value types in C# are provided only for optimizations, not to support the value type idiom.
A bit better is in Vala (in Vala values of struct types are created without using ‘new’ keyword) – although it’s hard to say anything about this language concepts as they usually exist in various mutations, controlled by annotations. A small annotation may make a huge and bloated GObject-based enumeration type, with assigned strings, memory allocation things etc. into a simple, int-based enum type. So it, for example, can make a “class” a simple value type by adding [Compact] annotation. Although this language looks much better than C# (while Vala was based on C#, I would say that C# creators can learn a lot from this language), especially regarding nullable and non-nullable types, however it still doesn’t have anything that supports the value type idiom.
C++ is weird. It doesn’t support the value type idiom at all – although, it also doesn’t disturb anyone to create them. Actually, every type created in this language can be a value type just by the fact that you can create a variable of this type and you can always create a function that will get or return such type by value (of course, unless the class’s creator didn’t privatize its destructor). It doesn’t support the idiom of value and object types, but at least it doesn’t disturb a user using it.
The D language seems to add some support for them by special kind of classes – however it also allows objects to be copyable and comparable by default, so this has nothing to do with value/object semantics.
What we’d expect in a language to support value and object semantics is that:
- there are separate statements that allow to define value and object types
- object are not copyable and not clonable by default (although there is some support to make cloning support easier, for example, if all subobjects are copyable or clonable, use them to provide a default cloning)
- only value type can be type of a variable
- the language should provide ability to create objects of value types, however it should also provide non-nullable references
I have no hope for supporting all these features in any of existing languages, although I can lay some hope on Vala. Unless it has some commercial support, though, it’s hard to suspect that.
What you can do about it is to make sure of using value and object types in your software and make sure you do not mix treats of these two in one type. Ever.
If you don’t, you will be fighting with very complicated topics of “comparing by value” and “comparing by identity”, or “shallow copying” and “deep copying”, which things are nothing else than digging in the implementation dirt because someone doesn’t understand the logical level of things they try to program for. Think about valuables and you’ll complete your objectives.