Exceptions konsidered harmfool

Introduction

Some time ago I found a very old article, where the author says that the exception handling should be skipped as much as possible, and exceptions should be caught immediately and handled exactly in place. And that this is worse than goto because the points where the execution stops are invisible. He restrained himself in another article, saying that there is, well, no good exception handling in general. Yes, I’m talking about Joel Spolsky. “Uh, oh”, speaking his language.

There are two general faults in this thinking. The first one is to call “hidden” something that is only written in different place. The second one is to think that a crash is the worst thing that may happen to a program. Only two possibilities are as to why state things like that: either to be a complete ignorant, or to be for a long time too familiar with procedural style and have no idea what to do in case of having exceptions in a language. At most, to know exceptions only from Java libraries, where they are used the most possible wrong way. As Joel Spolsky is an old and experienced programmer, I would mercifully assign to him this last possibility.

Theory of error handling in imperative languages

If you have several instructions, where each one can fail in its execution, you should detect this problem and… well, do something. We’ll reconsider this “do something” later.

If you are using the return-value error handling (I wanted to say “old-fashion”, but even the old-fashion APIs don’t rely only on return values), the following things have to be considered:

  • every call may fail, so you must execute the functions separately, checking the post-call status
  • every function must have some tools to determine the execution status – by a return value or additional variable passed via reference
  • if the status is returned by the return value, and the return value means something in case of success, there must be selected some special value for the case of failure

It means, for example, that you cannot build a chain expression, where a return value of a function is passed to another call. Consider:

object->GetFactory()->GetObject(value)->create();

Use( getValue(), getStep() );

Unfortunately, if “object->” returns NULL, GetFactory will fail – so you’d better do it the following way:

if ( !object )
 handle_error();

Factory* f = object->getFactory();

if ( !f )
  handle_error();

Object* o = f->getObject(value);

if ( !o )
  handle_error();

if ( !o->create() )
  handle_error();

Value v = getValue( &status );
if ( status == error )
  handle_error();

Step s = getStep( &status );
if ( status == error )
  handle_error();

Use( v, s, &status );
if ( status == error )
  handle_error();

This has made the code more resistive to runtime errors, but this form is totally unreadable towards the previous form. In the first one you can clearly see the connections of operations and see the whole logics, while in the second one every single operation is intermixed with error handling (best if so concise as here!). Seeing any logics in this code is almost impossible.

The exception mechanism seems to be a remedy for most of those problems. Consider:

try {

  object->GetFactory()->GetObject(value)->create();

  Use( getValue(), getStep() );

} catch ( null_pointer_exception ) {
  handle_error();
} catch ( value_exception ) {
  handle_error();
}

This way you keep the expressions concise and simple, while still the failures at every step will cause terminating the execution and running handle_error().

I’m not writing these obvious things to show why exceptions are better than reporting via return value (I haven’t shown their all advantages). If you want to reconsider whether exceptions are better than reporting problems via return values, you have to consider this topic in two separate aspects:

  1. Reporting the result of a predicted condition, about which we can’t grant anything yet (we are just checking it).
  2. Reporting the problem, when an operation is expecting some condition to be true and it’s false, so the operation has been abnormally terminated.

The above example shown how exceptions are better than return values, but it’s because they refer to case #2. Let’s try something in case #1, done by return values:

if ( !fs::is_readable( "somefile.txt" ) )
    return NO_FILE_TO_READ; // nothing to do

ifstream z( "somefile.txt" );
size_t size = z.seekg( 0, ios::end ).tellg();
z.seekg(0, ios::beg );
string s;
s.resize( size );
z.read( s.data(), size );
if ( !z )
    return FILE_READ_ERROR;

size_t pos = s.find( "lockable" );
if ( pos == string::npos )
   return NO_OCCURRENCE; // nothing to do

... 

Could these checks be replaced with exceptions? Well…

try {
 ifstream z( "somefile.txt" );
 z.exceptions( ios::badbit | ios::failbit );

 size_t size = z.seekg( 0, ios::end ).tellg();
 z.seekg(0, ios::beg );

 string s;
 try {
  s.resize( size );
  z.read( s.data(), size );
 } catch ( ... ) {
    return FILE_READ_ERROR;
 }

 try {
    Extract( s, "lockable" ); // will throw exception if not found
 } catch (...) {
    return NO_OCCURRENCE;
 }

 ... 
} catch ( ... ) {
   return NO_FILE_TO_READ;
}

Does it still look better than the value-return version? I don’t think so.

Saying something like that the information about the exception is hidden (well, I state that you have never tried in your program to use some function, for which you neither have a complete documentation nor source code) is the old-fashion procedural thinking. It’s not true that this is dangerous, as long as you do not violate some simple rules per role:

  • the library user uses resources that have predictable conditions of releasing and keeps resources releasable until the operation is complete
  • the library author reports exceptions only because of reasons that the user could have predicted (and prevented from happening), by the reasons reported from some underlying calls, or other kinds of termination events, optionally

I would agree with Joel Spolsky that the exceptions should be handled immediately, but only in one case: if the exceptions were used by the library some stupid way – for example, like the way the IOException exception is thrown in file operations in Java standard library. This can be called “direct translation”, as this is simply translation of exceptions into return values.

The problem isn’t in exceptions. The problem is that exceptions is a mechanism that should be used only for reporting some selected kind of errors, not all possible errors (that is, calls that do “check some condition” should never report the “failure check” via exception). Exception is a mechanism that allows to report an error together with causing an abnormal procedure termination (because this is yet another way to exit the procedure in addition to “return”). So if you consider, which way is better to report an unrecoverable error with abnormal procedure termination, the answer is simple: exceptions, not return values, because the exception mechanism:

  • ensures immediate termination without engaging return value
  • ensures exception propagation in case when it’s not handled by the caller
  • in RAII languages (like C++), it also ensures automatic destruction
  • in non-RAII languages, it provides a mechanism to release resources at the end (when both exception was caught and the program normally continues)

So, there’s no dilemma whether to use exceptions or return values because this is like considering real object-oriented with C++ vs. manual pointer-to-function-based object-like programming in C. The real dilemma is in different place: in which practical situations “special situations” should be reported by exceptions.

Don’t prevent the prevention

Another aspect, which is astonishing to be heard for such an experienced programmer as Joel Spolsky, is to believe that crash is the worst thing that may happen to your program. Actually I think I should be jealous to those who think that – I would be happier to believe that there’s no worse thing to my program than a crash. Unfortunately I don’t have this solace of unawareness. I can only hope that the reason standing behind his statement of handling exceptions in place was something different than preventing programs from crashing, but I’m not sure of that because, well, this is the only practical consequence of that.

Try to look at it this way: imagine that you are driving a car, you are riding a bit fast and urgently you have a crowd of people in, say, half a kilometer ahead of you. You’re trying to brake, but well… the brakes don’t work. You are still riding. The distance to the people decreases. You don’t have any other way; there are buildings on both sides of the road. You can’t reverse because you’re riding too fast. In such a situation you have only two choices – either you crash on a building or you’ll let this car ride into this crowd of people and squish some of them.

These two things refer to things that may happen when you forget to handle an error reported by some function you call: the first one happens when you forget handling an exception, the second one when you forget handling an erroneous return value.

If you understand this example well enough, you’ll understand that someone, who is trying to do all possible things to prevent the program from crashing, is a complete idiot. Fortunately creators of standard libraries usually aren’t idiots and that’s why trying to do strlen() with NULL, or passing NULL to std::string’s constructor usually ends up with crash. Error handling should not prevent the program from crashing (although it may do its best to prevent vital data from being destroyed). Error handling should prevent the program from continuing, which usually means that the sub-functionality or even the whole program, if it’s unable to continue because of this condition, should be abnormally terminated (that is, crashed).

There is, of course, one more thing that can stand for the idea to prevent the crash even with the cost of destroying user’s data. When a user can see that the program crashes, his first thought is “what a s*** stuff!”, while when there is some destruction in the data, the user may think that, well, nothing wrong happened, probably they won’t even notice that something has crashed. And even if some data destruction happened, you can always say that, well, this is something wrong with your operating system or you have installed some malicious software. Of course you still risk that there is someone inquisitive who can prove that your program has a bug and you intentionally hold the crash by the cost of data destruction, and you are doomed. If you want to prevent the crash just to not let the program make you black PR, just try to make the process of saving the data and restoring the program to the previous state fully automatic. For example, the Opera browser I am using is the most crashing web browser out of all web browser I have been using – but Opera still has many advantages over other web browsers and even if there is some crash once per a time, it quickly recovers to the previous state after automatic restart. A short conclusion is: don’t try to trick users because it is enough that one smart user detects you and you are in the ***hole.

The useless handlers

Once you realized that crash actually isn’t such a bad thing, you can now think well whether you really would have to handle the error (in case of return values you have to have some way to propagate the error!), stating that the error handling is not a means to prevent the program from crashing.

In Tcl language, for example, there is a mechanism that works merely like exceptions – status codes (they work merely the same way). In this language they are used not only to report errors, but also as a mechanism for implementing the break/continue statements for loops. This language may teach you one very important, but controversial thing: Don’t handle an exception at all, if you can’t do anything better than there would happen when it’s not handled. In other words, let an error crash your program, if all you can do is to quit the application with error message (which for average user is same as “crash”). Simply because the Tcl built-in error handling will do exactly the same, with only a bit different error message. For example:


set fd [open somefile.txt r]
set contents [read $fd]
close $fd

do_something_with $contents

If the file was not found, [open] will end up with error and will crash the script (stating that this above is executed in a “bare interpreter”, not from within some procedure that may run it in its [catch] block):

couldn't open "somefile.txt": no such file or directory
    while executing
"open somefile.txt r"
    invoked from within
"set fd [open somefile.txt r]"
    (file "somescript.tcl" line 1)

Ouch! Well, so let’s add some error handling to our script:


set failed [catch {set fd [open somefile.txt r]} errmsg]
if { $failed } {
 error "File 'somefile.txt' not found"
}
set contents [read $fd]
close $fd

do_something_with $contents

Just to receive an extremely different and much better 😀 result:

File 'somefile.txt' not found
    while executing
"error "File 'somefile.txt' not found""
    invoked from within
"if { $failed } {
                error "File 'somefile.txt' not found"
}"
    (file "somescript.tcl" line 2)

Uh, oh. Yes, I know, I was a bastard – I have used the [error] command to handle the problem. I should have used something else. OK, go on, use something else. You’ll probably want to use [puts] command to display the error message (just to make it different from the human-readable stack dump that Tcl interpreter returns by default) and you’ll do [exit 1] (to make it return result 1 to the shell, what is also done by default by Tcl interpreter in case of error). Or, as you have the Tk package available, you’d prefer displaying some message window with the error message? Well, put the whole script (that is, the part that is independently executed – proc’s may be outside it) into a [catch] block and show the message window when it returned nonzero value. Just once for the whole script.

You can do all these things, then take an average user and make them use this program till getting this error and ask them, what happened to this program. Every “average user” will tell you “it crashed”. So what’s the difference?

If you really want to do something “really different”, do not handle this situation via exception. The error when opening a file reported by exception is exactly the right thing to do because it’s the problem that you could have predicted and you didn’t. You still could:

if { ![file readable somefile.txt] } {
   puts stderr "I kindly inform you that the file 'somefile.txt' is not possible"
   puts stderr "to be open for reading. Please make sure that this file can be read"
   puts stderr "and try again."
   puts stderr "Thank you for using Tcl for writing scripts"
   exit 1
}

set fd [open somefile.txt r]
set contents [read $fd]
close $fd

do_something_with $contents

Please note that this time we didn’t handle the prospective exception thrown from [open] command. But why should we do it? We have prevented the exception from being thrown!

Of course there still may be cases like “the file was deleted in between”, but this kind of error is qualified to different category, and we’ll talk about this later. In this place I can only tell you that if this happens, a crash is quite intended.

Remember that some users will try to intentionally make your program crash and they may also infringe the statements of the operating system in order to make it happen – don’t try to do everything you can to prevent it because, simply, you can too little. Try to do your best to prevent your program from crashing during normal use (but not in runtime!), not to withstand any abnormal situation. The most valuable thing that the user can have from using your program is data. Your program should never allow that the data are corrupted, lost, overwritten or leaked. Your program should not allow to compromise the safety and security of the data. And even if the only way to protect the user data is to crash the program – let the program do it and do not deliberate too much because this is exactly what you should do.

So why should I handle exceptions?

There are generally only two reasons why you may want to handle exceptions:

  1. The exception mechanism for this particular case was optional and you turned it on because you currently prefer this way (example: exceptions thrown from iostream operations in C++ std library, only if the exception flag is set by ios::exceptions).
  2. The procedure that ended up with exception was running in a kind of “subsystem”, it isn’t crucial to your application, but at most to this subsystem. This way only this “subsystem” is treated as “crashed”, while rest of the program normally continues. All errors of this kind are errors that could have been predicted (example: vector::at() reports exception when running out of bounds – you could have predicted this problem by checking the index before calling vector::at) or may be “totally unexpected” errors.

Of course, this above is true only when the library has been created with the principles of art. There should be also third reason: the procedure was written by an idiot and this exception was the only way to report an error, so you just have no choice. An example is IOException in Java, which is reported in all I/O operations including close() and it’s not implicitly propagated (it’s not a derivative of RuntimeException).

When we’re after the case #2, if you want to handle the exception, you should handle it always near the line between the subsystem and the higher level system unit. If you cannot extract the subsystem – just handle it on the highest level, or even don’t handle it at all (let it crash your program). In particular, handle the exception in this case only when you are able to recover and continue rest of the program. If you are not able to recover, then, well, let the exception crash your program – trying to “do something” in this situation is like trying to cure someone whose brain is already dead.

The leftist fool programmers

Error handling is a very underestimated topic in programming. Everyone thinks that if there is a potential problem, we should prevent it – so, for example, check if a pointer you have received in your function is not null.

Really? Check and what? Send an error message? Exit the application and lose all the data and process? Continue with a bit failure? Try to continue as far as possible and report error? Does your function consider failures in execution? Did the function that calls your function predict that failure and plan error handling accordingly? How can we recover from such an error, and – much more important – how can we determine such a problem on the testing step in the development process so that this problem won’t occur? Or maybe it’s still OK that this error occurred because it was caused by an external resource on which we have no influence?

How many times did you consider all these things when planning the error handling?

Why the hell should I check whether the pointer I have been passed to isn’t NULL? Ok, it may be reasonable for public API to prevent stupid users from doing stupid things (although I think C++ can afford for some better ways to do such a check). But, for example, why should I check that some function, returning a pointer, returns a non-null pointer?

There’s a lot of really stupid advices of how to prevent errors and problems in your application they may cause. Well, there are feminists, there are socialists, there are vegetarians, ecologists, anti-globalists… and the programmers also have their own version of idiots. They propose simple solutions for complicated problems and cause this way other kinds of problems. One of those stupid advices is that you should always use ‘strncpy’ function instead of ‘strcpy’. To see, how stupid this advice is, see an article on Maciek’s Inspirel site. This is a case of proposing simple statement kind of “you should never”, which is a typical leftist approach. The engineer approach can at most state “you should do this only if you have some logical explanation that things you do make sense”.

If you want to consider whether checking a pointer for NULL makes sense, first of all you should make sure that the function in the description in documentation states that it may return NULL and check in what situation it does it. If it declares that it does, and it’s not possible for you to prevent it, then you should check for NULL and in response take an action that is consistent with the error description (in particular, check which resources you need won’t be available and which operations cannot be done because of that). If the function returns NULL only because of some invalid input data – check the data beforehand. If the function declares that it never returns NULL (for example, the function returns a pointer to a common class pointing to one of 3 singleton objects), or it would return NULL only in a situation that you have prevented, then checking for NULL is useless.

Ok, I still should make a small exception of these statements because I base on another statement that programmers document their functions completely, while it is often, unfortunately, not true. So, if the function doesn’t have a complete documentation, or you suspect it’s out of date, or it says something like “returns NULL in case of error” and says nothing more about errors (which is everything all the same for a “user programmer”), the only way to go is to check the source code and see for yourself. And if even this thing isn’t available, you should gently refuse doing any work that is based on such a function, or at least sign a special declaration that you are not responsible for any bugs in this code.

In practice, people do “dumb null checks” just “to have a clear conscience”. People, who do it should rather go to psychotherapist. You should remember that one stupid null-check will not prevent even one tenth of prospective bugs in your application from destroying it (as I said, functions in C standard library do not check pointers to strings whether they are NULL). Instead of preventing the crash, cause the crash by reading the first 4 bytes from under the pointer – as long as the system features memory protection. If you really want to aim for preventing bugs in your application, you should perform checks on multiple levels (usually by static analysis), detect all possible cases of program’s states and behavior, and eliminate those that you don’t want. Ensure predictability by static declarations and flow analysis. If you really don’t trust the underlying code, try to run it in some separate environment. That’s really all you can do.

Also, if we’re talking about null pointer – almost every operating system has means to kill the process that tries to dereference the null pointer. So, are you trying to prevent an error in your application or prevent a crash? If you are trying to prevent some mistakenly returned NULL, you’re really doomed – by a simple statement that you are using a function for which you are not completely sure that it works. You’d say there’s no program without bugs? Theoretically you’re right, there is only one small problem: if you have a bug, it’s not possible to predict what the program will do. Are you trying to “do something” (the favorite saying of the leftists when they want to satisfy their thirst of solving problems) stating that you found this problem? Ok, do something – call abort(); that’s the best thing you can do.

Before you choose, make yourself conscious of what your choice is. If your choice is either crash or crash, then why do you think that the second crash is better than the first one? But if your choice is crash or unpredictable behavior – the crash is still better. If you are trying to prevent your program against unpredictable behavior in case of unexpected conditions – the only way to prevent your program from unpredictable behavior is to crash. At best you may lie to yourself that you can do something better (usually), and instead of calling abort() throw an exception, counting for that “someone above this place” will be able to save something.

In case of this dumb null check people often forget about one more thing: if you do any check for a condition that you think is critical, you should also plan some possibilities to test this “recovery code”, right?

When I was adding a feature to some application that should predict multiple access from threads and needed to have mutex locking, I was obliged to predict problems like not enough system resources or errors when locking/unlocking (the platform was not too much reliable). Handling of these errors was a bit complicated. Moreover because of that I have written test cases where mutex operations were stubbed and I could setup them to cause one of these errors. Having that I at least knew that even if these errors are very unlikely, there is some potential profit of this error handling code.

However I have seen many times some code, which I have traced only theoretically (just by navigation) and found that even though on some function levels these unlikely errors (reported by return value) are handled and propagated, after some call the return value was ignored. I didn’t need a test case to prove that this just won’t work. The error was maybe very unlikely, but if this happened, the handling just wouldn’t prevent the program from doing serious destruction. So why this whole error handling was added while it doesn’t prevent anything?

Moreover, this is one of the serious powers of exceptions. Exception is just something you can’t ignore without excessive handling code. The mission of an exception is to crash your program; handling the exception is to shrink the crashed part to a least possible range. The earlier you make yourself conscious of this unique treat of exceptions, the better it will serve your development work in case of error handling.

Imagine, for example, that with some configuration the malloc() function may call abort() instead of returning NULL, when the allocation failed. It’s a correct approach because people, despite that they usually check this returned value, they rarely do anything reasonable in response. Practically since the very beginning people were conscious that leaving the problem in hands of the user (that is, by reporting the problem by return value) may cause so many damages that it’s even better to crash the program without letting the user know. This may be the evolution of error reporting:

  • reporting the problem by return value; this requires to select an “erroneous value” as one of possible values, or return the value and status separately; however users may always ignore the error status and continue the program, usually leading it to calamity
  • call abort() in case of problems – this is a very drastic error report, but at least it doesn’t leave a user possibility to ignore the problem
  • raise a signal in case of problems – this usually ends up with crash, although the majority of “killing” signals give opportunity to handle them and do something better than crash; it proved to be a good solution for cases like division by zero
  • throw an exception in case of problems – this is merely like signals together with longjmp, with much better regard of non-trivially-destructible objects (C++ only) and language syntax support

So you should not consider whether it’s better to report by return value or exception, but rather: a) whether leaving a potential of ignoring this error is dangerous, and b) whether exceptions are better than signal handler + longjmp (they are).

The negligent make it twice

People intuitively throw exception in the last possible situation, not in the very beginning, before any significant steps were taken. That’s why catching exceptions is usually the wrong way of error handling. But when you check the conditions in the very beginning and take care that the “throwing” situation won’t occur, what’s the reason of catching exceptions?

For example, imagine that you hear: before you go to the till, check if you have enough money. Well, good advice; you have collected things you want to buy, you brought them all to the check-out point, just one small thing – check if you have enough money. And what? Leave them there and go out? Get them and return money later? Resign and repeat the shopping after you took enough money? Of course, every sane man will response: why not check for enough money before shopping? This way you don’t waste time to perform the shopping, if you can’t finish this operation correctly. But if you are sure that you have enough money, when you are currently shopping, why to check this condition again when going to the till?

Of course, you should have the “last chance to crash” in such situation. However, when this occurs, the only thing you can do is to drop all things you have and run away. It’s way too late to do any “fix” that would allow you to continue shopping! You may fix it by going to a cash disposal terminal, but not when you are at the till; this can be done only before shopping!

The clue of the problem here is that if you have an exception in some place, it doesn’t mean that the problem happened there – it only means that only there you had a chance to see it. In other words, it may be a result of something wrong that already lasts since some time. I’ll give you another example:

If you index an array, and you rely on exceptions – should the exception be raised in a place where the array is indexed out of bounds, or rather in the place where the invalid index value was created? Additionally, in Ada language arrays may be specified the integral type for indexing. This integral type may be a limited type, where valid values are exactly those that match all valid indices in the array. Having this, there’s just no way to index out of bounds because the out-of-bounds index value has no chance to be created.

But wait. This doesn’t mean that the problem disappeared. Of course not – it just occurs in different place. But it definitely occurs earlier, that is, closer to the place where the problem happened, which gives you probably a bit more chance to recover. For example, if the index comes from some external procedure, you can just state that the procedure got stupid and ignore its request, continuing rest of the program. While when it happened when you are indexing, usually you can only crash.

Please note first of all that many these exceptions are thrown in an “unexpected” situation; they just throw exceptions because they can’t continue or they do their best to prevent your application against going out of control – but when this happens, it’s usually way too late to do anything reasonable. There should be then some possibility to detect this situation a bit earlier, and if it is, you should do it. In particular, when you are going to perform some set of operations that need some resources, check for all resources whether you have them just at the place where you may have them, also check for conditions to be satisfied in the first possible place where you know what to check for.

For example: if you want to read some data from two files during your job, open both files in the beginning and keep them both open even if you don’t read from one of them for some time. You may spare a lot of resources not wasting time to perform your operation (and not giving a user hope that you can do it while you can’t) that would fail nonetheless in case one of files can’t be open. Then, once you are sure that both files exist and can be read, start your operation. If then there is any problem during reading from any of these files, it will be reported, but this is something you couldn’t check before starting your operations (by a case, this is a kind of error, which is very rare comparing to nonexistent file).

Only if you plan your error handling with this in mind, can you say that you really have control over your application.

Errors occur, problems happen. Mind the important difference between “error” and “problem”. When a problem happens in an application, there are several things that may happen:

  • an error is reported immediately
  • the problem causes invalid changes in data, which causes another operation to cause a invalid changes in data, which… finally some of them report an error
  • same as above, but it never ends up with an error (but does a lot of mess in your program)

In various VM platforms (like JVM or .NET) many works were done in order to make the last one never happen. First, they throw an exception when null pointer is dereferenced. Second, they ensure that in case of normal references, every non-null reference points to an existing object, and also throw an exception if referring to released object, in case of weak references. But as long as you are using universal data types (and too many people forget how universal the ‘int’ type is), you will never prevent the second one. And even though you finally catch an error, it’s usually far too late to do anything, even dumping the data to some file, because the data may have been already corrupted. Because to detect problems early there is needed a data type with weak abilities to “carry the problem” (which means, as you know, that it should be less expressive).

So, if you think that enough things have been done to prevent any problems to last too long in languages like Java or C#, you’re completely wrong. They only prevent data mess done out of the frame of current data types. But there are still a variety of errors that can be done in the frames of current data type, especially if these frames are so wide, as it happens with ‘int’ type. Using a simple ‘int’ you can index arrays out of bounds, you can refer to an existing object that was just moved to a different index (did you think that “iterator invalidation” cannot happen in Java?), you can refer to an object you have remembered in a local variable instead of reading this pointer by calling some method. And the VM will not prevent you from doing these things.

Think well, then, in which place you’d like to hold on because of some erroneous situation, and rely on that the granted condition will not unexpectedly change. Because if you don’t think well, you’ll get unrecoverable errors with undefined program behavior, and you’ll have to make your job twice.

How APIs can increase your bug risk

Imagine that you call some function fx() which returns a pointer value and it’s specified that it may return null pointer value in some specified case. Then you call another function xset(), which accepts a pointer value for the same type and it’s specified that this function may accept null values and this means some-special-conditions (in short: fx() declares that it MAY return NULL and xset() declares that it accepts NULL values). Now imagine that you do:

xset( fx() )

And? Will xset() function do exactly what you want for any value that may be returned by fx()? In other words, you probably want to pass a valid pointer from fx() to xset(), but did you really want to pass the null pointer value to xset() when fx() returned null pointer value?

If not, it may end up with a problem in your application that you haven’t detected.

This kind of error may happen in C, C++, C#, Java, Ruby, Perl, Python an many many others (Vala is a notable exception due to non-nullable references by default, however if you operate with nullable types in the above operation you’re in the same situation as in the others). You can’t do it in Tcl, but in Tcl you can do similar bugs like having an empty string in a variable, in which you expected an integer. Pay attention that there’s no way to detect this kind of error in the above languages, even in runtime.

The same happens, in Java this time, if you do:

 char c = characters[str.indexOf(match)];

If you have called String::indexOf() then the return value may be -1, if the searched occurrence was not found. Unfortunately, it’s next passed as an index of an array, which will end up with IndexOutOfBoundsException. Well, in this case the problems were still prevented, though maybe not with the most readable form. But this exception will not prevent errors in every case. Now, grab your chair, and watch this:

 char c = characters[str.indexOf(match)+10];

This is not fiction. This is a real bug that I have once found.

It’s best if the error occurs exactly in the place, where the problem happens. Unfortunately, errors occur usually far forth to the place where the problem happens, that is, in other words, various instructions transmit the problem and throw it to each other like a hot potato, until one of them throws it to the ground (that is, until the error occurs). If any, needless to say. Comparing it with the example with array indexing: a problem had happened way before the array indexing: an invalid index value was created. If we do it in Java language, we’ll get an integer value, which maybe contains invalid index, but we’re not yet indexing – only when we’re indexing, the JVM will throw the out-of-bound indexing exception.

But the problem isn’t always when we’re indexing. The problem may happen when we are only computing this index. Or computing it from some else value, that was taken from some else place… and so on. The further we are from the real place where the value was spelled up, the harder it is to find the real problem. And exceptions are thrown not in the place where the problem happened, but rather when hiding this bug under the carpet was no longer possible.

Of course, you’d say, this should be then handled by exception!

I know that in Java standard libraries there is a custom of making the most possible stupid use of exceptions, but this one, despite that it can provide so dire problems, is an example, where this stupidity didn’t reach.

It shouldn’t be handled by exception because the indexOf function is not meant to retrieve some existing thing from a string that you know it is there. It’s meant to search for an occurrence in a string and provide you with the answer whether it is there or not. The answer that this occurrence is there is same good as the answer that it’s not – the -1 answer is a bad answer only from the perspective of the user of this function, if they know that, according to the application’s logics, this occurrence must be there. If such a response was handled by exception it means that the guy who was writing it was a complete idiot. It might be handled by exception, but only when – for example – the indexOf function is passed an additional argument, which has a name kind of “THROW_EXCEPTION_IF_NOT_FOUND”.

So, is this return value still good?

Not exactly. It’s good that this is not reported by exception. It’s bad though that this return value is a good looking integer index value, if mistakenly used in an expression. The best way is to return some “value”, or both the index value and status, and make sure that the status has been checked before the value was used – maybe even that would throw an exception in case when a user mistook the value. In Java you can use Integer class (this will make use of the “optional” property of reference type in Java that I have already described). In C++ you can use boost::optional or some similar technique (say, a pair of bool and this expected value).

If you ask what should be then done in case of close() (in Java’s streaming classes), which may want to report error (by exception), things that should be done are the following:

  • this may throw only when the flush() operation can’t be finished (in order to complete writing) – just try to flush first and you should be fine
  • ok, so flush the buffers and close() should not throw any exceptions, you can ignore any errors because no errors will be reported

What? Really all errors?

Not exactly, but we really should not worry much about it (it doesn’t mean that you can leave the catch body empty – it only means that you should rethrow this exception via RuntimeException). Really. Well, let’s explain it in more details.

Levels of errors

We have various kinds of errors and which way of making this error possible to recognize at the right place depends on this kind.

As long as the “level of logics” is concerned, we have the following kinds of errors, that is, problems that have happened in a program:

  1. Preconditions to be checked beforehand (level 3 – preconditions)
  2. Problems that were found because the preconditions are not satisfied (level 2 – negligences)
  3. Weird errors that occurred in some category where they should not have happened (level 1 – calamities)
  4. System damage (level 0 – cataclysms)

Before you approach to predict any problems that may happen to your program, the first thing for you is to make yourself conscious that the only levels you should worry about are levels 3 and 2. You MAY be prepared to handle errors for level 1, but please, don’t be so naive that you can normally continue your program when they occur; try rather to close your application kindly, safely, and first of all quickly (before the system will kill it). If you get a problem of level 0, please call abort. No, not exit. Really you should call abort unconditionally. Don’t even try to check any data, “norover” saving them!

What kinds of causes may be assigned to particular levels?

Level 0 is something that is in the system, including hardware. For example, if reading of memory causes weird data to be read, then as long as you are able to detect this problem, you should crash. This can be also caused by lacking system resources (for example, you can’t create a semaphore), or sometimes by lacking some basic system service. All these things are resources that we believe that are crucial not only to our application, but just for everything that is running on this platform. That’s why we believe it will never happen. Some APIs allow to report errors of this kind, but usually you should handle them by simply calling abort(). APIs for languages that support exceptions should use exceptions to report them, however it’s hard to find this because these errors are usually reported on C level and there are no many C++ wrappers that translate them to exceptions. Generally it depends on how reliable the system is meant to be – the more reliable the system should be, the more kinds of errors are of level 0.

Level 1 concerns errors that happen usually because of some external conditions that the user couldn’t check before, but still not something that is not predicted to work. In other words, these are “user level” errors that should never happen, but not because of crucial system resources. It may be because of some problem in the library, maybe it overestimated something that it will work or by some external service that is needed by the application, and somehow it doesn’t work (but the service isn’t any crucial part of the whole system). For example, transmission or network errors are unexpected problems – how important it is for you, depends on how important the network connection is for you. This level also contains the cases when exceptions of level 2 were thrown, but they shouldn’t have, as the preconditions were satisfied. They should be handled by exceptions, although this may also be reported by return value optionally. However if return value is used to report this error, the error state must be remembered so that the next operation will not be tried.

Level 2 is the already mentioned “last chance to crash”. This is a report of a condition that should have been satisfied before calling some API, it wasn’t, and the API did not accept the conditions under which it was called. Errors of this kind should be at best reported by exceptions and handled on a high level. When this happens, it’s usually too late to “do some reasonable alternative”. An exception thrown at this level is always because of a “programmer’s negligence” because they could have prevented them by checking required conditions (level 3).

Level 3 is just the condition, which may be satisfied and may be not, both results are acceptable, maybe not in the program’s logics, but in the API logics. For example, your program is working with text files that should have some specific format. You should read the file and check if it has correct format; if it doesn’t, the program should display some information for the user about invalid usage and then normally continue. This exposes a “fork situation”, as we expect that both alternatives are more-less the same good or bad, we just make the program behave different way. This kind of problems should never be reported as exceptions as the only alternative; they should be always reported via return value, although reporting via exceptions may be provided as well, optionally.

Note that optional exceptions thrown at level 3 is a thing provided only for user’s comfort. It’s important to provide this alternative because for the API designer it’s not possible to predict whether this unwanted situation will be for the API user just some alternative situation or a serious error. Reporting level 3 errors only as return value will cause the problem described by the first example in the beginning. On the other hand, reporting them only as exceptions will cause another problem, described by the second example.

Remember that the topmost reliability of your software will be such as the reliability of the system and additional resources you use. Your application can be 100% stable by itself, but if the system or additional resources cause problems, you can’t do much, and practically, you can’t do anything. So the first rule for error handling is: don’t fix the system and dependent libraries. If your system is little reliable, change the system; if your library is not reliable – use another library.

That’s why you should ignore errors of level 0 and 1, letting them crash your program when they occur. Users should focus on levels 2 and 3.

Generally we can define the system environment as:

  • the basic services – crucial system resources that must always work, resources provided by these services must be always provided every time they are requested; at most there may be some limits defined which must not be overrun by the application; errors when allocating resources from these services, as long as the preconditions are satisfied (if not – level 2 error), are level 0 errors
  • the extension services – system resources that are not crucial and there are predicted situations when the resources may be not available. Depending on various conditions, errors when using resource from this level can be: level 1, if the resource wasn’t available, but it should; level 2, if the resource wasn’t available because a user didn’t satisfy some condition that was required; level 3, if the resource was never granted to be available and, well, it just wasn’t. The difference between level 1 errors here and level 0 errors from the basic resources is that the level 1 means that there are problems with applications of little importance because of some extended services’ damage, but the crucial part of the system is still functional. Level 0 means that the system is totally unstable.

Let’s try to classify particular types of errors:

  • can’t open a file with a name for which we didn’t check it exists: level 2 because you could have checked whether it exists before opening
  • can’t open a file, despite you have checked that it exists and can be read: level 0 because this is not your fault, nor is it fault of any unpredictable condition; it’s a system’s problem
  • can’t write to file because the file size exceeded the limit: level 2 because you could have checked the limit beforehand, so it’s your fault
  • can’t write to a file stream, despite that you have this stream correctly opened and no limit has been exceeded: level 0 because writing to files is a crucial system resource
  • can’t connect to a socket because the network is off: level 1 because it’s an error that depends on some external conditions, so it’s beyond the system’s basic guarantee
  • can’t connect to a socket because you passed incorrect parameters for the operation: level 2 and you should know why
  • can’t connect to a socket despite that you passed correct parameters and network is working: level 0 because this is one of the crucial system services that have strong guarantees
  • can’t read from socket because the network connection has been dropped: level 1, by the same reason as when trying to connect a socket when network is off
  • can’t read from socket because there were no data sent to the socket and the socket is working in a non-blocking mode: level 3 because there are no guarantees that the nonblocking socket will ever provide any data to read
  • can’t read from a stream that was not open for reading: level 2 because, as previously, it’s your fault
  • you search for an occurrence of a string in a bigger string and it wasn’t found: level 3 because no one granted that this string is there
  • you dynamically cast your object to another type and the dynamic cast failed: level 3 because no one said that this object is of this class unless you check it out

There can be, this way, the guarantees provided for that particular errors will never happen:

  • level 0: strong (internal, basic) guarantees: it must always work or you have a total crash
  • level 1: weak (external, extended) guarantees: it should work, but if it doesn’t it’s not a critical problem
  • level 2: your guarantees: if you have done your job well, this error will not occur
  • level 3: no (or private) guarantees: no one said it will ever work – you can try and you may succeed (or maybe you had given some additional guarantees that it will work, but this is your matter and your problem if this isn’t true)

However, please note that a lot of system API in today use has been created at times when system haven’t been given any strong guarantees. Because of that even problems like “I can’t create a socket” by whatever reason (no matter if it’s “incorrect parameters” or “resource temporarily unavailable”) is reported by -1 return value (which is normally the descriptor ID). It means that you may even allow some errors to cause other errors (for example, blindly use the value returned by socket() function in the next call to read() and respond on the error reported by read(), which reports also problems with invalid descriptor – this is practically the only way to go when you use network descriptors in multi-threaded environment). If the only system API that you have reports every error by return values (not even by signals!), it simply means that this system (theoretically) provides no guarantees about anything – in other words, everything may crash. A programmer, who must use such an API, is in a very bad condition because according to the rules of this API they must be prepared for a situation that nothing works. The need to worry about all the errors, including those that happen in 1% of cases, causes that the user is forced to spend 90% of their coding time with planning the error handling, so effectively it results with very poor error handling. Then, being accustomed to such kind of error, makes a user do “bug prevention tests” like those mentioned in the beginning, which causes very poor quality of the software.

Remember also that every handling and recovery that you plan must be tested. So you must be able to provide stubs for also system functions so that you can make them intentionally cause error, so that you can observe whether your program correctly responds for this kind of error. If you’re not going to test your error handling – don’t do error handling (I would even say it’s obvious that if you’re not going to test your code, you should not write it). Do a crash instead. I’m telling you truth – letting your program continue after the error will make it end up much, much worse.

I think this is one of the main reasons why termination semantics has been accepted in C++ and preferred to resumption semantics.

Exceptions should be exceptional!

Concerning the levels described above and skipping the 0 and 1 levels, we have two aspects (levels 2 and 3) in which we can consider whether in particular situation the problem should be reported by exception or by return value.

When you want to obtain some resource, which requires some preconditions, the correct way to go is the following:

  • call the resource checking function
  • if this failed, do not perform the operation and do some recovery
  • if this succeeded, perform the operation, IGNORING any exceptions (allow them to self-propagate)

The general rule for using exceptions is: make a user of a choice to not handle exceptions if they don’t want to do it. This concerns errors on every level:

  • level 3: provide an alternative (and default) API that reports errors via return value
  • level 2: provide a user with complete information about how to prevent this kind of errors
  • level 1: let it be something that in normal situation would immediately call abort() – by using exception you only give some users possibilities to limit the number of problems caused by the crashed application
  • level 0: the library should not report anything, even via exception, but do abort()

In a very short description: treat reporting exceptions as something “better than abort”.

The optional exceptions for level 3 may just provide a bit more comfort for the user so that they can build simple and clearly looking expressions. For example, problems when the key wasn’t found in the tree-like dictionary may be handled this way:

Value p1 = dict.find( key1 );
if ( !p1 ) return;
Value p2 = p1.find( key2 );
if ( !p2 ) return;
Use( p2 );

or this way – if you think that it’s more comfortable (usually it is, and it doesn’t bring a problem of interrupted-in-half operation):

try {
   Use( dict/key1/key2 );
} catch ( KeyNotFound ) {
   return;
}

The semantics of ‘find’ are that we are only check if the key is available and return status of this operation. But we may optionally want to get the status information via exception or no exception just to have the operation encoded in a simple expression using overloaded ‘/’ operator.

BTW., please note that if exceptions in C++ had been predicted to be used for anything else than unrecoverable errors, then they would have been designed with possible “resumption semantics” (as Common Lisp has). But it was chosen that exceptions in C++ support “termination semantics” only. You can refer to “Design and Evolution of C++” for details. I’m talking about C++ because only this language’s designers were reconsidering this topic – designers of Java and C# didn’t think at all, but simply copied the C++’s solution directly (the “finally” clause was also provided first in Borland C++ as extension). Of course, if you ask them, they will deny and say that they took them from Haskell :D.

So don’t let yourself be said that error handling is exception handling. Error handling is early checking in order to prevent operation from being executed in unacceptable conditions. In other words, error handling is preventing exceptions from being thrown (not from being propagated!).

Why strict exceptions in a language are bad

The developers of Java language think that the best way to report problems from the procedure is to throw exception. This is the first stupid idea (no matter that it’s not strictly adhered to because, as I have shown, the indexOf method does not throw an exception). And next, they think that the exception should be declared to be propagated between calls for every function that’s on the call path, in order to ensure that no exception will be left unhandled (as if it was impossible to ensure the same thing without requiring to explicitly declare propagation – especially that Java compiler does not use the C linker to complete the application, so it is in even better situation than C++). These two interconnected are dumb in power of two.

Think a while about it. If Java forces you to handle exception in case of a least problem (with no option to handle it by return value) and forces you to declare the propagation explicitly, it means only one thing: exceptions reported in Java libraries are expected errors. Because if they were unexpected, they would be thrown only when there was some predictable problem and not needed to be explicitly declared.

Above you have possible situations when exceptions should be used. If you now compose it with the above statements for Java, there’s one thing clear: the main idea of exceptions in Java is to handle errors of level 3 (not 2!). That’s more than clear. Because if this was any kind of unexpected problem, it would be handled by a derivative of RuntimeException.

As you know, strict exceptions were introduced in Java in order to provide ability to prevent unexpected exceptions (that is, prevent at compile time a situation that is handled by std::unexpected() in C++ – but remember that in C++ you can at most narrow the range of exceptions allowed to be propagated, while in Java you can only widen the range of propagated exceptions). Authors of this statement seem not to realize the depth of their stupidity: they think that there is a situation, when an exception should be expected???

Yes, an exception may be expected. Only in one situation – when a problem in a function will be able to be reported only and exclusively by exception, and you should catch the exception in place (as you know, catching the exception in place is simply translating exceptions to return values). If you ever have a situation that you have a need to handle the exception in place, it simply means that the function that throws this exception was written by an idiot.

No, I’m not abusing anyone. Read thoroughly any book that was written at the time when this idea was being developed, or documentation of any of old programming languages that were added this feature. You’ll find the same thing everywhere: the main idea of exception is to handle the problems automatically on higher level than the level where they occurred, and handle them once in a group (not in every place when they were able to occur), and this can only be achieved when the exceptions can be freely propagated. Do you imagine, for example, that in the first languages that featured exceptions (e.g. Ada) the only possibility to “catch” exceptions was to install the exception handler only once per function? It was only C++ the first language that introduced the try-catch block and allowed this way that you can have exception handlers in one function in multiple places or even on multiple levels.

Let’s consider this sorry IOException in Java thrown when some IO operation (read or write) was failed. The stupidity in this case does not rely on that this problem is handled by exception (as you know, this is a level 1 kind of problem, so it should be handled by exception). The problem is that this kind of error shouldn’t happen in normal situation, so it should be handled on a high level as something unexpected. Unexpected errors are similar to errors like NullPointerException. Why then isn’t IOException a derivative of RuntimeException?

If there is some situation when limiting the exceptions is needed, a user should decide by themself whether they want to catch only selected exceptions and do a crash in case of the others, sink exceptions into one other kind of exception, or let every exception to propagate. In Java, not only must you explicitly declare propagated exceptions, but you also can’t add new limitations for expected exceptions. For example, sometimes you’d like to let IOException propagate very “highly”, while sometimes you’d like to not let NullPointerException out of your function.

The first and foremost dumb idea standing behind strict exceptions is that “free exceptions” may easily lead your program to unexpectedly crash, so there’s a need for a tool that will prevent these crashes (if you have read my earliest statements thoroughly, such an idea sounds now very funny). It’s still stupid, but even stating that it has some little reason, why a compiler and the set of language rules must be such a tool? Why can’t there be some other (external and independent) tool to be used? Let a user write the (trial, “proof-of-concept”) programs as they want to, let them even unexpectedly crash. But let there be also an additional tool to detect possible unexpected exceptions.

Why such a solution would be better? Well, because the main drawback of strict exceptions, like with any other thing supported by a compiler, is that if it requires too much effort to be cleared using the “good way”, they will be cleared “evil way”. This is like fixing const-correctness in C++ – if it requires too much effort, users will end up with const_cast. It’s also like too strict rules for passwords: if they are too hard to remember, users will write them down in some visible places, causing this way that the system is much more vulnerable than with less strict password rules. In Java, if the compiler can’t let you go until you handle an exception X, people will handle it with empty body, just to make the compiler shut up. This empty body will then stay unchanged because there is usually more important things to do, and will be possibly quickly forgotten (especially when tools like Eclipse allow to automatically add the code for exception handling to fix the code that does not compile – of course, exception handler’s body isn’t provided :)).

Ok, but why am I talking about this? Will my complaints for strict exceptions change anything? Can I hint any solution but using another language than Java?

Yes. You can easily stop using strict exceptions by just not using them, that is:

  • don’t specify exceptions in your methods
  • when a library function throws an exception, create your own wrappers for this library that will not throw declarable exceptions
  • if it’s too much work to create these wrappers – catch these exceptions in a reasonably near place to where they are thrown and rethrow them via RuntimeException
  • all your exception classes should extend RuntimeException, not Exception

The only problem you may have is that there’s no software that would detect the “unexpected exception” situation and you can’t even use the exception filter like in C++ (indeed, Java’s exception filter is something completely different).

The system matters

There are three general levels considerable by the application, and on each level there are some important rules that should be kept in order to have a good and clear error handling:

  • basic system level: the API designer must explicitly define guarantees and possible preconditions for the resources provided on this level; the API user must strictly adhere to these preconditions if they want to be granted anything
  • extension level: the API designer should provide exception-based error handling in a “should not continue” problem and describe possible guarantees; the API user should be conscious about the weak guarantees provided by this level and be prepared to respond for a situation when something doesn’t work
  • application level: the API designer should always provide return-value-based error handling in the API and possibly, for the user’s advantage, some exception-based alternative; the API user should remember to adhere to the described preconditions, make sure they are satisfied, possibly install exception handlers if they think there will be any use of them

Separation of checking file existence and opening a file is important – if you, for example, use two files to read and one file to write results to, you should check whether you can open the first two files and whether you will be able to write to the output file before you start the operation (imagine: you read 10GB from one file and 20GB from the second file and then after 1 hour of reading you realize that you can’t open file for writing because – guess what – yes, you don’t have privileges to open the file, for which you have been given a path). You’d say, then, what about problems that may occur during your program is running? For example, the file existed in the beginning, but when the program is trying to write, the subdirectory of the file to be written was deleted? Well, these things belong to the other category: system safety level, interaction safety, trust to the other code being currently run (they are all level 1 errors). So, if your program should not work in “untrusted” conditions, don’t be so strict to predict every possible failure. Sometimes you can let your program crash, especially if it’s running in conditions for which this program was not predicted. And if your program is running in conditions that were not predicted, it means that the problem is on much different level and your program is most likely not the only application that has problems.

Note that opening both files once you know this is possible is a solution, but probably not in every system. In POSIX systems, for example, you can open both files, keeping the writable file uselessly open for some time, because even if some other process deletes the whole directory where this file was located, the file will be at most unlinked, so I/O write operation will succeed anyway. In Windows it’s a bit different, that is, the system will not allow the directory to be deleted because the file can’t be deleted as long as it is open. But there still may be a system, which does not protect the written file and the I/O write operation may cause runtime error. The difference is very important – this last system gives weak guarantees for I/O operations this way.

If you have any doubts whether I/O operations should report level 1 errors or level 3 errors, then remember, in which practical situations I/O errors are reported. Because both may be reported! For example, the I/O errors in many language’s standard libraries may be also errors caused by incorrect formatting, e.g. reading a series of ciphers that could not be turned into a valid int value. Thinking about writes, oh, maybe transmission errors? Forget it; not in today systems. In today systems the memory and filesystem are integrated and if the system cannot write to disk because of some disk damage, it’s system’s problem, not yours. The system must fix it somehow and ensure that you are always able to write to a file. The system never reports transmission errors. Different situation is when your device is not system-integrated – some external device or network. In this case you may at most get errors like… EOF. That’s all. For example, if the device is damaged or the network is broken, the file (even being written) will receive EOF. If you check what I/O error flags can be set in C++ standard library, you’ll see that there are three: failbit, which usually means formatting errors, that is, high level operation problems, badbit, which should practically occur when the stream operates with a device with weak guarantees, and eofbit when EOF was reached. It simply means that, for example, in POSIX systems, when operating with a file stream, the badbit flag will never be set. The creators of C++ standard I/O library were rather not aware of the shown above levels of errors, but feeling that both exception and return value may be useful depending on some conditions, they left the mechanism very flexible. Because in practice eofbit and failbit are level 3 errors, while badbit is level 1 error.

What we can do to make error handling better

If you don’t have bugs in your program, you should let prospective exceptions of level 2 be propagated without handling and be sure that your program will never crash. If you have bugs in your program, celebrate if they only cause a crash. If you don’t want that your program crash, eliminate code bugs – not block exception’s propagation! If you really want to eliminate unwanted state and behavior, there are two general good practices for that:

  • create a possibility to run a code in two different configurations: release configuration, where you allow errors to destroy your application, and debug configuration, where you catch any possible problems early, but don’t worry much, what would happen to the application in this case
  • use a lint-type tool that would perform thorough verification and detect potential problems and hint appropriate places where runtime verifications should be provided

So, first thing is to make your code prepared to run in a ‘debug’ configuration, where you verify everything, and in a ‘release’ configuration, where errors may destroy your application. So, if you think that some code may potentially be distrustful, just verify its results in ‘assert()’s. This way you still have possibility to verify whether the code ran correctly (when you run in a ‘debug’ configuration), but you won’t overload the code with a lots of useless verifications, when you run in a ‘release’ configuration.

Why to use verifications only in ‘debug’ configuration? Because in this case this is usually unimportant how you plan the recovery from errors. You can crash, you can stop in a debugger, you can display error and go on, even risking destruction of your data – you are not running in a production environment, but in test environment, so even most dangerous errors shouldn’t matter. If you state the matter this way, it’s really unimportant how thoroughly you have planned the error recovery. For this case there is only one thing important: every problem must cause error in the earliest possible place where the problem happens. Moreover – you are testing the application to make sure that a) every problem is reported as early as possible, and b) every error that was caused by internal reasons will not happen in any possible scenario.

And second good practice is to use lint-like tools. These tools are compiling your code according to your compile rules and create a special form of resulting files that will be used to perform a complicated verification of your code. This way you’ll be reported all dangerous situations like possible indexing out of bounds, reading an uninitialized variable, dereferencing a null or invalid pointer or uncaught exception. And it will also determine whether this problem has no chance to occur and won’t report problem for this case.

Once you have these things checked, it’s better that you trust to the API that was provided. Distrustful you should be only in case when you are running something REALLY distrustful, that is, the code that is executed may be even especially predestined to destroy your application, gather secret data or cause harm to user’s systems. Such an distrustful code should be run in an incubator environment, with limited access to particular elements of the system (in this statement, applications running as JVM code is a good example, but not the only good example, of course). Such a code has no chance to be verified beforehand by neither running in a debug configuration, nor by lint-like verification. And therefore it should be run as an interpreted code so that any problem it may cause will not cause problems to the application.

If you cooperate with another application, it depends whether you really need it for your work, or this is just an optional functionality. Anyway, when the interoperability malfunctions, the functionality that relies on it can’t be used. In effect, it only depends on how important this functionality is. In short: if your application cannot live it without this service, make it suicide; if not, well, make it cut off its hand and keep it alive. But the problem of failed service cannot be fixed by your application!

Always remember that you pay for every additional verification because of dis-trustfulness. But if you verify, be sure that your function is really prepared to run a code that may fail. The verifications should not be added because an application may potentially fail, but because this failure means something specific for your function. If you really want to be distrustful, you should not write programs at all, and you shouldn’t even run cars, travel by planes, give your money to a bank or even watch television. You just can’t prevent every possible failure. And in many places verifications won’t help you much, if something has been really direly destroyed.

Bottom line

It’s not a problem to handle exceptions, like there’s nothing simpler that you can do with a pointer value than checking it for null. The problem is what to do in the problem handling procedure. Before you program a verification procedure, ask yourself, what would you do in such a case. In other words, what’s the level of meaning when particular problem occurs.

Don’t also fall into easy simplifications that you should verify every pointer value against null or that you always should handle the problem in the place it occurs. You should, however, recognize the level of the problem and take according steps: maybe you should propagate the error as it is (in case of exceptions, just allow an exception to be propagated), maybe you should do some recovery in this place, or maybe you should report another kind of error. Maybe also sometimes it’s enough that the execution is broken and no additional steps have to be taken.

There are no simple ways to handle problems and errors. Every case should be treated individually, although the errors can be generally classified in four levels of errors. Coding standards may at most specify what exceptions should be used, how to propagate problems between execution levels, how to manage resources and how to report problems with particular resources. But it should never specify what problems should be ‘always’ verified, where to handle errors and what to do if an error was found.

On the other hand, the decisions whether particular situations should be reported by exception or return value is really simple (so simple that it’s even strange why so many libraries break it): use only exceptions, if the situation to be reported could have been predicted and prevented from happening, and use exceptions as an option for return values to report any kind of error. In other words: your library should not force user to handle exceptions – the program should be able to be written and should have notions of working even if no exception is caught.

The main and most important thing when doing error handling is to focus on errors that are likely to happen and plan error recovery for them thoroughly. Exceptions were predicted to make this thing possible, not to get into a degenerated API as the Java’s standard library, which is the main reason of why many good programmers despise exceptions.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Exceptions konsidered harmfool

  1. Pingback: Why not to kill people who write documentation? | SEKTOR KONSIDERED HARMFOOL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s