sab39

... indistinguishable from magic
effing the ineffable since 1977
How to completely screw up a good idea: Nullable types in C# 2.0

How to completely screw up a good idea: Nullable types in C# 2.0

4/19/2005
(Update way too many months later: the final release of 2.0 fixed most (but not quite all) of these issues. Eventually I'll get around to producing an updated table to show how things improved and what didn't. Sorry to everyone at MS who worked on this for not making this update sooner)

So Visual Studio 2005, aka Whidbey, Beta 2 has finally been released, so I guess this is as good a time as any to rant about my biggest pet peeve in the new release. For the record, on the whole I'm thrilled and excited and can't wait for my (3.5Gb!) download to finish to start working with it (and for Mono to catch up with the new features so I can use them in Free Software too). C# is hands down the nicest language I've ever worked with and the 2.0 release takes a good thing and makes it even better.

EXCEPT for one of the new features.

When I first heard that C# 2.0 would have nullable types built into the language, I was delighted. The lack of built-in support for nullable types was one of the very first downsides I ever encountered in the language, and one of my first tasks when porting nrdo from Java to C# was to attempt to rectify this lack, at least for the datatypes that nrdo supports. Fortunately C# is expressive enough that I was able to implement Nint, Nbool, NDateTime, etc classes which get reasonably close to providing the desired behavior. There have always been some limits to how close I could get to the ideal, though: there are some scenarios where you simply need some help at the language level. So with C# 2.0 we finally get that help, right?

You can guess the answer. No, we don't.

In fact, Microsoft's team of language designers (who as we've already established are pretty good, having produced such a kickass language in the first place) have managed to come up with something worse than what I was able to put together without any compiler changes using the language that they invented, over two years ago.

Let's review the behavior you want from a nullable types feature. It's pretty simple -- in fact, it's already there in the language. Reference types (classes, interfaces, delegates, etc) are already nullable. The whole concept of nullability and how it should behave is already established. The goal is, or should be, to capture that concept and allow it to apply to value types (structs, enums, and the builtin types like int and bool, which are actually structs under the hood) with as few changes as possible to programmer expectations. Bonus points if you can micro-optimize for close-to-the-metal performance, but this is a secondary concern.

(Some people may argue that point about performance. To me, it's self-evident: every use case I've ever seen given for nullable types has a pre-existing bottleneck in disk or network IO, so a few extra CPU cycles handling the nullability is going to make absolutely zero difference in practice. This is a moot point, however: I'll show later that it's perfectly possible to get the behavior right without any performance penalty.)

The goal of capturing the existing behavior of reference types and applying it to value types is pretty much sufficient to define exactly how an ideal nullable type feature should work, because for any given code construct using nullable types, you can just subsitute in an existing reference type and specify that the behavior should be the same. Here are some examples using ints and strings.

Reference type Nullable type (ideal) Nint C# 2.0 Comments
string x = "hello"; int? x = 1; Nint x = 1; int? x = 1; Pretty straightforward so far.
string n = null; int? n = null; Nint n = null; int? n = null; It's hardly nullable if you can't put null in it.
if (n == null)... if (n == null)... if (n == null)... if (n == null)... Again hard to get wrong.
x.ToString() x.ToString() x.ToString() x.ToString() 2.0 and Nint both cheat here by providing ToString() methods that call ToString() on the underlying int. A tie - it's cheating, but it works.
n.ToString() throws NullReferenceException n.ToString() throws NullReferenceException n.ToString() throws NullReferenceException n.ToString() returns an undefined value Since 2.0 doesn't really store null as null, it ends up invoking ToString() on an undefined value. No exception, just a meaningless result. In my book, invoking a method on null is the definition of what should be a NullReferenceException. Advantage Nint, but admittedly this isn't a terribly big deal.
object o1 = x; object o1 = x; object o1 = x; object o1 = x; This looks like it worked...
object o2 = n; object o2 = n; object o2 = n; object o2 = n; And so does this...
if (o1 == null) is false if (o1 == null) is false if (o1 == null) is false if (o1 == null) is false And so does this...
if (o2 == null) is true if (o2 == null) is true if (o2 == null) is true if (o2 == null) is FALSE? Nint got this right, but C# 2.0 inexplicably thinks that returning false is a good idea here. If you understand the implementation, there's a perfectly good explanation for why this happens, but a low level explanation isn't an excuse for the language flat-out lying to me. In order to make this work right we need to change the last few lines...
object o3 = x; object o3 = x; object o3 = x; object o3 = Nullable.ToObject(x); Holy verbosity batman! And not even a warning if we forget to do this, which we certainly will most of the time.
object o4 = n; object o4 = n; object o4 = n; object o4 = Nullable.ToObject(n); Okay, so once we've got them into objects, we can easily cast them back, right...?
string s = (string) o3; int? s = (int?) o3; Nint s = (Nint) o3; int? s = Nullable.FromObject<int>(o3); As if the ToObject line wasn't bad enough, now we have to remember to stick <int> in there at the right place for no obvious reason. Actually, the natural approach with casting would work as long as we didn't use ToObject and lived with a null that isn't actually null. But if you want sane behavior, you need this insane syntax.
string s1 = (string) o1;
string s2 = (string) o3;
int s1 = (int) o1;
int s2 = (int) o3;
int s1 = (int) (Nint) o1;
int s2 = (int) (Nint) o3;
int s1 = (int) (int?) o1;
int s2 = (int) o3;
2.0 actually gets this one right for o3, but only because we had to jump through hoops to create o3 in the first place. With Nint (and 2.0 if you forget to call ToObject) you have to perform this bizarre double-cast because the value in o3 is actually not an int. In theory, advantage 2.0; in practice it's a wash because you can't get to the correct behaviour without working around the wrong behavior first (and actually, Nint has ToObject and FromObject methods as well, but nobody ever calls them - with null behaving correctly, it turns out to be easier not to bother and just use the double-cast when necessary). Notice that by this point neither version is matching the ideal.
string y = t ? "hello" : null; int? y = t ? 1 : null; Nint y = t ? 1 : (Nint) null; int? y = t ? 1 : (int?) null; This one is truly a genuine tie. It's an extremely common construct when using nullable types and it's a huge pain to always have to remember the cast - I can tell you this from bitter experience. This is one of the most obvious places where the language could have helped us all out, but they didn't bother.
IComparable c = x; IComparable c = x; IComparable c = x; IComparable c = x.Value; Nint cheated here - because the underlying type is hardcoded I didn't need to do any magic to implement the same set of interfaces. Still, it gives the right result. 2.0 makes no attempt to even bother.
s.ToString(fmt); DateTime? dt;
dt.ToShortDateString();
NDateTime dt;
dt.ToShortDateString();
DateTime? dt;
dt.Value.ToShortDateString();
Again, my implementation cheated but got the right result; again, 2.0 doesn't bother.
Hopefully it's becoming clear by this point that although Nint, NDateTime and company have serious limitations compared to the ideal, the behavior of 2.0 is significantly worse. So what were the creators of 2.0 thinking? These are clearly smart people, how did they get it so badly wrong?

Well, I can only speculate, but my guess based on their public statements is that they made two fatal mistakes:
  1. Deciding on an implementation first and then fitting the behavior to that implementation, rather than designing the behavior first and then finding a way to implement it.
  2. Treating performance as if it were the critical factor and correctness and intuitiveness were entirely subservient to that overriding need.
It seems to me that the most likely route they took to the current state is by first making the decision that, "for performance reasons", the Nullable<T> type must be a value type. Once they'd made that decision they designed the entire behavior around what was easiest and most natural to do with value types, and never even thought to try to match the behavior of reference types.

The most bitter irony is that in fact it would have been perfectly possible to get the right behavior while still keeping the performance of a value type. If they'd provided a custom attribute which allowed a value type to override the standard "boxing" behavior, all the right behaviors would naturally fall into place. The problem wasn't with the fact that they decided to use a value type, but rather that they made that decision too soon and let it drive their design.

I, and others, have reported these issues to Microsoft in their Product Feedback Center. Here are some links: Notice in particular Microsoft's response to the last one as an exercise in missing the point. In response to the complaint that using "(int?)null" is hard to remember, confusing, and needlessly verbose when compared to just "null", what did they suggest as an alternative? "default(int?)"! At least my suggestion has null in it somewhere. And having the casts to and from object work right would be "confusing" because it would be different from the way other value types work. Earth to Microsoft: nobody understands how value types work. Nobody cares. The reason the language did such a great job in general is because you don't have to care, because it just works the way you'd expect. This doesn't -- not even close -- and that's what's confusing.

In retrospect I should have probably argued harder about these issues when I first filed or discovered them. My style of argument is usually to simply try to get people to realize the manifestly obvious rightness of what I'm saying ( ;) ), rather than attempt to use any credentials of my own, because I don't really have any that would impress the world's largest software company. So when I saw that response that missed the point so utterly, I gave up -- if they couldn't see the manifestly obvious wrongness of their own position, they'd never see the rightness of mine. But what never occurred to me is that in this area I actually do have some fairly unique credentials - I've implemented, worked with, and led a team of 4-5 people using, my own implementation of nullable types over several years. When I say "people will find this confusing", it's not just a guess: I've actually presented it to people and seen them find it confusing.

Unfortunately, it'll probably never get fixed now. Backward compatibility and all that. I'm trying to point Microsoft people to this blog entry in hopes that some of the worst misfeatures might still get fixed; one example is here, and Cyrus has responded by forwarding my issues to the language design team. There may yet be hope... (oh, and you can find the code to Nint and my other nullable type wrappers here).
What about nullable objects?
By Andrew Shuttlewood at 2005/04/21 05:18

I always thought that the idea of Nullable types was to introduce something akin to the option types in functional languages, where you can say "this value will never be null".

I don't really see the radical point of providing the opposite personally, but I might be missing something

database interoperability?
By yipyip at 2005/04/21 11:23

I suspect the primary motivation for nullable types is for data access. Databases allow columns of any type to be nullable, and most database types are value types in .NET.

Yep, databases are the biggie
By Stuart at 2005/04/21 14:14

Database access is the biggest motivator, but it's also often the case in general that you have some kind of "optional integer" value. It's extremely common to see code that has some "special" integer value like 0 or -1 used to indicate the absence of anything special. But what if 0 and -1 are both completely legal values for what you're trying to do and you still need something special? -9999? That's just silly. Why not provide a language-supported way to say what you really mean - there is no actual value here. That is to say, this value is null.

Nullable versus Non-Nullable
By Stuart at 2005/04/21 14:23

Andrew, yours is the second comment I've read in the past hour suggesting that Nullable Types were to enable you to say "this value will never be null". To be honest, I don't quite follow the logic. Surely those would be NON-nullable types?

(I do like the idea, but I see it as much less urgent than fixing nullable types at this point. If 2.0 happens with the current design, it will still be possible to add non-nullable types in the future, but it'll never be possible to make the behavior of *nullable* types sane.)

Nullable types luck design information from their designers
By TAG at 2005/04/21 22:45

I was playing devils advocate at one of Product Feedback suggestions defending current Microsoft implementation.

I was happy to see that all my claims about usability and usefullness of Nullable types as they are implemented now were trashed out by user comments.

I like your analysis and comparation chart. I do not see any real value in Microsoft implementation of Nullable types.
The only benefits I see is ?? operator - everything else was already possible using C#1.0.

But something I worry - I feel that Nullable types will have a very long life - at least 8 years after initial release. At this time we will be using some functional languages or technology nobody can even dream about :-(

I dunno why - but C# designers did some mistake with Nullable. Probably due to luck of resources. .NET2.0 is already delayed a lot.

Update?
By Rick Byers at 2006/07/26 13:38

Excellent post Stuart. Do you have an update anywhere indicating how you feel about the Nullable support we actually shipped with Whidbey? See http://blogs.msdn.com/somasegar/archive/2005/08/11/450640.aspx.

I saw several people on the CLR team racing to fix this near the end of Whidbey, and it would be great to see how our customers felt about what we ended up delivering :-) Also, this post might give other people the impression that nullable is still horribly broken (ideally the column labeled "C# 2.0" would say BETA instead of implying RTM).

Thanks!
Rick

Add a comment:
Subject:
Name:
Email:
Url:
Title: Don't enter anything here if you're a human.
CAPTCHA: Don't enter anything here if you're a human.
Comment: