Saturday, May 26, 2007

Software Development Detox Part 1: State

In the first installment of the software development detox program, I am going to review misconceptions and misunderstandings of one of the most fundamental software concepts -- state.

The problem with understanding state in the context of information processing/software development arises when people fail to recognize and acknowledge that there are two distinct aspects of state in the arena of software development. By bundling up these two aspects, people end up projecting an incorrect picture and consequently paint themselves into a corner by choosing the unsuitable architecture.

I am now going to (temporarily) abandon metaphors (such as 'paint oneself into a corner' etc.) and switch to using simple, albeit somewhat exaggerated examples.

Software Types

In this example, I am going to review a common occurrence of a typical software construct, such as date. Date is an abstraction devised to encapsulate and express human concept of time. In the world of information processing, we use software constructs, such as types, to encapsulate and express abstractions such as calendar date.

Suppose someone offers a software abstraction (i.e. type) called CustomDate. This abstraction is supposedly capable of doing accurate date calculations, and is endowed with certain conveniences. One such convenience being the ability to express, in calendar terms, such human concepts as 'tomorrow', 'yesterday', 'next week', etc.

So we see that this type is capable of certain behavior (such as being able to answer the question 'what date is tomorrow?', etc.) But, in addition to discernible behavior, software types typically also possess state. For example, our CustomDate may possess a state of knowing what date is year-end.

This state may change (different corporations have different year-end dates). And the instance of the type is expected to remember the changed state.

What can you say and how it gets interpreted

Upon acquiring a new software type, such as CustomDate, we will be expected to learn about its capabilities. We are not expected to understand how it is working. We're not even expected to understand all of its capabilities. We are free to pick and choose.

For example, if the CustomDate possesses 50 different capabilities, and all we want from it is the ability to tell us what date is the year-end, we should be able to safely ignore the remaining 49 capabilities.

To violate this basic agreement would result in creating brittle, unreliable software. Here is one fictitious example that illustrates this problem:

If we instantiate CustomDate and assign that instance a handle such as customDate, we should then be able to talk to that instance. If we are only interested in learning about our company's year-end date, we can send a message to our customDate instance, as follows:

customDate.year-end

In response to receiving that message, the customDate instance will return the actual year-end date to us.

The above described scenario should always yield the same behavior. There shouldn't be any surprises in how an instance of customDate behaves upon receiving the year-end message. If there is even a slightest possibility that the established message may render different, unexpected behavior, our software is not only brittle, but extremely buggy.

By now you may be wondering how could there be a possibility that the above scenario ever yields any different behavior than expected? Let me explain with another example:

We've learned so far that, when dealing with an instance of customDate, we can say year-end and it will be interpreted as a question that reads: "could you please tell me what is the year-end date?" Consequently, the representation of the correct year-end will get rendered and served as a response to our query. We've thus realized that an instance of customDate has state. That state (i.e. the actual value of company's year-end date) is the only state we're interested in when dealing with this software construct.

However, as we've mentioned earlier, this software construct may have 49 other capabilities and states, of which we know nothing. Now, the fundamental principle of software engineering dictates that we are absolutely not required to know anything about any additional, extraneous states and behaviors that a software construct may bring to the table.

Regardless of that prime directive, people who are not well versed in designing software solutions tend to violate this dictum on a daily basis. The way to violate the prime directive would be to introduce certain state/behavior combo that will modify how the question gets interpreted. One can imagine how easy would it be to add a capability to CustomDate which will turn it into a currency conversion utility. This example is admittedly unrealistic and exaggerated, but I chose it to illustrate the foolhardiness of arbitrarily assigning various capabilities to a software construct.

In this example, an overzealous green developer may add a capability to CustomDate that will put it into a "currency conversion" mode. If someone else is using the same instance of CustomDate and puts it into this "currency conversion" mode, that change in its state may modify the behavior of an instance of CustomDate, rendering the response to the year-end question unintelligible.

Let's now run this hypothetical scenario:
  1. CustomDate gets instantiated as a resource on the server

  2. A message arrives from a client asking the resource to convert 100 USD to Canadian dollars

  3. An instance of CustomDate (i.e. customDate) puts itself into the "currency conversion" mode and renders the proper currency conversion

  4. The client then sends a message to customDate asking it for a year-end

  5. The instance renders an answer that corresponds to the value of 100 US dollars at the year-end
The above answer at step 5 comes as a complete shock to the client who asked for a year-end; the client wasn't aware that the instance can be shape-shifting and consequently may not always be returning dates when asked about the year-end.

In other words, what you can say and how it gets interpreted changes based on the state that an instance of the type may be in. A very bad situation, guaranteed to render that particular software program dysfunctional.

Statelessness

We can see from the above example how disastrous it can be to attempt to manage the state of a resource. In our case, we've been managing the state of an instance of
CustomDate, keeping track of when is it a date rendering machine, and when is it a currency conversion machine.

This tracking of the state resulted in the breakage of the working code. If we had abstained from keeping track of the state of the instance, the problems wouldn't have emerged in the first place.

From this we see that the only way to achieve robust and reliable software is to ensure that its constituent components are stateless. No memory of what had transpired during previous conversations should be retained.

However, keep in mind that we must distinguish here two types of states:
  • Entity state
  • Conversation state
It is this conversation state that is troublesome. Entity state is perfectly valid, and should be memorized. In this instance, entity state would be the fact that our company's year-end is October 31.

Keeping track of what transpired as clients have interrogated an instance of a software component, and then retaining that state, is always disastrous. And yet that is how most inexperienced software developers tend to architect and design their software.

Coming up

In the next installment, we'll look more closely into how to architect and design stateless software.

No comments:

Post a Comment