When GOTO makes sense

By John Stracke

I hate GOTO, you hate GOTO, all Right-Thinking Modern Programmers hate GOTO. It’s a primitive, archaic construct, left over from machine languages, where primitive constructs make sense. When used, it enables bad coding practices, leads to bizarre bugs, and generally contributes to the decline of civilization.

But there is one case where GOTO is The Right Thing: finite state machines.

Think about it. The states of a finite state machine basically correspond to the program counter of a CPU. In this view, state transitions are GOTOs, right? So you can either build your code to store your FSM’s state in some sort of variable (in which case assignments to that variable are GOTOs in disguise), or you can be honest with yourself and write GOTO-based code.

Let me give you an example. This code is taken from my roamd server. It’s a simple check to see if the string matches the regexp "^li[0-9]*.tmp$". I didn’t want to rely on an existing regexp library because they aren’t as portable as I’d like, so I wrote a recognizer, a finite state machine which outputs true or false. The GOTO-based code is below:

int fileMatchesTmpPattern(const char* file)
  const char* cur=file;
  if (!cur)
    return 0;

  if ((*(cur++))!=’l’)
    return 0;

  if ((*(cur++))!=’i’)
    return 0;

  if (isdigit(*(cur++)))
    goto num;
  if ((*(cur++))!=’.’)
    return 0;

  if ((*(cur++))!=’t’)
    return 0;

  if ((*(cur++))!=’m’)
    return 0;

  if ((*(cur++))!=’p’)
    return 0;

  if (*cur)
    return 0;

  return 1;

See? Simple and direct. Every step through the code is a step through the string, and through the FSM; every GOTO is a jump to another state of the FSM.

Now, obviously, there are cases where this doesn’t work. For example, a generic regexp package could not apply this technique, because its state machine can’t be hard-coded. Instead, it builds its FSM as a data structure and then works through that data structure. (Of course, what it’s really doing is emulating a CPU whose instruction set is the syntax of that data structure.) But that sort of technique gets messy when you’re doing a special-purpose FSM that doesn’t need to vary at runtime.

I tried the non-hard-coded FSM approach once: I was working on a library that implemented Windows-like .INI files for Unix (but better, of course :-), and I needed to add operations to delete items or sections. The task was complicated by the fact that each section of the .INI file might be scattered throughout the file (can’t trust those users), and items might even appear more than once (only the first one would be read; but deleting only the first one would mean that the second one would be read instead). I blithered at it for a bit, then I stepped back and drew up a state machine--a fairly complex one, unfortunately, but that was the way it was. (Side note: why didn’t I just read line by line? Because the state machine allowed me to work in constant memory, even with unlimited-length lines.) I then asked myself, "OK, how can I implement this thing?" I thought for a while about building it the usual way, with a state machine interpreter; but that started getting hairy. It was then that I had the insight that every state transition is a GOTO. Once I’d accepted that, I was able to control my reflexive twitches at typing "goto", and banged out the code in nothing flat. It was clear, it worked, it was maintainable (I did hang on to the drawing of the FSM), and it was honest. :-)

Home SCA Geekery Books Thoughts Send mail Mastodon