• Refactoring slightly gnarly code can be relaxing, like untangling knotts. Maybe there's some mathematical similarity between the two activities?

Over on a long gone FogCreek discussion board I wrote :

I've never read the official canon on refactoring, so this stuff may be obvious.

But when I clean up code I basically have two kinds of activities :

  • clustering (putting things that should be together, together)
  • generalizing (making things parameterizable)

And I don't try to do the two at the same time.

Typically I start by trying to pull apart stuff which is just stuck together confusingly. For example, if a source file or an object seems to be doing several different weakly related things, I just split it into several smaller files / classes.

Often, at this stage I create files or classes called "XSystem" of which there's only meant to be one instance, and which does the X-related stuff. Eventually these classes are going to go away again.

At that point, the XSystem object is still intertwined with other parts of the code. So then I start to try to put a more robust inteface around it. Functions which use values from global variables or other shared context, are changed to get those values from parameters. For each function changed, I try to track down all the calls to it, and change them, there and then. Sometimes the calling function doesn't have the necessary data. So at that point, I change the calling code to get it from the global or shared context.

So these are small changes, pushing references to globals back up the calling stack ... and testing the code still works each time. (Sometimes with unit tests, but that isn't always possible.)

As the code starts to be untangled, some more generalizations and possible parameterizations become obvious. For example, you can often see where code from one place has been copied and modified in another. Now it's decontextualized, you can also see better how to replace both uses with a more general function or class.

This isn't a single pass process of course. Once you've made the quick-wins for generalization, it's back to more, finer-grained clustering and re-organizing. The XSystem can be broken into YSystem and ZSystem, and the whole thing re-iterates.

I'm scared of premature generalization, so I'll sometimes see a potential to pull stuff out into a shared base-class or strategy or delegate, but leave it for a couple of iterations, until I feel I know the code and understand the domain better.

Usually, the time to do that generalization is very late. And trivially simple.


Just noticed that the original questioner was talking about a web-app with currently a lot of business-logic in JSP pages. So here's a description of my current cleaning up of someone else's intranet application in ASP.

1) I started by pulling all the big, hard to read, pages apart into lots of little sub-pages, squirting these blocks of mixed code / HTML back into the main page with server-side #includes, which in this situation are a great boon.

2) Then I turned many of those sub-pages into procedural functions and supplimented the #includes with trailing calls to the functions. And I embedded the HTML as response.write statements in the functions.

Step 3) was to parameterize those functions in order to remove references to the page's globals. And I also began pulling related functions that shared data together into classes.

Up until this point, I still didn't try to generalize.

But with the functions more cleanly encapsulated, I could start to discern certain typical components like tables and navigation bars and charts. I merged similar pieces together into more parameterized components; and polished them up a bit, including making the HTML embedded in the code more data-driven and flexible.

Now, at the present time, I still have all the code in VB-script. About 90% of the code is in separate #included files with more or less meaningful names, and 50% is in objects. In some applications it might be appropriate to move some of the classes into ActiveX, compiled server-side objects for speed, or access to other operating system services.

I still have HTML embedded in code. I think that isn't a problem here, because it's very unlikely that any non-programmers are going to work on it in the forseeable future. But if you are moving classes "deeper" into the system, then you may need to separate the code from HTML. My advice is simply to try to do as much of this separation with style-sheets before you decide whether you really need to use a templating system or TagLibs. They're an extra level of pain to administrate, and half the time you won't be able to make the separation work. If the project itself seriously demands specialist HTML designers, the chances are they won't understand or accept the restrictions of a tag system, and you'll end up working on the HTML through editing code anyway; so don't make life harder than it needs to be.

I'm also personally unconvinced that Model-View-Controller is always appropriate for web-based apps. And in this case, I'm more inspired by Paul Graham's notion of a continuous negotiation of border between program and library (Here, Here,) there's a negotiation between re-usable components and pages which are flexible and easy to adapt. As I think SQL queries are really "views" on the database, I think SQL usually lives in the pages, as part of the fast evolving UI.