The True Nature Of Beautiful Perfection
I’m always pleased when I can build a nice clean system for a customer. I like to be able to look back and say, “That is beautiful. I’m proud of that.” Antoine de Saint-Exupery said, “A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.” I like that quote and try to build systems with that in mind. I like to be able to simplify a customer’s business process so it makes more sense to them that it did before.
But there is a danger here that needs to be avoided. We are not dealing with art or literature. We are dealing with business systems, systems that receive input from untrusted sources. This data needs to be checked. Joel Splosky puts it best when describing things you should never do. He talks about old code that “has grown little hairs and stuff on it and nobody knows why.” He’s not describing something beautiful; he’s describing something that works. Something that’s gone through the pain of having exceptions found and dealt with.
We help customers automate their business. The software products we sell all have a type of workflow build it. Oracle IPM and Liquid Office have a true workflow component where you build your processes graphically. ILINX Capture and Kofax Capture have the idea of queues and the ordering of queues. Systems with combined software are generally designed to be used in stages such as scan, store and retrieve. In each stage the data is moved from one queue to another or one piece of software to another and the data needs to arrive correctly. These are the types of system interactions I’m focusing on. Unfortunately, these systems aren’t always configured perfectly and something will happen.
Your scanned document will be unreadable. Your form won’t be filled out completely. You’ll have a power outage. The database you rely on will have bad or missing data. You network connection will drop. You’ll need a strategy for handling these.
If you don’t handle exceptions well, your system will be consistently inconsistent. Approvals will not be done on time. Documents may be hard to find. Users will become frustrated with extra unneeded steps to complete work. Or worse, no one will notice until a year later and document XYZ is not in the system and the auditors that are in the room with you breathing down your neck lose their patience and your about to be sued. You need to be able to handle these exceptions.
What An Exception Is Not
Just to be clear about the types of things I want to focus on, I’d like to list a few things I’m not talking about first. When I say exception, I do not mean anything that can be considered a part of normal business processes. For instance, in an accounting workflow, it might be part of the process to send invoices over $1,000,000 to a manager for special approval. This is not an exception. While the amount may be exceptional, the approval is a defined business process. Conditional queues in your business process are also not exceptions. If it is normal for a form to be missing specific data and there is a special queue for dealing with that, it is known and a normal part of the business processes.
What An Exception Is
Exceptions are when we are dealing with the unknown. Exceptions are typically something that requires human intervention, special research or a new decision to be made that hasn’t been made before. Exceptions are also when automated parts of the system suddenly stop doing the things they are suppose to. (Actually, those are sometimes just programming bugs, but you need to handle those too.)
How To Handle Exceptions During Development & Testing
- Build maps and scripts that contain catch all routines. If you have a workflow script that needs to interface with an external system, make sure you can accurately report that the system is operational, that you can connect, that you can do the tasks you need to and that it will return successfully. All of these steps may need their own error handling code so you can accurately report the problem. You may also want to catch other exception outside the known ones so that you can log and report on them. Workflow maps contain exception processing within them. Most queues can return with an error, which can then be routed to a special queue. Be willing to add those exception queues. Allow users to be able to route their data to these exception queues with explanations of the issues. They may not be used much, but you’ll be thankful there were there when something happens.
- Have good metrics and logging. The software we sell and implement already has logging features. Those features need to be turned on. When designing systems that have custom components, make sure they include a logging feature. You will also need to make sure the data that is collected can be easily read. This means investing some time into reviewing available reports or creating new reports.
- Include automatic reporting. Consider an automated way to flag problems or issues that can be detected. Emailing a business administrator or system administrator is a good way to alert people of potential issues. Be careful with this, however, as you can have too much. You want your noise to signal ration to be low. The last thing someone wants to do is to read through hundreds of automated emails because perhaps one indicates an issue. (There is a reason most people turn off the Vista UAC. Allow? Deny?)
- Test with real data. I’ve seen engineers send the same 5 documents through the test system thousands of times and declare the system is able to handle thousands of documents. No, the system is able to handle those 5 documents. That’s slightly different. You need to test with thousands of different documents. That is where your exceptions will be found. You also need to test with real documents. When some fills out a form for testing, of course they are going to fill it out correctly. The programmer wants to make sure his code can read each and every field. But what happens when they are not all filled in? What happens when the data being used as a unique index is not there or not unique? You need to test to find out. Fake data will only test the common paths. Real data will test for exceptions.
- Test with a lot of real data. I feel the need to state this twice. Exceptions happen when users are using the system. Use the system as much as possible with real data before you go live.
How To Handle Exceptions During Production
- Expect to find exceptions. Make it part of the production process to spend time looking at the log files and dealing with issues. Every project our company manages includes what we call “rollout support.” We know that exceptions are going to happen and we plan for the time to deal with and fix them. Have someone whose job is to handle the exceptions from a business point of view, not just a IT administrator. Give them the time required on a weekly basis to make sure the system is running smoothly, and that the users are happy with the way the system is running.
- Be willing to change your process. After an issue occurs a few times, it may make sense to consider it something that should be handled as normal processing. Be willing to review the workflow map or order of queues. Those conditional queues I mentioned earlier, the ones I said weren’t exceptions? Those are really exceptions that everyone already knew about. Now that you’ve found and know about one more, add it to the normal business process.
- Focus on the common problems. It’s okay to think, “This is the first and probably last time I will have to deal with this.” As soon as the second time occurs, you need to plan to deal with more exceptions.
What To Take With You
By now you can see that your system is going to encounter exceptions too. Your not alone. We all have to deal with them. You can see the importance of thinking about them, looking for them and dealing with them responsibly. If you system is still being developed, you have the knowledge to be able to plan for them. If you system is implemented, you now have good reasons to review and change it for the better.
I hope you can look at the design of your system and say, “The workflow map, scripts, queues and configuration may not be beautiful, but the system handles everything beautifully.”
[photo by wikipedia user Patriarca12]