I wanted to share a recent project experience with you that further strengthened my belief that a picture paints a thousand words. It helped to identify the root cause of a show-stopping problem where other efforts to do so had failed. ().
The project landscape:
- A client-enforced, aggressive and fast approaching implementation deadline
- A trade messaging project with complex and distinct business and routing rules silos
- Messages from multiple asset classes sourced from several upstream trade capture systems
- Performance issues requiring a hasty workflow redesign
- Insufficient time to document the complex business/routing rules and workflow
The symptoms of the problem:
- A persistent issue on a single asset class
- An inability to reproduce the problem in the debug testing rig
- A clash between the business and routing logic producing an “intermediate” status
- Upon replaying the message the problem corrected itself
Steps taken to date:
The project team attempted to reproduce the problem locally in their testing rigs. It proved impossible to do so. In the testing rig, the clash between business and routing rules did not occur. The trade messages were processed as expected.
A deep dive was taken into the business and routing rules. Where was the conflict? The business rules said “yes”. The routing rules said “no”.
It was not possible to reproduce the error. Without being able to reproduce it it was not going to be possible to find the root cause and resolve it.
The clock was ticking and this problem had been ongoing for four days already.
What Red Hound did:
When the Principal Consultant shared the problem with me my thoughts turned to the workflow model. I could use that to walk through the model and “be” a trade message. Unfortunately, there wasn’t a workflow model. It had never been produced – a step dropped in the pressure to deliver. Unfortunately, I knew this problem wasn’t going to be solved without it, so I took myself away from the maelstrom in order to draw it out.
After a couple of hours of documenting the flow, creating the model, walking trade messages through it and analysing the business and routing rule silos the issue started to reveal itself.
The redesign had introduced an anomaly into the model. It had resulted in adapters in the main flow that retained cross asset class entry business rules but single asset class exit routing rules.
There hadn’t been time to refactor the engines and rule sets.
As the symptoms stated, replay worked but the main workflow failed.
Was it possible that somehow the failing asset class was being processed by the wrong adapter? The entry business rules in the main flow would allow any asset class into the adapter whereas the exit routing rules could only successfully process a specific, single asset class. Was it possible for the business rules to say “yes” but the routing rules to say “no” in this scenario? It was if the wrong asset class was being processed in an adapter.
A check with the infrastructure team revealed that a sizeable minority of Rates trades were being mistakenly pushed down the FX queue into the FX adapter. The cross-asset entry business rules in the FX adapter processed the Rates trades successfully, resulting in a “yes” state. However, the FX exit routing rules didn’t recognise them and so dropped into their default mode resulting in a “no” state.
The queue routing was corrected and the problem went away, as my workflow model predicted.
- The starting point of all workflow projects must be a workflow model
- Creating your workflow model is not a waste of time. You do have time to do it and the return on your investment will be worth the effort
- Start workflow redesigns by updating your model. This will reduce your risk exposure and allow you to quickly identify potential flaws
- Infrastructure changes must be documented and communicated – your workflow model is a great basis for this
- Build smoke testing packs to give you confidence that infrastructure changes have been successful
- Bake logging into your adapter framework
- When problems arise in the system, use your toolkit- including your workflow model to help identify the root cause
- As far as RULES and WORKFLOW are concerned, the code NEVER documents itself
By building out the workflow model and walking it, I was able to resolve the problem that had been plaguing the project for four days. A couple of hours effort well spent.
If you’d like to find out more about our approach, the technology we use and the partners we work with, get in touch