Sunday, November 6, 2011

Gracefully handling orchestration errors

Errors happen, it's sad but true. Services can become nonresponsive.

This "article" is about how to handle this when using biztalk server to orchestrate an integration flow.
The image below show a very simple orchestration flow: A message trigger the flow, an external WCF service is called, the response from the service call is fetched and delivered to some endpoint... i.e a really simple integration flow delivering some data between disparate systems. So what could go wrong? In this simple case not so much, the most critical part of this flow is the external service call made from the orchestration. This service could be unavaileble, the network could be down.
What happens if we don't write any dedicated code to handle this?
Well the send port calling the external service will retry according to the configured retry values (the default in biztalk is 4 retries in total 5 minutes appart). If these retries all fail the send port will be suspended along with the orchestration. That's ok -> then the network or service is upp again we just resume the port... or?
Unfortunately it doesn't work the way we would hope. The send port will indeed send away the message to the service once resumed but the response message from the service won't correlate back to our orchestration and hence the integration flow will be broken. No response payload will be delivered to our final destination (endpoint).

The design below is a way to solve this issue.

1 - first we must catch the exception thrown back from the failed service call.
(could be a custom service exception, a soap exception or System.Exception (when you'll catch all errors) )

2 - Next we suspend the orchestration with a suspend shape and with a relevant error message to the support operator. (a suspend flag has to be set when you catch the error which will be used to decide if an error happend in an if shape below the service call)

3 - A loop shape is used to retry the service call if the operator decides to resume the orchestration. The logic to exit the loop is simply handle by some flag that is reset then the service call is resumed.

A new "service message request" will be send to the message box resulting in a new instance of a send port calling the service. If the service call is successfull the response will be handled by the orchestration (thus resuming the integration flow) and (in this case) finally delivered to the endpoint.

You will end up with a least one suspended send port in the administration console that has to be manually killed once the orchestration is resumed. That's the best way I've found to recover from failed service calls (that exceeds the retry intervall on the ports).

If there's even a better way let me know!

Below you can see the orchestration design in this particular simple integration flow and how it looks in the tracking tool when a soap error is catched from a failed service call.

Orchestration design
Tracking tool -> catch SOAP error

No comments:

Post a Comment