Handling retries part 2 – using composition

Handling retries part 1 – using inheritance

Introduction

In our line of work there are usually many ways to accomplish a particular task (for better or worse), in these series of posts I want to try and demonstrate various different techniques that we can use and also what benefits we can gain from each.

So without further ado here is the scenario I want to be able to support:

I need a way of performing a particular action that can also handle an exception being raised by re-trying the action after a specified amount of time for a specified number of retries.

here is the pseudo-code to get an idea:

set retries = 5
    while retries > 0
        begin
            call task
            exit while
        exception
            decrement retries
            call exception
        end
        call sleep 3
    end while
call failure

The most basic way to accomplish this would be to simply have the C# replicate exactly what we have above and this would do the trick but means that if we had other tasks that needed to behave the same way we would end up duplicating the code for every instance ideally we want to re-use this behaviour.

Composition

In chapter 1 of the Design Patterns GoF book there is a section titled Inheritance versus Composition I highly recommend anyone with the book who has not read this to go and take a look as it really distills the problems with relying too heavily on inheritance and even includes there own principle:

Favor object composition over class inheritance

The principle holds up when you see how many of the design patterns use composition as opposed to inheritance, so lets give composition a go:

public class Retrier
{
    protected readonly IRunner _runner;

    public int RetryCount { protected get; set; }
    public TimeSpan Interval { protected get; set; }
    public event EventHandler OnException = {};
    public event EventHandler OnFailure = {};

    public Retrier(IRunner runner)
    {
        _runner = runner;
    }

    public void Execute()
    {
        var retries = RetryCount;
        while (retries > 0)
        {
            try
            {
                _runner.Run();
                break;
            }
            catch (Exception ex)
            {
                retries--;
                OnException(this, ex);
            }
            Thread.Sleep(Interval);
        }
        OnFailure(this, EventArgs.Empty);
    }
}

public interface IRunner
{
    void Run();
}

This would then be used as follows:

public class NetworkFileCopier : IRunner
{
    protected Retrier _retrier;

    public NetworkFileCopier()
    {
        _retrier = new Retrier(this);
        _retrier.Interval = TimeSpan.FromSeconds(30);
        _retrier.OnException += ex => Log.Error(ex);
    }

    public void DoCopy()
    {
        _retrier.Execute();
    }

    public void Run()
    {
        // do file copy here
    }
}

Now we have wrapped up the behaviour inside the Retrier object and we reference it inside NetworkFileCopier, unlike the inheritance version we no longer need to be in an inheritance hierarchy so NetworkFileCopier can inherit from some other base class if it needed to. It does need to implement an interface so that the Retrier object knows what to call when it gets executed however this could be changed so that you pass the Retrier a delegate to call instead.

We still have the issue though that NetworkFileCopier is still having to manage the retry object the next section will remove this in case this is an issue.

Decorator pattern

One way we could split this responsibility out of NetworkFileCopier is to use the Decorator Pattern:

public interface IFileCopier
{
    void DoCopy();
}

public class NetworkFileCopier : IFileCopier
{
    public void DoCopy()
    {
        // do file copy here
    }
}

public class RetryFileCopier : IFileCopier, IRunner
{
    protected readonly IFileCopier _decoratedFileCopier;
    protected Retrier _retrier;

    public RetryFileCopier(IFileCopier decoratedFileCopier)
    {
        _decoratedFileCopier = decoratedFileCopier;
        _retrier = new Retrier(this);
        _retrier.Interval = TimeSpan.FromSeconds(30);
        _retrier.OnException += ex => Log.Error(ex);
    }

    public void DoCopy()
    {
        _retrier.Run();
    }

    public void Run()
    {
        _decoratedFileCopier.DoCopy();
    }
}

This can then be used by client code like this:

var fileCopier = new RetryFileCopier(
                    new NetworkFileCopier());
fileCopier.DoCopy();

The first thing to note is how slimmed down NetworkFileCopier is now its only concern is copying files this means that if we needed to change the retry behaviour we do not need to make any changes to this object a good example of orthogonal code also the client gets to decide if we want the behaviour or not.

I feel that these versions are nicer than the inheritance version we looked at in part 1 however it still feels like we need to perform quite a few tasks (or ceremony) to get this to work:

  • Introduce new interface IRunner so that Retrier can communicate with the method to execute (this can be alleviated by a delegate)
  • Introduce a new interface IFileCopier for the decorator pattern to be utilised
  • Introduce new object RetryFileCopier to wrap up the retry behaviour

In the next part I’m going to be throwing OOP out of the window and looking at how functional programming in C# could potentially save us from some of this overhead.

Advertisements

Handling retries part 1 – using inheritance

Introduction

In our line of work there are usually many ways to accomplish a particular task (for better or worse), in these series of posts I want to try and demonstrate various different techniques that we can use and also what benefits we can gain from each.

So without further ado here is the scenario I want to be able to support:

I need a way of performing a particular action that can also handle an exception being raised by re-trying the action after a specified amount of time for a specified number of retries.

here is the pseudo-code to get an idea:

set retries = 5
    while retries > 0
        begin
            call task
            exit while
        exception
            decrement retries
            call exception
        end
        call sleep 3
    end while
call failure

The most basic way to accomplish this would be to simply have the C# replicate exactly what we have above and this would do the trick but means that if we had other tasks that needed to behave the same way we would end up duplicating the code for every instance ideally we want to re-use this behaviour.

Inheritance

In true OOP fashion many will reach for the tried and tested inheritance model to wrap up the behaviour above inside a base class à la Template Method Pattern:

public abstract class RetryBase
{
    public RetryBase()
    {
        Interval = TimeSpan.FromSeconds(10);
        RetryCount = 5;
    }

    protected TimeSpan Interval
    {
        get; set;
    }
    protected int RetryCount
    {
        get; set;
    }

    public void Execute()
    {
        var retries = retryCount;
        while (retries > 0)
        {
            try
            {
                ExecuteImpl();
                break;
            }
            catch (Exception ex)
            {
                retries--;
                Exception(ex);
            }
            Thread.Sleep(Interval);
        }
        Failure();
    }

    protected abstract void ExecuteImpl();

    protected virtual void Exception(Exception ex)
    {
    }

    protected virtual void Failure()
    {
    }
}

public class NetworkFileCopier : RetryBase
{
    public NetworkFileCopier()
    {
        // override to 30 secs
        Interval = TimeSpan.FromSeconds(30);
    }

    protected override void ExecuteImpl()
    {
        // do file copy here
    }

    // override to provide logging
    protected override void Exception(Exception ex)
    {
        Log.Error(ex);
    }
}

Client usage:

var networkFileCopier = new NetworkFileCopier();
networkFileCopier.Execute();

We now have reusable behaviour for our retry logic and we also have the ability to override the interval & retries and also get hooked into calls when an exception occurs or in case of failure. There are some issues with this approach though:

  • Firstly it requires quite a bit of work to be able to get this behaviour because we need to inherit from a specific class if we had many actions that needed this behaviour this could get tedious
  • Inheritance is static, once the class is compiled into the hierarchy it cannot change its behaviour dynamically (removing retry logic on demand) without extra code hooks this breaks OCP.
  • The NetworkFileCopier is now intrinsically tied to this inheritance hierachy if it already inherited from another base class we would then need that class to inherit from RetryBase or change RetryBase to inherit from the existing base class (yuk!)
  • Before NetworkFileCopier was happily getting on with it’s responsibility of copying a file over the network now it has to worry about retry logic (intervals, retry count, exception handling etc…) this breaks SRP.

The importance of idempotent services

Introduction

We all know the importance of performing a unit of work (UoW) in a transaction scope so that if anything goes wrong we can rollback the actions that have taken place and be sure that the state is exactly how it was before the UoW took place, this is were we can use a messaging framework (i.e. MSMQ) that can support transactional capabilities using a transaction coordinator (i.e. MSDTC) so that when we receive a message perform some work, but when saving to the DB there is an issue (i.e. Network problem, transaction deadlock victim) we know that the message received will be put back on the queue and any messages that have been attempted to be sent will be returned (to prevent duplicates):

public void DoWork()
{
    using (var scope = new TransactionScope())
    {
        var command = MsmqInboundWork.NextMessage();
        // *** processing here ***
        DbGateway.Save(workItem);
        MsmqOutboundResult.SendMessage(new ResultMessage { CompletedAt = DateTime.UtcNow; });

        scope.Complete();
    }
} 

If this happens we have options (i.e. retry for a limited amount of goes and then put the message into an error queue) however we don’t always have the option of using a reliable/durable/transactional transport layer such as messaging, for instance if we are integrating with a 3rd party over the internet they are going to provide usually a HTTP based transport layer such as SOAP/REST or like in my case recently integrating with an internal legacy system that only provides a HTTP transport layer (SOAP webservices was the only option in my case).

This post will demonstrate how we can work with services exposed on unreliable transport layer to perform actions but still not end up with duplicates by making the operations idempotent.

What is an idempotent operation?

This basically boils down to being able to call a function with the same arguments and have the same result occur, so if we have a method:

public int Add(int first, int second)
{
    return first + second;
} 

This method is idempotent because if I called it multiple times with Add(2,2) it will always return 4 and there are no side effects, compare this to the next method (demo only!):

public void DebitAccount(string accountNumber, decimal amount)
{
    var account = FindAccount(accountNumber);
    account.Balance -= amount;
    // account saved away
} 

If I call this multiple times like this DebitAccount("12345678", 100.00m) the client of that account is going to be more out of pocket each time a call is made.

How does this relate to my webservice method?

You may be thinking to yourself we have a system running that exposes a webservice with a method to perform a similar action to the example above or that you make a call to a 3rd party to perform an action (i.e. send an SMS) and so far we haven’t had any problems so what’s the big deal!? You’ve been lucky so far but remember the first fallacy of distributed computing

The network is reliable

Once you get network issues this is where everything goes wrong, this may not be a big deal you may decide to retry the call and the user ends up with another SMS sent to their mobile however if you’re performing a journal on an account this is a big problem.

If we look at a typical call to a webservice, in this case we are going to use the example of a façade service of a legacy banking system:

Account Service
fig 1 – If all goes to plan

In this case everything has gone ok and the client has received a response back from the account service so we know that the account has been debited correctly, what if we have the following:

Account Service
fig 2 – Network issue causes loss of response to client

Now we have a problem because the client has no way of knowing whether the debit was performed or not, in the example above the journal has taken place but the machine running the account service could have a problem and not get round to telling the banking system to perform a journal.

Fixing the problem

We established in the last section that the client has no way of knowing whether the debit took place if we do not receive a response from the account service so the only thing we can do would be assume that it did not take place and retry the service call given the scenario from fig 2 this would be very bad as the debit did take place so we would duplicate the debiting of the account.

So we have a few options of how we can deal with this issue:

  1. We don’t perform retries on the client side automatically instead we handle the situation manually and could perform a compensating action before attempting a retry (i.e. perform a credit journal)
  2. We check on the account service whether we have already had a debit for the chosen account and amount in a specific time frame and if so we ignore the call (this would not work in this example as we may have a genuine need to debit the account twice within a specific time frame)
  3. Have the client send something unique (i.e. GUID) for the current debit that you want to take place this can then be used by the account service to check if we have already performed the journal associated with this request.

The last 2 options are to make the account service idempotent my recommendation would be to use the last option and this is the option I will demonstrate for the rest of the post.

Ok so the first change we should make is to add a unique id to the request that gets sent, it is the responsibility of the client application to associate this unique id with the UoW being carried out (i.e. if the client application was allowing an overdrawn account be settled then the UoW would be a settlement and we can then associate this id to the settlement) so the client needs be changed to store this unique id.

The next change is for the account service to perform a check to make sure that we have not already performed a journal for the id passed to it, this is accomplished by storing each id against the result (in this case the journal reference) once the journal has taken place and querying it as the first task, if we find the id then we simply return the journal reference in the response back to the client.

If the id is not found we must not have performed a journal so we need to make the call to the banking system as we were before and then store away the id and the journal reference and return the response back to the client.

Now that this is in place we can have the client perform retries (within a certain threshold) if we receive no response back from the account service safe in the knowledge that we won’t get duplicate debits from the account, here is a sequence diagram to outline the new strategy:

New idempotent account service

Ah the joys of distributed architecture!

Command Query Separation (CQS)

Introduction

CQS was a term that was coined by Bertrand Meyer and presented itself in his book Object Oriented Software Construction (I haven’t read the book, yet!) the principle states that every method should either be a command that changes state or a query in which case it does not change any state but not both.

In all I agree with the principle I can think of some valid examples that are valid to break it but on the whole it’s makes sense to follow it, however what I want to show in this post is to take this principle and show how it can be applied at various levels rather than just at the method level.

Architectural Level

If we imagine a system which manages lots of amounts of data and the business would like to perform analysis on this data while still providing transactional throughput it would not be uncommon to split the data into separate Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP), this  is a good example of separating the data that will be changing often (OLTP) from the stagnant data that can be analysed (OLAP) I feel this matches strongly to the principle, the benefits gained typically result in better performance and data that reflects the needs of the operations applied to it.

Object Design Level

When we have a complex domain that require numerous behaviours we want to take advantage of domain modelling in our objects by utilising something like DDD to come up with entities, value objects etc… that we can use to represent the domain were working in.

We use these objects to perform transactional operations such as “Submitting an Order” or “Creating a New User” as these represent where we are going to using the behaviours and rules we put together using our objects. What we don’t want to start doing is then using these same objects for displaying result grids on a search screen, this represents a very different usage pattern and if we try to use our objects for this purpose and also for transactional processing we get into a mess quite quickly.

Instead it’s best to take onboard the CQS principle and separate our domain model objects that will be used for out transaction processing from what I think of as reporting or querying objects that are only interested in grabbing data for things such as search screens & reports. So if we were in a issue tracking domain we would probably end up with something like this:

 
// uses ORM such as nHibernate concerned only with using objects in a transactional 
// context does not do any reporting or querying 
public class IssueRepository : IIssueRepository 
{ 
	public Issue FindBy(int id) 
	{ 
		// use ORM to get by id 
	} 
	
	public Issue Save(Issue issue) 
	{ 
		// use ORM to save only used when new as object will be tracked 
		// for updates automatically 
	} 
		
	public void Delete(Issue issue) 
	{ 
		// use ORM to delete 
	} 
} 

// probably just uses ADO.NET raw or a light facade on top of ADO.NET 
// may use nHibernate but DOES NOT return any transactional objects only 
// projections 
public class IssueQuerying : IIssueQuerying 
{ 
	public DataTable FindMatchingCriteria(int priorityId, int pageNo, int pageCount, out int virtualCount) 
	{ 
		// use ADO.NET to fill datatable of results, possibly using SP to support paging 
		// for older DB 
	} 

	public IssueDisplayModel FindAssignedTo(string username) 
	{ 
		// use nHibernate to fill projection object IssueDisplayModel 
	} 
} 

The benefits we get from separating these 2 concerns:

  1. Our domain model objects are not polluted with lots of getters just to be used for reporting/querying
  2. We can utilise databinding because our querying object returns objects that ASP.NET controls know what to do with (one way databinding only)
  3. Our mapping is simplified as we only need to map our domain model objects to other objects for transactional processing needs not querying/reporting.

Summary

Hopefully this post has shown how we can take the original CQS principle and see how it applies to other levels when developing software from the high level architecture right down to the method level.

NHibernate & Null Object Pattern: The Options

Don’t bother using Null Object Pattern

The first option is to just not bother using Null Object Pattern, this is the easiest solution however has the side-effect that you end up with null checks everywhere which is the reason for moving to the Null Object Pattern in the first place.

Place a Null Object in the database

Next option is to have an object in the database that represents the null object so it will probably have an Id of zero and a value of << Unassigned >> or along those lines. This then has its own problems because if it’s a user editable object you don’t really want them to be able to change/delete this data so it becomes a special case that will need to be locked for editing/deleting.

Use Field Access and have the Property Handle the Mismatch

This is my preferred method whereby NHibernate is configured to use field access to the object, and the property handles the mismatch internally for example:

// inside class definition
protected User assignedTo;

public virtual User AssignedTo
{
    get
    {
        return assignedTo ?? User.NotAssigned;
    }
    set
    {
        if (value == User.NotAssigned)
            assignedTo = null;
        else
            assignedTo = value;
    }
}

This gives NHibernate a different view of the assigned to value than outside objects, for NHibernate which uses the internal field assignedTo it can be set to a null however for outside objects that have to use the AssignedTo property it will never be a null and instead will be set to the Null Object in this case User.NotAssigned.

Any other options please add a comment below 🙂

Staff Intranet Updated

I just updated the staff intranet project over on codeplex, I have now got it under source control now that they have added support for SVN 🙂 And have also uploaded a new release of the source in case you dont have a source control client compatible.

The changes are:

  • Incorporating some of my other projects (log4net-altconf, object2object mapper) to help with the maintainability of the app.
  • Refactorings I have made to the way a staff member is saved. After gaining a better understanding of DDD principles it was quite clear that the validation around the photo upload & checking for name duplication should be separated as Specification objects to reduce the coupling in the Staff object.

Remember any feedback you can post here on my blog or on codeplex

Null Object Pattern by example

I feel sorry for the Null Object pattern, while Abstract Factory Pattern, Strategy and the other patterns from the GoF book get all the attention the Null Object pattern doesn’t get a look in. This is a shame because this pattern can come in very useful, saving you many scattered ==null checks across your code, I’m going to show a real world use of Null Object pattern (and Adapter pattern as a by-product) using example logging objects:

Lets get to it, we have an ILog it looks like this:

public interface ILog
{
    void Debug(string message);
}

Ok granted I have made it a simple example at least it ain’t an ICar or IAnimal 😉 how do we get hold of our ILog with an ILogFactory of course:

public interface ILogFactory
{
    ILog GetLog(Type type);
}

We can make this easier to get hold of by having a static gateway so we aren’t having to pass around an ILogFactory to every object or having to new up a ILogFactory instance everytime we want to log:

public static class Log
{
    private static logFactory;

    public static void InitializeWith(ILogFactory logfactory)
    {
	this.logFactory = logFactory;
    }

    public static ILogFactory Current(Type type)
    {
        return logFactory.GetLog(type);
    }
}

You could also leverage a IoC container here instead.

Now we want some implementations I’m a fan of log4net so I would implement the following:

public class Log4NetLogFactory
{
    public ILog GetLog(Type type)
    {
	//retrieve an adapted log4net log something like...
	log4net.ILog rawLog = log4net.LogManager.GetLogger(type);
	return new Log4NetLog(rawLog);
    }
}

With Log4NetLog looking like this:

public class Log4NetLog : ILog
{
    private log4net.ILog log;

    public Log4NetLog(log4net.ILog logToAdapt)
    {
	this.log = logToAdapt;
    }

    public void Debug(string message)
    {
	if (log.IsDebugEnabled)
	    log.Debug(message);
    }
}

Now we want to be able to log in our client code like this:

Log.Current(this.GetType()).Debug("some debug message");

The problem of course is that if we try to do this without hooking up our static Log class to an ILogFactory we get the dreaded null reference exception, this is bad! Our logging code should not throw exceptions, enter the Null Object pattern:

public static class Log
{
    private static logFactory = new NullLogFactory();

    public static void InitializeWith(ILogFactory logfactory)
    {
	if (logFactory != null)
	    this.logFactory = logFactory;
    }

    public static ILogFactory Current(Type type)
    {
        return logFactory.GetLog(type);
    }
}

We instantiate a new NullLogFactory object and assign it to the static logFactory member, we also perform a check to make sure no nulls are sent to the InitializeWith method, here is the NullLogFactory:

public class Log : ILogFactory
{
    public ILog GetLog(Type type)
    {
        return new NullLog();
    }
}

All this does is new up a NullLog object:

public class NullLog : ILog
{
    public void Debug(string message)
    {
    }
}

The NullLog does nothing whatsoever apart from prevent us from receiving null reference exceptions, the neat thing about this though is that we could potentially capture attempts to call methods the NullLog and perform some sort of internal debugging:

public class NullLog : ILog
{
    public void Debug(string message)
    {
        if (Config.IsInternalDebugEnabled)
            Debug.Write("'{0}' sent to NullLog", message);
    }
}

That way users of the logging objects can be notified if logging has taken place but a log factory has not been hooked up.