Leveraging value objects & domain services

Introduction

Over the years I have come to see and also been guilty of not modelling concepts that should be treated as such in the domain your working in and instead what this leads to is 2 primary code smells:

The first manifests itself as values that get passed around together but are always taken by functions as standalone parameters there is usually a breakage of DRY here with knowledge of what the parameters represent scattered around the codebase. The second relates to a refusal to try and apply object thinking and instead procedures are created with no real intent as to where the behaviour should reside and because were in an OOP language they have to live somewhere right? So we end up with badly named classes that just become containers of procedures (and if you’re really unlucky they deal with a whole host of unrelated responsibilities!)

Resolving

So how can we resolve these issues? Well as it turns out we can usually stamp out both by introducing a Value Object that properly represents the concept in our domain by making this first step we can usually start moving behaviour related to this new concept from the *Util class into the actual object here it becomes a much more rich object with it’s own behaviour instead of just being a dumb data holder.

There are however times when certain behaviour cannot just be moved inside the new concept object and for these cases you will probably want to introduce a Domain Service object the difference here between the Util class vs. the Domain Service is that it is specific for a certain operation that you want to perform and as such can be properly named around the current domain your working in.

Example

The example I’m going to show has been a really common scenario I have found while working on various financial systems for a number of years.

In a financial domain you will have a concept of Money it will consist of:

  • Amount
  • Currency

Seems fairly straightforward and it is, however lets look at how I typically see this concept dealt with in code.

public static string FormatMoney(IDictionary<string, int> currencyDecimalPlacesLookup, decimal amount, string currency, bool includeCurrency)
{
    // lookup for decimal places
    // string.Format using amount with found decimal places and currency flag
}

public static string FormatCurrencyDescription(Dictionary<string, string> currencyDescLookup, string currency)
{
    // lookup by currency code
}

public static decimal ConvertGBP(Dictionary<string, decimal> xrates, string currency, decimal amount)
{
   // lookup rate by currency code
   // apply conversion
}

Here is an example of their usage:

string balance = MoneyUtil.FormatMoney(_decimalPlacesLookup, account.PostedBalance, account.CurrencyCode);

string currDesc = MoneyUtil.FormatCurrencyDescription(_currDescLookup, account.CurrencyCode);

decimal inGBP = MoneyUtil.ConvertGBP(_xrateLookup, account.CurrencyCode, account.PostedBalance);

So to start with these probably were just inlined inside one of the calling functions but then someone saw they needed to do the same thing somewhere else and created the MoneyUtil class hooray for reuse! As you could imagine these will then continue to grow and become dumping grounds for any behaviour related to Money, you can already see that unrelated responsibilities are being introduced formatting & currency conversion have no reason to live together, you can also see that the caller is having to manage lookup dictionaries which is another sign of primitive obsession and that modelling concepts are being missed.

As described in the introduction we can see that certain values are being passed around together in this case the amount & currency however we have not captured the intent of these values by introducing a Money object, instead we will introduce the concepts of the domain and see how that changes things:

public class Currency : IEquatable<Currency>
{
	public Currency(string code, int decimalPlaces, string description)
	{
		Code = code;
		DecimalPlaces = decimalPlaces;
		Description = description;
	}

	public string Code { get; private set; }
	public int DecimalPlaces { get; private set; }
	public string Description { get; private set; }
	
	public override string ToString()
	{
		return Code;
	}

// ... emitted rest of implementation
}

First we have the concept of a Currency this may seem like a pointless object however even just doing this step alone would change our signatures for the functions above so that no longer do they just accept a magic string for currency code but instead a properly defined Currency object that already has its information encapsulated in once place.

There a few items to note when creating Value Objects:

  • They are immutable
  • Identity is based off their properties (I have not shown this here but you can see in the full source)
public class Money : IEquatable<Money>
{	
	public Money(decimal amount, Currency currency)
	{
		Amount = amount;
		Currency = currency;
	}

	public decimal Amount { get; private set; }
	public Currency Currency { get; private set; }

// ... emitted rest of implementation

	public override string ToString()
	{
		return string.Format("{0} {1}", Currency, FormatAmount());
	}

	private string FormatAmount()
	{
		decimal val = Amount * (decimal)Math.Pow(10, Currency.DecimalPlaces);
		val = Math.Truncate(val);
		val = val / (decimal)Math.Pow(10, Currency.DecimalPlaces);
		return string.Format("{0:N" + Math.Abs(Currency.DecimalPlaces) + "}", val);
	}
}

We can see that the Money object has to be supplied a Currency at creation and that it now has the smarts to know how to format correctly using the details of its Currency, if you wanted more control over the formatting you could provide an overload to take in a formatter object.

With the changes made above we have nearly made the MoneyUtil class redundant however there is the operation of converting existing Money to Money in another Currency (in this case GBP) this operation is prime example were a Domain Service is a good fit and captures the concept of converting Money from one Currency to another Currency.

First we can define an interface to represent the operation.

public interface ICurrencyConverter
{
    Money ConvertTo(Money from);
}

This captures the operation using the language of the domain and were already getting the benefit of our Money and Currency objects by eliminating primitive obsession were no longer passing strings and decimals but instead fully realised domain concepts.

Next we can implement a CurrencyConverter to convert to a specific Currency.

public class ToCurrencyConverter : ICurrencyConverter
{
	private readonly Currency _toCurrency;
	private readonly IDictionary<Currency, decimal> _rates;

	public ToCurrencyConverter(Currency toCurrency, IDictionary<Currency, decimal> rates)
	{
		_toCurrency = toCurrency;
		_rates = rates;
	}

	public Money ConvertTo(Money from)
	{
		if (!_rates.ContainsKey(from.Currency))
		{
			throw new InvalidOperationException(string.Format("Could not find rate from currency: {0} to: {1}", from.Currency, _toCurrency));
		}

		decimal rate = _rates[from.Currency];
		return new Money(from.Amount * rate, _toCurrency);
	}
}

On creation we provide the converter the Currency we want to convert to and also the rates that should be used, these would typically be populated daily and then cached for faster access although these are concerns way outside of the domain model and it does not concern itself with how these are provided.

With this final piece of the puzzle complete we can now take joy in removing our MoneyUtil class and instead having a richer domain model.

I have published code to go along with this post that can be found here https://github.com/stavinski/joasd-domainconcepts

The importance of performing spike solutions

Firstly we should discuss what is a spike solution? Well according to The Art of Agile Development[Shore & Warden 2007]:

A spike solution, or spike, is a technical investigation. It’s a small experiment to research the answer to a problem.

The Extreme Programming site defines them as:

A spike solution is a very simple program to explore potential solutions. Build the spike to only addresses the problem under examination and ignore all other concerns.

Spikes

These two definitions line up perfectly and also go on to present one major point that must be made before continuing:

Spike/Spike solutions should never be committed into the main codebase

They should be treated as throw away code, once you have your answer it has served it’s purpose (if you must have the code under source control make sure it’s completely separate to the main trunk!).

I also want to clarify how I see the difference between spike solutions and prototyping which could appear very similar, prototyping usually encompasses a much larger goal such as putting some quick static front-end screens together to gauge the UX whereas spike solutions in contrast help to answer a specific technical question such as will EF be able to map to our legacy Users table correctly. This means that spike solutions should require a lot less time to complete and we should probably be time-boxing how long we spend on a particular spike to ensure this.

Now that we have defined what a spike solution is I want to go through a subset real world example (with name changes) that demonstrates there effectiveness when certain situations arise.

XMHELL

Back Story

Fubar DIY Stores Ltd has an e-commerce site that lists all the products available that can be bought, when a product is displayed reviews are also shown and new ones submitted, this was managed in house previously but now they have decided to use a popular global third party service Haveyoursay and they have assured them that they can provide a like for like data match to what we currently have stored.

So in order to bring in the reviews from Haveyoursay twice a day the reviews will be exported as an XML file, we have been tasked with using this exported XML file to import the reviews into the Fubar DIY Stores database so that they can then be retrieved along with the product data.

Making a start

For this example we will only be concentrating on the import of the XML file and ignoring the subsequent steps.

So we have completed a rough architecture of all the moving parts we roughly know which objects we need an how they need to collaborate with each other, essentially there will be a coordinating object that knows what steps need to be performed in which order, this will be the ReviewImportCoordinator and has a method that will start the import PerformImport taking in the XML file path.

We know also that we need an object that ReviewImportCoordinator will collaborate with to read the XML file and bring back a trusty XmlDocument object that we can then use to parse the data we need and then save to the DB.

So we start writing our unit tests first for ReviewImportCoordinator and stub our IXmlReviewFileReader this has a method that takes the import file path and returns us our XmlDocument object and we continue with our unit testing of the import process.

Setting ourselves up for a fall

It seems were doing everything right, we have broken the responsibilities up into separate objects and are using TDD/BDD against our import process. However we have jumped the gun here and are making some big assumptions about how we go about reading the XML file which will have an impact on how our ReviewImportCoordinator does it’s work.

Just enough design

This is were people new to agile get it wrong and start jumping into the code rather than doing some design up front, agile does not tell you to do any design is tells us not to big design up front were we try and guess everything about the system before any code is written.

Our first task should be to get a copy of the XML file this will get rid of our assumptions about how to handle the XML import, so after chatting to the stakeholder we get a copy of the XML file and good job we did as we hit a potential hurdle, the file is around 450MB this new discovery should start us asking questions:

  • Is this a one off or are they all going to be around this size?
  • How much memory will be used if we load this into an XML DOM?
  • Will the machines that are running the import be impacted by the extra memory usage?

After asking the stakeholder for some other import files they are also around 450MB so this seems to be the expected size for each import, so now we can move onto our crucial question How much memory will be used if we load this into an XML DOM? until we get the answer we have no way of knowing whether the will be an impact on the memory usage.

Time for a spike solution

This is the ideal time for us to write a spike to discover the answer, so we knock together a very quick console app with some hacked together code in the Main method, that simply loads an XmlDocument using one of our supplied import files as the input and a Console.ReadLine() so that it waits for input to allow us to open up task manager and discover how much memory the process is using (we just need a ballpark figure otherwise we could use some profiling tools to get more insight).

static void Main(string[] args) 
{
    XmlDocument.Load(@"c:\haveyoursay\imports\import.xml");
    Console.ReadLine();
}

Getting Feedback

So after we run our spike solution we find that the process is using around 1GB of memory to load the import XML into a DOM, we now have a confident number that we can go back to our stakeholder with in order to find out what impact this will have on the machine running the import.

After discussing with the stakeholder it turns out this machine is already being used to perform other jobs and will suffer badly from having that much memory being taken away from these jobs, so we have to look at streaming the XML file rather than loading it into a DOM so we need to use XmlReader rather than XmlDocument so we can now start to unit test using this knowledge and heading down the right path from the start.

Summary

I hope this demonstrates how we can use spike solutions with a little design up front to help steer us in the right direction, this example was done at the time of implementation you can also use spike solutions as part of estimating when you have to use a unfamiliar technology, library, protocol etc… It can give you a quick way of gauging how difficult it is perform certain tasks to give you a bit more confidence in your estimates.

So next time your stuck with a technical question a spike solution could be just what your after!

Handling retries part 4 – using both OOP & functional programming

Handling retries part 1 – using inheritance

Handling retries part 2 – using composition

Handling retries part 3 – using functional

Introduction

In our line of work there are usually many ways to accomplish a particular task (for better or worse), in these series of posts I want to try and demonstrate various different techniques that we can use and also what benefits we can gain from each.

So without further ado here is the scenario I want to be able to support:

I need a way of performing a particular action that can also handle an exception being raised by re-trying the action after a specified amount of time for a specified number of retries.

here is the pseudo-code to get an idea:

set retries = 5
    while retries > 0
        begin
            call task
            exit while
        exception
            decrement retries
            call exception
        end
        call sleep 3
    end while
call failure

The most basic way to accomplish this would be to simply have the C# replicate exactly what we have above and this would do the trick but means that if we had other tasks that needed to behave the same way we would end up duplicating the code for every instance ideally we want to re-use this behaviour.

Having the best of both worlds

In the last part we had a couple of issues with using just a functional programming (FP) approach (certainly in C#, full FP languages have dealt with these issues)

Lets deal with the client duplication issue first, ideally what we would like is that we can have the FP retry behaviour but to have it wrapped up in an object that can be re-used:

public class RetryInstance
{    
    public TimeSpan Interval { get; set; }    
    public int Retries { get; set; }    
    public Action&lt;Exception&gt; OnException { get; set;}    
    public Action OnFailure = { get; set; }    
    public void Execute(Action action)    
    {        
        var retryCount = this.Retries;        
        while (retryCount &gt; 0)        
        {            
            try            
            {                
                this.action();                
                break;            
            }            
            catch (Exception ex)            
            {                
                retryCount--;                
                this.OnException(ex);            
            }            
            Thread.Sleep(this.Interval);        
            }        
        this.OnFailure();    
    }
}

This would be used like this:

var retryInstance = new retryInstance()
{    
    Interval = TimeSpan.FromSeconds(30),    
    Retries = 5,    
    OnException = ex =&gt; Log.Error(ex),    
    OnFailure = () =&gt; Log.Fatal(&quot;fail!&quot;)
};
    
retryInstance.Execute(networkFilCopier.DoCopy());
// later in the code
retryInstance.Interval = TimeSpan.FromMinutes(1);
retryInstance.Execute(networkFilCopier.DoCopy());

Here we have moved the FP code into an RetryInstance object we can the use this object to maintain the state of how we want the retries, intervals, callbacks to behave and in the example we can adjust this when we need to without having to duplicate anything. The readability of the code has improved quite a lot as well, but I think we can go one step better by introducing a Builder object to help us build a RetryInstance and also to provide it with suitable defaults:

public class RetryConfiguration
{    
    private TimeSpan _interval;    
    private int _retries;    
    private Action&lt;Exception&gt; _onException;    
    private Action _onFailure;    
    
    public void SetInterval(TimeSpan duration)    
    {        
        _interval = duration;    
    }    
    
    public void SetRetries(int count)    
    {        
        _retries = count;    
    }    
    
    public void WhenExceptionRaised(Action&lt;Exception&gt; handler)    
    {        
        _onException = handler;    
    }    
    
    public void WhenFailed(Action handler)    
    {        
        _onFailure = handler;    
    }    
    
    public RetryInstance Build()    
    {        
        return new RetryInstance()        
        {            
            Retries = _retries,            
            Interval = _interval,            
            OnException = _onException,            
            OnFailure = _onFailure        
        };    
    }
}
    
public static class RetryFactory
{    
    public static RetryInstance New(Action&lt;RetryConfiguration&gt; configuration)    
    {        
        config = new RetryConfiguration();        
        config.SetInterval(TimeSpan.FromSeconds(30));        
        config.SetRetries(5);        
        config.WhenExceptionRaised(_ =&gt;; {}); // no-op        
        config.WhenFailed(() =&gt;; {}); // no-op        
        configuration(config);        
        return config.Build();    
    }
}

This would then be used by client code like this:

RetryFactory.New(cfg =&gt;                
    {                    
        cfg.SetInterval(TimeSpan.FromMinutes(1));                    
        cfg.SetRetries(3);                    
        cfg.WhenExceptionRaised(ex =&gt; Log.Error(ex));                    
        cfg.WhenFailed(() =&gt; Log.Fatal(&quot;fail!&quot;));                
    }).Execute(networkFileCopier.DoCopy());

We have introduced a couple more objects one is a configuration object that exposes a really nice API to setup our RetryInstance object and the other is our builder object/fluent DSL that exposes a static method to create a new RetryInstance and also provides our suitable defaults that we can choose to override.

Summary

We have certainly covered a lot of ground in these series of posts and have ended up with quite a lot of options for how we could solve the issue in the introduction and there are no doubt way more other ways that I couldn’t think of! Some of these may be overkill especially for the simplistic case of retry behaviour we have been looking at as with most design & architecture its always a balance and this goes hand in hand with how the behaviour is going to be used. Here is a breakdown of how I would personally go about deciding how to implement the retry behaviour:

  • If I have one object that has one single use for using retry logic I would probably stick to a simple method call like the pseudo-code in the introduction
  • Once I have another use in a separate object I may decide to use inheritance (if no inheritance hierarchy is present) or move to using composition (decorator pattern if I don’t want the object to manage the retries)
  • If I start to have lots of objects wanting retry behaviour and especially across different projects that’s when I would probably move to using OOP & FP together to provide the callers with a nice API to use and to also reduce the amount of code they need to write in order to use it

I hope that this series has helped to put forward the case that we should be open to various different ways of designing software to solve problems and just because in the past you have always done it one way it doesn’t mean that you should then apply that across the board as each problem tends to unique.

Handling retries part 3 – using functional

Handling retries part 1 – using inheritance

Handling retries part 2 – using composition

Introduction

In our line of work there are usually many ways to accomplish a particular task (for better or worse), in these series of posts I want to try and demonstrate various different techniques that we can use and also what benefits we can gain from each.

So without further ado here is the scenario I want to be able to support:

I need a way of performing a particular action that can also handle an exception being raised by re-trying the action after a specified amount of time for a specified number of retries.

here is the pseudo-code to get an idea:

set retries = 5
    while retries > 0
        begin
            call task
            exit while
        exception
            decrement retries
            call exception
        end
        call sleep 3
    end while
call failure

The most basic way to accomplish this would be to simply have the C# replicate exactly what we have above and this would do the trick but means that if we had other tasks that needed to behave the same way we would end up duplicating the code for every instance ideally we want to re-use this behaviour.

Functional

Sometimes it helps to take a look at different languages and programming styles when facing a problem to see how you would solve the problem and whether you can take any of the techniques and utilise them, this is especially true of functional programming (FP) now that C# has a lot more support of FP constructs (lambdas, generics, tuples etc…).

If we take a look at a javascript example of how we can achieve the retry behavour:

var Retrier = {
    execute: function (action, exception, failure, interval, retries) {
        try
        {
            action();
            return;
        }
        catch (ex)
        {
            retries--;
            exception(ex);
            if (retries > 0) {
              var vals = {
                retries: retries,
                interval: interval,
                action: action,
                exception: exception,
                failure: failure
              };
              window.setTimeout(function() {
                  Retrier.execute(vals.action, vals.exception, vals.failure, vals.interval, vals.retries);
                }, vals.interval);
            } else {
                failure();
            }
        }
    }
};

This would then be used like this:

Retrier.execute(function () { // action
                  networkFileCopier.DoCopy();
                },
                function (ex) { // exception
                  console.log('exception raised! ' + ex);
                },
                function () { // failure
                  console.log('fail!');
                }, 1000, 5);

I’ll be the first to admit that my javascript is not the best as I don’t tend to use it (I have omitted the anonymous function to close the scope), there are a number of major differences we have had to take into account:

  • Javascript does not have a concept of a class and instead just uses objects , therefore we have a simple object to hang the execute method off you can think of it as a static method in C#
  • All of the state is maintained inside the call I could have had the Retrier object have properties and this would work better if we wanted to have a standard number of retries, interval and way of handling errors, instead I have stuck to more of a FP style
  • You generally don’t want to do any sort of blocking in javascript as this would either block the UI thread in the browser or block the processing thread in NodeJS therefore instead we have to use the setTimeout function to tell javascript to call a specific function sometime in the future based on the interval
  • Due to the fact that we have to use setTimeout instead of sleeping the thread for the interval we use a recursive call with the retries value decremented each time, before we can do so we have to setup a closure vals otherwise the variables would be lost as javascript uses function scoping

Whenever using recursion we need to be careful not to end up overflowing the stack but in this case unless your going to retry a task several thousand times this should not be an issue.

So let’s take the above and create a C# equivelant:

public static class Retrier
{
    public static void Execute(Action action, Action exception, Action failure, TimeSpan interval, int retries)
    {
        var retryCount = retries;
        while (retryCount > 0)
        {
            try
            {
                action();
                break;
            }
            catch (Exception ex)
            {
                retryCount--;
                exception(ex);
            }
            Thread.Sleep(interval);
        }
        failure();
    }
}

This would then be used like this:

Retrier.Execute(() => networkFileCopier.DoCopy(),
                ex => Log.Error(ex),
                () => Log.Fatal("fail!"),
                TimeSpan.FromSeconds(30),
                5);

Well we have completely eliminated OOP from the retry behaviour here and instead are left with a single class to hold our Execute method, from the client side they are no longer required to create new objects to hook into the retry behaviour however there are a couple of issues:

  • There is going to be quite a bit of duplication from the client code as each time they need to setup all the callback methods and also assign the interval and retry amount
  • The API for the caller is very obtuse, once you start to have lamdas being passed in to method calls it can start to get difficult to understand (named arguments can help but is generally an indication your API could do with being changed)

In the next part I want to leverage OOP and FP together to see if we can fix the issues above.

Handling retries part 2 – using composition

Handling retries part 1 – using inheritance

Introduction

In our line of work there are usually many ways to accomplish a particular task (for better or worse), in these series of posts I want to try and demonstrate various different techniques that we can use and also what benefits we can gain from each.

So without further ado here is the scenario I want to be able to support:

I need a way of performing a particular action that can also handle an exception being raised by re-trying the action after a specified amount of time for a specified number of retries.

here is the pseudo-code to get an idea:

set retries = 5
    while retries > 0
        begin
            call task
            exit while
        exception
            decrement retries
            call exception
        end
        call sleep 3
    end while
call failure

The most basic way to accomplish this would be to simply have the C# replicate exactly what we have above and this would do the trick but means that if we had other tasks that needed to behave the same way we would end up duplicating the code for every instance ideally we want to re-use this behaviour.

Composition

In chapter 1 of the Design Patterns GoF book there is a section titled Inheritance versus Composition I highly recommend anyone with the book who has not read this to go and take a look as it really distills the problems with relying too heavily on inheritance and even includes there own principle:

Favor object composition over class inheritance

The principle holds up when you see how many of the design patterns use composition as opposed to inheritance, so lets give composition a go:

public class Retrier
{
    protected readonly IRunner _runner;

    public int RetryCount { protected get; set; }
    public TimeSpan Interval { protected get; set; }
    public event EventHandler OnException = {};
    public event EventHandler OnFailure = {};

    public Retrier(IRunner runner)
    {
        _runner = runner;
    }

    public void Execute()
    {
        var retries = RetryCount;
        while (retries > 0)
        {
            try
            {
                _runner.Run();
                break;
            }
            catch (Exception ex)
            {
                retries--;
                OnException(this, ex);
            }
            Thread.Sleep(Interval);
        }
        OnFailure(this, EventArgs.Empty);
    }
}

public interface IRunner
{
    void Run();
}

This would then be used as follows:

public class NetworkFileCopier : IRunner
{
    protected Retrier _retrier;

    public NetworkFileCopier()
    {
        _retrier = new Retrier(this);
        _retrier.Interval = TimeSpan.FromSeconds(30);
        _retrier.OnException += ex => Log.Error(ex);
    }

    public void DoCopy()
    {
        _retrier.Execute();
    }

    public void Run()
    {
        // do file copy here
    }
}

Now we have wrapped up the behaviour inside the Retrier object and we reference it inside NetworkFileCopier, unlike the inheritance version we no longer need to be in an inheritance hierarchy so NetworkFileCopier can inherit from some other base class if it needed to. It does need to implement an interface so that the Retrier object knows what to call when it gets executed however this could be changed so that you pass the Retrier a delegate to call instead.

We still have the issue though that NetworkFileCopier is still having to manage the retry object the next section will remove this in case this is an issue.

Decorator pattern

One way we could split this responsibility out of NetworkFileCopier is to use the Decorator Pattern:

public interface IFileCopier
{
    void DoCopy();
}

public class NetworkFileCopier : IFileCopier
{
    public void DoCopy()
    {
        // do file copy here
    }
}

public class RetryFileCopier : IFileCopier, IRunner
{
    protected readonly IFileCopier _decoratedFileCopier;
    protected Retrier _retrier;

    public RetryFileCopier(IFileCopier decoratedFileCopier)
    {
        _decoratedFileCopier = decoratedFileCopier;
        _retrier = new Retrier(this);
        _retrier.Interval = TimeSpan.FromSeconds(30);
        _retrier.OnException += ex => Log.Error(ex);
    }

    public void DoCopy()
    {
        _retrier.Run();
    }

    public void Run()
    {
        _decoratedFileCopier.DoCopy();
    }
}

This can then be used by client code like this:

var fileCopier = new RetryFileCopier(
                    new NetworkFileCopier());
fileCopier.DoCopy();

The first thing to note is how slimmed down NetworkFileCopier is now its only concern is copying files this means that if we needed to change the retry behaviour we do not need to make any changes to this object a good example of orthogonal code also the client gets to decide if we want the behaviour or not.

I feel that these versions are nicer than the inheritance version we looked at in part 1 however it still feels like we need to perform quite a few tasks (or ceremony) to get this to work:

  • Introduce new interface IRunner so that Retrier can communicate with the method to execute (this can be alleviated by a delegate)
  • Introduce a new interface IFileCopier for the decorator pattern to be utilised
  • Introduce new object RetryFileCopier to wrap up the retry behaviour

In the next part I’m going to be throwing OOP out of the window and looking at how functional programming in C# could potentially save us from some of this overhead.

Handling retries part 1 – using inheritance

Introduction

In our line of work there are usually many ways to accomplish a particular task (for better or worse), in these series of posts I want to try and demonstrate various different techniques that we can use and also what benefits we can gain from each.

So without further ado here is the scenario I want to be able to support:

I need a way of performing a particular action that can also handle an exception being raised by re-trying the action after a specified amount of time for a specified number of retries.

here is the pseudo-code to get an idea:

set retries = 5
    while retries > 0
        begin
            call task
            exit while
        exception
            decrement retries
            call exception
        end
        call sleep 3
    end while
call failure

The most basic way to accomplish this would be to simply have the C# replicate exactly what we have above and this would do the trick but means that if we had other tasks that needed to behave the same way we would end up duplicating the code for every instance ideally we want to re-use this behaviour.

Inheritance

In true OOP fashion many will reach for the tried and tested inheritance model to wrap up the behaviour above inside a base class à la Template Method Pattern:

public abstract class RetryBase
{
    public RetryBase()
    {
        Interval = TimeSpan.FromSeconds(10);
        RetryCount = 5;
    }

    protected TimeSpan Interval
    {
        get; set;
    }
    protected int RetryCount
    {
        get; set;
    }

    public void Execute()
    {
        var retries = retryCount;
        while (retries > 0)
        {
            try
            {
                ExecuteImpl();
                break;
            }
            catch (Exception ex)
            {
                retries--;
                Exception(ex);
            }
            Thread.Sleep(Interval);
        }
        Failure();
    }

    protected abstract void ExecuteImpl();

    protected virtual void Exception(Exception ex)
    {
    }

    protected virtual void Failure()
    {
    }
}

public class NetworkFileCopier : RetryBase
{
    public NetworkFileCopier()
    {
        // override to 30 secs
        Interval = TimeSpan.FromSeconds(30);
    }

    protected override void ExecuteImpl()
    {
        // do file copy here
    }

    // override to provide logging
    protected override void Exception(Exception ex)
    {
        Log.Error(ex);
    }
}

Client usage:

var networkFileCopier = new NetworkFileCopier();
networkFileCopier.Execute();

We now have reusable behaviour for our retry logic and we also have the ability to override the interval & retries and also get hooked into calls when an exception occurs or in case of failure. There are some issues with this approach though:

  • Firstly it requires quite a bit of work to be able to get this behaviour because we need to inherit from a specific class if we had many actions that needed this behaviour this could get tedious
  • Inheritance is static, once the class is compiled into the hierarchy it cannot change its behaviour dynamically (removing retry logic on demand) without extra code hooks this breaks OCP.
  • The NetworkFileCopier is now intrinsically tied to this inheritance hierachy if it already inherited from another base class we would then need that class to inherit from RetryBase or change RetryBase to inherit from the existing base class (yuk!)
  • Before NetworkFileCopier was happily getting on with it’s responsibility of copying a file over the network now it has to worry about retry logic (intervals, retry count, exception handling etc…) this breaks SRP.

The importance of idempotent services

Introduction

We all know the importance of performing a unit of work (UoW) in a transaction scope so that if anything goes wrong we can rollback the actions that have taken place and be sure that the state is exactly how it was before the UoW took place, this is were we can use a messaging framework (i.e. MSMQ) that can support transactional capabilities using a transaction coordinator (i.e. MSDTC) so that when we receive a message perform some work, but when saving to the DB there is an issue (i.e. Network problem, transaction deadlock victim) we know that the message received will be put back on the queue and any messages that have been attempted to be sent will be returned (to prevent duplicates):

public void DoWork()
{
    using (var scope = new TransactionScope())
    {
        var command = MsmqInboundWork.NextMessage();
        // *** processing here ***
        DbGateway.Save(workItem);
        MsmqOutboundResult.SendMessage(new ResultMessage { CompletedAt = DateTime.UtcNow; });

        scope.Complete();
    }
} 

If this happens we have options (i.e. retry for a limited amount of goes and then put the message into an error queue) however we don’t always have the option of using a reliable/durable/transactional transport layer such as messaging, for instance if we are integrating with a 3rd party over the internet they are going to provide usually a HTTP based transport layer such as SOAP/REST or like in my case recently integrating with an internal legacy system that only provides a HTTP transport layer (SOAP webservices was the only option in my case).

This post will demonstrate how we can work with services exposed on unreliable transport layer to perform actions but still not end up with duplicates by making the operations idempotent.

What is an idempotent operation?

This basically boils down to being able to call a function with the same arguments and have the same result occur, so if we have a method:

public int Add(int first, int second)
{
    return first + second;
} 

This method is idempotent because if I called it multiple times with Add(2,2) it will always return 4 and there are no side effects, compare this to the next method (demo only!):

public void DebitAccount(string accountNumber, decimal amount)
{
    var account = FindAccount(accountNumber);
    account.Balance -= amount;
    // account saved away
} 

If I call this multiple times like this DebitAccount("12345678", 100.00m) the client of that account is going to be more out of pocket each time a call is made.

How does this relate to my webservice method?

You may be thinking to yourself we have a system running that exposes a webservice with a method to perform a similar action to the example above or that you make a call to a 3rd party to perform an action (i.e. send an SMS) and so far we haven’t had any problems so what’s the big deal!? You’ve been lucky so far but remember the first fallacy of distributed computing

The network is reliable

Once you get network issues this is where everything goes wrong, this may not be a big deal you may decide to retry the call and the user ends up with another SMS sent to their mobile however if you’re performing a journal on an account this is a big problem.

If we look at a typical call to a webservice, in this case we are going to use the example of a façade service of a legacy banking system:

Account Service
fig 1 – If all goes to plan

In this case everything has gone ok and the client has received a response back from the account service so we know that the account has been debited correctly, what if we have the following:

Account Service
fig 2 – Network issue causes loss of response to client

Now we have a problem because the client has no way of knowing whether the debit was performed or not, in the example above the journal has taken place but the machine running the account service could have a problem and not get round to telling the banking system to perform a journal.

Fixing the problem

We established in the last section that the client has no way of knowing whether the debit took place if we do not receive a response from the account service so the only thing we can do would be assume that it did not take place and retry the service call given the scenario from fig 2 this would be very bad as the debit did take place so we would duplicate the debiting of the account.

So we have a few options of how we can deal with this issue:

  1. We don’t perform retries on the client side automatically instead we handle the situation manually and could perform a compensating action before attempting a retry (i.e. perform a credit journal)
  2. We check on the account service whether we have already had a debit for the chosen account and amount in a specific time frame and if so we ignore the call (this would not work in this example as we may have a genuine need to debit the account twice within a specific time frame)
  3. Have the client send something unique (i.e. GUID) for the current debit that you want to take place this can then be used by the account service to check if we have already performed the journal associated with this request.

The last 2 options are to make the account service idempotent my recommendation would be to use the last option and this is the option I will demonstrate for the rest of the post.

Ok so the first change we should make is to add a unique id to the request that gets sent, it is the responsibility of the client application to associate this unique id with the UoW being carried out (i.e. if the client application was allowing an overdrawn account be settled then the UoW would be a settlement and we can then associate this id to the settlement) so the client needs be changed to store this unique id.

The next change is for the account service to perform a check to make sure that we have not already performed a journal for the id passed to it, this is accomplished by storing each id against the result (in this case the journal reference) once the journal has taken place and querying it as the first task, if we find the id then we simply return the journal reference in the response back to the client.

If the id is not found we must not have performed a journal so we need to make the call to the banking system as we were before and then store away the id and the journal reference and return the response back to the client.

Now that this is in place we can have the client perform retries (within a certain threshold) if we receive no response back from the account service safe in the knowledge that we won’t get duplicate debits from the account, here is a sequence diagram to outline the new strategy:

New idempotent account service

Ah the joys of distributed architecture!