Handling retries part 4 – using both OOP & functional programming

Handling retries part 1 – using inheritance

Handling retries part 2 – using composition

Handling retries part 3 – using functional

Introduction

In our line of work there are usually many ways to accomplish a particular task (for better or worse), in these series of posts I want to try and demonstrate various different techniques that we can use and also what benefits we can gain from each.

So without further ado here is the scenario I want to be able to support:

I need a way of performing a particular action that can also handle an exception being raised by re-trying the action after a specified amount of time for a specified number of retries.

here is the pseudo-code to get an idea:

set retries = 5
    while retries > 0
        begin
            call task
            exit while
        exception
            decrement retries
            call exception
        end
        call sleep 3
    end while
call failure

The most basic way to accomplish this would be to simply have the C# replicate exactly what we have above and this would do the trick but means that if we had other tasks that needed to behave the same way we would end up duplicating the code for every instance ideally we want to re-use this behaviour.

Having the best of both worlds

In the last part we had a couple of issues with using just a functional programming (FP) approach (certainly in C#, full FP languages have dealt with these issues)

Lets deal with the client duplication issue first, ideally what we would like is that we can have the FP retry behaviour but to have it wrapped up in an object that can be re-used:

public class RetryInstance
{    
    public TimeSpan Interval { get; set; }    
    public int Retries { get; set; }    
    public Action<Exception> OnException { get; set;}    
    public Action OnFailure = { get; set; }    
    public void Execute(Action action)    
    {        
        var retryCount = this.Retries;        
        while (retryCount > 0)        
        {            
            try            
            {                
                this.action();                
                break;            
            }            
            catch (Exception ex)            
            {                
                retryCount--;                
                this.OnException(ex);            
            }            
            Thread.Sleep(this.Interval);        
            }        
        this.OnFailure();    
    }
}

This would be used like this:

var retryInstance = new retryInstance()
{    
    Interval = TimeSpan.FromSeconds(30),    
    Retries = 5,    
    OnException = ex => Log.Error(ex),    
    OnFailure = () => Log.Fatal("fail!")
};
    
retryInstance.Execute(networkFilCopier.DoCopy());
// later in the code
retryInstance.Interval = TimeSpan.FromMinutes(1);
retryInstance.Execute(networkFilCopier.DoCopy());

Here we have moved the FP code into an RetryInstance object we can the use this object to maintain the state of how we want the retries, intervals, callbacks to behave and in the example we can adjust this when we need to without having to duplicate anything. The readability of the code has improved quite a lot as well, but I think we can go one step better by introducing a Builder object to help us build a RetryInstance and also to provide it with suitable defaults:

public class RetryConfiguration
{    
    private TimeSpan _interval;    
    private int _retries;    
    private Action<Exception> _onException;    
    private Action _onFailure;    
    
    public void SetInterval(TimeSpan duration)    
    {        
        _interval = duration;    
    }    
    
    public void SetRetries(int count)    
    {        
        _retries = count;    
    }    
    
    public void WhenExceptionRaised(Action<Exception> handler)    
    {        
        _onException = handler;    
    }    
    
    public void WhenFailed(Action handler)    
    {        
        _onFailure = handler;    
    }    
    
    public RetryInstance Build()    
    {        
        return new RetryInstance()        
        {            
            Retries = _retries,            
            Interval = _interval,            
            OnException = _onException,            
            OnFailure = _onFailure        
        };    
    }
}
    
public static class RetryFactory
{    
    public static RetryInstance New(Action<RetryConfiguration> configuration)    
    {        
        config = new RetryConfiguration();        
        config.SetInterval(TimeSpan.FromSeconds(30));        
        config.SetRetries(5);        
        config.WhenExceptionRaised(_ =>; {}); // no-op        
        config.WhenFailed(() =>; {}); // no-op        
        configuration(config);        
        return config.Build();    
    }
}

This would then be used by client code like this:

RetryFactory.New(cfg =>                
    {                    
        cfg.SetInterval(TimeSpan.FromMinutes(1));                    
        cfg.SetRetries(3);                    
        cfg.WhenExceptionRaised(ex => Log.Error(ex));                    
        cfg.WhenFailed(() => Log.Fatal("fail!"));                
    }).Execute(networkFileCopier.DoCopy());

We have introduced a couple more objects one is a configuration object that exposes a really nice API to setup our RetryInstance object and the other is our builder object/fluent DSL that exposes a static method to create a new RetryInstance and also provides our suitable defaults that we can choose to override.

Summary

We have certainly covered a lot of ground in these series of posts and have ended up with quite a lot of options for how we could solve the issue in the introduction and there are no doubt way more other ways that I couldn’t think of! Some of these may be overkill especially for the simplistic case of retry behaviour we have been looking at as with most design & architecture its always a balance and this goes hand in hand with how the behaviour is going to be used. Here is a breakdown of how I would personally go about deciding how to implement the retry behaviour:

  • If I have one object that has one single use for using retry logic I would probably stick to a simple method call like the pseudo-code in the introduction
  • Once I have another use in a separate object I may decide to use inheritance (if no inheritance hierarchy is present) or move to using composition (decorator pattern if I don’t want the object to manage the retries)
  • If I start to have lots of objects wanting retry behaviour and especially across different projects that’s when I would probably move to using OOP & FP together to provide the callers with a nice API to use and to also reduce the amount of code they need to write in order to use it

I hope that this series has helped to put forward the case that we should be open to various different ways of designing software to solve problems and just because in the past you have always done it one way it doesn’t mean that you should then apply that across the board as each problem tends to unique.

, , , , ,

Leave a Comment

Handling retries part 3 – using functional

Handling retries part 1 – using inheritance

Handling retries part 2 – using composition

Introduction

In our line of work there are usually many ways to accomplish a particular task (for better or worse), in these series of posts I want to try and demonstrate various different techniques that we can use and also what benefits we can gain from each.

So without further ado here is the scenario I want to be able to support:

I need a way of performing a particular action that can also handle an exception being raised by re-trying the action after a specified amount of time for a specified number of retries.

here is the pseudo-code to get an idea:

set retries = 5
    while retries > 0
        begin
            call task
            exit while
        exception
            decrement retries
            call exception
        end
        call sleep 3
    end while
call failure

The most basic way to accomplish this would be to simply have the C# replicate exactly what we have above and this would do the trick but means that if we had other tasks that needed to behave the same way we would end up duplicating the code for every instance ideally we want to re-use this behaviour.

Functional

Sometimes it helps to take a look at different languages and programming styles when facing a problem to see how you would solve the problem and whether you can take any of the techniques and utilise them, this is especially true of functional programming (FP) now that C# has a lot more support of FP constructs (lambdas, generics, tuples etc…).

If we take a look at a javascript example of how we can achieve the retry behavour:

var Retrier = {
    execute: function (action, exception, failure, interval, retries) {
        try
        {
            action();
            return;
        }
        catch (ex)
        {
            retries--;
            exception(ex);
            if (retries > 0) {
              var vals = {
                retries: retries,
                interval: interval,
                action: action,
                exception: exception,
                failure: failure
              };
              window.setTimeout(function() {
                  Retrier.execute(vals.action, vals.exception, vals.failure, vals.interval, vals.retries);
                }, vals.interval);
            } else {
                failure();
            }
        }
    }
};

This would then be used like this:

Retrier.execute(function () { // action
                  networkFileCopier.DoCopy();
                },
                function (ex) { // exception
                  console.log('exception raised! ' + ex);
                },
                function () { // failure
                  console.log('fail!');
                }, 1000, 5);

I’ll be the first to admit that my javascript is not the best as I don’t tend to use it (I have omitted the anonymous function to close the scope), there are a number of major differences we have had to take into account:

  • Javascript does not have a concept of a class and instead just uses objects , therefore we have a simple object to hang the execute method off you can think of it as a static method in C#
  • All of the state is maintained inside the call I could have had the Retrier object have properties and this would work better if we wanted to have a standard number of retries, interval and way of handling errors, instead I have stuck to more of a FP style
  • You generally don’t want to do any sort of blocking in javascript as this would either block the UI thread in the browser or block the processing thread in NodeJS therefore instead we have to use the setTimeout function to tell javascript to call a specific function sometime in the future based on the interval
  • Due to the fact that we have to use setTimeout instead of sleeping the thread for the interval we use a recursive call with the retries value decremented each time, before we can do so we have to setup a closure vals otherwise the variables would be lost as javascript uses function scoping

Whenever using recursion we need to be careful not to end up overflowing the stack but in this case unless your going to retry a task several thousand times this should not be an issue.

So let’s take the above and create a C# equivelant:

public static class Retrier
{
    public static void Execute(Action action, Action exception, Action failure, TimeSpan interval, int retries)
    {
        var retryCount = retries;
        while (retryCount > 0)
        {
            try
            {
                action();
                break;
            }
            catch (Exception ex)
            {
                retryCount--;
                exception(ex);
            }
            Thread.Sleep(interval);
        }
        failure();
    }
}

This would then be used like this:

Retrier.Execute(() => networkFileCopier.DoCopy(),
                ex => Log.Error(ex),
                () => Log.Fatal("fail!"),
                TimeSpan.FromSeconds(30),
                5);

Well we have completely eliminated OOP from the retry behaviour here and instead are left with a single class to hold our Execute method, from the client side they are no longer required to create new objects to hook into the retry behaviour however there are a couple of issues:

  • There is going to be quite a bit of duplication from the client code as each time they need to setup all the callback methods and also assign the interval and retry amount
  • The API for the caller is very obtuse, once you start to have lamdas being passed in to method calls it can start to get difficult to understand (named arguments can help but is generally an indication your API could do with being changed)

In the next part I want to leverage OOP and FP together to see if we can fix the issues above.

, , , ,

1 Comment

Handling retries part 2 – using composition

Handling retries part 1 – using inheritance

Introduction

In our line of work there are usually many ways to accomplish a particular task (for better or worse), in these series of posts I want to try and demonstrate various different techniques that we can use and also what benefits we can gain from each.

So without further ado here is the scenario I want to be able to support:

I need a way of performing a particular action that can also handle an exception being raised by re-trying the action after a specified amount of time for a specified number of retries.

here is the pseudo-code to get an idea:

set retries = 5
    while retries > 0
        begin
            call task
            exit while
        exception
            decrement retries
            call exception
        end
        call sleep 3
    end while
call failure

The most basic way to accomplish this would be to simply have the C# replicate exactly what we have above and this would do the trick but means that if we had other tasks that needed to behave the same way we would end up duplicating the code for every instance ideally we want to re-use this behaviour.

Composition

In chapter 1 of the Design Patterns GoF book there is a section titled Inheritance versus Composition I highly recommend anyone with the book who has not read this to go and take a look as it really distills the problems with relying too heavily on inheritance and even includes there own principle:

Favor object composition over class inheritance

The principle holds up when you see how many of the design patterns use composition as opposed to inheritance, so lets give composition a go:

public class Retrier
{
    protected readonly IRunner _runner;

    public int RetryCount { protected get; set; }
    public TimeSpan Interval { protected get; set; }
    public event EventHandler OnException = {};
    public event EventHandler OnFailure = {};

    public Retrier(IRunner runner)
    {
        _runner = runner;
    }

    public void Execute()
    {
        var retries = RetryCount;
        while (retries > 0)
        {
            try
            {
                _runner.Run();
                break;
            }
            catch (Exception ex)
            {
                retries--;
                OnException(this, ex);
            }
            Thread.Sleep(Interval);
        }
        OnFailure(this, EventArgs.Empty);
    }
}

public interface IRunner
{
    void Run();
}

This would then be used as follows:

public class NetworkFileCopier : IRunner
{
    protected Retrier _retrier;

    public NetworkFileCopier()
    {
        _retrier = new Retrier(this);
        _retrier.Interval = TimeSpan.FromSeconds(30);
        _retrier.OnException += ex => Log.Error(ex);
    }

    public void DoCopy()
    {
        _retrier.Execute();
    }

    public void Run()
    {
        // do file copy here
    }
}

Now we have wrapped up the behaviour inside the Retrier object and we reference it inside NetworkFileCopier, unlike the inheritance version we no longer need to be in an inheritance hierarchy so NetworkFileCopier can inherit from some other base class if it needed to. It does need to implement an interface so that the Retrier object knows what to call when it gets executed however this could be changed so that you pass the Retrier a delegate to call instead.

We still have the issue though that NetworkFileCopier is still having to manage the retry object the next section will remove this in case this is an issue.

Decorator pattern

One way we could split this responsibility out of NetworkFileCopier is to use the Decorator Pattern:

public interface IFileCopier
{
    void DoCopy();
}

public class NetworkFileCopier : IFileCopier
{
    public void DoCopy()
    {
        // do file copy here
    }
}

public class RetryFileCopier : IFileCopier, IRunner
{
    protected readonly IFileCopier _decoratedFileCopier;
    protected Retrier _retrier;

    public RetryFileCopier(IFileCopier decoratedFileCopier)
    {
        _decoratedFileCopier = decoratedFileCopier;
        _retrier = new Retrier(this);
        _retrier.Interval = TimeSpan.FromSeconds(30);
        _retrier.OnException += ex => Log.Error(ex);
    }

    public void DoCopy()
    {
        _retrier.Run();
    }

    public void Run()
    {
        _decoratedFileCopier.DoCopy();
    }
}

This can then be used by client code like this:

var fileCopier = new RetryFileCopier(
                    new NetworkFileCopier());
fileCopier.DoCopy();

The first thing to note is how slimmed down NetworkFileCopier is now its only concern is copying files this means that if we needed to change the retry behaviour we do not need to make any changes to this object a good example of orthogonal code also the client gets to decide if we want the behaviour or not.

I feel that these versions are nicer than the inheritance version we looked at in part 1 however it still feels like we need to perform quite a few tasks (or ceremony) to get this to work:

  • Introduce new interface IRunner so that Retrier can communicate with the method to execute (this can be alleviated by a delegate)
  • Introduce a new interface IFileCopier for the decorator pattern to be utilised
  • Introduce new object RetryFileCopier to wrap up the retry behaviour

In the next part I’m going to be throwing OOP out of the window and looking at how functional programming in C# could potentially save us from some of this overhead.

, , , ,

5 Comments

Handling retries part 1 – using inheritance

Introduction

In our line of work there are usually many ways to accomplish a particular task (for better or worse), in these series of posts I want to try and demonstrate various different techniques that we can use and also what benefits we can gain from each.

So without further ado here is the scenario I want to be able to support:

I need a way of performing a particular action that can also handle an exception being raised by re-trying the action after a specified amount of time for a specified number of retries.

here is the pseudo-code to get an idea:

set retries = 5
    while retries > 0
        begin
            call task
            exit while
        exception
            decrement retries
            call exception
        end
        call sleep 3
    end while
call failure

The most basic way to accomplish this would be to simply have the C# replicate exactly what we have above and this would do the trick but means that if we had other tasks that needed to behave the same way we would end up duplicating the code for every instance ideally we want to re-use this behaviour.

Inheritance

In true OOP fashion many will reach for the tried and tested inheritance model to wrap up the behaviour above inside a base class à la Template Method Pattern:

public abstract class RetryBase
{
    public RetryBase()
    {
        Interval = TimeSpan.FromSeconds(10);
        RetryCount = 5;
    }

    protected TimeSpan Interval
    {
        get; set;
    }
    protected int RetryCount
    {
        get; set;
    }

    public void Execute()
    {
        var retries = retryCount;
        while (retries > 0)
        {
            try
            {
                ExecuteImpl();
                break;
            }
            catch (Exception ex)
            {
                retries--;
                Exception(ex);
            }
            Thread.Sleep(Interval);
        }
        Failure();
    }

    protected abstract void ExecuteImpl();

    protected virtual void Exception(Exception ex)
    {
    }

    protected virtual void Failure()
    {
    }
}

public class NetworkFileCopier : RetryBase
{
    public NetworkFileCopier()
    {
        // override to 30 secs
        Interval = TimeSpan.FromSeconds(30);
    }

    protected override void ExecuteImpl()
    {
        // do file copy here
    }

    // override to provide logging
    protected override void Exception(Exception ex)
    {
        Log.Error(ex);
    }
}

Client usage:

var networkFileCopier = new NetworkFileCopier();
networkFileCopier.Execute();

We now have reusable behaviour for our retry logic and we also have the ability to override the interval & retries and also get hooked into calls when an exception occurs or in case of failure. There are some issues with this approach though:

  • Firstly it requires quite a bit of work to be able to get this behaviour because we need to inherit from a specific class if we had many actions that needed this behaviour this could get tedious
  • Inheritance is static, once the class is compiled into the hierarchy it cannot change its behaviour dynamically (removing retry logic on demand) without extra code hooks this breaks OCP.
  • The NetworkFileCopier is now intrinsically tied to this inheritance hierachy if it already inherited from another base class we would then need that class to inherit from RetryBase or change RetryBase to inherit from the existing base class (yuk!)
  • Before NetworkFileCopier was happily getting on with it’s responsibility of copying a file over the network now it has to worry about retry logic (intervals, retry count, exception handling etc…) this breaks SRP.

, , ,

3 Comments

WCF MaxReceivedMessageSize Handle with care

I see a number of questions get raised when the following exception is raised from a WCF client/server:

The maximum message size quota for incoming messages (65536) has been exceeded. To increase the quota, use the MaxReceivedMessageSize property on the appropriate binding element.

And a lot of the responses I see are similar the following:

Easy to solve you just need to set the maxReceivedMessageSize to 2147483647

So you make the change and the error goes away, all is good…

I’m not too sure you see the reason WCF has the the maxReceivedMessageSize in the first place is to have a safeguard by restricting the size of the message being transported, if we think about the change we have just made we have gone from allowing up to a 64kb message  to 2gb, now chances are your not going to hit this but even if we get up to 5mb because we are selecting every transaction that a client has ever made and they have been a customer for several years, now the following is put under extra stress:

  • Server has to serialize 5mb of data, this increases memory usage & cpu
  • Any network points now have to cope with 5mb going over the wire
  • Calling application has to deserialize 5mb of data, this also increases memory usage & cpu
  • The data then needs to be displayed to the client if this is a web application that means returning the html for the 5mb amount of data which will probably end up being more due to the presentation html structure as well, this will increase the outbound bandwidth usage significantly

Now looking at the above you may be thinking 5mb is not much and it should be handled easily and you’re probably correct however this is the steps for a single request, once you factor in multiple clients that may have similar amounts of transactions all browsing the site, your going to have major problems especially given the amount of time request may take to come back the the clients browser as the client will usually refresh the page a number of times before giving up.

Looking at the above the are a number of potential problems that could arise:

  • Depending on the memory the server has it may reach its maximum usage if it has to serialize multiple large objects
  • The calling application may also suffer the same fate when deserializing multiple large objects
  • The network may not have been prepared for the amount of traffic now going over it leading to sluggish replies
  • The calling application could run out of threads all waiting for replies coming back from the server (hopefully there will at least be a timeout set so that the thread does not wait indefinitely)
  • The clients browser may not be able to deal with the vast amounts of html being returned or more likely the client will give up before the browser completes rendering it

So had do we solve this? well let’s take a step back, once we start seeing errors about the size of messages we should really be thinking about what it is we are trying to achieve if we look at the example above

As a client I want to be able to view my previous transactions

If we think about the client who could have 5 years worth of transactions who could have thousands of transactions we want to present the transactions that are important to him now, chances are he is interested in his recent transactions so you could have:

  • Last 6 months transactions
  • Last 12 months transactions
  • Transactions made in 2012
  • Transactions made greater than £1000

By partitioning the transactions we have made it easier for the client to locate the transactions they’re interested in and also solved the issue of sending huge data objects over the network.

Now some people may be thinking that this is a cop out and if I were a client I may want to see all my transactions how would you get this to work…

  1. The client would be able to request all transactions
  2. The client is presented with a message to inform them that the results are being collated and he will be sent an email with the results
  3. In the meantime a backend service would deal with this request by querying the relevant database directly (preferably separate to the OLTP database)
  4. It would then build the results into a file (csv, xml, json etc…)
  5. Provide it to the client over a more appropriate channel (ftp, zipped attachment email)
  6. Notify the client via email that their transactions can now be viewed with the details to access the channel above

What I have done above is acknowledge that providing all the users transactions could result in a massive result set and therefore needs to be treated differently to a simple query performed via the website.

The key point I want to make here is that when you get an error raised due to message size it’s usually a dead give-away that you need to re-think what it is your trying to achieve especially if you have already increased the message size once before!

, , ,

Leave a Comment

NServicebus Host and MS DTC Problem

This post is basically to help someone who finds themselves in the same situation I was in this morning, over the weekend our DB server went down (runs MS SQL Server 2000, yeah I know) when it came back up the MS DTC service was set to manual and not started so this ended up with the NServiceBus Host the message handlers are running in throwing the common exception:

The Transaction Manager is not available

This is fair enough our messages end up in the error queue ready for me to sort things out and use the ReturnToSourceQueue.exe, once I got in and saw the errors I did the following:

  1. Started the MS DTC service on the DB server
  2. Ran ReturnToSourceQueue.exe against the messages in the error queue

I was somewhat surprised to find the messages return straight back to the error queue and the following exception now being logged:

DTC transaction prepre [sic] phase failed

To solve this I restarted the NServiceBus Host running the message handlers and this resolved it completely, hopefully this saves someone some time.

, ,

Leave a Comment

The importance of idempotent services

Introduction

We all know the importance of performing a unit of work (UoW) in a transaction scope so that if anything goes wrong we can rollback the actions that have taken place and be sure that the state is exactly how it was before the UoW took place, this is were we can use a messaging framework (i.e. MSMQ) that can support transactional capabilities using a transaction coordinator (i.e. MSDTC) so that when we receive a message perform some work, but when saving to the DB there is an issue (i.e. Network problem, transaction deadlock victim) we know that the message received will be put back on the queue and any messages that have been attempted to be sent will be returned (to prevent duplicates):

public void DoWork()
{
    using (var scope = new TransactionScope())
    {
        var command = MsmqInboundWork.NextMessage();
        // *** processing here ***
        DbGateway.Save(workItem);
        MsmqOutboundResult.SendMessage(new ResultMessage { CompletedAt = DateTime.UtcNow; });

        scope.Complete();
    }
} 

If this happens we have options (i.e. retry for a limited amount of goes and then put the message into an error queue) however we don’t always have the option of using a reliable/durable/transactional transport layer such as messaging, for instance if we are integrating with a 3rd party over the internet they are going to provide usually a HTTP based transport layer such as SOAP/REST or like in my case recently integrating with an internal legacy system that only provides a HTTP transport layer (SOAP webservices was the only option in my case).

This post will demonstrate how we can work with services exposed on unreliable transport layer to perform actions but still not end up with duplicates by making the operations idempotent.

What is an idempotent operation?

This basically boils down to being able to call a function with the same arguments and have the same result occur, so if we have a method:

public int Add(int first, int second)
{
    return first + second;
} 

This method is idempotent because if I called it multiple times with Add(2,2) it will always return 4 and there are no side effects, compare this to the next method (demo only!):

public void DebitAccount(string accountNumber, decimal amount)
{
    var account = FindAccount(accountNumber);
    account.Balance -= amount;
    // account saved away
} 

If I call this multiple times like this DebitAccount("12345678", 100.00m) the client of that account is going to be more out of pocket each time a call is made.

How does this relate to my webservice method?

You may be thinking to yourself we have a system running that exposes a webservice with a method to perform a similar action to the example above or that you make a call to a 3rd party to perform an action (i.e. send an SMS) and so far we haven’t had any problems so what’s the big deal!? You’ve been lucky so far but remember the first fallacy of distributed computing

The network is reliable

Once you get network issues this is where everything goes wrong, this may not be a big deal you may decide to retry the call and the user ends up with another SMS sent to their mobile however if you’re performing a journal on an account this is a big problem.

If we look at a typical call to a webservice, in this case we are going to use the example of a façade service of a legacy banking system:

Account Service
fig 1 – If all goes to plan

In this case everything has gone ok and the client has received a response back from the account service so we know that the account has been debited correctly, what if we have the following:

Account Service
fig 2 – Network issue causes loss of response to client

Now we have a problem because the client has no way of knowing whether the debit was performed or not, in the example above the journal has taken place but the machine running the account service could have a problem and not get round to telling the banking system to perform a journal.

Fixing the problem

We established in the last section that the client has no way of knowing whether the debit took place if we do not receive a response from the account service so the only thing we can do would be assume that it did not take place and retry the service call given the scenario from fig 2 this would be very bad as the debit did take place so we would duplicate the debiting of the account.

So we have a few options of how we can deal with this issue:

  1. We don’t perform retries on the client side automatically instead we handle the situation manually and could perform a compensating action before attempting a retry (i.e. perform a credit journal)
  2. We check on the account service whether we have already had a debit for the chosen account and amount in a specific time frame and if so we ignore the call (this would not work in this example as we may have a genuine need to debit the account twice within a specific time frame)
  3. Have the client send something unique (i.e. GUID) for the current debit that you want to take place this can then be used by the account service to check if we have already performed the journal associated with this request.

The last 2 options are to make the account service idempotent my recommendation would be to use the last option and this is the option I will demonstrate for the rest of the post.

Ok so the first change we should make is to add a unique id to the request that gets sent, it is the responsibility of the client application to associate this unique id with the UoW being carried out (i.e. if the client application was allowing an overdrawn account be settled then the UoW would be a settlement and we can then associate this id to the settlement) so the client needs be changed to store this unique id.

The next change is for the account service to perform a check to make sure that we have not already performed a journal for the id passed to it, this is accomplished by storing each id against the result (in this case the journal reference) once the journal has taken place and querying it as the first task, if we find the id then we simply return the journal reference in the response back to the client.

If the id is not found we must not have performed a journal so we need to make the call to the banking system as we were before and then store away the id and the journal reference and return the response back to the client.

Now that this is in place we can have the client perform retries (within a certain threshold) if we receive no response back from the account service safe in the knowledge that we won’t get duplicate debits from the account, here is a sequence diagram to outline the new strategy:

New idempotent account service

Ah the joys of distributed architecture!

, , , , , , ,

2 Comments

Follow

Get every new post delivered to your Inbox.