The importance of performing spike solutions

Firstly we should discuss what is a spike solution? Well according to The Art of Agile Development[Shore & Warden 2007]:

A spike solution, or spike, is a technical investigation. It’s a small experiment to research the answer to a problem.

The Extreme Programming site defines them as:

A spike solution is a very simple program to explore potential solutions. Build the spike to only addresses the problem under examination and ignore all other concerns.

Spikes

These two definitions line up perfectly and also go on to present one major point that must be made before continuing:

Spike/Spike solutions should never be committed into the main codebase

They should be treated as throw away code, once you have your answer it has served it’s purpose (if you must have the code under source control make sure it’s completely separate to the main trunk!).

I also want to clarify how I see the difference between spike solutions and prototyping which could appear very similar, prototyping usually encompasses a much larger goal such as putting some quick static front-end screens together to gauge the UX whereas spike solutions in contrast help to answer a specific technical question such as will EF be able to map to our legacy Users table correctly. This means that spike solutions should require a lot less time to complete and we should probably be time-boxing how long we spend on a particular spike to ensure this.

Now that we have defined what a spike solution is I want to go through a subset real world example (with name changes) that demonstrates there effectiveness when certain situations arise.

XMHELL

Back Story

Fubar DIY Stores Ltd has an e-commerce site that lists all the products available that can be bought, when a product is displayed reviews are also shown and new ones submitted, this was managed in house previously but now they have decided to use a popular global third party service Haveyoursay and they have assured them that they can provide a like for like data match to what we currently have stored.

So in order to bring in the reviews from Haveyoursay twice a day the reviews will be exported as an XML file, we have been tasked with using this exported XML file to import the reviews into the Fubar DIY Stores database so that they can then be retrieved along with the product data.

Making a start

For this example we will only be concentrating on the import of the XML file and ignoring the subsequent steps.

So we have completed a rough architecture of all the moving parts we roughly know which objects we need an how they need to collaborate with each other, essentially there will be a coordinating object that knows what steps need to be performed in which order, this will be the ReviewImportCoordinator and has a method that will start the import PerformImport taking in the XML file path.

We know also that we need an object that ReviewImportCoordinator will collaborate with to read the XML file and bring back a trusty XmlDocument object that we can then use to parse the data we need and then save to the DB.

So we start writing our unit tests first for ReviewImportCoordinator and stub our IXmlReviewFileReader this has a method that takes the import file path and returns us our XmlDocument object and we continue with our unit testing of the import process.

Setting ourselves up for a fall

It seems were doing everything right, we have broken the responsibilities up into separate objects and are using TDD/BDD against our import process. However we have jumped the gun here and are making some big assumptions about how we go about reading the XML file which will have an impact on how our ReviewImportCoordinator does it’s work.

Just enough design

This is were people new to agile get it wrong and start jumping into the code rather than doing some design up front, agile does not tell you to do any design is tells us not to big design up front were we try and guess everything about the system before any code is written.

Our first task should be to get a copy of the XML file this will get rid of our assumptions about how to handle the XML import, so after chatting to the stakeholder we get a copy of the XML file and good job we did as we hit a potential hurdle, the file is around 450MB this new discovery should start us asking questions:

  • Is this a one off or are they all going to be around this size?
  • How much memory will be used if we load this into an XML DOM?
  • Will the machines that are running the import be impacted by the extra memory usage?

After asking the stakeholder for some other import files they are also around 450MB so this seems to be the expected size for each import, so now we can move onto our crucial question How much memory will be used if we load this into an XML DOM? until we get the answer we have no way of knowing whether the will be an impact on the memory usage.

Time for a spike solution

This is the ideal time for us to write a spike to discover the answer, so we knock together a very quick console app with some hacked together code in the Main method, that simply loads an XmlDocument using one of our supplied import files as the input and a Console.ReadLine() so that it waits for input to allow us to open up task manager and discover how much memory the process is using (we just need a ballpark figure otherwise we could use some profiling tools to get more insight).

static void Main(string[] args) 
{
    XmlDocument.Load(@"c:\haveyoursay\imports\import.xml");
    Console.ReadLine();
}

Getting Feedback

So after we run our spike solution we find that the process is using around 1GB of memory to load the import XML into a DOM, we now have a confident number that we can go back to our stakeholder with in order to find out what impact this will have on the machine running the import.

After discussing with the stakeholder it turns out this machine is already being used to perform other jobs and will suffer badly from having that much memory being taken away from these jobs, so we have to look at streaming the XML file rather than loading it into a DOM so we need to use XmlReader rather than XmlDocument so we can now start to unit test using this knowledge and heading down the right path from the start.

Summary

I hope this demonstrates how we can use spike solutions with a little design up front to help steer us in the right direction, this example was done at the time of implementation you can also use spike solutions as part of estimating when you have to use a unfamiliar technology, library, protocol etc… It can give you a quick way of gauging how difficult it is perform certain tasks to give you a bit more confidence in your estimates.

So next time your stuck with a technical question a spike solution could be just what your after!

Advertisements

Handling retries part 3 – using functional

Handling retries part 1 – using inheritance

Handling retries part 2 – using composition

Introduction

In our line of work there are usually many ways to accomplish a particular task (for better or worse), in these series of posts I want to try and demonstrate various different techniques that we can use and also what benefits we can gain from each.

So without further ado here is the scenario I want to be able to support:

I need a way of performing a particular action that can also handle an exception being raised by re-trying the action after a specified amount of time for a specified number of retries.

here is the pseudo-code to get an idea:

set retries = 5
    while retries > 0
        begin
            call task
            exit while
        exception
            decrement retries
            call exception
        end
        call sleep 3
    end while
call failure

The most basic way to accomplish this would be to simply have the C# replicate exactly what we have above and this would do the trick but means that if we had other tasks that needed to behave the same way we would end up duplicating the code for every instance ideally we want to re-use this behaviour.

Functional

Sometimes it helps to take a look at different languages and programming styles when facing a problem to see how you would solve the problem and whether you can take any of the techniques and utilise them, this is especially true of functional programming (FP) now that C# has a lot more support of FP constructs (lambdas, generics, tuples etc…).

If we take a look at a javascript example of how we can achieve the retry behavour:

var Retrier = {
    execute: function (action, exception, failure, interval, retries) {
        try
        {
            action();
            return;
        }
        catch (ex)
        {
            retries--;
            exception(ex);
            if (retries > 0) {
              var vals = {
                retries: retries,
                interval: interval,
                action: action,
                exception: exception,
                failure: failure
              };
              window.setTimeout(function() {
                  Retrier.execute(vals.action, vals.exception, vals.failure, vals.interval, vals.retries);
                }, vals.interval);
            } else {
                failure();
            }
        }
    }
};

This would then be used like this:

Retrier.execute(function () { // action
                  networkFileCopier.DoCopy();
                },
                function (ex) { // exception
                  console.log('exception raised! ' + ex);
                },
                function () { // failure
                  console.log('fail!');
                }, 1000, 5);

I’ll be the first to admit that my javascript is not the best as I don’t tend to use it (I have omitted the anonymous function to close the scope), there are a number of major differences we have had to take into account:

  • Javascript does not have a concept of a class and instead just uses objects , therefore we have a simple object to hang the execute method off you can think of it as a static method in C#
  • All of the state is maintained inside the call I could have had the Retrier object have properties and this would work better if we wanted to have a standard number of retries, interval and way of handling errors, instead I have stuck to more of a FP style
  • You generally don’t want to do any sort of blocking in javascript as this would either block the UI thread in the browser or block the processing thread in NodeJS therefore instead we have to use the setTimeout function to tell javascript to call a specific function sometime in the future based on the interval
  • Due to the fact that we have to use setTimeout instead of sleeping the thread for the interval we use a recursive call with the retries value decremented each time, before we can do so we have to setup a closure vals otherwise the variables would be lost as javascript uses function scoping

Whenever using recursion we need to be careful not to end up overflowing the stack but in this case unless your going to retry a task several thousand times this should not be an issue.

So let’s take the above and create a C# equivelant:

public static class Retrier
{
    public static void Execute(Action action, Action exception, Action failure, TimeSpan interval, int retries)
    {
        var retryCount = retries;
        while (retryCount > 0)
        {
            try
            {
                action();
                break;
            }
            catch (Exception ex)
            {
                retryCount--;
                exception(ex);
            }
            Thread.Sleep(interval);
        }
        failure();
    }
}

This would then be used like this:

Retrier.Execute(() => networkFileCopier.DoCopy(),
                ex => Log.Error(ex),
                () => Log.Fatal("fail!"),
                TimeSpan.FromSeconds(30),
                5);

Well we have completely eliminated OOP from the retry behaviour here and instead are left with a single class to hold our Execute method, from the client side they are no longer required to create new objects to hook into the retry behaviour however there are a couple of issues:

  • There is going to be quite a bit of duplication from the client code as each time they need to setup all the callback methods and also assign the interval and retry amount
  • The API for the caller is very obtuse, once you start to have lamdas being passed in to method calls it can start to get difficult to understand (named arguments can help but is generally an indication your API could do with being changed)

In the next part I want to leverage OOP and FP together to see if we can fix the issues above.

The importance of idempotent services

Introduction

We all know the importance of performing a unit of work (UoW) in a transaction scope so that if anything goes wrong we can rollback the actions that have taken place and be sure that the state is exactly how it was before the UoW took place, this is were we can use a messaging framework (i.e. MSMQ) that can support transactional capabilities using a transaction coordinator (i.e. MSDTC) so that when we receive a message perform some work, but when saving to the DB there is an issue (i.e. Network problem, transaction deadlock victim) we know that the message received will be put back on the queue and any messages that have been attempted to be sent will be returned (to prevent duplicates):

public void DoWork()
{
    using (var scope = new TransactionScope())
    {
        var command = MsmqInboundWork.NextMessage();
        // *** processing here ***
        DbGateway.Save(workItem);
        MsmqOutboundResult.SendMessage(new ResultMessage { CompletedAt = DateTime.UtcNow; });

        scope.Complete();
    }
} 

If this happens we have options (i.e. retry for a limited amount of goes and then put the message into an error queue) however we don’t always have the option of using a reliable/durable/transactional transport layer such as messaging, for instance if we are integrating with a 3rd party over the internet they are going to provide usually a HTTP based transport layer such as SOAP/REST or like in my case recently integrating with an internal legacy system that only provides a HTTP transport layer (SOAP webservices was the only option in my case).

This post will demonstrate how we can work with services exposed on unreliable transport layer to perform actions but still not end up with duplicates by making the operations idempotent.

What is an idempotent operation?

This basically boils down to being able to call a function with the same arguments and have the same result occur, so if we have a method:

public int Add(int first, int second)
{
    return first + second;
} 

This method is idempotent because if I called it multiple times with Add(2,2) it will always return 4 and there are no side effects, compare this to the next method (demo only!):

public void DebitAccount(string accountNumber, decimal amount)
{
    var account = FindAccount(accountNumber);
    account.Balance -= amount;
    // account saved away
} 

If I call this multiple times like this DebitAccount("12345678", 100.00m) the client of that account is going to be more out of pocket each time a call is made.

How does this relate to my webservice method?

You may be thinking to yourself we have a system running that exposes a webservice with a method to perform a similar action to the example above or that you make a call to a 3rd party to perform an action (i.e. send an SMS) and so far we haven’t had any problems so what’s the big deal!? You’ve been lucky so far but remember the first fallacy of distributed computing

The network is reliable

Once you get network issues this is where everything goes wrong, this may not be a big deal you may decide to retry the call and the user ends up with another SMS sent to their mobile however if you’re performing a journal on an account this is a big problem.

If we look at a typical call to a webservice, in this case we are going to use the example of a façade service of a legacy banking system:

Account Service
fig 1 – If all goes to plan

In this case everything has gone ok and the client has received a response back from the account service so we know that the account has been debited correctly, what if we have the following:

Account Service
fig 2 – Network issue causes loss of response to client

Now we have a problem because the client has no way of knowing whether the debit was performed or not, in the example above the journal has taken place but the machine running the account service could have a problem and not get round to telling the banking system to perform a journal.

Fixing the problem

We established in the last section that the client has no way of knowing whether the debit took place if we do not receive a response from the account service so the only thing we can do would be assume that it did not take place and retry the service call given the scenario from fig 2 this would be very bad as the debit did take place so we would duplicate the debiting of the account.

So we have a few options of how we can deal with this issue:

  1. We don’t perform retries on the client side automatically instead we handle the situation manually and could perform a compensating action before attempting a retry (i.e. perform a credit journal)
  2. We check on the account service whether we have already had a debit for the chosen account and amount in a specific time frame and if so we ignore the call (this would not work in this example as we may have a genuine need to debit the account twice within a specific time frame)
  3. Have the client send something unique (i.e. GUID) for the current debit that you want to take place this can then be used by the account service to check if we have already performed the journal associated with this request.

The last 2 options are to make the account service idempotent my recommendation would be to use the last option and this is the option I will demonstrate for the rest of the post.

Ok so the first change we should make is to add a unique id to the request that gets sent, it is the responsibility of the client application to associate this unique id with the UoW being carried out (i.e. if the client application was allowing an overdrawn account be settled then the UoW would be a settlement and we can then associate this id to the settlement) so the client needs be changed to store this unique id.

The next change is for the account service to perform a check to make sure that we have not already performed a journal for the id passed to it, this is accomplished by storing each id against the result (in this case the journal reference) once the journal has taken place and querying it as the first task, if we find the id then we simply return the journal reference in the response back to the client.

If the id is not found we must not have performed a journal so we need to make the call to the banking system as we were before and then store away the id and the journal reference and return the response back to the client.

Now that this is in place we can have the client perform retries (within a certain threshold) if we receive no response back from the account service safe in the knowledge that we won’t get duplicate debits from the account, here is a sequence diagram to outline the new strategy:

New idempotent account service

Ah the joys of distributed architecture!

Command Query Separation (CQS)

Introduction

CQS was a term that was coined by Bertrand Meyer and presented itself in his book Object Oriented Software Construction (I haven’t read the book, yet!) the principle states that every method should either be a command that changes state or a query in which case it does not change any state but not both.

In all I agree with the principle I can think of some valid examples that are valid to break it but on the whole it’s makes sense to follow it, however what I want to show in this post is to take this principle and show how it can be applied at various levels rather than just at the method level.

Architectural Level

If we imagine a system which manages lots of amounts of data and the business would like to perform analysis on this data while still providing transactional throughput it would not be uncommon to split the data into separate Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP), this  is a good example of separating the data that will be changing often (OLTP) from the stagnant data that can be analysed (OLAP) I feel this matches strongly to the principle, the benefits gained typically result in better performance and data that reflects the needs of the operations applied to it.

Object Design Level

When we have a complex domain that require numerous behaviours we want to take advantage of domain modelling in our objects by utilising something like DDD to come up with entities, value objects etc… that we can use to represent the domain were working in.

We use these objects to perform transactional operations such as “Submitting an Order” or “Creating a New User” as these represent where we are going to using the behaviours and rules we put together using our objects. What we don’t want to start doing is then using these same objects for displaying result grids on a search screen, this represents a very different usage pattern and if we try to use our objects for this purpose and also for transactional processing we get into a mess quite quickly.

Instead it’s best to take onboard the CQS principle and separate our domain model objects that will be used for out transaction processing from what I think of as reporting or querying objects that are only interested in grabbing data for things such as search screens & reports. So if we were in a issue tracking domain we would probably end up with something like this:

 
// uses ORM such as nHibernate concerned only with using objects in a transactional 
// context does not do any reporting or querying 
public class IssueRepository : IIssueRepository 
{ 
	public Issue FindBy(int id) 
	{ 
		// use ORM to get by id 
	} 
	
	public Issue Save(Issue issue) 
	{ 
		// use ORM to save only used when new as object will be tracked 
		// for updates automatically 
	} 
		
	public void Delete(Issue issue) 
	{ 
		// use ORM to delete 
	} 
} 

// probably just uses ADO.NET raw or a light facade on top of ADO.NET 
// may use nHibernate but DOES NOT return any transactional objects only 
// projections 
public class IssueQuerying : IIssueQuerying 
{ 
	public DataTable FindMatchingCriteria(int priorityId, int pageNo, int pageCount, out int virtualCount) 
	{ 
		// use ADO.NET to fill datatable of results, possibly using SP to support paging 
		// for older DB 
	} 

	public IssueDisplayModel FindAssignedTo(string username) 
	{ 
		// use nHibernate to fill projection object IssueDisplayModel 
	} 
} 

The benefits we get from separating these 2 concerns:

  1. Our domain model objects are not polluted with lots of getters just to be used for reporting/querying
  2. We can utilise databinding because our querying object returns objects that ASP.NET controls know what to do with (one way databinding only)
  3. Our mapping is simplified as we only need to map our domain model objects to other objects for transactional processing needs not querying/reporting.

Summary

Hopefully this post has shown how we can take the original CQS principle and see how it applies to other levels when developing software from the high level architecture right down to the method level.

Staff Intranet Updated

I just updated the staff intranet project over on codeplex, I have now got it under source control now that they have added support for SVN 🙂 And have also uploaded a new release of the source in case you dont have a source control client compatible.

The changes are:

  • Incorporating some of my other projects (log4net-altconf, object2object mapper) to help with the maintainability of the app.
  • Refactorings I have made to the way a staff member is saved. After gaining a better understanding of DDD principles it was quite clear that the validation around the photo upload & checking for name duplication should be separated as Specification objects to reduce the coupling in the Staff object.

Remember any feedback you can post here on my blog or on codeplex

New Staff Intranet Release

I have found enough spare time to put up a new release of the staff intranet project, for those who are not aware of this project it is a demonstration of using best practices, principles & patterns in a real world web application so if your looking for pointers or some code to use for your own applications go give it a look on codeplex.

In this newest version I have added AOP support to cut down on cross cutting code and also the ability to delete staff members from the GridView, most of the time spent was fighting against the asp.net controls (suprise, surprise) such as the GridView and the ObjectDataSource, I’m not sure what the guy(s) who created the ObjectDataSource object was smoking at the time but it must have been stronger than just tobacco 🙂

My next release I want to demonstrate adding some service support showing how we can re-use existing code so they become little more than a remote facade (in theory!).