How To: Anonymise swap trade data for your project team

I was recently asked for some “production like” OTC swaps coming from Calypso so that a development partner could test their proof of concept project. I needed to provide trade data as well as referential data for both product and account look ups to support the testing. The following shows you some of the techniques that I employed to anonymise the swap data whilst still enabling the vendor to use it to prove their system.

Anonymise Swap

Technique: HASHING
Purpose: To change the value of data items like account IDs
Effect: Breaks the link between the data to be released and the original production data
Applied To: Account, Product, Index and Trade IDs
Comments:
We developed an algorithm that could change IDs in the data. We needed to maintain the integrity of lookups from the trade to the referential data. Therefore, our algorithm scrambled the data in the same way for both the trade and referential datasets. This scrambling removed the ability to link the data back to the production system.

Technique: DATE SLIDING
Purpose: To slide all the dates in the trade data forward/backward by a consistent value
Effect: Changes the dates on the trade whilst still maintaining the integrity of the dates
Applied To: Trade, As Of, Execution, Cleared, Effective. Termination and Payment dates
Comments:
We developed algorithm based on a couple of trade attributes. The first was used to determine the offset value to be applied to the dates in the trade data. The second was used to determine the direction (forward or backward). This was particularly effective as it applied different slides to each trade.

Technique: PERTURBATION
Purpose: To adjust the economic values in a dataset so they no longer match the original
Effect: Changes the economic values by applying aggregates across various ranges
Applied To: Notional, Fixed Rate, Premium
Comments:
We analysed the economic data in the dataset and applied averages across bands of data. This means that the dataset as a whole is still mathematically intact but individual economic values on trades have been adjusted.

Technique: MASKING
Purpose: To prevent test within the data from providing information to the consumer
Effect: Replaces text strings with “*” characters
Applied To: Party names, country of residence, contact information, trader names
Comments:
Simple masking was implemented on this data. As an additional security step, we ensured that all the masking was the same length. This would prevent someone for trying to deduce client names from the length of the mask.

Finally, we also applied an additional technique to the data in order to apply “noise” to the data by adding additional entries to the dataset.

Technique: K-Anonymisation
Purpose: To distort the number of entries in the dataset for a set criteria
Effect: Ensures that there will always be at least “K” occurrences of trades matching the criteria
Applied To: Additional trades across the dataset
Comments:
We were concerned that it might be possible to narrow down trades for a specific counterparty, In the scenario where the consumer of the data knew that a single trade had taken place with the counterparty for a specific value it could be possible to identify this trade. In order to obfuscate the dataset we developed an algorithm that would ensure that there would always be “K” entries for the specified criteria.

Postscript
I’ve created a spreadsheet demonstrating some of these techniques, which you can download via the form below.

Posted in FpML, ISDA, Test Data, Testing | Tagged , , , , , | Comments Off on How To: Anonymise swap trade data for your project team

Automated FpML Message Testing with JMeter

JMeter_V0.1

One of the ingredients of a successful messaging project is strong testing. However, the fluid nature of messaging projects means iteration after iteration of system releases. This presents a challenge for the testers and they need to run the tests and verify the results over and over again. Given the complex routing, functional and regression testing requirements in messaging projects, you will need an automated process. Without it you will struggle to prove that your release is fit for purpose in a timely manner. We have found that the Apache Foundation’s JMeter provides a perfect solution.

The Apache Foundation’s JMeter solution provides a way to automate testing as well as check the results. Although designed to flex systems to provide load testing and monitoring services, the software can also orchestrate tests – which is perfect for the testing of messaging systems. Additionally, JMeter doesn’t need a full Developer software setup. It doesn’t require an install – simply dropping the JMeter files on your machine is enough to get it up and running.

.

The following article details how we used JMeter to orchestrate the testing of a messaging system.

.

Before we started

Before we rushed into building out tests for the messaging system, we needed to think a few things through:

  • Strategy: What would prove that the system worked?
  • Test Pack: What would our test inputs look like?
  • Orchestration: How would we present the test inputs and check the outputs?
  • Visibility: How would we know which were our tests in a busy environment?
  • Control: How could we maintain strict version control of our tests?

Strategy

We designed our tests using the black box testing strategy. This means ignoring the inner workings of the messaging system and looking at the inputs and outputs from it. In our messaging system, we concentrated on a single target system. There are numerous other targets that are fed by our messaging system but we chose to build our test pack around this particular system.

JMeter_Test_Boundary_V0.1

Fig 1.1 – Black Box testing strategy

[A point of note. JMeter is sufficiently flexible to support us moving to white box testing in later iterations.]

.

Test Pack

The test data for our system would consist of FpML messages. We won’t cover the process of how we determined the content for these messages here. However, its important to understand how we stored these. We decided to use individual CSV files to contain the messages for each functional test that we required. This resulted in us having approximately ten CSV files, each holding numerous FpML messages. We stored these in our version control system.

.

Orchestration

This is where JMeter came into its own. We made use of the following functionality within the tool in order to support our testing.

  • HTTP Header Manager: This allowed us to connect to the input queue via the Rabbit MQ http web service
  • JDBC Connection: This allowed us to connect to target Oracle database
  • CSV Data Set Config: This allowed us to read in our CSV test packs and load the messages
  • Constant Timer: This allowed us to build in a delay between posting and checking the results
  • BeanShell Sampler: This allowed us to get creative with generating IDs and counting the rows on our CSV test packs
  • Loop Controller: This allowed us to loop through our message push for each row on our CSV test packs
  • JDBC Request: This allowed us to run SQL queries against the target database to pull our results back
  • Response Assertion: This allowed us to compare the results returned to our expected results
  • View Results Tree: This allowed us to see the results of tests

That’s quiet a lot of functionality all contained within JMeter that we could call on out-of-the-box. JMeter allowed us to use all of these and string them together in order to meet our requirements. They are all added to the test plan into the tree structure and configured via the UI. Our Business Analyst was able to build all this without a developer spec machine.

.

Visibility

Our test environment had a lot of activity taking place within it. In order to ensure that we could see our tests, we decided to generate a “run number” for each test run and prefix all our trade Ids with that number. We could then quickly see our trades and this also supported pulling the results for this test only from the target database.

JMeter provided the built in User Defined Variable functionality, which allowed us to automate this run number and to set a run time variable to hold the value. It was then straight forward to adjust our test packs to include this variable.

.

Control

The outstanding feature of JMeter is that it can easily pull in version controlled files. This ensured that our test packs could be checked into version control and become part of our project artifacts. The JMeter test plan itself can also be saved as a .jmx file and stored in version control. This is a critical feature when working in such fluid development projects.

.

When you put it all together, what does it look like

JMeter_Automated_Testing_V0.1

Fig 1.2 – Our  JMeter Testing Framework

.

.Summary

JMeter allowed us to quickly build out an automated testing function for our BAs to use. We were able to save the orchestration as well as our test data in our version control system. Moving from a slow manual process utilising multiple tools to an automated, self contained and self checking testing tool was critical to the project success. It is also possible to add JMeter to your Jenkins automated build so these tests can be run with every build in the future.

.

If you want to know more about how we did this and what we could do for you and your projects, then feel free to get in touch.

 .

Posted in Automation, FpML, JMeter, Orchestration, Regression, Smoke Testing, STP, Test Automation, Testing, Trade Flow, Uncategorized | Tagged , , , , , , , , | Comments Off on Automated FpML Message Testing with JMeter

MariaDB CONNECT – Avoiding the pitfalls!

mariaDB

There will come a time when you need to make data available to your mariaDB application from other database management systems. The CONNECT functionality allows you to do this. This article will cover how to use it to access remote data and some of the challenges and pitfalls you may encounter.

In one of our recent projects, we needed to calculate some count statistics from two Oracle 11g database tables and store the results in our mariaDB 10.0.22 database. We were dealing with approximately 2 million rows on each of the Oracle tables and, as we were calculating set theory counts, we needed to compare the keys on both tables. The tables were indexed correctly and performance within Oracle was really good.

In order to access the Oracle tables we need to set CONNECT up. Having rushed through the CONNECT documentation, we set up two CONNECT tables in our mariaDB database, one for each of the remote Oracle tables.

The mariadb create table statements looked a bit  like this:

CREATE TABLE CONNECT_Remote_Data_TableA

ENGINE=CONNECT

TABLE_TYPE=ODBC

TABNAME=TableA

CONNECTION=’Driver={Oracle ODBC driver};Server=://xxx.xxx.xxx.xxx:1521/ORCL;UID=USERID;PWD=PASSWORD;’

 

CREATE TABLE CONNECT_Remote_Data_TableB

ENGINE=CONNECT TABLE_TYPE=ODBC

TABNAME=TableB

CONNECTION=’Driver={Oracle ODBC driver};Server=://xxx.xxx.xxx.xxx:1521/ORCL;UID=USERID;PWD=PASSWORD;’

When we ran these, the result was successful and the two tables were created. A quick test via “Select * from CONNECT_Remote_Data_TableA” proved that data was indeed flowing from Oracle to mariaDB.

We built our queries in mariaDB, referring to the CONNECT tables and started our unit testing. The results were good and we could insert the data returned from them into a mariaDB table. CONNECT was a success and we could now push on with the rest of the development, having built and tested this functionality.

Everything went well until we started to ramp up the volume in the Oracle tables. Then we witnessed an alarming degradation in performance that got worse as we added more and more data. At first we struggled to understand what the problem was – the tables were indexed after all and, therefore, access should be really quick. It was only when we started to think through what  CONNECT table actually was and did some more reading that we found the problem. The solution was based around where the actual SQL Query was being executed.

Here is a representation of what we had built:

maria_connect_pic1

In this configuration, our SQL query was running in mariaDB and drawing data from the Oracle tables. MariaDB inserted the result into the results table but it was very slow. Out of interest, we took the SQL query, converted it to Oracle PL/SQL and ran it in Oracle. The results were lightening quick as you’d expect them to be as the tables were correctly indexed. So, the problem was related to where the SQL ran:

  • In mariaDB – very slow
  • In Oracle – very fast

What’s the usual solution to make a slow query run quickly? Indexing. So we looked at that. In our rush to get this up and running, we had missed the fact that ODBC CONNECT tables cannot be indexed. In effect, all we had created was a conduit or “pipe” to the data which arrived in a stream of unindexed rows that mariaDB then had to work heroically to produce our results from.

So how could we make use of the Oracle indexing within our query and still get the results into mariaDB? It seemed that we needed to “push down” the SQL query to the Oracle end of the CONNECT “pipe”. To do this, we realised that we only needed a single mariaDB CONNECT table but that table would need the SRCDEF parameter adding to it. SRCDEF allows you to execute SQL on the remote database system instead of in mariaDB. The SRCDEF needed to contain a PL/SQL query as it would be running native to Oracle. Our new CONNECT statement looked like this:

CREATE TABLE CONNECT_Remote_Data_Count

ENGINE=CONNECT

TABLE_TYPE=ODBC

TABNAME=TableA

CONNECTION=’Driver={Oracle ODBC driver};Server=://xxx.xxx.xxx.xxx:1521/ORCL;UID=USERID;PWD=PASSWORD;’

SRCDEF=’…PL/SQL equivalent of PSEUDO SQL: Count the entries on TableA that are also on TableB…”

However, when we executed a “Select count(*) from CONNECT_Remote_Data_Count” we received a strange result – 1. The answer was returned very quickly, which was encouraging. However, we knew that this wasn’t the correct answer – we expected many thousands of entries to be on both tables. After a little more head scratching, we tried “Select * from CONNECT_Remote_Data_Count” and viola – our expected result was returned. In effect we were selecting the content of the CONNECT table’s query.

So we now had an Oracle PL/SQL query that was wrapped inside a mariaDB CONNECT “pipe” and being executed remotely in an Oracle database where it could make full use of the indexing. The result was then the only data item being sent down the “pipe” from Oracle to mariaDB.

The final solution looked like this:

maria_connect_pic2

So, as we can see, CONNECT is a powerful thing. It allowed us to build a solution that populated our mariaDB system with results from a query against two tables sat on an Oracle database. The full power of the indexing was utilised and the results were returned in a very fast time.

If you’d like to know more about how we are using CONNECT, then just get in touch.

 

Posted in Connectivity, Data Flow, Databases, MariaDB, Oracle, Uncategorized | Tagged , , , , , , , , , , | Comments Off on MariaDB CONNECT – Avoiding the pitfalls!

Continuous Lifecycle London 2016 – Conference Notes

Who was there

Big names:  Jez Humble and Dave Farley (authors of Continuous Delivery), and Katherine Daniels (Etsy).

Reportedly there were 270 delegates (it certainly felt like it).

Vendors

In general, thin on the ground – New Relic, HPE, Jet Brains, Automic, Chef, Serena, CloudBees and Perforce.

We didn’t see Red Hat or Puppet, and there were no service companies with a stand, although plenty of consultancies on the speaker list.

We asked some of the vendors about Docker support (which ended up feeling a bit like pulling garlic on vampires) and responses varied from “we don’t really have anything there” to “we’ve got something coming soon.”

Favourite Moments / Thoughts / Quotes

@drsnooks: microservices (n,pl); an efficient device for transforming business problems into distributed transaction problems.

“The Chaos Snail – smaller than the Chaos Monkey, and runs on shell.”

“With the earlier revolution (virtualisation), every tool that runs on bare metal also runs on a VM. With the container revolution, this is not true.” (Dima Stopel from Twistlock)

“Tools will not fix a broken culture.”

Katherine Daniels ending her talk with a passionate speech on the need for more diversity in the IT industry.

“Continuous Delivery != Continuous Deployment.” (Jez Humble and Dave Farley repeatedly)

Puppet Should Charge Per-Stream Royalties for Their Report

Memo To All Consultants: It Is Now Time To Stop Quoting The 2014/5 Puppet Labs State of DevOps Report.

Jez Humble probably got away with it by speaking first 🙂

Platforms In The Wild

An entirely unscientific sample of platforms that people are using in the wild for continuous delivery, microservices and DevOps:

Organisation Platform
Financial Times 45 micro services running on Docker + CoreOS + AWS

Were using private cloud but now use AWS. Live.

Pearson Technologies Docker + Kubernetes + OpenStack. Two AWS availability zones and one private cloud. Not yet live.
Home Office Docker + Kubernertes.
<private chat> NServiceBus, Azure, Rabbit MQ.
Government Digital Service (gov.uk) Open Source Cloud Foundry + vCentre. Preparing for move to OpenStack on AWS.
Azure Container Service Reputed to be using Mesos …

Personal Opinion:

For infrastructure-as-a-service, AWS is starting to sound like the choice for the early majority as well as the early adopters. Organisations with sensitive information requirements are already positioning themselves for the arrival of UK data centres. Relatively little mention of Cloud Foundry or Heroku – Docker is the topic du jour.

The objection to ‘rolling your own container platform’ is the amount of work you have to do around orchestration, logging, monitoring, management and so on. This didn’t seem to be putting people off, nor were we seeing much mention of frameworks such as Rancher.

Further Reading

Empathy: The Essence of DevOps – Jeff Sussna
http://blog.ingineering.it/post/72964480807/empathy-the-essence-of-devops

Why Every Company Needs Continuous Delivery – Sarah Goff-Dupont
http://blogs.atlassian.com/2015/10/why-continuous-delivery-for-every-development-team/

Posted in AWS, Cloud, Conferences, Continuous Integration, Docker | Tagged , , , , , | Comments Off on Continuous Lifecycle London 2016 – Conference Notes

Continuous Integration with Docker and Jenkins – Not So Easy

DockerImages
TL;DR: It takes a few minutes to pull a Jenkins container, it takes a few weeks of hard work to get it playing nicely with Docker.

Intro

We wanted to build a CI pipeline to do automated deployment and testing against our containerised web application. And we picked the most mainstream, vanilla technology set we could:

jenkins-technology-soup
Our Reasoning

[1] The link between hosted GitHub repositories and hosted Docker Hub builds is lovely.

[2] Triggering Jenkins jobs from Docker Hub web hooks *must* be just as lovely.

[2] There *must* just be a Jenkins plugin to stand up Docker applications.

Reality Bites #1 – Docker Hub Web Hooks

These aren’t reliable. Sometimes Docker Hub builds timeout if the queue is busy, so the web hook never fires. But the upstream change has still happened in GitHub, and you still need your CI pipeline to run.

Our Solution

We changed our Jenkins job to be triggered by GitHub web hooks. Once triggered, our job called a script that polled Docker Hub until it spotted a change in the image identifier.

Reality Bites #2 – So there is a Jenkins Plugin …

… but it doesn’t work, and is now dormant. The main issue is that authentication no longer works since the Docker API 2.0 release but there is a reasonable list of other issues.

Our First Solution

We looked at Docker in Docker https://blog.docker.com/2013/09/docker-can-now-run-within-docker/ and Docker outside Docker https://forums.docker.com/t/using-docker-in-a-dockerized-jenkins-container/322 The later had some success and we were able to execute docker commands though this wasn’t scalable as you are limited to a single Docker engine which may or may not be an issue depending on the scale of your set up.

Our Second Solution

We set up a Jenkins Master Slave configuration. The master is a Dockerised version of Jenkins image (it doesn’t need to have access to Docker in this configuration). The slave is another Linux instance (in this case we are using AWS). Our instance is fairly light weight – it is a standard t2.micro (which is free tier eligible) AWS Linux instance on which is installed SSH, Java, Maven and Docker.

A user is created that has permission to run docker and access to a user created folder /var/lib/Jenkins. The Jenkins master can then run the slave via SSH and we can confine Jenkins jobs to only run on that slave and run shell scripts such as docker pull. This is fully extensible and allows for parallel job execution and segregation of Jenkins job types e.g. compile on one slave, docker on another and so on.

Reality Bites #3 – I’m sorry, can you just explain that last bit again?

The Jenkins Docker image is tempting as an easy way to get Jenkins, but it creates a new problem which is “controlling a target Docker Engine from inside a Docker container controlled by another Docker Engine.”

If you create a Jenkins “slave” on a separate host, your Jenkins Docker container can happily send commands to that slave via SSH. Your “slave” is just a VM running Jenkins alongside Docker Engine, so you can run shell scripts locally on the “slave” to call docker compose.

Summary

The hard bit of this is getting from a nice diagram (http://messageconsulting.com/wp-content/uploads/2016/03/ContinuousBuildAndIntegration02.png) to a set of running applications deployed either as containers or native processes on a set of hosts that are all talking to each other, and to your upstream source code repository. Plenty of people have done it already, but be prepared to do some head scratching, and write some bash scripts!

Posted in Continuous Integration, Docker, Jenkins | Tagged , , , | Comments Off on Continuous Integration with Docker and Jenkins – Not So Easy

Three Amigos, One Cucumber and Where to Stick it in Jenkins

CucumberLogoThis article is aimed at the stalwart of software development, the Test Manager! The scenario: your boss has been on a jolly and has heard the term Cucumber whilst enjoying the free bar! Apparently, using a cucumber has helped his friend/rival steal a march on the market and it’s your job to work out how you can repeat this success using a long, green-skinned fruit! This article is aimed at giving the Test Manager enough information to understand what on earth Three Amigos would do with cucumber in an automated way. Don’t worry, we promise to avoid all food analogies or allergies, sorry!

Ok, let’s think of some common things we do during our Test Planning and Automation Phases, what usually goes wrong and how we can fix it.

The first thing we try and do is understand the Application Under Test (this is true if you are working in Agile or Waterfall or whatever), this typically involves amongst other things a workshop with the Business Analyst, The Development Team and the Testers. I make that a count of three, aha! The Three Amigos! Of course, this meeting can involve a whole host of others though the point is there are three groups of people present, or in jargon three domains. These groups are trying to come to a shared understanding of the requirements which typically results in three sets of documentation each with its’ own vocabulary. The long running specification workshop eventually wraps up, with each group relatively content that they know what they are doing and can carry out their respective tasks. The Test Manager and Team set about their business only to discover some way in to the process that there have been several misunderstandings and the tests don’t validate the customer requirements and even though it’s a no-blame culture, people want to know whose fault it is that it’s not working. Sound familiar? Wouldn’t it be nice if there was a tool that could effectively close this gap in understanding, using a shared language and at the same time give us a flying start in the automation process? Well brace yourself for Cucumber!

I made a common mistake when first looking into Cucumber – I took it to be a pure test automation tool. I missed the point, it really is a superb way of collaborating. The Three Amigos (yes it was taken from the film) work together in short meetings (hopefully no more than an hour) on a regular basis to arrive at a shared understanding of how the software should behave and capture this behaviour in scenarios. Well, that’s nothing new you say! The clever bit is the way that the Scenario is captured; Cucumber makes use of Feature files. These files are in plain English and have a very light weight structure. For example, at Redhound we have developed a product called Rover Test Intelligence and below is the actual feature file we use to test a particular scenario, without any other form of documentation can you tell what the product and the test do?

Feature: Rover Categorise Data
As a Release Manager I want to be able to categorise unexpected
differences in data between Production and UAT whilst ignoring 
irrelevant fields
Scenario: User categorises data
Given That I am logged in to the Rover Categorisation screen
When I select a difference group
And I launch the categorise process
Then I can tag the record with a difference label

Try another

Feature: Rover See Data Differences
As a Release Manager I want to be able to see differences in data 
between Production and UAT whilst ignoring irrelevant fields
Scenario: User views differences
Given That I am logged in to the Rover Categorisation screen
When I select a difference group
And I launch the see differences process
Then I can view the differences in data between the two record sets

As you can see this is, hopefully, understandable to most people reading it. And, this is important, the steps “Given, When, And & Then” can be interpreted by a computer so, that is Three Amigos, a Cucumber and a Laptop! It may not be obvious but this feature file is written in a Language called Gherkin. The Feature files can be developed in their totality outside the Three Amigos specification meeting so long as a feedback loop is in place to ensure the Amigos stay friendly.

When I say it can be interpreted by a computer there is work to do here. At this point the Test Engineers get busy; however, when you run Cucumber it takes the Feature file steps and creates a skeleton type framework and the Test Engineers then have to put the meat on the bones – no, Cucumber will not write your automation tests you still have to code them.

At Redhound we are using IntelliJ IDEA for the Integrated Development Environment, Maven for dependency management and Java as the language of choice. With this set up, when you run a Feature file for the first time, rover_see_data_differences.feature, Cucumber will helpfully generate the following:

//You can implement missing steps with the snippets below:
@Given("^That I am logged in to the Rover Categorisation screen$")
public void That_I_am_logged_in_to_the_Rover_Categorisation_screen() throws Throwable {
// Express the Regexp above with the code you wish you had
    throw new PendingException();
}

@When("^I select a difference group$")
public void I_select_a_difference_group() throws Throwable {
    // Express the Regexp above with the code you wish you had
    throw new PendingException();
}

@When("^I launch the see differences process$")
public void I_launch_the_see_differences_process() throws Throwable {
    // Express the Regexp above with the code you wish you had
    throw new PendingException();
}

@Then("^I can view the differences in data between the two record sets$")
public void I_can_view_the_differences_in_data_between_the_two_record_sets() throws Throwable {
    // Express the Regexp above with the code you wish you had
    throw new PendingException();
}

Granted, the above does look a little more technical, though you can recognise that the steps for the Feature file are now linked via a regular expression to Java code, brilliant! The generated code snippet can be cut and paste directly into Java class and the process of developing your automated tests and indeed your software can begin. Your own test code is placed in the auto generated method bodies replacing the “throw new PendingException();” statement.

The real advantage here is that there is a shared understanding of what the feature steps mean, a so called ubiquitous language; the developers can make it, the testers can break it, and the Business Analyst can see that the product is being developed in line with actual requirements and the tests are sound. This is an iterative process that goes under the guise of Behavioural Driven Development, the desired behaviour drives the development and test! Another term you may see used for the same process is “Specification by Example” (2). Though the irony is not lost in using Cucumber and Gherkin to describe the tool which in no way describes what it is or does, still, it is catchy!

Ok, pause for a breath…

To recap, Cucumber should be thought of as a collaboration tool that brings the Three Amigos together in order to define, using examples, a scenario. When Cucumber runs it generates helpful skeleton code that can be filled in by Test Engineers to create an automated acceptance test framework. The cumulative behaviours in all of the Feature files will eventually equate to the specifications for the system.

ThreeAmigosAndTesting-1027x222

Now, how to link Cucumber in to your Continuous Integration and Deployment framework. We have discussed Continuous Integration & Deployment with Docker, Jenkins & Selenium however, it can be confusing to see just how all these bits link together…

The way we do it is to have our automated tests safely tucked away in Git Hub. We have Jenkins and Git Hub linked together using the inbuilt facility of Git Hub – Web Hooks. A change in the base code will trigger Jenkins to run the job. The source code uses Maven for dependency management, which in turn uses Profiles – this is simply a collection of tests under a Test Suite. Jenkins is configured to execute Maven Tests so that test suites can be run accordingly. (See (3) for diagram).

We can’t finish without mentioning that maximum benefits are achieved if you use Cucumber in an Agile Testing framework. You get all the benefits of short iterations, quickly finding defects, less handovers due to multi-disciplinary teams etc, etc. However, just collaborating in the Three Amigos style can assist you no end in understanding what you are supposed to be testing.

Final Summary – Cucumber can be thought of as a collaboration tool, that in conjunction with a Specification by Example process can bring enormous benefits to your automation test efforts. Cucumber won’t write your automation tests for you though it creates skeleton code from a plain English Feature file. If the Three Amigos (a Business Analyst, a Test Analyst and a Developer) work together in short bursts a common understanding and language is achieved, greatly increasing your chances of delivering successful software. We actively encourage the adoption of this approach to enable you to achieve your goals.

Posted in Test Automation, Testing | Tagged , , , , , , | 1 Comment

Continuous Integration and Deployment with Docker, Jenkins and Selenium

Key Technologies and Tools: Jenkins, Docker, Docker-Hub, Git, Git-Hub, Amazon Web Services, Saucelabs, Blazemeter, Selenium, Appium, Webdriver, Test Automation, Agile, Waterfall, Rapid Development and Test, Business Driven Testing, Data Driven Testing, JUNIT, Test Suites , Java, Maven

This article is aimed at the long suffering Test Manager. Often the unsung hero, who at the last minute and under great pressure, brings it all together polishing the proverbial and assuring the delivered product actually meets most of its’ requirements and will operate as expected. Amongst the day to day chaos you have been given the task of finding out what all the fuss is around virtualisation, continuous integration and delivery and also, if it’s any good, can we have one as soon possible! If this description fits you we think you will find this article very useful. We describe how we have successfully implemented Test Automation within a continuous build and deployment framework.

A quick google will bring you all the definitions you could ever need (there are some good links at the end of this article) so let’s think about what you would like, what you can have, what it would cost and what tools and processes you would use. The list could far exceed the space we have here so, in interest of keeping this brief, we list a few small nice to haves that are consistent amongst our client surveys:

  • Wouldn’t it be good if you could test changes as they were developed and automatically deploy or stage the tested code so we don’t have a mass panic before go live?
  • As a Test Manager I would like to have a cost effective and rapid way of setting up Test Environment, Application and Test Data then wiping it all clean and starting a fresh on demand
  • I would like a common framework that allows me to test applications with a write once and test in multiple ways say, across the web and mobile platforms
  • My set up would have to grow and reduce in near real time and we only pay for what we use
    The return on investment must exceed the cost of set up

Well that would be nice wouldn’t it? It’s probably no surprise that the above is possible; what probably is surprising is the fact that the tooling required to get going is absolutely free and is industry standard! That is worth repeating, the cost of tooling is absolutely free!

The above in fact describes just some of the benefits of continuous integration and virtualisation.

Ah, you say, that all sounds great but what does it really mean? Where do I get started?

Let’s take this a step at a time…

The first thing you will need is a platform to act as a backdrop. There are lots of cloud providers competing for your business – we have settled on Amazon Web Services (AWS). ASW is free to get started and will allow you to spin up and dispose of servers at will. You only pay for what you use and can replicate your builds easily. For example, we have created Linux based servers and windows boxes. You can log on using a device of your choice – laptops, tablets etc; and utilise the full power of the Cloud. If you find your machines lacking in power or storage you can expand at will. This will, of course, lead to higher charges, so if you find after a particularly intense testing effort you no longer need the horsepower, you can scale back and reduce costs. This is where the elastic comes from in EC2 Elastic Compute Generation 2.

The second thing you need is something to orchestrate the end to end flow and that is Jenkins. Jenkins is a continuous integration and continuous delivery application. Use Jenkins to build and test your software projects continuously. It is truly a powerful tool so it must be expensive right? It is free! Also, you would expect it to be hard to install and configure – well the basic implementation is quick and easy. Complexity of job configuration will increase in line with your actual tests; however, there are a wide range of plugins that ease the task of set up and configuration and cater for nearly everything you can think of. Once you get in to the swing of it you will find it hard to resist tinkering as you can set up a new job in minutes.

What about code control and deployment? We use a combination of Git Hub and Docker Hub for our version control and image build. GitHub is a web-based Git hosting service. It offers all of the distributed revision control and source code management (SCM) functionality of Git and comes with plugins for linking to Jenkins. The Docker Hub is a Cloud-based registry service for building and shipping application or service containers. It provides a centralized resource for container image discovery, distribution and change management, user and team collaboration, and workflow automation throughout the development pipeline. Both Git Hub and Docker Hub are, you guessed it, free to get started. If you want to make your repositories private you will start paying a small fee.

We mentioned images earlier and in this context we refer to Docker images. Docker allows you to package an application with all of its dependencies and data into a standardized unit for software development. With a single command you can for example, run a Tomcat server with a baked in application along with any static and user data. Sound useful? It is! With another command or two you can flatten and pull a new version allowing total reset of the environment. So, if the Development Team build and push the code you can extract and test it in a rapid time-frame.

The above components allow the software and data bundle to be developed, tested, and changed as required and pushed again. The cycle continues on and on building test coverage as it goes.

ContinuousBuildAndIntegration02In summary so far:

  • Developers create code using their tool of choice and push it to the Git repository
  • Git Hub triggers Docker Hub – We use this to bundle the application and data into a single package for test
  • Docker Hub notifies Jenkins that a fresh build is available for test

At last I have mentioned testing! True the above does start to stray into development and deployment territory, though it is important information for you to wrap your head around. From a testing perspective it is really helps to focus in on the Docker image as being the product.

We have built an application ROVER TEST INTELLIGENCE this is an excellent application in its own right allowing rapid comparison and analysis of millions of records in seconds. To test this we need a Tomcat server, a war file containing our application and a supporting database; a fairly typical bundle for a web based application. We have a single Docker image for the Tomcat server and war file and another for the database and one for the data. That is three in total – this suits our development approach. However, for testing purposes all these can be treated as a single unit. For us a change in any of the underlying components triggers the full set of test suites.

We use Jenkins to control our tests. A Git change triggers a Docker build which in turn triggers Jenkins to spin up a ‘slave’ machine on AWS and execute the tests. As illustrated we have two slave machines. Docker type operations are executed on a native Linux instance and GUI tests are run on a Windows based platform; the instances are only active whilst needed keeping costs to a minimum.

We create tests using the JUnit framework and Selenium Webdriver classes. The code is reusable and a single script can be executed for Web, JMeter and Appium mobile testing, minimising redundancy and duplication.

We also take advantage of some of the services offered by third party Cloud based providers, namely Saucelabs to provide extensive cross browser testing, and Blazemeter to scale up performance tests when we really need to crank up the horsepower and perform short burst enterprise level testing. This is done with minimum alteration to the script. Configuration is passed in via request parameters. Saucelabs and Blazemeter are elastic too with a free tier account ramping up and down with usage.

Further, Jenkins can be configured to run on a schedule as well as in response to changes; this allows you to Soak Test applications, driving out intermittency due to fore example environment factors and run tests when it’s cheaper. You can actually negotiate for cheap server time! Also, it will keep you updated by email.

In summary:

  • Jenkins, Git Hub and Docker Hub can be used for your automated framework to build, test and deploy your code
  • Focus on the Docker image as being the testable product; this can include code, databases, data and even servers
  • JUnit and Selenium can be used for writing your reusable automated test scripts
  • Test scripts are portable and can be directly utilised by third party Cloud providers to extend testing capabilities in an elastic fashion
  • The tooling cost for your initial set ups are zero, you just need to add time and effort

When you get this combination right, it really does liberate you with less time spent manually testing and more time spent innovating. The traditional test integration phase all but disappears and non-functional requirements, so often forsaken in an agile context, get built as part of the deal. The return on investment accumulates as you go, with test coverage increasing at each iteration. Of course, there is a learning curve and a (less than you may think) maintenance cost, though we feel the benefits gained well worth the time and effort.

If you would like us to help you please get in touch at
info@redhound.net
Tel: +44(0)800 255 0169 FREE

Demos and further reading
What is Rover Test Intelligence? http://redhound.net/rover-test-intelligence/
What are Amazon Web Services? https://aws.amazon.com/
What is Jenkins? https://wiki.jenkins-ci.org/display/JENKINS/Meet+Jenkins
What is Git Hub? https://github.com/
What is Docker Hub? https://docs.docker.com/docker-hub/overview/
What is Docker?
What is JUnit? http://www.tutorialspoint.com/junit/junit_test_framework.htm
What is Selenium WebDriver? http://www.seleniumhq.org/
What is Sauce Labs? https://saucelabs.com/
What is Blazemeter? https://www.blazemeter.com/

Posted in Continuous Integration, Jenkins, Test Automation, Virtualisation | Tagged , , , , | Comments Off on Continuous Integration and Deployment with Docker, Jenkins and Selenium

WaveMaker – 7 Tips For Week 1

WaveMaker-Logo-Retina

TLDR: Technical business analysts *really can* use WaveMaker to build web applications on the enterprise Java stack.

Intro

The promise of RAD development tools has always been the opportunity for non-programmers to build applications, and the big downside has always been the cost of licensing and deploying a specialised runtime environment to each user. The interesting thing about WaveMaker’s approach to this market is that the runtime cost has gone. You compile your web application to a WAR file, and you can then deploy it to the web application server of your choice.

We have been using WaveMaker Beta 8.0.0.3 to build a prototype of an enterprise dashboard application (free trial here: http://www.wavemaker.com/ ). We were happy enough with the progress we made and the support we got to commit to licensing – here are some of our early lessons.

Lesson 1: Get a real problem

We had a prototype of our application built in Microsoft Excel, so we knew exactly what we had to build. This made it harder for us to shy away from the trickier problems, such as master-detail navigation.

Lesson 2: Variable Scope and Type

Keep an eye out for whether your variable is at application or page level. This is key, and determines how the application refreshes or reloads your variable. Variable is a broad brush term in the Wavemaker context and encompasses procedures, queries and widgets to name but a few. It helped us to think of them as objects.

01-variable-scope-and-type

Lesson 3: Use Timers to auto-update

We had a requirement to have an auto-updating dashboard, and we used a special kind of variable of type Timer. This fires an event as per its set parameters, and was very useful.

02-timer-screen

Lesson 4: Be aware of where localhost is with database connections

We discovered a real ‘gotcha’  when importing a database – one of the key parameters asked for is the host. When developing in the cloud, ‘localhost’ will point to the cloud instance, not the database on your local machine, so use the URL. We have successfully connected WaveMaker to Amazon RDS for SQL Server, Oracle, MariaDB and PostgreSQL.

Lesson 5: Where to find project configuration files

If you want to take a peek at your project files look at the following. This lets you edit configuration files and update your parameters.

03-view-project-files

Lesson 6: How to pass parameters between pages

Two widgets on a page can be made to talk to each other very easily with a few clicks of a mouse as they will have the same scope. However, what is not so intuitive is how to pass parameters from a parent to a child page where the scope of the widgets is constrained to their individual page. The solution: On the parent page create a static variable, set the scope to Application level and then bind the data values to the appropriate parent widget; On the Child page, bind the child widget to the static variable values.

Lesson 7: Beware of scheduled maintenance

You are happily coding away, you are feeling pleased, even a little smug with yourself and you hit the run button when … What can be happening? Erm … nothing! Have you lost your mind? Will your boss mock you? Don’t worry – it might just be that you’ve missed a maintenance window. These are extremely easy to miss – the small banner that appears briefly in the bottom right hand side of the studio window will be your only warning.

04-small-banner

Just in case you missed that …

WaveMakerMaintenance

Summary

We found that after a bit of experimenting WaveMaker does everything and more than we needed it to do. The product is not instantly intuitive, but after a couple of weeks there is a flow that you go with and application development does indeed become rapid. Hopefully, the tips above will get you where you need to be rapidly and stop you barking up the wrong tree. We give the Wavemaker product an overall 9 out of 10.

Useful Links:

Main website, free trial, tutorials: http://www.wavemaker.com/

Series of tutorials on YouTube:  https://www.youtube.com/channel/UCQXjfhBWpBiqpXol_WGh71A

 

Posted in Rapid Application Development, WaveMaker | Tagged , , | Comments Off on WaveMaker – 7 Tips For Week 1

Neo4j GraphTalks – Fraud Detection and Risk Management Talk Review

neo4j-logo-2015I went to an excellent seminar this morning hosted by Neo4j, the graph database vendor. I used Neo4j a couple of years back to model change requests at a investment bank, and I’ve had a soft spot for its speed and ease of use ever since, so it was good to hear that adoption is growing, and also to hear about some real life experience.

Key takeaways:
– Neo4j is particularly relevant for fraud detection, since it allows you to spot patterns that you didn’t know you were looking for
– Some impressive claims about performance – one of the speakers was running a 16 million node proof-of-concept on a notebook with 4 GB RAM!
– Interesting (and coherent) explanation of the difference between graph databases like Neo4j, RDBMS and other NoSQL solutions – Neo4j captures relationships as data, RDBMS allow you to construct relationships through server side processing (queries) and something like MongoDB puts the onus of constructing relationships on the application side
– Neo4j lets you model the real world directly – you don’t need to decompose into an RDBMS schema

Speaker notes:
The talk was billed as a more ‘business friendly’ event than the beer and pizza developer meet-ups, and I think Jonny Cheetham’s introduction to Neo4j was very nicely pitched at a mixed audience. I’m pretty sure he got as far as cypher without losing people, and the visual display of a conventional person-account-address relationship, versus a fraud chain, was highly instructive.

Charles Pardue from Prophis did a great job on describing why using a graph database was so useful for portfolio risk management. Most people look at their relationships to their counterparties, and then stop. Charles’ example showed how you could use a graph database to go out into the world of your counterparties’ relationships and beyond, detecting for instance) an exposure to a particular country that only existed a three steps removal.

Cristiano Motto had clearly had a good experience exporting a Teradata data mart into a Neo4j P.O.C running on someone’s notebook. Apart from speaking volumes about the products’ impressive performance, it also made the point about how you could use the product to mine an existing (expensive) data store, without having to repurpose that data store itself.

One always comes away from these talks with a couple of resolutions – my own were to:
– See what kind of transaction speed I can get for storing a high volume message flow
– Figure out how to model time series in Neo4j (for instance a set of cashflows)
– Figure out how to model historical data in Neo4j (current/previous address)

Posted in Databases, Domain Model, Enterprise Wide Data | Tagged , , , | Comments Off on Neo4j GraphTalks – Fraud Detection and Risk Management Talk Review

Automating TICK: Using the TICK API for Reporting

tick_combined_logo

I like TICK for its clean and intuitive user interface, but my organisation uses “day” as the time period for budgets whereas TICK reports everything by the hour.

So I needed my own app to take TICK report hours and convert them to days.

Fortunately the nice people at TICK helpfully provide a tool to facilitate the automation of this task …

Preliminaries

You know that TICK provides manual reporting capabilities from the homepage

Image 01 cropped - TICK Homepage

But did you know you can also access this functionality for scripting via the TICK API?

There are some basics you will need to get started with the API:

  1. Install cURL as the tool to provide the capability for automated access to TICK  http://curl.haxx.se/
  2. Give yourself a means to run cURL in a unix environment.  Not a problem for those of you working with Linux, but for Windows users you will need a tool such as Cygwin installed  https://cygwin.com/
  3. Make a note of your TICK Subscription ID and API Token.  These can be found on the Users screen.Image 02 cropped - TICK User details

OK, so now you are ready to build some cURL commands to access all that useful TICK data.

Main Course

To collect the data for Time Entries you need to follow the basic structure of TICK:

Clients –> Projects –> Tasks –> Time Entries

There are also the Users who enter the times linked to Time Entries

First, let’s have a look at the Clients for your Subscription.

The cURL command you need is:

curl –insecure -H “Authorization: Token token=Your API Token–user-agent aNameForYourApp (yourEmailAddress@wherever)” https://www.tickspot.com/Your Subscription ID/api/v2/clients.json

  • –insecure stops cURL performing SSL certificate verification. You need to decide if this is appropriate for your circumstances. If not then off you go to sort out SSL certification …
  • -H is the HTTP Header. TICK insists on the format shown here to recognise the API Token
  • –user-agent is where you give TICK a label to identify your app and an email address to communicate with you if anything goes wrong
  • Finally, there’s the address for the API. Note that these notes refer to version 2, keep an eye on https://www.tickspot.com/api to check if the version ever changes

To quote the documentation “the Tick API and has been designed around RESTful concepts with JSON for serialization” so that’s why it has clients.json

But what I hear you cry is JSON?

Well, it’s a standard way to format the returned outputs so that you can use other freely available tools to parse it into something more useful for your app. More of that shortly …

Running the cURL command for Clients in Cygwin will give you output that looks like this:

[{“id”:257764,”name”:”Client No.1″,”archive”:false,”url”:”https://www.tickspot.com/Your Subscription ID/api/v2/clients/257764.json”,”updated_at”:”2014-09-14T15:15:26.000-04:00″},{“id”:257766,”name”:”Client No.2″,”archive”:false,”url”:”https://www.tickspot.com/Your Subscription ID/api/v2/clients/257766.json”,”updated_at”:”2014-11-05T18:24:59.000-05:00″}]

JSON format puts the full output in square brackets with each dataset in braces. Elements are separated by colons and strings are in double quotes.

So you can see how it can be parsed into rows and columns or array elements.

We have already looked at the cURL command for Clients.

Now here are the other cURL commands you need for Projects, Tasks, Time Entries and Users

  • curl –insecure -H “Authorization: Token token=Your API Token” –user-agent “aNameForYourApp (yourEmailAddress@wherever)” https://www.tickspot.com/Your Subscription ID/api/v2/projects.json
  • curl –insecure -H “Authorization: Token token=Your API Token” –user-agent “aNameForYourApp (yourEmailAddress@wherever)” https://www.tickspot.com/Your Subscription ID/api/v2/projects/project_id/tasks.json

project_id is the numeric code taken from the output of the Projects command

  • curl –insecure -H “Authorization: Token token=Your API Token” –user-agent “aNameForYourApp (yourEmailAddress@wherever)” “https://www.tickspot.com/Your Subscription ID/api/v2/entries?project_id=project_id&start_date=yyyy-mm-dd&end_date= yyyy-mm-dd.json”

start_date and end_date gives a restricted range for the Time Entries. You must specify values for them.

  • curl –insecure -H “Authorization: Token token=Your API Token” –user-agent ” aNameForYourApp (yourEmailAddress@wherever)” https://www.tickspot.com/Your Subscription ID/api/v2/users.json

The output of the Time Entries command looks like this:

[{“id“:44400081,”date”:”2015-03-06″,”hours”:7.5,”notes”:””,”task_id“:6470920,”user_id“:222443,”url”:”https://www.tickspot.com/Your Subscription ID/api/v2/entries/44400081.json”,”created_at”:”2015-03-09T04:28:05.000-04:00″,”updated_at”:”2015-03-09T04:28:05.000-04:00″}]

  • id is the unique identifier of that Time Entry
  • task_id links back to the list of Tasks. From Task you can link back to Project. From Project you can link back to Client
  • user_id links to the User

That’s everything you need to collect all the TICK Time Entry data and feed the rest of your app to construct your own reports.

Good luck!

Posted in API, Automation, TICK, Time Recording | Tagged , , , , , | 1 Comment