I was recently asked for some “production like” OTC swaps coming from Calypso so that a development partner could test their proof of concept project. I needed to provide trade data as well as referential data for both product and account look ups to support the testing. The following shows you some of the techniques that I employed to anonymise the swap data whilst still enabling the vendor to use it to prove their system.
Purpose: To change the value of data items like account IDs
Effect: Breaks the link between the data to be released and the original production data
Applied To: Account, Product, Index and Trade IDs
We developed an algorithm that could change IDs in the data. We needed to maintain the integrity of lookups from the trade to the referential data. Therefore, our algorithm scrambled the data in the same way for both the trade and referential datasets. This scrambling removed the ability to link the data back to the production system.
Technique: DATE SLIDING
Purpose: To slide all the dates in the trade data forward/backward by a consistent value
Effect: Changes the dates on the trade whilst still maintaining the integrity of the dates
Applied To: Trade, As Of, Execution, Cleared, Effective. Termination and Payment dates
We developed algorithm based on a couple of trade attributes. The first was used to determine the offset value to be applied to the dates in the trade data. The second was used to determine the direction (forward or backward). This was particularly effective as it applied different slides to each trade.
Purpose: To adjust the economic values in a dataset so they no longer match the original
Effect: Changes the economic values by applying aggregates across various ranges
Applied To: Notional, Fixed Rate, Premium
We analysed the economic data in the dataset and applied averages across bands of data. This means that the dataset as a whole is still mathematically intact but individual economic values on trades have been adjusted.
Purpose: To prevent test within the data from providing information to the consumer
Effect: Replaces text strings with “*” characters
Applied To: Party names, country of residence, contact information, trader names
Simple masking was implemented on this data. As an additional security step, we ensured that all the masking was the same length. This would prevent someone for trying to deduce client names from the length of the mask.
Finally, we also applied an additional technique to the data in order to apply “noise” to the data by adding additional entries to the dataset.
Purpose: To distort the number of entries in the dataset for a set criteria
Effect: Ensures that there will always be at least “K” occurrences of trades matching the criteria
Applied To: Additional trades across the dataset
We were concerned that it might be possible to narrow down trades for a specific counterparty, In the scenario where the consumer of the data knew that a single trade had taken place with the counterparty for a specific value it could be possible to identify this trade. In order to obfuscate the dataset we developed an algorithm that would ensure that there would always be “K” entries for the specified criteria.
I’ve created a spreadsheet demonstrating some of these techniques, which you can download via the form below.