I went to an excellent seminar this morning hosted by Neo4j, the graph database vendor. I used Neo4j a couple of years back to model change requests at a investment bank, and I’ve had a soft spot for its speed and ease of use ever since, so it was good to hear that adoption is growing, and also to hear about some real life experience.
Key takeaways:
– Neo4j is particularly relevant for fraud detection, since it allows you to spot patterns that you didn’t know you were looking for
– Some impressive claims about performance – one of the speakers was running a 16 million node proof-of-concept on a notebook with 4 GB RAM!
– Interesting (and coherent) explanation of the difference between graph databases like Neo4j, RDBMS and other NoSQL solutions – Neo4j captures relationships as data, RDBMS allow you to construct relationships through server side processing (queries) and something like MongoDB puts the onus of constructing relationships on the application side
– Neo4j lets you model the real world directly – you don’t need to decompose into an RDBMS schema
Speaker notes:
The talk was billed as a more ‘business friendly’ event than the beer and pizza developer meet-ups, and I think Jonny Cheetham’s introduction to Neo4j was very nicely pitched at a mixed audience. I’m pretty sure he got as far as cypher without losing people, and the visual display of a conventional person-account-address relationship, versus a fraud chain, was highly instructive.
Charles Pardue from Prophis did a great job on describing why using a graph database was so useful for portfolio risk management. Most people look at their relationships to their counterparties, and then stop. Charles’ example showed how you could use a graph database to go out into the world of your counterparties’ relationships and beyond, detecting for instance) an exposure to a particular country that only existed a three steps removal.
Cristiano Motto had clearly had a good experience exporting a Teradata data mart into a Neo4j P.O.C running on someone’s notebook. Apart from speaking volumes about the products’ impressive performance, it also made the point about how you could use the product to mine an existing (expensive) data store, without having to repurpose that data store itself.
One always comes away from these talks with a couple of resolutions – my own were to:
– See what kind of transaction speed I can get for storing a high volume message flow
– Figure out how to model time series in Neo4j (for instance a set of cashflows)
– Figure out how to model historical data in Neo4j (current/previous address)