Making Spark Work for Next Generation Information Fusion Workflows

Outline:

How to integrate a graph metadata repository with Spark
Benefits of using Spark with a graph metadata repository
Examples of data fusion workflows using Spark

The benefits of Apache Spark include cluster computing in memory and iterative batch analytics. Both of these capabilities lend themselves quite well to graph analytics and machine learning. These types of analytics allow users to derive valuable business contextual data. Such as how different customer groups are connected, similar purchasing patterns of people over time, the biggest influencers in market, and so forth. The problem with using large scale batch graph analytics is that any delay reduces the potential impact that real-time business intelligence can add. By tightly coupling a graph metadata repository alongside a running Spark engine, you not only gain a greater contextual awareness by fusing data from multiple sources, but also enable real-time decision making by adding contextual processing to workflow. By including contextual awareness as part of the workflow, instead of just the output, you can give the overall workflow greater significance in deriving business value.

Nick Quinn is the Principle Engineer for InfiniteGraph, a distributed graph-oriented data management technology from Objectivity Inc. Since joining Objectivity, Nick has been work exclusively on the InfiniteGraph project and has played a key role on the design and architecture of InfiniteGraph 3.0 and 3.1. Prior to joining Objectivity in 2010, Nick worked as a lead Java Developer at Savi Networks, a Lockheed Martin Company. Nick holds a Master’s Degree in the College of Engineering from Santa Clara University in Santa Clara, California.