PM1: Architecting a Data Platform

What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive and real-time analytical workloads.

By tracing the flow of data from source to output, we’ll explore the options and considerations for components, including:

Acquisition: from internal and external data sources
Ingestion: offline and real-time processing
Storage
Providing data services: exposing data to applications
Analytics: batch and interactive
Data management: data security, lineage, metadata and quality

We’ll give also advice on:

Tool selection
The function of the major Hadoop components and other big data technologies such as Spark and Kafka
hardware sizing and cloud provisioning
integration with legacy systems

John Akred likes to help organizations become more data driven. Mr. Akred has over 15 years of experience in advanced analytical applications and analytical system architecture. He is a recognized expert in the areas of applied business analytics, machine learning, predictive analytics, and operational data mining. He has deep expertise in the application of various architectural approaches such as: distributed non-relational data stores (NoSQL), stream processing, in-database analytics, event-driven architectures and specialized appliances; to real-time scoring, real-time optimization, and similar applications of analytics at scale. John received a BA in Economics from the University of New Hampshire, and a MS in Computer Science, focused on Distributed Systems from DePaul University.

A leading expert on big data architecture and Hadoop, Stephen brings over 20 years of experience creating scalable, high-availability, data and applications solutions. A veteran of WalmartLabs, Sun and Yahoo!, Stephen leads data architecture and infrastructure.

An Apache Cassandra committer and PMC member, Gary specializes in building distributed systems. Recent experience includes creating an open source high-volume metrics processing pipeline and building out several geographically distributed API services in the cloud.