AM4: Apache Spark Workshop

Apache Spark is emerging as the next Big Data platform. Compared to Hadoop MapReduce, Spark is easier to use and offers a flexible compute model. Spark’s popularity is due the fact that it excels in in-memory computations (many times faster than MapReduce). Spark’s stream processing capability also makes it a very good fit for processing ‘Connected Devices or Internet of Things (IoT)’ data.

This workshop will introduce Spark concepts and teach how to use Spark by working on hands-on labs. The intended audience is developers and technical architects.

Students will be provided a Spark environment running on cloud for the duration of class. If students want to keep working on Spark on their own, it is highly recommended that they bring their own Spark environment. Download Spark from here : http://spark.apache.org/

Sujee Maniyam has been consulting and teaching Hadoop, NOSQL and Cloud technologies to large enterprise companies (Intuit & Hitachi) and startups. Sujee is a co-author of open source Hadoop book 'hadoop illuminated'. Sujee stays active in Hadoop / Open Source community, and is a contributor to Hadoop and other open source projects. He runs a Big Data meetup called 'Big Data Gurus' in San Jose. Sujee has presented at variety and conferences and meetups. Sujee is a co-founder and principal at Elephant Scale -- an expert consulting and training company for Big Data technologies.