Days with Data Workshop

Introduction to Big Data and Apache Spark

Apache Spark (http://spark.apache.org) is currently the fastest growing project in Big Data environment. It allows processing Big Data sets faster and easier than in the existing solutions. This workshop will jump-start you into working with Spark and help in transition from analyst or developer to Big Data Engineer or Data Scientist. The workshop focuses on practical aspect of working with Apache Spark exploring real-life data sets.

You will learn:

  • basic knowledge of working with data in Big Data scale,
  • what is RDD and DataFrame,
  • about the biggest revolution in Big Data systems since Apache Hadoop,
  • pros and cons of using Big Data technologies in practice.

During the workshop we work in Python. For best results you should have some knowledge of at least one: SQL, Python, Java or other programming or scripting language.

Course Agenda
  1. Introduction to Big Data
    • Definition
    • What is Big Data?
    • History of Big Data
    • Big Data problems
  2. Apache Spark
    • Introduction
    • History
    • Spark vs Hadoop
    • Resilient Distributed Datasets (RDDs)
    • Architecture
    • Operation variants
    • Administration
  3. Spark Core
    • Introduction
    • Java vs Spark vs Python
    • Connecting to cluster
    • Dataset distribution
    • RDD operations
    • Shared variables
    • Execution and testing
  4. Spark SQL
    • Introduction
    • Spark SQL vs Hive
    • Basic operation
    • Data and schema
    • Queries
    • Hive integration
    • Execution and testing
Testimonials

“The trainer is very competent with good organisational skills.”

“Excelent workshop!”

“Lots of practical excerices and useful information.”

Minimum Requirements
  • basic knowledge of Java, SQL, bash, Python (or other scripting languages)
  • device: Intel i5 or newer/similar, >6GB ram

This is a BYOD (bring your own device) workshop, so remember to bring your own laptop.

The workshop will run for 8 hours from 9AM until 5PM. There will be a few coffee breaks and one 1-hour lunch break (on your own).

The workshop will be conducted in English.

Trainer - Jakub Nowacki

Jakub is University of Bristol graduate where he obtained PhD in Engineering Mathematics. On the daily basis he utilizes his analytical and development skill working in software development. He is mostly interested in distributed processing and analysis of big data sets. Jakub originally has C/C++ background but currently works mostly in JVM and Python world.

What makes us unique?
  • Over 5500 course participants
  • 98% satisfied clients
  • 9 years' experience
  • Unique offer of over 300 specialised training courses
  • Over 100 active coaches and consultants
Date and Time

Friday, Feb 02, 2018

09:00 – 17:00 CEST

Location

Berlin, Germany

Büroservice Dorotheenstadt/Internationales Handelszentrum, Friedrichstr. 95

Register

150 EUR per participant

Ticket price includes

  • Full-day workshop
  • Coffee & Tea
  • Wi-Fi access
  • Workshop attendance certificate

Join our Meetup

 

Photos of our workshops

Register

Powered by Eventbrite

Our Partners

Sages
Stacja IT

Contact

How can we help?

Drop us a line and we'll respond as soon as possible.

Treble