Big Data with Java Lambdas!
The course is 6 hours long and would be equivalent to a 3 day training course.
Having problems? check the errata
Introduction 16m 56s A brief overview of Spark and some of the jargon terms you'll be encountering. |
Preview |
Getting Started 21m 35s Let's get Spark "installed" - it's just a maven dependency. |
Preview |
Reduces 14m 19s Reduces are fundamental transformations. Here we'll do a very basic reduce to establish the idea. |
Watch |
Update - problems with NotSerializableExceptions? 6m 28s If, in the next chapter on "Mapping" (or any future chapters) you experience a NotSerializableException, it is because your CPU architecture is sophisticated enough for Spark to treat each CPU as a node in a cluster! But this causes a crash with System.out.println. See this video for a simple workaround. |
Watch |
Mapping 17m 45s Mapping allows you transform the RDD from one form to another. |
Watch |
Tuples 18m 12s Commonly used in Scala, Tuples appear everywhere in the Spark Core API. We can use them in Java, but they are a bit awkward. |
Watch |
PairRDDs 41m 30s A PairRDD is a key/value representation of a dataset. |
Watch |
FlatMap and Filtering 14m 46s FlatMaps look complicated but it's a simple transformation. Also we'll see how to filter. |
Watch |
Reading Files 13m 26s We can read local files, or from S3 or HDFS big data file systems. |
Watch |
Keyword Ranking 41m 47s A major exercise, we'll automatically generate keywords for training courses based on their subtitle files. |
Watch |
Sorts and Coalesces 28m 44s There are some misunderstandings with sorts and we'll address that here. Also - what is Coalesce used for (and when it shouldn't be used). |
Watch |
Deploying to EMR 40m 42s We'll now deploy to a live cluster. Spark can deploy to Hadoop Yarn clusters or you can build a standalone cluster. Here we use Amazon EMR. Even if you're not using EMR, do watch this chapter as there is a lot to learn from running on real hardware. |
Watch |
Joins 27m 27s One last transformation type on the course - how to do Inner, Outer, Full and Cartesian Joins. |
Watch |
Big Data Big Exercise 51m 35s A chance for you to practice everything - a real "course ranking" process we run here at VirtualPairProgrammers. |
Watch |
Performance 80m 8s A deeper look into the internals of Spark. |
Watch |