Big Data Essentials

Lecture notes

  • Course Introduction (slides)
  • Introduction to Big Data (slides)
  • Linux
    • Using Linux as a Data Scientist (slides)
  • Parallel Computing (slides)
  • Distributed computing
    • Exploring the World of Hadoop (slides)
    • Hadoop HDFS (slides)
    • Hadoop MapReduce (slides)
    • MapReduce with Hadoop Streaming (slides)
    • Hive (slides)
    • Introduction to Spark (slides)
    • Data Processing with Spark (slides)
    • Machine Learning with Spark (slides)
    • Text processing with Spark (slides)
  • Student presentations

Note: Interactive slides are based on Jupyter notebook and aliyun E-MapReduce service. Please find them on the aliyun server. They are only available within the Autumn semester.