Big Data Essentials

Lecture notes

  • L0: Course Introduction (slides)
  • L1: Introduction to Big Data (slides)
  • Linux
    • L2: Using Linux as a Data Scientist (slides)
  • Git
  • Python
  • Distributed computing
    • L6: Introduction to Parallel Computing (slides)
    • L7: Exploring the World of Hadoop (slides)
    • L8: Hadoop Streaming (slides)
    • L9: Hive (slides)
    • L10: Introduction to Spark (slides)
    • L11: Data Processing with Spark (slides)
    • L12: Machine Learning with Spark (slides)
    • L13: Text processing with Spark (slides)
    • L14: Guest lecture on advanced topics

Note: Interactive slides are based on Jupyter notebook and aliyun E-MapReduce service. Please find them on the aliyun server. They are only available within the Autumn semester.