Lecture notes
- Course Introduction (slides)
- Introduction to Big Data (slides)
- Linux
- Using Linux as a Data Scientist (slides)
- Parallel Computing (slides)
- Distributed computing
- Exploring the World of Hadoop (slides)
- Hadoop HDFS (slides)
- Hadoop MapReduce (slides)
- MapReduce with Hadoop Streaming (slides)
- Hive (slides)
- Introduction to Spark (slides)
- Data Processing with Spark (slides)
- Machine Learning with Spark (slides)
- Distributed Linear Regression using Gradient Descent (slides)
- Text processing with Spark (slides)
- Student presentations
Note: Interactive slides are based on Jupyter notebook and aliyun E-MapReduce service. Please find them on the aliyun server. They are only available within the Autumn semester.