Lecture notes
- L0: Course Introduction (interactive slides | static slides)
- L1: Introduction to Big Data (interactive slides | static slides)
- Linux
- L2: Using Linux as a Data Scientist (interactive slides | static slides)
- Python
- L3: Introduction to Python (interactive slides | static slides)
- L4: Statistical Modeling with Python (interactive slides | static slides)
- L5: Web Scraping with Python (interactive slides | static slides)
- Hadoop
- L6: Introduction to Parallel Computing (interactive slides | static slides)
- L7: Exploring the World of Hadoop (interactive slides | static slides)
- L8: Hadoop Streaming (interactive slides | static slides)
- L9: Hadoop Hive (interactive slides | static slides)
- L10: Introduction to Spark (interactive slides | static slides)
- L11: Data Processing with Spark (interactive slides | static slides)
- L12: Machine Learning with Spark (interactive slides | static slides)
- L13: Text processing with Spark (interactive slides | static slides)
- L14: SparkR
Note: Interactive slides are based on Jupyter notebook and aliyun E-MapReduce service. They are only available in the 2020 Autumn semester.