General Information¶
Course: Big Data Essientials (2 credits).
Lecturer: Yanfei Kang.
Language: Taught in Chinese. Materials are in English.
Reception hours: Questions concerned with this course can be asked during or after each lecture or via email.
Lecture notes: available on on https://yanfei.site/teaching/bde.
Textbook in Chinese. Many other reference materials will be given per lecture.
More on lecture notes¶
Reproducible lecture notes.
Based on Jupyter notebook, a web-based interactive development environment for Jupyter notebooks, code, and data.
You can save the *.ipynb files to local.
You can also edit and build your own lecture notes.
You will find both interactive and static slides.
Flexible as it is!
Unit objectives¶
- learn about various features of Big Data and its sources;
- program with Linux command lines;
- learn git for version control;
- learn statistical modeling and web scraping with Python;
- have an overview of the most popular Big Data technology - Hadoop/Hive/Spark;
- case studies to build a fundamental understanding of real world big data problems.
Course contents¶
- Introduction to Big Data
- Linux
- Using Linux as a Data Scientist
- Git
- Using Git for Version Control
- Python
- Statistical Modeling with Python
- Web Scraping with Python
- Distributed computing
- Introduction to Parallel Computing
- Exploring the World of Hadoop
- Hadoop Streaming
- Hive
- Spark
- Guest lecture with case studies