Big Data Essentials¶

L0: Course Information¶





Yanfei Kang
yanfeikang@buaa.edu.cn
School of Economics and Management
Beihang University
http://yanfei.site

General Information¶

  • Course: Big Data Essientials (2 credits).

  • Lecturer: Yanfei Kang.

  • Language: Taught in Chinese. Materials are in English.

  • Reception hours: Questions concerned with this course can be asked during or after each lecture or via email.

  • Lecture notes: available on on https://yanfei.site/teaching/bde.

  • Textbook in Chinese. Many other reference materials will be given per lecture.

More on lecture notes¶

  • Reproducible lecture notes.

  • Based on Jupyter notebook, a web-based interactive development environment for Jupyter notebooks, code, and data.

  • You can save the *.ipynb files to local.

  • You can also edit and build your own lecture notes.

  • You will find both interactive and static slides.

  • Flexible as it is!

Unit objectives¶

  1. learn about various features of Big Data and its sources;
  2. program with Linux command lines;
  3. learn parallel computing using your own computer;
  4. have an overview of the most popular Big Data technology - Hadoop/Hive/Spark;
  5. case studies to build a fundamental understanding of real world big data problems.

Course contents¶

  • Introduction to Big Data
  • Linux
    • Using Linux as a Data Scientist
  • Parallel computing
  • Distributed computing
    • Exploring the World of Hadoop
    • Hadoop HDFS
    • Hadoop MapReduce
    • Hive
    • Spark
  • Student presentations