Big Data Essentials

L0: Course Information





Yanfei Kang
yanfeikang@buaa.edu.cn
School of Economics and Management
Beihang University
http://yanfei.site

General Information

  • Course: Big Data Essientials (2 credits).

  • Lecturer: Yanfei Kang.

  • Language: Taught in Chinese. Materials are in English.

  • Reception hours: Questions concerned with this course can be asked during or after each lecture or via email.

  • Lecture notes: available on on https://yanfei.site/teaching/bde2020.

  • Textbook in Chinese. Many other reference materials will be given per lecture.

More on lecture notes

  • Reproducible lecture notes.

  • Based on Jupyter notebook, a web-based interactive development environment for Jupyter notebooks, code, and data.

  • You can save the *.ipynb files to local.

  • You can also edit and build your own lecture notes.

  • You will find I provide both interactive and static slides on the course website.

  • Flexible as it is!

Unit objectives

  1. learn about various features of Big Data and its sources;
  2. program with Linux command lines;
  3. learn Python, skills of Web scraping and natural language processing;
  4. have an overview of the most popular Big Data technology - the Hadoop platform;
  5. case studies to build a fundamental understanding of real world big data problems.

Course contents

  • Introduction to Big Data
  • Linux
    • Using Linux as a Data Scientist
  • Python
    • Introduction to Python
    • Web Scraping with Python
    • Natural Language Processing with Python
  • Hadoop
    • Introduction to Parallel Computing
    • Parallel Computing in R
    • Parallel Computing in Python
    • Hadoop Ecosystem
    • Hadoop HDFS
    • Hadoop Hive
    • Distributed Computing based on MapReduce with Case Studies
  • Guest lecture with case studies