School of Economics and Management
Beihang University
http://yanfei.site

General Information

  • Course: Big Data Essientials (2 credits)
  • Lecturer: Yanfei Kang
  • Tutor: Xixi Li
  • Language: Taught in Chinese. Materials are in English.
  • Reception hours: Questions concerned with this course can be asked during or after each lecture or via email.
  • Lecture notes: available on on https://yanfei.site/teaching/bde.

Unit objectives

  1. learn about the various characteristics of Big Data and its sources;
  2. program with Linux command lines;
  3. learn Python, and skills of Web scraping and natural language processing;
  4. have an overview of the most popular Big Data technology - the Hadoop platform;
  5. case studies to build a fundamental understanding of real world big data problems.

Course contents

  • Introduction to Big Data
  • Linux
    • Using Linux as a Data Scientist
  • Python
    • Introduction to Python
    • Web Scraping with Python
    • Natural Language Processing with Python
  • Hadoop
    • Introduction to Parallel Computing
    • Parallel Computing in R
    • Parallel Computing in Python
    • Hadoop Ecosystem
    • Hadoop HDFS
    • Hadoop Hive
    • Distributed Computing based on MapReduce with Case Studies
  • Guest lecture with case studies