# Big Data Essentials¶

#### L3: Introduction to Python¶

Yanfei Kang
yanfeikang@buaa.edu.cn
School of Economics and Management
Beihang University
http://yanfei.site

• A widely used general-purpose, high-level programming language.
• Allows programmers to express concepts in fewer lines of code than C++ or Java.

# Features¶

The core philosophy of the language:

• Beautiful is better than ugly.
• Explicit is better than implicit.
• Simple is better than complex.
• Complex is better than complicated.

Python enables programs to be written compactly and readably:

• the high-level data types allow you to express complex operations in a single statement.
• statement grouping is done by indentation instead of beginning and ending brackets.
• no variable or argument declarations are necessary.

# Using Python¶

The Python interpreter is usually installed as /usr/bin/python3 on those machines where it is available.

• To start a Python interpreter, type the command in your terminal: python3.

• To terminate the Python interpreter, type an end-of-file character (Control-D on Linux, Control-Z on Windows) at the primary prompt. If that doesn’t work, you can exit the interpreter by typing the following command: quit() or exit().

# Executing Python scripts¶

• Write down your Python script and name it as hello.py with .py extension.

• Your script contents look like this

#!/usr/bin/python3
print('Hello World')
• Go to your terminal, make your script executable: chmod +x hello.py.
• Run the script in your terminal: ./hello.py.

Note: The line #!/usr/bin/python3 should appear at the very beginning of your file.

# Text Editors/IDEs¶

To edit Python code, you just need a handy text editor. There are many available, check out the following pages:

# Getting help¶

• Help is available in Python sessions using help(function) .

• Some functions (and modules) have very long help files. When using IPython, these can be paged using the command ?function or function? so that the text can be scrolled using page up and down and q to quit. ??function or function?? can be used to type the entire function including both the docstring and the code.

# Libraries¶

• Python has a large standard library, commonly cited as one of Python's greatest strengths, providing tools suited to many tasks.

• Modules for creating graphical user interfaces, connecting to relational databases, pseudo random number generators, arithmetic with arbitrary precision decimals, manipulating regular expressions, and doing unit testing are also included.

• As of September 2020, the Python Package Index, the official repository of third-party software for Python, contains more than 261,000 packages offering a wide range of functionality, including:

• graphical user interfaces, web frameworks, multimedia, databases, networking and communications.
• test frameworks, automation and web scraping, documentation tools, system administration.
• scientific computing, text processing, image processing.

# Using Python as a Calculator¶

In [14]:
3 + 2 +4

Out[14]:
9
In [4]:
5 + 4*3

Out[4]:
17
In [1]:
8/5.0 # int / float -> float

Out[1]:
1.6
In [2]:
8//5.0 # explicit floor division discards the fractional part

Out[2]:
1.0
In [3]:
5**2

Out[3]:
25

The equal sign (=) is used to assign a value to a variable. Afterwards, no result is displayed before the next interactive prompt:

In [10]:
a = 3
b = 5
c = a + b
c

Out[10]:
8

In interactive mode, the last printed expression is assigned to the variable _. This means that when you are using Python as a desk calculator, it is somewhat easier to continue calculations.

In [16]:
100/3.0

Out[16]:
33.333333333333336
In [18]:
_

Out[18]:
33.333333333333336

# Strings¶

• Python can also manipulate strings, which can be expressed in several ways. They can be enclosed in single quotes ('...') or double quotes ("...") with the same result.
In [7]:
LastName = "Kang"
FirstName = "Yanfei"

In [19]:
print("Hello\nWorld!")

Hello
World!

• If you don't want characters prefaced by \ to be interpreted as special characters, you can use raw strings by adding an r before the first quote:
In [15]:
print(r"Hello \n World!")

Hello \n World!


Strings can be concatenated (glued together) with the + operator, and repeated with *:

In [7]:
"I " + 'L' + 'o'*5  + 've' + ' you'

Out[7]:
'I Looooove you'
In [8]:
n = 10
"G"+"o"*n+"gle"

Out[8]:
'Goooooooooogle'

The built-in function len() returns the length of a string:

In [1]:
len("Yanfei Kang")

Out[1]:
11

Two or more string literals (i.e. the ones enclosed between quotes) next to each other are automatically concatenated. This feature is particularly useful when you want to break long strings:

In [2]:
"Yanfei" "Kang"

Out[2]:
'YanfeiKang'
In [3]:
print("Hi, my name is Yanfei Kang."
" And I am from Beijing.")

Hi, my name is Yanfei Kang. And I am from Beijing.


Strings can be indexed (subscripted), with the first character having index 0. There is no separate character type; a character is simply a string of size one:

In [23]:
Name = "Yanfei Kang"
Name[0]

Out[23]:
'Y'
In [24]:
Name[-1]

Out[24]:
'g'
In [26]:
Name[-2]

Out[26]:
'n'

In addition to indexing, slicing is also supported. While indexing is used to obtain individual characters, slicing allows you to obtain a substring:

In [33]:
Name[0:6] #Remember that [0:6] mathematically means [0, 6).

Out[33]:
'Yanfei'
In [34]:
Name[6:11]

Out[34]:
' Kang'
In [35]:
Name[:6]

Out[35]:
'Yanfei'
In [36]:
Name[6:]

Out[36]:
' Kang'
In [37]:
Name[-4:]

Out[37]:
'Kang'
In [38]:
Name[6:100]

Out[38]:
' Kang'
In [16]:
Name2 = "Yanfei.Kang"
Name2.lower().title().split(".")

Out[16]:
['Yanfei', 'Kang']

# Lists¶

• Python knows a number of compound data types, used to group together other values.
• The most versatile is the list, which can be written as a list of comma-separated values (items) between square brackets.
• Lists might contain items of different types, but usually the items all have the same type.
In [39]:
values = [1,5,7,9,12]

In [20]:
len(values)

Out[20]:
5
In [30]:
values[0]

Out[30]:
1
In [31]:
values[-2:]

Out[31]:
[9, 12]

Lists also supports operations like concatenation:

In [35]:
values + ["Hello","World"]*3

Out[35]:
[1,
2,
1000,
4,
67,
22,
9999,
'Hello',
'World',
'Hello',
'World',
'Hello',
'World']

Lists are a mutable type, i.e. it is possible to change their content:

In [22]:
values = [1,2,3,4,67,22]
values

Out[22]:
[1, 2, 3, 4, 67, 22]
In [23]:
values[2] = 1000
values

Out[23]:
[1, 2, 1000, 4, 67, 22]

You can also add new items at the end of the list, by using the append() method

In [24]:
values.append(9999)
values

Out[24]:
[1, 2, 1000, 4, 67, 22, 9999]

Assignment to slices is also possible, and this can even change the size of the list or clear it entirely:

In [25]:
values[2:4] = [2,3,4]
values

Out[25]:
[1, 2, 2, 3, 4, 67, 22]
In [27]:
values[:] = []
values
len(values)

Out[27]:
0

# Building Functions¶

The Python interpreter has a number of functions built into it that are always available. They are listed here in alphabetical order. Use e.g. help(abs) to see the function help.

abs()   divmod()    input()     open()
staticmethod()      all()   enumerate()
int()   ord()   str()        any()
eval()  isinstance()    pow()   sum()
basestring()    execfile()  issubclass()
print()     super() bin()   file()
iter()  property()  tuple() bool()
filter()    len()   range()     type()
bytearray()     float()     list()
raw_input()     unichr() callable()
format()    locals()    reduce()    unicode()
vars() classmethod()    getattr()   map()
repr()  xrange() cmp()  globals()
max()   reversed()  zip() compile()
hasattr()   memoryview()    round()
__import__() complex()  hash()  min()
set()   apply() delattr()   help()
next()  setattr()   buffer()
dict()  hex()   object()    slice()
coerce() dir()  id()    oct()
sorted()    intern()

# Import modules¶

To import a module (like math) that is not in Python's default module, use

In [38]:
import math


Then you can use all the mathematical functions inside math module as:

In [39]:
math.exp(0)

Out[39]:
1.0

Alternatively, you can do the following changes

In [40]:
import math as mt
mt.exp(1)

Out[40]:
2.718281828459045

If you just want to import one or two functions from a module

In [41]:
from math import exp
exp(3)

Out[41]:
20.085536923187668
In [42]:
from math import exp as myexp

myexp(1)

Out[42]:
2.718281828459045

# The if statements¶

Perhaps the most well-known statement type is the if statement. For example:

In [8]:
x = -5

if x < 0:
x = 0
print('Negative changed to zero')
elif x == 0:
print('Zero')
elif x == 1:
print('Single')
else:
print('More')

Negative changed to zero


Note

• the comma/colon sign(:) should be right after if, elif and else statement.
• the indentation is very important. The first non-blank line after the first line of the string determines the amount of indentation for the entire documentation string.

# The for Statements¶

In [9]:
words = ['cat', 'window', 'defenestrate']
for j in words:
print(j,len(j))

print("I am done!")

cat 3
window 6
defenestrate 12
I am done!


# Defining Functions¶

In [10]:
def fib(n = 200):    # write Fibonacci series up to n
"""
Print a Fibonacci series up to n.

Usage

fib(n)
""" # the function help
a, b = 0, 1
while a < n:
print(a, end="  ")
a, b = b, a+b

• We can create a function that writes the Fibonacci series to an arbitrary boundary.

• The first line should always be a short, concise summary that ends with a period.

• If there are more lines in the documentation string, the second line should be blank.

• The first statement of the function body can optionally be a string literal; this string literal is the function’s documentation string, or docstring.

In [32]:
help(fib)

Help on function fib in module __main__:

fib(n=200)
Print a Fibonacci series up to n.

Usage

fib(n)


In [11]:
fib(200)

0  1  1  2  3  5  8  13  21  34  55  89  144
In [34]:
fib()

0  1  1  2  3  5  8  13  21  34  55  89  144

# Lab 1¶

Write a function to find the roots of $ax^2+bx+c=0$.

# Function with default values¶

The most useful form is to specify a default value for one or more arguments. This creates a function that can be called with fewer arguments than it is defined to allow. For example:

In [12]:
def ask_ok(prompt, retries=4, complaint='Yes or no, please!'):
while True:
ok = input(prompt)
if ok in ('y', 'ye', 'yes'):
return True
if ok in ('n', 'no', 'nop', 'nope'):
return False
retries = retries - 1
if retries < 0:
raise IOError('refusenik user')
print(complaint)

In [13]:
ask_ok("Do you really want to go?")

Do you really want to go?hao
Do you really want to go?shi
Do you really want to go?yes

Out[13]:
True

# Coding Style¶

• Use 4-space indentation, and no tabs.
• Wrap lines so that they don't exceed 79 characters.
• Use blank lines to separate functions and classes, and larger blocks of code inside functions.
• When possible, put comments on a line of their own.
• Use docstrings.
• Use spaces around operators and after commas, but not directly inside bracketing constructs: a = f(1, 2) + g(3, 4).
• Name your classes and functions consistently; the convention is to use CamelCase for classes and lower_case_with_underscores for functions and methods. Always use self as the name for the first method argument (see A First Look at Classes for more on classes and methods).
• Don’t use fancy encodings if your code is meant to be used in international environments. Plain ASCII works best in any case.