# Data structure and algorithm(1)

Notes from Udacity Data Science Interview Preparation lesson.

- big O Notation

http://bigocheatsheet.com/

## Data Structure

### The difference between Array and list

- Array has index
- List has order but no indices
- Python list is built as an array

### linked lists

- will tell you where is the next list
- the same as array is that store the value information
- the difference with array is that they store the different information. array store the index inforation, and linked list store a reference to the next element in the list.

when coding：

it has two parts in one node. one of them is value of linked list(self.value), another is pointer(self.next) which point to next list.

1 | lass Node(): |

reference:

python数据结构——链表的实现 http://www.cnblogs.com/linxiyue/p/3551633.html

### stack

- like stack pancake, list-based data structure. But different from linked_list and Array
- LIFO: last in first out.
- PUSH:insert； POP：remove

### Queues

- FIFO: first in first out; list-based
- dequeues or double-end queues can dequeue and enqueue in both head and tail
- priority queue. dequeue with highest priority(1). if have same priority, dequeue the oldest one.

### sets

- the difference from list is that it don’t have orders, but can’t have repeated elements

### Maps(dictionary)

- like dictionary. Maps = <key, value>. a group of keys is a set.
- Dictionaries are wonderfully flexible—you can store a wide variety of structures as values. You store another dictionary or a list:

1 | Cities to add: |

### Hashing

Using a data structure that employs a hash function allows you to look up in constant time.

- using the reminder as index, store this value at this place
- if have collisions, two ways: 1, change hash function; 2, store all the value into one collection, the array will turn to ‘bucket’.
- in bucket, still can apply hash function
- For the load factor, you should divide the number of values by the number of buckets. $Load Factor = Number of Entries / Number of Buckets$

https://classroom.udacity.com/courses/ud944/lessons/7118294395/concepts/79336691870923

### Hash Maps

- In interview, you will be asked about creating "Hash table" to test your understand hashing. also need to know the upside and downside of a hash function### Hash for String Keys

- string turn into ASCLL, A=65, B=66,C=67

1 | example: Hash Value = (ASCII Value of First Letter * 100) + ASCII Value of Second Letter |

## Trees_Data Structure

- tree is an extension of linked-list
- the following figure, different color has its meaning.

### Tree Traversal

DFS(depth first search):

pre-order

in-order

post-order (right first)

Q&A: D, F, E, B, C, A

BFS(Breadth first search): check the same level node first

Search and Delete, insert. perfect tree, each node has two child

### Binary search tree(BST)

1 | class BinaryTree(object): |

rule to BSTS: right of node is larger than left

unbalanced BSTS

apply the rule to BSTS and pre-order Traversal. Search and insert

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31class BST(object):

def __init__(self, root):

self.root = Node(root)

def insert(self, new_val):

self.insert_helper(self.root, new_val)

def insert_helper(self, current, new_val):

if current.value < new_val:

if current.right:

self.insert_helper(current.right, new_val)

else:

current.right = Node(new_val)

else:

if current.left:

self.insert_helper(current.left, new_val)

else:

current.left = Node(new_val)

def search(self, find_val):

return self.search_helper(self.root, find_val)

def search_helper(self, current, find_val):

if current:

if current.value == find_val:

return True

elif current.value < find_val:

return self.search_helper(current.right, find_val)

else:

return self.search_helper(current.left, find_val)

return False

### Heaps

- also a kind of tree has its own rule
- Max heaps and min heaps. max: parent always big than child; min vice versa. link
- heaps can have several trees, not like binary
- heaps binary trees search big 0 time is $O(n)$
- heapify: insert in the child leaf first, then compare it to the parent, if bigger then swap their value. The big O time is $O(logn)$
- heap implementation: heap often store as sorted array. as shown below, array only store value and index, but tree will store bunch of pointers, which mean array save space.

### self-balanced tree(red-black)

definition: minimize the level of tree

**Red Black-Tree**an extension of Binary tree. 5 rules- every node is red or black
- Every leaf (NULL) is black. Every node in your tree doesn’t otherwise have two leaves, must have null children(black)
- If a node is red, then both its children are black.
- the root node is black
- Every simple path from a node to a descendant leaf contains the same number of black nodes.

still watching: https://classroom.udacity.com/courses/ud944/lessons/7122604912/concepts/78867246220923

https://www.youtube.com/watch?v=O5Yl-m0YbVA

## Graphs(Networks)

- tree is a type of graph. - Node, edge### directions and cycles

directions like this figure show

avoid graphs has cycles, infinite loop

DAG: directed Acyclic(no cycles) Graph. type show up often

### connectivity

- right seems has more connection than left one. - remove one connection in left group, whole group will destoryed(disconnected)#### definition

disconnected:

Disconnected graphs are very similar whether the graph’s directed or undirected—there is some vertex or group of vertices that have no connection with the rest of the graph.weakly connected:

A directed graph is weakly connected when only replacing all of the directed edges with undirected edges can cause it to be connected. Imagine that your graph has several vertices with one outbound edge, meaning an edge that points from it to some other vertex in the graph. There’s no way to reach all of those vertices from any other vertex in the graph, but if those edges were changed to be undirected all vertices would be easily accessible.connected:

Here we only use “connected graph” to refer to undirected graphs. In a connected graph, there is some path between one vertex and every other vertex.Strongly Connected:

Strongly connected directed graphs must have a path from every node and every other node. So, there must be a path from A to B AND B to A.

#### Graph Representations

vertex orbject can have a list of edges it’s connected to and vice versa.

There are other graph represent ways:

Edge list, could be 2D, and 3D

Adjacency list, store the node that adjacent to index

Adjacency Matrix, store a matrix, if node 0 connect to node 1, then show 1, otherwise show 0 in matrix.

code from udacity link

1 | class Node(object): |

#### Graph Traversal

- just like tree traversal
- DFS(depth first search)
- BFS(breadth first search)

## Algorithm_Searching and Sorting

code visualization: https://visualgo.net/en

### Binary Search

efficiency: $O(log(n))$

1

2

3

4

5

6

7

8

9

10

11

12def binarySearch(listData, value):

low = 0

high = len(listData)-1

while (low <= high):

mid = (low+high)/2

if (listData[mid] = value):

return mid

elif(listData[mid] < value):

low = mid+1

else:

high = mid-1

return -1Resource: https://www.cs.usfca.edu/~galles/visualization/Search.html

### Recursion

- 1, call it self; 2, base case(get out the loop); 3, alter the input parament
1

2

3

4

5the Fibonacci Sequence.

def get_fib(position):

if position == 0 or position == 1:

return position

return get_fib(position - 1) + get_fib(position - 2)

### Sorting

- in plce or not, in place: lest space; not: lest time. Tradeoff

#### bubble Sort(in place)

Big O is $(n-1)(n-1)=n^2$, best case is $O(n)$, the array is already sorted.

reference: https://en.wikipedia.org/wiki/Bubble_sort

#### Merge Sort

divide the arrray into one by one, then combine every two element except the first element. Then the combined two element compare first, then compare to others just as following images.

Merge sort efficiency: $O(n*logn)$

#### Quick Sort

- random choose one of element, then compare and move as following figure show. - efficiency: the worst case is O(n^2), the average and the best case is $O(nlogn)$

1 | for each (unsorted) partition |

reference: http://visualgo.net/en/sorting/