Notes from Udacity Data Science Interview Preparation lesson.

## Data Structure

### The difference between Array and list

• Array has index
• List has order but no indices
• Python list is built as an array

• will tell you where is the next list
• the same as array is that store the value information
• the difference with array is that they store the different information. array store the index inforation, and linked list store a reference to the next element in the list. when coding：
it has two parts in one node. one of them is value of linked list(self.value), another is pointer(self.next) which point to next list.

reference:
python数据结构——链表的实现 http://www.cnblogs.com/linxiyue/p/3551633.html

### stack

• like stack pancake, list-based data structure. But different from linked_list and Array
• LIFO: last in first out.
• PUSH:insert； POP：remove

### Queues

• FIFO: first in first out; list-based
• dequeues or double-end queues can dequeue and enqueue in both head and tail • priority queue. dequeue with highest priority(1). if have same priority, dequeue the oldest one. ### sets

• the difference from list is that it don’t have orders, but can’t have repeated elements

### Maps(dictionary)

• like dictionary. Maps = <key, value>. a group of keys is a set.
• Dictionaries are wonderfully flexible—you can store a wide variety of structures as values. You store another dictionary or a list:

### Hashing

Using a data structure that employs a hash function allows you to look up in constant time.

• using the reminder as index, store this value at this place • if have collisions, two ways: 1, change hash function; 2, store all the value into one collection, the array will turn to ‘bucket’. • in bucket, still can apply hash function • For the load factor, you should divide the number of values by the number of buckets. $Load Factor = Number of Entries / Number of Buckets$
https://classroom.udacity.com/courses/ud944/lessons/7118294395/concepts/79336691870923  ### Hash Maps - In interview, you will be asked about creating "Hash table" to test your understand hashing. also need to know the upside and downside of a hash function

### Hash for String Keys

• string turn into ASCLL, A=65, B=66,C=67 - why use 31, it's more likely convention.

## Trees_Data Structure

• tree is an extension of linked-list
• the following figure, different color has its meaning. ### Tree Traversal

• DFS(depth first search):

• pre-order • in-order • post-order (right first) • Q&A: D, F, E, B, C, A • BFS(Breadth first search): check the same level node first

• Search and Delete, insert. perfect tree, each node has two child ### Binary search tree(BST)

• rule to BSTS: right of node is larger than left • unbalanced BSTS • apply the rule to BSTS and pre-order Traversal. Search and insert

### Heaps

• also a kind of tree has its own rule
• Max heaps and min heaps. max: parent always big than child; min vice versa. link • heaps can have several trees, not like binary
• heaps binary trees search big 0 time is $O(n)$
• heapify: insert in the child leaf first, then compare it to the parent, if bigger then swap their value. The big O time is $O(logn)$ • heap implementation: heap often store as sorted array. as shown below, array only store value and index, but tree will store bunch of pointers, which mean array save space.  ### self-balanced tree(red-black) • definition: minimize the level of tree

• Red Black-Tree an extension of Binary tree. 5 rules

• every node is red or black
• Every leaf (NULL) is black. Every node in your tree doesn’t otherwise have two leaves, must have null children(black)
• If a node is red, then both its children are black.
• the root node is black
• Every simple path from a node to a descendant leaf contains the same number of black nodes. ## Graphs(Networks) - tree is a type of graph. - Node, edge ### directions and cycles

• directions like this figure show

• avoid graphs has cycles, infinite loop • DAG: directed Acyclic(no cycles) Graph. type show up often

### connectivity - right seems has more connection than left one. - remove one connection in left group, whole group will destoryed(disconnected)

#### definition

• disconnected:
Disconnected graphs are very similar whether the graph’s directed or undirected—there is some vertex or group of vertices that have no connection with the rest of the graph.

• weakly connected:
A directed graph is weakly connected when only replacing all of the directed edges with undirected edges can cause it to be connected. Imagine that your graph has several vertices with one outbound edge, meaning an edge that points from it to some other vertex in the graph. There’s no way to reach all of those vertices from any other vertex in the graph, but if those edges were changed to be undirected all vertices would be easily accessible.

• connected:
Here we only use “connected graph” to refer to undirected graphs. In a connected graph, there is some path between one vertex and every other vertex.

• Strongly Connected:
Strongly connected directed graphs must have a path from every node and every other node. So, there must be a path from A to B AND B to A.

#### Graph Representations

vertex orbject can have a list of edges it’s connected to and vice versa. There are other graph represent ways:

• Edge list, could be 2D, and 3D  • Adjacency Matrix, store a matrix, if node 0 connect to node 1, then show 1, otherwise show 0 in matrix. #### Graph Traversal

• just like tree traversal
• DFS(depth first search)

## Algorithm_Searching and Sorting

code visualization: https://visualgo.net/en

• efficiency: $O(log(n))$ ### Recursion

• 1, call it self; 2, base case(get out the loop); 3, alter the input parament ### Sorting

• in plce or not, in place: lest space; not: lest time. Tradeoff

#### bubble Sort(in place)

• Big O is $(n-1)(n-1)=n^2$, best case is $O(n)$, the array is already sorted. #### Merge Sort

• divide the arrray into one by one, then combine every two element except the first element. Then the combined two element compare first, then compare to others just as following images.  • Merge sort efficiency: $O(n*logn)$   #### Quick Sort

• random choose one of element, then compare and move as following figure show.  - efficiency: the worst case is O(n^2), the average and the best case is $O(nlogn)$

reference: http://visualgo.net/en/sorting/