- What this course is:
- Classic data structurs and algos
- Queues, dicts, graphs, sorting
- Parallelism and concurrency (new! exciting!)

- Data structures are clever ways of storing information to perform efficent compututation
- There are
*always*tradeoffs with each option

- There are
- Abstract Data Type (ADT) is a mathematical description of a “thing” with a set of operations
- Data Structure: specific orginization of data and family of algos for implementing ADT
- Implementation: the actual concrete code
- We can use circular arrays or double-linked lists for queues
- Both work!

- What do we care about? (DSA)
- Correctness - does it work?
- Performance - speed! (and memory)

- Evaluating the
*program*is difficult - better to evaluate the*algorithm* - Analyzing code: counting code constructs:
- Basic operations (indexing, assignment): \(\text{constant time}\)
- Loops: \(\text{num iterations} \times \text{loop body}\)
- Conditionals: \(\text{time of condition} + \text{slowest branch}\)
- Function calls: \(\text{timee of functions body}\)
- Recursion: \(\text{solve recurrence equation}\)

- Complexity cases:
- Worst-case complexity: max num of steps algo takes on “most challenging” input \(n\)
- Best-case complexity: min num of steps algo takes on “easiest” input of size \(n\)
- Average-case complexity:
*what does “average” mean?* - Amortized-case complexity: we learn about this later

- Linear search, best case: \(O(1)\), worst case: \(O(n)\)
- Asymptotic analysis: big \(O\) notation
- A function \(f\) is in \(O(\tilde{f})\) if \(f\) and \(\tilde{f}\) have the same asymptoic behavior

- How to show \(f(n)\) is in \(O(g(n))\)?
- Pick a \(c\) large enough to cover the constant factors
- Pick a \(n_0\) large enough to cover the lower order terms

- Dropping coefficents is ususally okay, be careful otherwise
- Upper bound: Big-O
- Lower bound: Big-Omega, same as upper bound but for min
- Big-Theta bound (tight bound): \(f(n)\) is in the upper and lower bounds of \(g(n)\)
- Worst/best-case vs asymptotic analysis
- Think of it as comparing scenarios

- Amortization: worst-case is too pessimistic
- “what is the average run time of operations in the
*worst*sequence?” **Not**the average (average is an average input, this is average operations over a sequence of worst-case operations)

- “what is the average run time of operations in the

- Priority Queue: Highest Priority, First Out
- Holds comparable data
`insert`

: adds an item at the end`deleteMin`

: finds, returns, and removes the minimum element in the queue

- Heap: only pay for functionality needed (half-sorted)
- Similar to trees!
- Both operations are: \(\Theta(log n)\)

- Tree terminology:
`depth`

: how many levels from root to node`height`

: how far from “deepest” descendent leaf node

- Binary tree: each node has max 2 children
`n`

-ary tree: each node has max`n`

children- Perfect tree: each row is completely full
- Complete tree: each row is completely full except the bottom row
- The bottom row must be filled up from left to right

- (Binary Min-) Heap:
- Complete binary tree structure
- Every (non-root) node has a priority value greater than its parent
`insert:`

- Preserve complete tree structure property
- This might break heap order property
- Percolate up to restore heap order property

`deleteMin:`

- Remove root node
- Move bottom side node to root
- This might break heap order property
- Percolate down to restore property

- We can use an array to store the heap
- We will skip index 0 to make the math easier
- To get to the child nodes from node
`i`

, go to`i*2`

and`i*2 + 1`

- Parent is integer division

- To increase or decrease a key by an amount, we change it then percolate
- To delete, we simply decrease the key by infinity, then delete min

- Naively building a heap:
`insert`

for each element- This is \(O(n * log n)\)

- We can do better: Floyd’s
`buildHeap`

- \(O(n)\) runtime
- Randomly add all elements into the array
- Percolate down from each element one level above the leaves, up to the root
- As each level increases the amount of operations but halves the number of iterations, it is \(O(n)\)

- Counting recursive code:
- Each recursive method call:
- Perform some
*non-recursive*work: \(w(n)\) - Call the method \(T(n)\) on a smaller part of the list \(T(n-1)\) (where \(T(n)\) is the time or cost function)
- Do some base case work: \(b(n)\)

- Perform some

- Each recursive method call:
- Total cost: \(T(n) = w(n) + T(n - 1)\) where some base case: \(T(1) = b(n)\)
- This is basically the recurrence function/relation
- General formula and closed form are ways to “solve” the function

- How to solve a recurrence function?
- Unrolling: substitution until we find a pattern
- Write a recurrence
- Find general formula: expand
- Should still have a \(T\) function on both sides

- Find closed form: find when base case occurs, then get to the base case
- Now no \(T\) or \(i\): all constants or \(n\)

- Asymptotic analysis

- Recurrence relation: Tree Method
- We draw an actual tree of the work required for function calls
- Allows us to find the amount of nodes quickly and the work per node

- Common recurrences:
- \(T(n) = T(\frac{n}{2}) + c \in \Theta (log n)\): Binary search
- \(T(n) = 2T(\frac{n}{2}) + n \in \Theta (n log n)\): Merge sort
- \(T(n) = 2T(\frac{n}{2}) + c \in \Theta (n)\): Recursive binary sum
- \(T(n) = T(n - 1) + c \in \Theta (n)\): Recursive sum

- Dictionary: associate associates
`key`

,`value`

pairs- Set of unique
`key`

,`value`

pairs - (For some) keys must be comparable
- Operations:
`insert(k, v)`

: place`k`

,`v`

in dictionary`find(k)`

: return the`v`

associated with`k`

`delete(k)`

: delete and return the`k`

,`v`

- Set of unique
- Set is just a dictionary with boolean values
- Naieve implementation operations are \(\Theta (n)\)
`find(k)`

is \(\Theta (log n)\) for sorted arrays

- Many data structure operations come down to preserving structure and order properties
- Comparable keys allow for ordering in storage
- Lazy deletion: we only “mark” as deleted - is the element there?
- Allows for removing in batches
- Increases
`find(k)`

runtime

- Binary search trees (BST) have “average” \(\Theta (log n)\) runtime, but could be \(\Theta (n)\)
- All elements are either roots, left subtrees, or right subtrees

- BST Delete:
- Case 1: Leaf
- Just remove the leaf

- Case 2: One child
- Connect parent to the child node

- Case 3: Two children
- We can replace with smallest element from right tree or largest from left tree, predecessor or successor
- Find the max on the left subtree, swap with parent then delete
- Find the min on the right subtree, swap with parent then delete

- Case 1: Leaf
- Worse case for BST occurs when the tree is unbalanced
- Ideas to balance the BST: (while keeping fast runtime)
- Left and right subtree of
`root`

have same number of nodes- Too weak

- Left and right subtree of
`root`

have same height- Too weak

- Left and right subtree of every tree have same number of nodes
- Perfect tree, too strong

- Left and right subtree of every tree have same height
- Perfect tree, too strong

- Left and right subtree of each node’s heights differ by at most 1
- Ensures
`root`

height is \(\Theta (log n)\) - Quick: \(\Theta (1)\) rotations

- Ensures

- Left and right subtree of
- AVL Balance Property:
`balance(node) = height(node.left) - height(node.right)`

- AVL Tree: a BST with guaranteed balancing
- BST with the added condition that for all
`node`

s,`-1 <= balance(node) <= 1`

- We need to keep track of the heights

- BST with the added condition that for all

- AVL Operations:
- Find: same as BST
- Insert: BST insert, then check and fix balance

- Let
`p`

be the problem node where an imbalance occurs - Two main cases: inserted node is in the outside or inside (e.g. right-right vs right-left)
- Outside cases are solved with a “single rotation”
- Inside cases are solved with a “double rotation”

- Insert:
- BST insert: afterwards, each node’s height in the path to the bottom
*might*have changed - Recursive backtracking: calculate new height and detect any height imbalances
- If imbalance, find case and rotate. Only the deepest
`p`

needs to be addressed

- BST insert: afterwards, each node’s height in the path to the bottom
- Single rotation: (left-left)
- Move child of
`p`

to position of`p`

`p`

becomes the “other” child- Old “other” child takes
`p`

’s new empty child - Other subtrees move accordingly

- Move child of
- Double rotation:
- Rotate into an outside case with
`p`

’s child and grandchild - Rotate
`p`

and`p`

’s new child- These rotations are in different directions

- Rotate into an outside case with

- Motivation behind B-Trees: \(\Theta (1)\) memory access operations are not insignificant
- Getting data from disk is many, many operations (even if linear time)
- Thus, minimizing disk access is a goal
- This is of course if data doesn’t fit in cache (i.e. databases)

- If we’re accessing a byte in memory, might as well access a whole block and store it in cache
- Temporal locality: if an address is referenced, it tends to be referenced
- Spatial locality: if an address is referenced, addresses close by are more likely to be referenced soon

- The amount of data moved from disk to memory is block size, memory to cache is line size
- These are not under program control

- Problem: large dictionaries requre that most of memory will be stored on disk
- Desire: a balanced tree that is even shallower than AVL trees to minimize disk accesses
- A key idea: increase branching factor of the tree
- B Trees:
- Two types of nodes
- Internal nodes, having M-1 keys and M children
- Leaf nodes, storing up to L sorted data items

- Order property:
- Subtree between keys a and b contains only data which is: a <= x < b

- Structure properties:
- Root: If tree has <= L items, root is a leaf, else has between 2 and M children
- Internal nodes: have between ceil(M/2) and M children (i.e. at least half full)
- Leaf nodes: all leaves at the same depth, have between ceil(L/2) and L data items

- Any M > 2 and L will work, but we pick them based on disk-block size

- Two types of nodes
- Many trees stored in one internal node, binary search can be inconsequential compared to disk access
- Only bring one leaf of data items into memory, no need for loading unnecessary items into memory
- Adding to a leaf which is not already full is easy, just add like a sorted list
- When a leaf is full, we split it into two leaves and add them to the parent
- If the parent is full (so we cannot add another child), we split the parent
- If the parents parent … we eventually split the root, where we create a new root with the two children
- This is how a B-Tree gains height

- Deletion:
- First remove the item
- Then if the leaf contains too few, adopt from a neighbor
- Unless neighbor is at minimum, then combine to create new child
- If parent is now at minimum, adopt from a neighbor
- repeat…

- Now switching from tree-based data structures to array-based ones
- Hashing: take in a key, convert it into an integer
- This allows us to index into an array with the integer

- If two keys have the same hash, we have a collision
- Always will happen: infinite keys, finite array length

- Hash table runtime average is close to \(\Theta (1)\)
- Hash tables don’t allow us to order well unlike trees
- We expect the number of key-value pairs to be much less than the possible domain
- Ideal hash function is fast and avoids excessive collisions
- Often the case that the client implements some hash function, then program mods by the array length
- In java, the client will override the
`hashCode`

method

- In java, the client will override the
- Use prime number as a table size
- Real life data tends to follow a pattern, primes don’t follow them

- Try to use the most identifiable fields into the hash function
- Trade-off between how many fields and amount of collisions
- We use
*educated guesses*

- We use
- If the key space is an int, using int and modulo is intuitive
- Even if every value has a distinct key, still collisions because mod

- Number one step is to rely on expertise of others
- Separate chaining:
- All keys that map to the same table location are added to a list

- Worst case runtime for
`find`

is \(\Theta (n)\)- Not worth optimizing if your hash function isn’t terrible

- Load factor is the average number of elements per bucket
- Want to keep load factor constant, prevent it from being too large
- Typically if lambda is greater than one, increase the table size

`delete`

is the same as`find`

intuitively

- Open Addressing: attempt to not store chains
- Resolving collisions by trying a sequence of other positions in the table

- Linar Probing: when inserting, if index
`x`

is full, then try`x+1`

- Repeat as necessary…
- We are “probing” the array “linarlly”

- Probing:
`i`

th probe into array:`(h(key) + f(i)) % TableSize`

- Open addressing is much worse as load factor increses
- To find using probing, we follow probe sequence until we find the key or empty spot
- Delete: use lazy deletion, replace value with a flag
- Primary clustering
- Linar probing tends to produce clusters of values
- This leads to runtime issues

- Quadratic probing:
`i`

th probe into array:`(h(key) + i**2) % TableSize`

- Possible to have infinite cycles because of mod

- If TableSize is prime and the load factor is less than 0.5, quadratic probing will find empty slot in maximum TableSize / 2 probe
- Secondary clutering happens when the two keys hash to the same index, probing doesn’t help
- Double hashing:
`f(i) = i*g(key)`

- Idea is that two hashes won’t collide much
- Ensure
`g(key) != 0`

otherwise infinite loop - Can still lead to infinite cycles
- Important to pick good functions with guarantees

- When we resize the array we need to rehash everything
- Good to resize separate chaining when load factor approaches one, open addressing half

- Good to make size twice as big, but that won’t be prime!
- Keep a hardcoded list of primes at good intervals
- If it continues to grow, say screw it and double (I think)

- View slides or Effective Java for tips on how to hash different data types

**Lecture 12**

- Sorting goals:
- Stable: preserve original ordering in case of ties
- In-Place: only use a constant amount of auxiliary storage
- Fast: \(O(n log n)\) and/or good constant factors

- Simple algorithms: \(O(n^2)\)
- Insertion sort
- Selection sort
- Bubble sort

- Fancier algorithms: \(O(n log n)\)
- Heap sort
- Merge sort
- Quick sort (avg)

- Specialized algorithms: \(O(n)\)
- Bucket sort
- Radix sort

- Insertion sort:
- Maintain a sorted subarray, insert the unsorted elements into it
- Stable!
- In-place!
- Slow :(

- Selection sort:
- Maintain a sorted subarray, find the smallest remaining element and append it to sorted
- Stable!
- In-place!
- \(O(n^2)\), but good constant factors

- Bubble sort: (it sucks!)
- \(O(n^2)\) and bad constant factors
- Never use

- Heap sort:
- Put all elements into a heap, then remove the elements
- Not stable
- Can be in-place by treating the inital array as a heap, then inserting into back of array
- \(O(n log n)\), but bad constant factors

- AVL sort:
- We pretend this doesn’t exist!
- Not in-place, worse constant factors
- Heap sort is strictly better

- Merge sort: divide and conquer!
- Sort left and right halves of elements recursivly, then combine into one list
- Stable (always pick from left array first)
- Not in-place: \(O(n)\)
- \(O(n log n)\), bad constant factors because of copying

- Quick sort:
- Make sure to use the 332 quicksort
- Algorithm:
- Pick a pivot element, hopefully the median element
- Divide the array into two pieces, less than and greater than pivot
- Recursivly sort the two pieces
- Combine the sorted pieces

- Picking a good pivot:
- Pick first or last element: fast to pick but likely worst-case
- Pick random element: good, but randomness is expensive
- Median of 3: pick median of first, middle, and last element, overall good

- How to split into two pieces: Hoare partitioning
- Swap pivot with
`arr[lo]`

(move it out of the way) - Use two pointers
`l`

and`r`

starting at`lo+1`

and`hi-1`

and move inwards - Put pivot back in middle: swap with
`arr[l]`

- Swap pivot with
- Not stable
- In-place!
- \(O(n log n)\) average case, worst constant factors
- In practice, way, way better

- Comparison sorting: CUTOFF strategy
- Use comparison sort until the array is small enough, then use a sort with good constants

- All comparison sorting algorithms have a lower bound of \(O(n log n)\)
- Bucket sort:
- Find the
`min`

and`max`

value, make an auxillary array of that range - Iterate through the array and count how many of each number
- Copy the auxillary array into the original array
- Stable!
- Not in-place
- Possibly very large memory usage

- Good speed \(O(n + k)\) and constants
- Non-integers: make each bucket a linked list of items

- Find the
- Radix sort:
- For each digit (starting from LSD), implement bucket sort and repeat
- As it is stable, this allows for ordering
- Stable
- Not in-place
- Very fast, \(O(d(n+k))\) and good constant factors

- A graph is a mathematical representation of a set of objects connected by links
- Verticies and edges!

- Terminology:
- Undirected graphs: edges have no direction, two-way, facebook friends
- Degree of a vertex: number of edges containing that vertex (undirected)
- Directed graphs: edges have a direction, pointing, instagram follows
- In-Degree of a vertex: the number of in-bound edges
- Out-Degree of a vertex: the number of outbound edges
- Weighted graphs: each edge has a weight or cost, represting relationships
- Walk: a sequence of adjacent vertices
- Path: a walk that doesn’t repeat a vertex
- Cycle: a walk that doesn’t repeat a vertex
*except*the first and last vertex - Path length: number of edges in a path
- Path cost: the sum of the weights in the path
- Undirected graph connectivity: an undirected graph is connected if there exists a path between each pair of vertices
- Undirected graph completeness: an undirected graph is complete if there is an edge between each pair of vertices
- Directed strong connectivity: a directed graph is strongly connected if there is a path between each pair of vertices
- Directed weakly connectivity: a directed graph is weakly connected if there exists a path between each pair of vertices
*ignoring*edge directions - Directed graph completeness: a directed graph is complete if for all pairs of vertices there is an edge in each direction

- Self-edges: we pretend they don’t exist… (in 332)
- Trees are an undirected, acyclic, and connected graph
- Trees are rooted graphs where we think of edges as directed
- Directed Acyclic Graphs (DAGs):
- Directed graph with
*no*cycles

- Directed graph with
- Maximum number of edges:
- Undirected: \(0 \leq \mid E \mid < \mid V \mid ^2\)
- Directed: \(0 \leq \mid E \mid \leq \mid V \mid ^2\)
- So: \(\mid E \mid \in O(\mid V \mid ^2)\)

- Sparse graph: \(\mid E \mid \in \Theta (\mid V \mid)\)
- Dense graph: \(\mid E \mid \in \Theta (\mid V \mid ^2)\)
- Graphs: the data structure
- Common operations:
- Is \((v, u)\) an edge?
- "What are the neighbors of \(v\)?

- Two standards:
- Adjacency Matrix
- Adjacency List

- Common operations:
- Adjacency Matrix: a 2D array of booleans representing the presence of an edge
- Properties:
- Get a vertex’s out-bound edges: \(O(\mid V \mid)\)
- Get a vertex’s in-bound edges: \(O(\mid V \mid)\)
- Decide if edge exists: \(O(1)\)
- Insert an edge: \(O(1)\)
- Delete an edge: \(O(1)\)

- Space requirements: \(O(\mid V \mid ^2)\)
- Adding vertices is not
*really*possible - Better for dense graphs (you’re already using the memory and complexity)
- Can be weighted by using non-booleans as type
- Then you must define “not an edge”

- Properties:
- Adjacency List: an array of vertices each with a linked list of edges
- Properties:
- Get a vertex’s out-bound edges: \(O(d)\)
- Get a vertex’s in-bound edges: \(O(\mid V \mid + \mid E \mid )\)
- Decide if edge exists: \(O(d)\)
- Insert an edge: \(O(1)\)
- Delete an edge: \(O(d)\)

- Space requirements: \(O(\mid V \mid + \mid E \mid )\)
- Better for sparse graphs (more operations depend on the number of elements)

- Properties:
- Processing a node: “doing something” at a node
- Visiting a node: exploring it

- Basic idea: follow nodes and mark them to prevent cycles
- Use slide code for the algorithms
- DFS uses a stack, BFS uses a queue (that’s the main difference)
- Iterative DFS:
- Use a stack, we process after popping each node

- Recursive DFS:
- The “stack” is the program’s recursive stack

- BFS:
- Similar algorithm structure, just use a queue

- Comparisons:
- DFS
- Typically uses much less memory
- Applications: topological sorting, cycle detection

- BFS
- Typically uses more memory
- Applications: shortest paths

- Iterative Deep DFS (IDDFS)
- Use DFS with increasing depth limits
- Good memory and finds the shortest path

- DFS
- To get the final path, we include the predecessor node in the mark
- To get path, we then backtrack
- Often implemented with an array

- Shortest paths for weighted graphs: we often assume no negative weights
- Dijkstra’s Algorithm:
- Initially, start node has cost 0 and all other nodes have infinite cost
- At each step:
- Pick closesst unvisited vertex
`v`

- Add it to the set of visited vertices
- Update distances for nodes with edges from
`v`

- Pick closesst unvisited vertex

- Dijkstra’s is a greedy algorithm
- At each step, it does what seems best at that step
- It explores every node, so it must find globablly optimal path
- Unoptimal runtime is: \(O(\mid V \mid ^2 + \mid E \mid )\)

- Can make more optimal by using a minHeap: \(O(\mid V \mid log \mid V \mid + \mid E \mid log \mid V \mid )\)
- \(O( \mid V \mid log \mid V \mid )\) for a sparse graph

**Parallelism**- We typically have worked with sequential programming: one thing happens after another
- Removing this assumption means many new things to think about:
- How can we divide work between different threads and synchronize?
- How can we parallelize algorithms?
- How can we support concurrent access with datastructures?

- This means we should be able to do multiple things once in one program
- Parallelism: using extra resources to solve a problem faster
- Concurrency: correctly and efficiently manage access to shared resources
- Threads can communicate by writing to shared locations
- Need to make sure that it doesn’t write in bad way!

- Message-passing: threads communicate by sending and receiving messages
- Wait for one thread to coordinate all of the worker threads
- Java threading basics:
- We will use
`java.lang.Thread`

to start off with, then use a better library later - Define and instantiate a subclass
`C`

of java.lang.Thread`, overriding`

run` - Call the objects
`start`

method`run`

method runs in the current thread

- We will use
**Fork-Join Framework:**- To start parallelism, we must use
`ForkJoinPool.invoke(RECURSIVE_ACTION)`

- This calls the object’s
`compute`

field - Each program should only have one
`ForkJoinPool`

, use`commonPool`

to get static

- This calls the object’s
- Subclass
`RecursiveAction`

to use`fork`

,`compute`

, and`join`

`compute`

is the code we want to run`fork`

starts a new thread running`compute`

`join`

waits for the thread to finish- Better for code which don’t return anything

- Subclasss
`RecursiveTask`

if you*do*want to return values

- To start parallelism, we must use

- The fork-join pattern is quite common for solving many different types of problems
- All that changes is the base-case and way we combine results
- These are called
*reductions* - It is required that the operation is associative

- A parallel
*map*is applying an operation on each input element independently- E.g. multiplying each element of an array by two
- Better to subclass
`RecursiveAction`

as nothing is returned

- To use divide-and-conquer parallelism, we need to be able to dive the problem up efficently
- We can still use maps on sequential data to speedup the process, think HF DS maps

- Analyzing Fork-Join algorithms:
- \(T_P\) is the time a program takes to run if there are \(P\) processors available
- \(T_1\) is the time it takes to run if there is one processor, note: not sequential algorithm
- This is called the
*work*, the total run time of all parts of the algorithm

- This is called the
- \(T_{\infty}\) is the
*span*, how long it takes to run on an infinite amount of processors- Also called:
*critical path length*or*computational depth*

- Also called:

- We can describe a parallel program execution as a DAG:
- Nodes are constant-time pieces of work the program performs
- The number of nodes adds up to \(T_1\)

- Edges represent
*computational dependencies*, what we must complete before moving onto another node- \(T_{\infty}\) is the length of the longest path in the DAG

- Nodes are constant-time pieces of work the program performs
- A parallel reduction is two balanced trees on top of each other
- This allows \(T_1\) and \(T_{\infty}\) to become simple graph properties
- \(T_1\) is \(O(2n)\), because two trees
- \(T_{\infty}\) is \(O(log n)\), because that’s the height of the trees

- The
*speedup*on \(P\) processors is: \(\frac{T_P}{T_1}\)- Note that this is often dishonest, as a true sequential algorithm might be faster

*Perfect speedup*is when \(\frac{T_P}{T_1} = P\), and is rare*Perfect linear speedup*is the case where doubling \(P\) halves the running time- \(\frac{T_1}{T_{\infty}}\) is the
*parallelism*of an algorithm- Parallel reduction: \(\frac{n}{logn}\), so there is an exponential speedup

- The ForkJoin Framework has the expected time bound: \(T_P\) is \(O(\frac{T_1}{P} + T_{\infty}\)
- There is some randomness involved
- This works as long as each thread is expected to be about the same amount of work
- Also, all threads are expected to do a small but not tiny amount of work
- This is approx 5000 (within an order of magnitude) computational steps

- Amdahl’s Law:
- \(\frac{T_1}{T_P} = \frac{1}{S + \frac{1 - S}{P}}\)
- As \(P\) goes to \(\infty\), \(\frac{T_1}{T_{\infty}} = \frac{1}{S}\)

- Takeaways from Amdahl’s Law:
- This means that scaling computational resources leads to non-linear speedups
- We cannot always rely on extra compute for speedups when there are chunks of sequential code
- We can always find new algorithms work with the Law
- We can still use parallelism to solve
*bigger*problems, but not faster- This happens when some parallel parts grow at a faster rate then the sequential parts

- Parallel-Prefix Sum:
- We run this in two parts: first create a tree where each node is the sum of a portion of the array
- The leaves are the sums of the one-element leaves (we would use sequential cutoff)

- We run through the tree, passing along as an argument the sum of the values to the left of the node
- This allows the “sequential” problem to have in \(O(logn)\) span
- This pattern is good for problems where you want to do
*operation*to the left of`i`

- We run this in two parts: first create a tree where each node is the sum of a portion of the array
- Pack: produce an array with the same order where all elements satisfy some condition
- Perform a parallel map to produce a bit (binary) vector of elements that satisfy a condition
- Perform parallel prefix sum to get the number of indicies to the left which satisfy the condition
- Run a parallel map for each index to move the index value from the input map to the new output map
- Use
`bitsum`

to get the output indicief - Use
`bitvector`

to check if the value satisfies the condition

- Use

- Parallel Quicksort:
- Parallelize sorting both partitions
- Parallelize partitioning using an auxillary array

- Parallel Mergesort:
- We can parallelize recursivly sorting the halves easilly, but we need more for exponential speedup
- To do so, we need to parallelize the merge operation:
- Determine the median element of the larger array: \(O(1)\)
- Binary search to find the position of the median in the smaller array: \(O(logn)\)
- Recursivly merge the left half of the large array with the left portions of the small array
- Recursivly merge the right half in the same way

- Concurrent programming is about controlling access to shared resources
- In concurrent programming, threads are
*largely*doing their own thing- Operations are
*interleaved*, happening before, after, or at the same time as other threads

- Operations are
- We must enforce
*mutual exclusion*on memory which might change- We do this by “hanging a sign” telling other threads to wait
- Checking there is no sign and then hanging the sign must be one operation
- The work done while the sign is hanging is a
*critical section* - We typically do this using synchronization primitives within a language

- A mutual-exclusion lock (lock or mutex) is an abstract datatype for concurrent programming
`new`

creates a new lock which is “not held”`acquire`

takes a lock and blocks it until it is currently “not held”, then sets it to “held” and returns`release`

takes a lock and sets it to “not held”

- It is important to ensure that exceptions do not prevent a lock from ever getting released
- Locks which allow a thread re-acquire a lock it holds are called
*reentrant locks* - We use the
`synchronized`

keyword in Java to evaluate, acquire, and release a lock- This seems somewhat similar to the Python
`with`

statement

- This seems somewhat similar to the Python
- As any object can be a lock in Java, we can just use the instance (
`this`

) as the lock - If the entire method should be
`synchronized`

, we can use that as a method keyword instead

- A
*race condition*is a bug in a program that makes program correctness dependent on the order that threads execute - There are two kinds of data races:
- When one thread reads and another writes at the same time
- When both threads attempt to write at the same time

- Even if we
*think*that it wouldn’t cause bad behavior, never have race conditions - When running programs, there are often hidden intermediate states
- This is like times where the rep inv isn’t true
- If a program reads during this intermediate state, things get weird
- These are critical sections

- One reason why data races are not allowed is because of compiler optimizations
- The Grand Comprimise:
- The programmer promises not to write data races
- The compiler will make optimizations seem as if there was a global interweaving

- We can also declare fields as
`volitile`

, making accesses not count as data races- They are good when there is only one shared field

**Concurrency Programming Guidelines**- Conceptually split memory into three parts: all memory should be at least one of the following
- Memory that is thread-local
- It is typically better to copy data and have more thread-local memory

- Memory that is immutable (after initialization)
- Make new objects instead of updating fields

- Memory that is synchronized, where locks are needed

- Memory that is thread-local
- Approaches to synchronization:
- Avoid data races
- Use consistant locking (and document it)
- Use more course-grained locking and only be more specific if performance is hurt
- Keep critical sections small and do not perform expensive operations in them
- Think of what needs to be atomic, then determine a locking strategy
- Always rely on the standard libraries

- Deadlock: threads being blocked forever
- This happens where there is a cycle of waiting
- We can define an ordering for which locks are allocated

- Topological Sort
- Given a DAG, output all vertices in order such that no parent after children
- Think of class pre-reqs

- This can be used in backpropagation algorithms or other dependency graphs
- There can be multiple legal permutations per DAG

- Given a DAG, output all vertices in order such that no parent after children
- Simple Topo-sort algorithm
- Find all the in-degrees for each vertex
- These can be stored in a data structure

- While not all vertices are output:
- Choose a vertex \(v\) labed with in-degree of 0
- Output \(v\) and remove it from the graph (or add to an outside set)
- For each vertex \(w\) adjacent to \(v\), decrement in-degree of \(w\)

- Find all the in-degrees for each vertex
- By using a queue, we can lower the time complexity to \(O(V + E)\)

- Minimum spanning trees: the minimal cost graph such that all nodes are connected
- Undirected graph

- Two main algorithms to find the MST:
- Prim’s algorithm: similar to Dijkstra’s
- Randomly pick a vertex, then use Dijkstra’s from there
- Pick the vertices which are the lowest cost
- The random pick of vertex leads to different possible MSTs (if they exists)

- Kruskals’ algorithm: completely different
- We make clusters of sub-MSTs, then they end up unifying

- Prim’s algorithm: similar to Dijkstra’s
- Prim’s:
- Pick random element
- Djikstra’s connecting unconnected nodes to connected nodes
- Create MST by connecting edges with their preds

- Kruskals’:
- Sort all edges by weight
- Add edges in order if they connect nodes to MSTs they don’t already connect to

- Disjoint Set ADT:
`Union(x, y)`

: combines the two sets which contain`x`

and`y`

- Constant time

`Find(x)`

: returns the name of the set containing`x`

- Amortized constant time

- A decision problem: a problem that takes some input and returns true or false
- Halting problem: takes in some code and an input, returns if it terminates
- Defintion of \(P\) in \(P = NP\), a problem is \(P\) if it satisfies:
- It’s a decision problem
- There exists a polynomial time algorithm to solve it

- Euler Circuit Problem:
- Input: Undirected graph \(G\)
- Output: True iff there is a path in \(G\) that visits each edge exactly once
- Polynomial time solution: true iff
- Degree of every vertex is even
- Connected graph

- Definition of \(NP\), non-deterministic polynomial: a problem is in \(NP\) iff:
- It’s a decision problem
- For every input where the output is true, there exists some quick way to verify output is true

- Every problem in \(P\) is also \(NP\), i.e. \(P \subset NP\)
- Is \(NP \subset P\)? Is \(P = NP\)?
- We don’t know…
- Most people believe that \(P \neq NP\)

- Hamiltonian Circuit Problem:
- Input A graph \(G\)
- Output: True iff there exists a cycle in \(G\) that visits every vertex exactly once
- Much more challenging than Euler Circut Problem
- \(NP\)-Complete

- \(NP\)-Complete: they are the
*hardest*problems in \(NP\)- A problem is \(NP\)-complete if:
- It’s \(NP\) itself
- Every problem in \(NP\) is polynomial time reducible to it

- This means that if any \(NP\)-complete problem is in \(NP\), then \(P = NP\)

- A problem is \(NP\)-complete if:
- Reduction:
- Consider two decision problems \(A\) and \(B\)
- \(A\)
*reduces*to \(B\) if we can solve \(A\) using a solution to problem \(B\) - Notation: \(A \leq B\)
- We say that \(A\) reduces to \(B\) in polynomial time if we can solve \(A\) using a polynomial number of calls to \(B\)
- If \(B \in P\) and \(A \leq B\) then \(A \in P\)

- The fact that there are thousands of \(NP\)-complete problems and we haven’t been able to find one that is in \(P\), thus people generally believe that \(P \neq NP\)
- \(NP\)-Hard: \(NP\)-complete problems which
*aren’t*in \(NP\)- Example problem: \(N \times N\) chessboard, verify a move is optimal

- What to do when faced with a probelm that you think is hard?
- Confirm that it is actually hard
- Take an \(NP\)-complete problem, take your problem, show that the \(NP\)-complete problem reduces to your problem
- If so, stop trying to find a general solution to your problem
*lol* - Instead, possibly use an approximation algorithm (within a constant factor of true)
- Possibly used a randomized algorithm, repeat using the randomized algorithm
- Consider special cases for your problem

- Confirm that it is actually hard

- \(NP\)-Hard: every \(NP\) problem can be reduced to it
- \(NP\)-Complete: \(NP\)-hard and \(NP\)
- If an \(NP\)-hard problem is in \(NP\), then \(P = NP\)
- 3-Colorable:
- Input: undirected graph \(G\)
- Output: true iff you can color each vertex of the graph such that no two adjacent verticies have the same color
- \(NP\)-Complete

- 2-Colorable: a related problem
- Not \(NP\)-Complete
- All verticies that are at distance \(i+1\) from the start must be colored differently from those at distance \(i\)
- Proof is to color it, return true if it works, else false

- 2-Colorable reduction to 3-Colorable
- Add a dummy vertex to each other node - this takes up one of the three colors
- Thus, 3-Colorable on that graph will test if it is 2-Colorable

- 3-SAT:
- Terminology:
- Literal: a boolean variable (or its negation)
- Clause: a series of literals, OR’d together

- Input: a series of clauses AND’d together, each clause has at most 3 literals
- Output: true iff there is a setting of boolean variables such that the expression is true
- \(NP\)-Complete
- The general problem of satisfiability is reduciable to 3-SAT

- Terminology:
- Vertex Cover:
- Input: undirected graph \(G\), integer \(k\)
- Output: true iff there is a set of verticies of size \(k\) such that every edge has at least one vertex in the set
- \(NP\)-Complete
- “The set of vertices covers all edges”
- Special cases:
- If the graph is a tree, there is a linear time solution
- If the graph is 2-Colorable, there is a polynomial time solution

- Approximation algorithm for finding minimum vertex cover
- At most \(2x\) larger than optimal
- While there are edges in \(G\):
- Pick an arbitrary one
- Put both nodes in the vertex cover and delete nodes/edges connecting to those nodes