github posts notes

CSE 332

Lecture 1 - June 21

What this course is:
- Classic data structurs and algos
- Queues, dicts, graphs, sorting
- Parallelism and concurrency (new! exciting!)
Data structures are clever ways of storing information to perform efficent compututation
- There are always tradeoffs with each option
Abstract Data Type (ADT) is a mathematical description of a “thing” with a set of operations
Data Structure: specific orginization of data and family of algos for implementing ADT
Implementation: the actual concrete code
We can use circular arrays or double-linked lists for queues
- Both work!

Lecture 2 - June 23

What do we care about? (DSA)
1. Correctness - does it work?
2. Performance - speed! (and memory)
Evaluating the program is difficult - better to evaluate the algorithm
Analyzing code: counting code constructs:
- Basic operations (indexing, assignment): \(\text{constant time}\)
- Loops: \(\text{num iterations} \times \text{loop body}\)
- Conditionals: \(\text{time of condition} + \text{slowest branch}\)
- Function calls: \(\text{timee of functions body}\)
- Recursion: \(\text{solve recurrence equation}\)
Complexity cases:
- Worst-case complexity: max num of steps algo takes on “most challenging” input \(n\)
- Best-case complexity: min num of steps algo takes on “easiest” input of size \(n\)
- Average-case complexity: what does “average” mean?
- Amortized-case complexity: we learn about this later
Linear search, best case: \(O(1)\), worst case: \(O(n)\)
Asymptotic analysis: big \(O\) notation
- A function \(f\) is in \(O(\tilde{f})\) if \(f\) and \(\tilde{f}\) have the same asymptoic behavior

Lecture 3 - June 26

How to show \(f(n)\) is in \(O(g(n))\)?
- Pick a \(c\) large enough to cover the constant factors
- Pick a \(n_0\) large enough to cover the lower order terms
Dropping coefficents is ususally okay, be careful otherwise
Upper bound: Big-O
Lower bound: Big-Omega, same as upper bound but for min
Big-Theta bound (tight bound): \(f(n)\) is in the upper and lower bounds of \(g(n)\)
Worst/best-case vs asymptotic analysis
- Think of it as comparing scenarios
Amortization: worst-case is too pessimistic
- “what is the average run time of operations in the worst sequence?”
- Not the average (average is an average input, this is average operations over a sequence of worst-case operations)

Lecture 4 - June 28

Priority Queue: Highest Priority, First Out
- Holds comparable data
- insert: adds an item at the end
- deleteMin: finds, returns, and removes the minimum element in the queue
Heap: only pay for functionality needed (half-sorted)
- Similar to trees!
- Both operations are: \(\Theta(log n)\)
Tree terminology:
- depth: how many levels from root to node
- height: how far from “deepest” descendent leaf node
Binary tree: each node has max 2 children
n-ary tree: each node has max n children
Perfect tree: each row is completely full
Complete tree: each row is completely full except the bottom row
- The bottom row must be filled up from left to right
(Binary Min-) Heap:
- Complete binary tree structure
- Every (non-root) node has a priority value greater than its parent
- insert:
  - Preserve complete tree structure property
  - This might break heap order property
  - Percolate up to restore heap order property
- deleteMin:
  - Remove root node
  - Move bottom side node to root
  - This might break heap order property
  - Percolate down to restore property
We can use an array to store the heap
- We will skip index 0 to make the math easier
- To get to the child nodes from node i, go to i*2 and i*2 + 1
- Parent is integer division
To increase or decrease a key by an amount, we change it then percolate
To delete, we simply decrease the key by infinity, then delete min

Lecture 5 - June 30

Naively building a heap: insert for each element
- This is \(O(n * log n)\)
We can do better: Floyd’s buildHeap
- \(O(n)\) runtime
- Randomly add all elements into the array
- Percolate down from each element one level above the leaves, up to the root
- As each level increases the amount of operations but halves the number of iterations, it is \(O(n)\)
Counting recursive code:
- Each recursive method call:
  - Perform some non-recursive work: \(w(n)\)
  - Call the method \(T(n)\) on a smaller part of the list \(T(n-1)\) (where \(T(n)\) is the time or cost function)
  - Do some base case work: \(b(n)\)
Total cost: \(T(n) = w(n) + T(n - 1)\) where some base case: \(T(1) = b(n)\)
- This is basically the recurrence function/relation
- General formula and closed form are ways to “solve” the function
How to solve a recurrence function?
Unrolling: substitution until we find a pattern
1. Write a recurrence
2. Find general formula: expand
  - Should still have a \(T\) function on both sides
3. Find closed form: find when base case occurs, then get to the base case
  - Now no \(T\) or \(i\): all constants or \(n\)
4. Asymptotic analysis

Lecture 6 - June 3

Recurrence relation: Tree Method
- We draw an actual tree of the work required for function calls
- Allows us to find the amount of nodes quickly and the work per node
Common recurrences:
- \(T(n) = T(\frac{n}{2}) + c \in \Theta (log n)\): Binary search
- \(T(n) = 2T(\frac{n}{2}) + n \in \Theta (n log n)\): Merge sort
- \(T(n) = 2T(\frac{n}{2}) + c \in \Theta (n)\): Recursive binary sum
- \(T(n) = T(n - 1) + c \in \Theta (n)\): Recursive sum
Dictionary: associate associates key, value pairs
- Set of unique key, value pairs
- (For some) keys must be comparable
- Operations:
  - insert(k, v): place k, v in dictionary
  - find(k): return the v associated with k
  - delete(k): delete and return the k, v
Set is just a dictionary with boolean values
Naieve implementation operations are \(\Theta (n)\)
- find(k) is \(\Theta (log n)\) for sorted arrays

Lecture 7 - June 5

Many data structure operations come down to preserving structure and order properties
Comparable keys allow for ordering in storage
Lazy deletion: we only “mark” as deleted - is the element there?
- Allows for removing in batches
- Increases find(k) runtime
Binary search trees (BST) have “average” \(\Theta (log n)\) runtime, but could be \(\Theta (n)\)
- All elements are either roots, left subtrees, or right subtrees
BST Delete:
- Case 1: Leaf
  - Just remove the leaf
- Case 2: One child
  - Connect parent to the child node
- Case 3: Two children
  - We can replace with smallest element from right tree or largest from left tree, predecessor or successor
  - Find the max on the left subtree, swap with parent then delete
  - Find the min on the right subtree, swap with parent then delete
Worse case for BST occurs when the tree is unbalanced
Ideas to balance the BST: (while keeping fast runtime)
- Left and right subtree of root have same number of nodes
  - Too weak
- Left and right subtree of root have same height
  - Too weak
- Left and right subtree of every tree have same number of nodes
  - Perfect tree, too strong
- Left and right subtree of every tree have same height
  - Perfect tree, too strong
- Left and right subtree of each node’s heights differ by at most 1
  - Ensures root height is \(\Theta (log n)\)
  - Quick: \(\Theta (1)\) rotations
AVL Balance Property: balance(node) = height(node.left) - height(node.right)
AVL Tree: a BST with guaranteed balancing
- BST with the added condition that for all nodes, -1 <= balance(node) <= 1
- We need to keep track of the heights

Lecture 8 - June 7

AVL Operations:
- Find: same as BST
- Insert: BST insert, then check and fix balance
Let p be the problem node where an imbalance occurs
Two main cases: inserted node is in the outside or inside (e.g. right-right vs right-left)
- Outside cases are solved with a “single rotation”
- Inside cases are solved with a “double rotation”
Insert:
- BST insert: afterwards, each node’s height in the path to the bottom might have changed
- Recursive backtracking: calculate new height and detect any height imbalances
- If imbalance, find case and rotate. Only the deepest p needs to be addressed
Single rotation: (left-left)
- Move child of p to position of p
- p becomes the “other” child
- Old “other” child takes p’s new empty child
- Other subtrees move accordingly
Double rotation:
- Rotate into an outside case with p’s child and grandchild
- Rotate p and p’s new child
  - These rotations are in different directions

Lecture 9 - June 10

Motivation behind B-Trees: \(\Theta (1)\) memory access operations are not insignificant
- Getting data from disk is many, many operations (even if linear time)
- Thus, minimizing disk access is a goal
- This is of course if data doesn’t fit in cache (i.e. databases)
If we’re accessing a byte in memory, might as well access a whole block and store it in cache
- Temporal locality: if an address is referenced, it tends to be referenced
- Spatial locality: if an address is referenced, addresses close by are more likely to be referenced soon
The amount of data moved from disk to memory is block size, memory to cache is line size
- These are not under program control
Problem: large dictionaries requre that most of memory will be stored on disk
Desire: a balanced tree that is even shallower than AVL trees to minimize disk accesses
A key idea: increase branching factor of the tree
B Trees:
- Two types of nodes
  - Internal nodes, having M-1 keys and M children
  - Leaf nodes, storing up to L sorted data items
- Order property:
  - Subtree between keys a and b contains only data which is: a <= x < b
- Structure properties:
  - Root: If tree has <= L items, root is a leaf, else has between 2 and M children
  - Internal nodes: have between ceil(M/2) and M children (i.e. at least half full)
  - Leaf nodes: all leaves at the same depth, have between ceil(L/2) and L data items
- Any M > 2 and L will work, but we pick them based on disk-block size
Many trees stored in one internal node, binary search can be inconsequential compared to disk access
Only bring one leaf of data items into memory, no need for loading unnecessary items into memory
Adding to a leaf which is not already full is easy, just add like a sorted list
When a leaf is full, we split it into two leaves and add them to the parent
If the parent is full (so we cannot add another child), we split the parent
If the parents parent … we eventually split the root, where we create a new root with the two children
- This is how a B-Tree gains height
Deletion:
- First remove the item
- Then if the leaf contains too few, adopt from a neighbor
  - Unless neighbor is at minimum, then combine to create new child
  - If parent is now at minimum, adopt from a neighbor
  - repeat…

Lecture 10 - June 12

Now switching from tree-based data structures to array-based ones
Hashing: take in a key, convert it into an integer
- This allows us to index into an array with the integer
If two keys have the same hash, we have a collision
- Always will happen: infinite keys, finite array length
Hash table runtime average is close to \(\Theta (1)\)
Hash tables don’t allow us to order well unlike trees
We expect the number of key-value pairs to be much less than the possible domain
Ideal hash function is fast and avoids excessive collisions
Often the case that the client implements some hash function, then program mods by the array length
- In java, the client will override the hashCode method
Use prime number as a table size
- Real life data tends to follow a pattern, primes don’t follow them
Try to use the most identifiable fields into the hash function
Trade-off between how many fields and amount of collisions
- We use educated guesses
If the key space is an int, using int and modulo is intuitive
- Even if every value has a distinct key, still collisions because mod
Number one step is to rely on expertise of others
Separate chaining:
- All keys that map to the same table location are added to a list
Worst case runtime for find is \(\Theta (n)\)
- Not worth optimizing if your hash function isn’t terrible
Load factor is the average number of elements per bucket
- Want to keep load factor constant, prevent it from being too large
- Typically if lambda is greater than one, increase the table size
delete is the same as find intuitively

Lecture 11 - July 17

Open Addressing: attempt to not store chains
- Resolving collisions by trying a sequence of other positions in the table
Linar Probing: when inserting, if index x is full, then try x+1
- Repeat as necessary…
- We are “probing” the array “linarlly”
Probing: ith probe into array: (h(key) + f(i)) % TableSize
Open addressing is much worse as load factor increses
To find using probing, we follow probe sequence until we find the key or empty spot
Delete: use lazy deletion, replace value with a flag
Primary clustering
- Linar probing tends to produce clusters of values
- This leads to runtime issues
Quadratic probing: ith probe into array: (h(key) + i**2) % TableSize
- Possible to have infinite cycles because of mod
If TableSize is prime and the load factor is less than 0.5, quadratic probing will find empty slot in maximum TableSize / 2 probe
Secondary clutering happens when the two keys hash to the same index, probing doesn’t help
Double hashing: f(i) = i*g(key)
- Idea is that two hashes won’t collide much
- Ensure g(key) != 0 otherwise infinite loop
- Can still lead to infinite cycles
- Important to pick good functions with guarantees
When we resize the array we need to rehash everything
- Good to resize separate chaining when load factor approaches one, open addressing half
Good to make size twice as big, but that won’t be prime!
- Keep a hardcoded list of primes at good intervals
- If it continues to grow, say screw it and double (I think)
View slides or Effective Java for tips on how to hash different data types

Lecture 12

Sorting goals:
- Stable: preserve original ordering in case of ties
- In-Place: only use a constant amount of auxiliary storage
- Fast: \(O(n log n)\) and/or good constant factors
Simple algorithms: \(O(n^2)\)
- Insertion sort
- Selection sort
- Bubble sort
Fancier algorithms: \(O(n log n)\)
- Heap sort
- Merge sort
- Quick sort (avg)
Specialized algorithms: \(O(n)\)
- Bucket sort
- Radix sort
Insertion sort:
- Maintain a sorted subarray, insert the unsorted elements into it
- Stable!
- In-place!
- Slow :(

Lecture 12 - July 19

Selection sort:
- Maintain a sorted subarray, find the smallest remaining element and append it to sorted
- Stable!
- In-place!
- \(O(n^2)\), but good constant factors
Bubble sort: (it sucks!)
- \(O(n^2)\) and bad constant factors
- Never use
Heap sort:
- Put all elements into a heap, then remove the elements
- Not stable
- Can be in-place by treating the inital array as a heap, then inserting into back of array
- \(O(n log n)\), but bad constant factors
AVL sort:
- We pretend this doesn’t exist!
- Not in-place, worse constant factors
- Heap sort is strictly better
Merge sort: divide and conquer!
- Sort left and right halves of elements recursivly, then combine into one list
- Stable (always pick from left array first)
- Not in-place: \(O(n)\)
- \(O(n log n)\), bad constant factors because of copying
Quick sort:
- Make sure to use the 332 quicksort
- Algorithm:
  1. Pick a pivot element, hopefully the median element
  2. Divide the array into two pieces, less than and greater than pivot
  3. Recursivly sort the two pieces
  4. Combine the sorted pieces
- Picking a good pivot:
  1. Pick first or last element: fast to pick but likely worst-case
  2. Pick random element: good, but randomness is expensive
  3. Median of 3: pick median of first, middle, and last element, overall good
- How to split into two pieces: Hoare partitioning
  1. Swap pivot with arr[lo] (move it out of the way)
  2. Use two pointers l and r starting at lo+1 and hi-1 and move inwards
  3. Put pivot back in middle: swap with arr[l]
- Not stable
- In-place!
- \(O(n log n)\) average case, worst constant factors
- In practice, way, way better

Lecture 13 - July 21

Comparison sorting: CUTOFF strategy
- Use comparison sort until the array is small enough, then use a sort with good constants
All comparison sorting algorithms have a lower bound of \(O(n log n)\)
Bucket sort:
- Find the min and max value, make an auxillary array of that range
- Iterate through the array and count how many of each number
- Copy the auxillary array into the original array
- Stable!
- Not in-place
  - Possibly very large memory usage
- Good speed \(O(n + k)\) and constants
- Non-integers: make each bucket a linked list of items
Radix sort:
- For each digit (starting from LSD), implement bucket sort and repeat
- As it is stable, this allows for ordering
- Stable
- Not in-place
- Very fast, \(O(d(n+k))\) and good constant factors

Lecture 14 - July 24

A graph is a mathematical representation of a set of objects connected by links
- Verticies and edges!
Terminology:
- Undirected graphs: edges have no direction, two-way, facebook friends
- Degree of a vertex: number of edges containing that vertex (undirected)
- Directed graphs: edges have a direction, pointing, instagram follows
- In-Degree of a vertex: the number of in-bound edges
- Out-Degree of a vertex: the number of outbound edges
- Weighted graphs: each edge has a weight or cost, represting relationships
- Walk: a sequence of adjacent vertices
- Path: a walk that doesn’t repeat a vertex
- Cycle: a walk that doesn’t repeat a vertex except the first and last vertex
- Path length: number of edges in a path
- Path cost: the sum of the weights in the path
- Undirected graph connectivity: an undirected graph is connected if there exists a path between each pair of vertices
- Undirected graph completeness: an undirected graph is complete if there is an edge between each pair of vertices
- Directed strong connectivity: a directed graph is strongly connected if there is a path between each pair of vertices
- Directed weakly connectivity: a directed graph is weakly connected if there exists a path between each pair of vertices ignoring edge directions
- Directed graph completeness: a directed graph is complete if for all pairs of vertices there is an edge in each direction
Self-edges: we pretend they don’t exist… (in 332)
Trees are an undirected, acyclic, and connected graph
Trees are rooted graphs where we think of edges as directed
Directed Acyclic Graphs (DAGs):
- Directed graph with no cycles
Maximum number of edges:
- Undirected: \(0 \leq \mid E \mid < \mid V \mid ^2\)
- Directed: \(0 \leq \mid E \mid \leq \mid V \mid ^2\)
- So: \(\mid E \mid \in O(\mid V \mid ^2)\)
Sparse graph: \(\mid E \mid \in \Theta (\mid V \mid)\)
Dense graph: \(\mid E \mid \in \Theta (\mid V \mid ^2)\)
Graphs: the data structure
- Common operations:
  - Is \((v, u)\) an edge?
  - “What are the neighbors of \(v\)?
- Two standards:
  - Adjacency Matrix
  - Adjacency List
Adjacency Matrix: a 2D array of booleans representing the presence of an edge
- Properties:
  - Get a vertex’s out-bound edges: \(O(\mid V \mid)\)
  - Get a vertex’s in-bound edges: \(O(\mid V \mid)\)
  - Decide if edge exists: \(O(1)\)
  - Insert an edge: \(O(1)\)
  - Delete an edge: \(O(1)\)
- Space requirements: \(O(\mid V \mid ^2)\)
- Adding vertices is not really possible
- Better for dense graphs (you’re already using the memory and complexity)
- Can be weighted by using non-booleans as type
  - Then you must define “not an edge”
Adjacency List: an array of vertices each with a linked list of edges
- Properties:
  - Get a vertex’s out-bound edges: \(O(d)\)
  - Get a vertex’s in-bound edges: \(O(\mid V \mid + \mid E \mid )\)
  - Decide if edge exists: \(O(d)\)
  - Insert an edge: \(O(1)\)
  - Delete an edge: \(O(d)\)
- Space requirements: \(O(\mid V \mid + \mid E \mid )\)
- Better for sparse graphs (more operations depend on the number of elements)
Processing a node: “doing something” at a node
Visiting a node: exploring it

Lecture 15 - July 26

Basic idea: follow nodes and mark them to prevent cycles
Use slide code for the algorithms
DFS uses a stack, BFS uses a queue (that’s the main difference)
Iterative DFS:
- Use a stack, we process after popping each node
Recursive DFS:
- The “stack” is the program’s recursive stack
BFS:
- Similar algorithm structure, just use a queue
Comparisons:
- DFS
  - Typically uses much less memory
  - Applications: topological sorting, cycle detection
- BFS
  - Typically uses more memory
  - Applications: shortest paths
- Iterative Deep DFS (IDDFS)
  - Use DFS with increasing depth limits
  - Good memory and finds the shortest path
To get the final path, we include the predecessor node in the mark
- To get path, we then backtrack
- Often implemented with an array
Shortest paths for weighted graphs: we often assume no negative weights
Dijkstra’s Algorithm:
- Initially, start node has cost 0 and all other nodes have infinite cost
- At each step:
  1. Pick closesst unvisited vertex v
  2. Add it to the set of visited vertices
  3. Update distances for nodes with edges from v

Lecture 16 - July 28

Dijkstra’s is a greedy algorithm
- At each step, it does what seems best at that step
- It explores every node, so it must find globablly optimal path
- Unoptimal runtime is: \(O(\mid V \mid ^2 + \mid E \mid )\)
Can make more optimal by using a minHeap: \(O(\mid V \mid log \mid V \mid + \mid E \mid log \mid V \mid )\)
- \(O( \mid V \mid log \mid V \mid )\) for a sparse graph
Parallelism
We typically have worked with sequential programming: one thing happens after another
Removing this assumption means many new things to think about:
- How can we divide work between different threads and synchronize?
- How can we parallelize algorithms?
- How can we support concurrent access with datastructures?
This means we should be able to do multiple things once in one program
Parallelism: using extra resources to solve a problem faster
Concurrency: correctly and efficiently manage access to shared resources
Threads can communicate by writing to shared locations
- Need to make sure that it doesn’t write in bad way!
Message-passing: threads communicate by sending and receiving messages
Wait for one thread to coordinate all of the worker threads
Java threading basics:
- We will use java.lang.Thread to start off with, then use a better library later
- Define and instantiate a subclass C of java.lang.Thread, overridingrun`
- Call the objects start method
  - run method runs in the current thread
Fork-Join Framework:
- To start parallelism, we must use ForkJoinPool.invoke(RECURSIVE_ACTION)
  - This calls the object’s compute field
  - Each program should only have one ForkJoinPool, use commonPool to get static
- Subclass RecursiveAction to use fork, compute, and join
  - compute is the code we want to run
  - fork starts a new thread running compute
  - join waits for the thread to finish
  - Better for code which don’t return anything
- Subclasss RecursiveTask if you do want to return values

Lecture 17 - July 31

The fork-join pattern is quite common for solving many different types of problems
- All that changes is the base-case and way we combine results
- These are called reductions
- It is required that the operation is associative
A parallel map is applying an operation on each input element independently
- E.g. multiplying each element of an array by two
- Better to subclass RecursiveAction as nothing is returned
To use divide-and-conquer parallelism, we need to be able to dive the problem up efficently
- We can still use maps on sequential data to speedup the process, think HF DS maps
Analyzing Fork-Join algorithms:
- \(T_P\) is the time a program takes to run if there are \(P\) processors available
- \(T_1\) is the time it takes to run if there is one processor, note: not sequential algorithm
  - This is called the work, the total run time of all parts of the algorithm
- \(T_{\infty}\) is the span, how long it takes to run on an infinite amount of processors
  - Also called: critical path length or computational depth
We can describe a parallel program execution as a DAG:
- Nodes are constant-time pieces of work the program performs
  - The number of nodes adds up to \(T_1\)
- Edges represent computational dependencies, what we must complete before moving onto another node
  - \(T_{\infty}\) is the length of the longest path in the DAG
A parallel reduction is two balanced trees on top of each other
- This allows \(T_1\) and \(T_{\infty}\) to become simple graph properties
- \(T_1\) is \(O(2n)\), because two trees
- \(T_{\infty}\) is \(O(log n)\), because that’s the height of the trees
The speedup on \(P\) processors is: \(\frac{T_P}{T_1}\)
- Note that this is often dishonest, as a true sequential algorithm might be faster
Perfect speedup is when \(\frac{T_P}{T_1} = P\), and is rare
Perfect linear speedup is the case where doubling \(P\) halves the running time
\(\frac{T_1}{T_{\infty}}\) is the parallelism of an algorithm
- Parallel reduction: \(\frac{n}{logn}\), so there is an exponential speedup
The ForkJoin Framework has the expected time bound: \(T_P\) is \(O(\frac{T_1}{P} + T_{\infty}\)
- There is some randomness involved
- This works as long as each thread is expected to be about the same amount of work
- Also, all threads are expected to do a small but not tiny amount of work
  - This is approx 5000 (within an order of magnitude) computational steps
Amdahl’s Law:
- \(\frac{T_1}{T_P} = \frac{1}{S + \frac{1 - S}{P}}\)
- As \(P\) goes to \(\infty\), \(\frac{T_1}{T_{\infty}} = \frac{1}{S}\)
Takeaways from Amdahl’s Law:
- This means that scaling computational resources leads to non-linear speedups
- We cannot always rely on extra compute for speedups when there are chunks of sequential code
- We can always find new algorithms work with the Law
- We can still use parallelism to solve bigger problems, but not faster
  - This happens when some parallel parts grow at a faster rate then the sequential parts

Lecture 18 - August 2

Parallel-Prefix Sum:
- We run this in two parts: first create a tree where each node is the sum of a portion of the array
  - The leaves are the sums of the one-element leaves (we would use sequential cutoff)
- We run through the tree, passing along as an argument the sum of the values to the left of the node
- This allows the “sequential” problem to have in \(O(logn)\) span
- This pattern is good for problems where you want to do operation to the left of i
Pack: produce an array with the same order where all elements satisfy some condition
1. Perform a parallel map to produce a bit (binary) vector of elements that satisfy a condition
2. Perform parallel prefix sum to get the number of indicies to the left which satisfy the condition
3. Run a parallel map for each index to move the index value from the input map to the new output map
  - Use bitsum to get the output indicief
  - Use bitvector to check if the value satisfies the condition
Parallel Quicksort:
- Parallelize sorting both partitions
- Parallelize partitioning using an auxillary array
Parallel Mergesort:
- We can parallelize recursivly sorting the halves easilly, but we need more for exponential speedup
- To do so, we need to parallelize the merge operation:
  - Determine the median element of the larger array: \(O(1)\)
  - Binary search to find the position of the median in the smaller array: \(O(logn)\)
  - Recursivly merge the left half of the large array with the left portions of the small array
  - Recursivly merge the right half in the same way

Lecture 19 - August 4

Concurrent programming is about controlling access to shared resources
In concurrent programming, threads are largely doing their own thing
- Operations are interleaved, happening before, after, or at the same time as other threads
We must enforce mutual exclusion on memory which might change
- We do this by “hanging a sign” telling other threads to wait
- Checking there is no sign and then hanging the sign must be one operation
- The work done while the sign is hanging is a critical section
- We typically do this using synchronization primitives within a language
A mutual-exclusion lock (lock or mutex) is an abstract datatype for concurrent programming
- new creates a new lock which is “not held”
- acquire takes a lock and blocks it until it is currently “not held”, then sets it to “held” and returns
- release takes a lock and sets it to “not held”
It is important to ensure that exceptions do not prevent a lock from ever getting released
Locks which allow a thread re-acquire a lock it holds are called reentrant locks
We use the synchronized keyword in Java to evaluate, acquire, and release a lock
- This seems somewhat similar to the Python with statement
As any object can be a lock in Java, we can just use the instance (this) as the lock
If the entire method should be synchronized, we can use that as a method keyword instead

Lecture 20 - August 7

A race condition is a bug in a program that makes program correctness dependent on the order that threads execute
There are two kinds of data races:
1. When one thread reads and another writes at the same time
2. When both threads attempt to write at the same time
Even if we think that it wouldn’t cause bad behavior, never have race conditions
When running programs, there are often hidden intermediate states
- This is like times where the rep inv isn’t true
- If a program reads during this intermediate state, things get weird
- These are critical sections
One reason why data races are not allowed is because of compiler optimizations
The Grand Comprimise:
- The programmer promises not to write data races
- The compiler will make optimizations seem as if there was a global interweaving
We can also declare fields as volitile, making accesses not count as data races
- They are good when there is only one shared field
Concurrency Programming Guidelines
Conceptually split memory into three parts: all memory should be at least one of the following
1. Memory that is thread-local
  - It is typically better to copy data and have more thread-local memory
2. Memory that is immutable (after initialization)
  - Make new objects instead of updating fields
3. Memory that is synchronized, where locks are needed
Approaches to synchronization:
1. Avoid data races
2. Use consistant locking (and document it)
3. Use more course-grained locking and only be more specific if performance is hurt
4. Keep critical sections small and do not perform expensive operations in them
5. Think of what needs to be atomic, then determine a locking strategy
6. Always rely on the standard libraries
Deadlock: threads being blocked forever
- This happens where there is a cycle of waiting
- We can define an ordering for which locks are allocated

Lecture 21 - August 9

Topological Sort
- Given a DAG, output all vertices in order such that no parent after children
  - Think of class pre-reqs
- This can be used in backpropagation algorithms or other dependency graphs
- There can be multiple legal permutations per DAG
Simple Topo-sort algorithm
1. Find all the in-degrees for each vertex
  - These can be stored in a data structure
2. While not all vertices are output:
  - Choose a vertex \(v\) labed with in-degree of 0
  - Output \(v\) and remove it from the graph (or add to an outside set)
  - For each vertex \(w\) adjacent to \(v\), decrement in-degree of \(w\)
By using a queue, we can lower the time complexity to \(O(V + E)\)

Lecture 22 - August 11

Minimum spanning trees: the minimal cost graph such that all nodes are connected
- Undirected graph
Two main algorithms to find the MST:
- Prim’s algorithm: similar to Dijkstra’s
  - Randomly pick a vertex, then use Dijkstra’s from there
  - Pick the vertices which are the lowest cost
  - The random pick of vertex leads to different possible MSTs (if they exists)
- Kruskals’ algorithm: completely different
  - We make clusters of sub-MSTs, then they end up unifying
Prim’s:
1. Pick random element
2. Djikstra’s connecting unconnected nodes to connected nodes
3. Create MST by connecting edges with their preds
Kruskals’:
1. Sort all edges by weight
2. Add edges in order if they connect nodes to MSTs they don’t already connect to
Disjoint Set ADT:
- Union(x, y): combines the two sets which contain x and y
  - Constant time
- Find(x): returns the name of the set containing x
  - Amortized constant time

Lecture 23 - August 14

A decision problem: a problem that takes some input and returns true or false
Halting problem: takes in some code and an input, returns if it terminates
Defintion of \(P\) in \(P = NP\), a problem is \(P\) if it satisfies:
1. It’s a decision problem
2. There exists a polynomial time algorithm to solve it
Euler Circuit Problem:
- Input: Undirected graph \(G\)
- Output: True iff there is a path in \(G\) that visits each edge exactly once
- Polynomial time solution: true iff
  - Degree of every vertex is even
  - Connected graph
Definition of \(NP\), non-deterministic polynomial: a problem is in \(NP\) iff:
1. It’s a decision problem
2. For every input where the output is true, there exists some quick way to verify output is true
Every problem in \(P\) is also \(NP\), i.e. \(P \subset NP\)
Is \(NP \subset P\)? Is \(P = NP\)?
- We don’t know…
- Most people believe that \(P \neq NP\)
Hamiltonian Circuit Problem:
- Input A graph \(G\)
- Output: True iff there exists a cycle in \(G\) that visits every vertex exactly once
- Much more challenging than Euler Circut Problem
- \(NP\)-Complete
\(NP\)-Complete: they are the hardest problems in \(NP\)
- A problem is \(NP\)-complete if:
  1. It’s \(NP\) itself
  2. Every problem in \(NP\) is polynomial time reducible to it
- This means that if any \(NP\)-complete problem is in \(NP\), then \(P = NP\)
Reduction:
- Consider two decision problems \(A\) and \(B\)
- \(A\) reduces to \(B\) if we can solve \(A\) using a solution to problem \(B\)
- Notation: \(A \leq B\)
- We say that \(A\) reduces to \(B\) in polynomial time if we can solve \(A\) using a polynomial number of calls to \(B\)
- If \(B \in P\) and \(A \leq B\) then \(A \in P\)
The fact that there are thousands of \(NP\)-complete problems and we haven’t been able to find one that is in \(P\), thus people generally believe that \(P \neq NP\)
\(NP\)-Hard: \(NP\)-complete problems which aren’t in \(NP\)
- Example problem: \(N \times N\) chessboard, verify a move is optimal
What to do when faced with a probelm that you think is hard?
- Confirm that it is actually hard
  - Take an \(NP\)-complete problem, take your problem, show that the \(NP\)-complete problem reduces to your problem
  - If so, stop trying to find a general solution to your problem lol
  - Instead, possibly use an approximation algorithm (within a constant factor of true)
  - Possibly used a randomized algorithm, repeat using the randomized algorithm
  - Consider special cases for your problem

Lecture 24 - August 16

\(NP\)-Hard: every \(NP\) problem can be reduced to it
\(NP\)-Complete: \(NP\)-hard and \(NP\)
If an \(NP\)-hard problem is in \(NP\), then \(P = NP\)
3-Colorable:
- Input: undirected graph \(G\)
- Output: true iff you can color each vertex of the graph such that no two adjacent verticies have the same color
- \(NP\)-Complete
2-Colorable: a related problem
- Not \(NP\)-Complete
- All verticies that are at distance \(i+1\) from the start must be colored differently from those at distance \(i\)
- Proof is to color it, return true if it works, else false
2-Colorable reduction to 3-Colorable
- Add a dummy vertex to each other node - this takes up one of the three colors
- Thus, 3-Colorable on that graph will test if it is 2-Colorable
3-SAT:
- Terminology:
  - Literal: a boolean variable (or its negation)
  - Clause: a series of literals, OR’d together
- Input: a series of clauses AND’d together, each clause has at most 3 literals
- Output: true iff there is a setting of boolean variables such that the expression is true
- \(NP\)-Complete
  - The general problem of satisfiability is reduciable to 3-SAT
Vertex Cover:
- Input: undirected graph \(G\), integer \(k\)
- Output: true iff there is a set of verticies of size \(k\) such that every edge has at least one vertex in the set
- \(NP\)-Complete
- “The set of vertices covers all edges”
- Special cases:
  - If the graph is a tree, there is a linear time solution
  - If the graph is 2-Colorable, there is a polynomial time solution
- Approximation algorithm for finding minimum vertex cover
  - At most \(2x\) larger than optimal
  - While there are edges in \(G\):
  - Pick an arbitrary one
  - Put both nodes in the vertex cover and delete nodes/edges connecting to those nodes