CSE 332
Lecture 1 - June 21
- What this course is:
- Classic data structurs and algos
- Queues, dicts, graphs, sorting
- Parallelism and concurrency (new! exciting!)
- Data structures are clever ways of storing information to perform
efficent compututation
- There are always tradeoffs with each option
- Abstract Data Type (ADT) is a mathematical description of a “thing”
with a set of operations
- Data Structure: specific orginization of data and family of algos
for implementing ADT
- Implementation: the actual concrete code
- We can use circular arrays or double-linked lists for queues
Lecture 2 - June 23
- What do we care about? (DSA)
- Correctness - does it work?
- Performance - speed! (and memory)
- Evaluating the program is difficult - better to evaluate
the algorithm
- Analyzing code: counting code constructs:
- Basic operations (indexing, assignment): \(\text{constant time}\)
- Loops: \(\text{num iterations} \times
\text{loop body}\)
- Conditionals: \(\text{time of condition} +
\text{slowest branch}\)
- Function calls: \(\text{timee of functions
body}\)
- Recursion: \(\text{solve recurrence
equation}\)
- Complexity cases:
- Worst-case complexity: max num of steps algo takes on “most
challenging” input \(n\)
- Best-case complexity: min num of steps algo takes on “easiest” input
of size \(n\)
- Average-case complexity: what does “average” mean?
- Amortized-case complexity: we learn about this later
- Linear search, best case: \(O(1)\),
worst case: \(O(n)\)
- Asymptotic analysis: big \(O\)
notation
- A function \(f\) is in \(O(\tilde{f})\) if \(f\) and \(\tilde{f}\) have the same asymptoic
behavior
Lecture 3 - June 26
- How to show \(f(n)\) is in \(O(g(n))\)?
- Pick a \(c\) large enough to cover
the constant factors
- Pick a \(n_0\) large enough to
cover the lower order terms
- Dropping coefficents is ususally okay, be careful otherwise
- Upper bound: Big-O
- Lower bound: Big-Omega, same as upper bound but for min
- Big-Theta bound (tight bound): \(f(n)\) is in the upper and lower bounds of
\(g(n)\)
- Worst/best-case vs asymptotic analysis
- Think of it as comparing scenarios
- Amortization: worst-case is too pessimistic
- “what is the average run time of operations in the worst
sequence?”
- Not the average (average is an average input, this
is average operations over a sequence of worst-case operations)
Lecture 4 - June 28
- Priority Queue: Highest Priority, First Out
- Holds comparable data
insert
: adds an item at the end
deleteMin
: finds, returns, and removes the minimum
element in the queue
- Heap: only pay for functionality needed (half-sorted)
- Similar to trees!
- Both operations are: \(\Theta(log
n)\)
- Tree terminology:
depth
: how many levels from root to node
height
: how far from “deepest” descendent leaf
node
- Binary tree: each node has max 2 children
n
-ary tree: each node has max n
children
- Perfect tree: each row is completely full
- Complete tree: each row is completely full except the bottom row
- The bottom row must be filled up from left to right
- (Binary Min-) Heap:
- Complete binary tree structure
- Every (non-root) node has a priority value greater than its
parent
insert:
- Preserve complete tree structure property
- This might break heap order property
- Percolate up to restore heap order property
deleteMin:
- Remove root node
- Move bottom side node to root
- This might break heap order property
- Percolate down to restore property
- We can use an array to store the heap
- We will skip index 0 to make the math easier
- To get to the child nodes from node
i
, go to
i*2
and i*2 + 1
- Parent is integer division
- To increase or decrease a key by an amount, we change it then
percolate
- To delete, we simply decrease the key by infinity, then delete
min
Lecture 5 - June 30
- Naively building a heap:
insert
for each element
- We can do better: Floyd’s
buildHeap
- \(O(n)\) runtime
- Randomly add all elements into the array
- Percolate down from each element one level above the leaves, up to
the root
- As each level increases the amount of operations but halves the
number of iterations, it is \(O(n)\)
- Counting recursive code:
- Each recursive method call:
- Perform some non-recursive work: \(w(n)\)
- Call the method \(T(n)\) on a
smaller part of the list \(T(n-1)\)
(where \(T(n)\) is the time or cost
function)
- Do some base case work: \(b(n)\)
- Total cost: \(T(n) = w(n) + T(n -
1)\) where some base case: \(T(1) =
b(n)\)
- This is basically the recurrence function/relation
- General formula and closed form are ways to “solve” the
function
- How to solve a recurrence function?
- Unrolling: substitution until we find a pattern
- Write a recurrence
- Find general formula: expand
- Should still have a \(T\) function
on both sides
- Find closed form: find when base case occurs, then get to the base
case
- Now no \(T\) or \(i\): all constants or \(n\)
- Asymptotic analysis
Lecture 6 - June 3
- Recurrence relation: Tree Method
- We draw an actual tree of the work required for function calls
- Allows us to find the amount of nodes quickly and the work per
node
- Common recurrences:
- \(T(n) = T(\frac{n}{2}) + c \in \Theta
(log n)\): Binary search
- \(T(n) = 2T(\frac{n}{2}) + n \in \Theta (n
log n)\): Merge sort
- \(T(n) = 2T(\frac{n}{2}) + c \in \Theta
(n)\): Recursive binary sum
- \(T(n) = T(n - 1) + c \in \Theta
(n)\): Recursive sum
- Dictionary: associate associates
key
,
value
pairs
- Set of unique
key
, value
pairs
- (For some) keys must be comparable
- Operations:
insert(k, v)
: place k
, v
in
dictionary
find(k)
: return the v
associated with
k
delete(k)
: delete and return the k
,
v
- Set is just a dictionary with boolean values
- Naieve implementation operations are \(\Theta (n)\)
find(k)
is \(\Theta (log
n)\) for sorted arrays
Lecture 7 - June 5
- Many data structure operations come down to preserving structure and
order properties
- Comparable keys allow for ordering in storage
- Lazy deletion: we only “mark” as deleted - is the element there?
- Allows for removing in batches
- Increases
find(k)
runtime
- Binary search trees (BST) have “average” \(\Theta (log n)\) runtime, but could be
\(\Theta (n)\)
- All elements are either roots, left subtrees, or right subtrees
- BST Delete:
- Case 1: Leaf
- Case 2: One child
- Connect parent to the child node
- Case 3: Two children
- We can replace with smallest element from right tree or largest from
left tree, predecessor or successor
- Find the max on the left subtree, swap with parent then delete
- Find the min on the right subtree, swap with parent then delete
- Worse case for BST occurs when the tree is unbalanced
- Ideas to balance the BST: (while keeping fast runtime)
- Left and right subtree of
root
have same number of
nodes
- Left and right subtree of
root
have same height
- Left and right subtree of every tree have same number of nodes
- Left and right subtree of every tree have same height
- Left and right subtree of each node’s heights differ by at most 1
- Ensures
root
height is \(\Theta (log n)\)
- Quick: \(\Theta (1)\)
rotations
- AVL Balance Property:
balance(node) = height(node.left) - height(node.right)
- AVL Tree: a BST with guaranteed balancing
- BST with the added condition that for all
node
s,
-1 <= balance(node) <= 1
- We need to keep track of the heights
Lecture 8 - June 7
- AVL Operations:
- Find: same as BST
- Insert: BST insert, then check and fix balance
- Let
p
be the problem node where an imbalance
occurs
- Two main cases: inserted node is in the outside or inside
(e.g. right-right vs right-left)
- Outside cases are solved with a “single rotation”
- Inside cases are solved with a “double rotation”
- Insert:
- BST insert: afterwards, each node’s height in the path to the bottom
might have changed
- Recursive backtracking: calculate new height and detect any height
imbalances
- If imbalance, find case and rotate. Only the deepest
p
needs to be addressed
- Single rotation: (left-left)
- Move child of
p
to position of p
p
becomes the “other” child
- Old “other” child takes
p
’s new empty child
- Other subtrees move accordingly
- Double rotation:
- Rotate into an outside case with
p
’s child and
grandchild
- Rotate
p
and p
’s new child
- These rotations are in different directions
Lecture 9 - June 10
- Motivation behind B-Trees: \(\Theta
(1)\) memory access operations are not insignificant
- Getting data from disk is many, many operations (even if linear
time)
- Thus, minimizing disk access is a goal
- This is of course if data doesn’t fit in cache (i.e. databases)
- If we’re accessing a byte in memory, might as well access a whole
block and store it in cache
- Temporal locality: if an address is referenced, it tends to be
referenced
- Spatial locality: if an address is referenced, addresses close by
are more likely to be referenced soon
- The amount of data moved from disk to memory is block size, memory
to cache is line size
- These are not under program control
- Problem: large dictionaries requre that most of memory will be
stored on disk
- Desire: a balanced tree that is even shallower than AVL trees to
minimize disk accesses
- A key idea: increase branching factor of the tree
- B Trees:
- Two types of nodes
- Internal nodes, having M-1 keys and M children
- Leaf nodes, storing up to L sorted data items
- Order property:
- Subtree between keys a and b contains only data which is: a <= x
< b
- Structure properties:
- Root: If tree has <= L items, root is a leaf, else has between 2
and M children
- Internal nodes: have between ceil(M/2) and M children (i.e. at least
half full)
- Leaf nodes: all leaves at the same depth, have between ceil(L/2) and
L data items
- Any M > 2 and L will work, but we pick them based on disk-block
size
- Many trees stored in one internal node, binary search can be
inconsequential compared to disk access
- Only bring one leaf of data items into memory, no need for loading
unnecessary items into memory
- Adding to a leaf which is not already full is easy, just add like a
sorted list
- When a leaf is full, we split it into two leaves and add them to the
parent
- If the parent is full (so we cannot add another child), we split the
parent
- If the parents parent … we eventually split the root, where we
create a new root with the two children
- This is how a B-Tree gains height
- Deletion:
- First remove the item
- Then if the leaf contains too few, adopt from a neighbor
- Unless neighbor is at minimum, then combine to create new child
- If parent is now at minimum, adopt from a neighbor
- repeat…
Lecture 10 - June 12
- Now switching from tree-based data structures to array-based
ones
- Hashing: take in a key, convert it into an integer
- This allows us to index into an array with the integer
- If two keys have the same hash, we have a collision
- Always will happen: infinite keys, finite array length
- Hash table runtime average is close to \(\Theta (1)\)
- Hash tables don’t allow us to order well unlike trees
- We expect the number of key-value pairs to be much less than the
possible domain
- Ideal hash function is fast and avoids excessive collisions
- Often the case that the client implements some hash function, then
program mods by the array length
- In java, the client will override the
hashCode
method
- Use prime number as a table size
- Real life data tends to follow a pattern, primes don’t follow
them
- Try to use the most identifiable fields into the hash function
- Trade-off between how many fields and amount of collisions
- If the key space is an int, using int and modulo is intuitive
- Even if every value has a distinct key, still collisions because
mod
- Number one step is to rely on expertise of others
- Separate chaining:
- All keys that map to the same table location are added to a
list
- Worst case runtime for
find
is \(\Theta (n)\)
- Not worth optimizing if your hash function isn’t terrible
- Load factor is the average number of elements per bucket
- Want to keep load factor constant, prevent it from being too
large
- Typically if lambda is greater than one, increase the table
size
delete
is the same as find
intuitively
Lecture 11 - July 17
- Open Addressing: attempt to not store chains
- Resolving collisions by trying a sequence of other positions in the
table
- Linar Probing: when inserting, if index
x
is full, then
try x+1
- Repeat as necessary…
- We are “probing” the array “linarlly”
- Probing:
i
th probe into array:
(h(key) + f(i)) % TableSize
- Open addressing is much worse as load factor increses
- To find using probing, we follow probe sequence until we find the
key or empty spot
- Delete: use lazy deletion, replace value with a flag
- Primary clustering
- Linar probing tends to produce clusters of values
- This leads to runtime issues
- Quadratic probing:
i
th probe into array:
(h(key) + i**2) % TableSize
- Possible to have infinite cycles because of mod
- If TableSize is prime and the load factor is less than 0.5,
quadratic probing will find empty slot in maximum TableSize / 2
probe
- Secondary clutering happens when the two keys hash to the same
index, probing doesn’t help
- Double hashing:
f(i) = i*g(key)
- Idea is that two hashes won’t collide much
- Ensure
g(key) != 0
otherwise infinite loop
- Can still lead to infinite cycles
- Important to pick good functions with guarantees
- When we resize the array we need to rehash everything
- Good to resize separate chaining when load factor approaches one,
open addressing half
- Good to make size twice as big, but that won’t be prime!
- Keep a hardcoded list of primes at good intervals
- If it continues to grow, say screw it and double (I think)
- View slides or Effective Java for tips on how to hash different data
types
Lecture 12
- Sorting goals:
- Stable: preserve original ordering in case of ties
- In-Place: only use a constant amount of auxiliary storage
- Fast: \(O(n log n)\) and/or good
constant factors
- Simple algorithms: \(O(n^2)\)
- Insertion sort
- Selection sort
- Bubble sort
- Fancier algorithms: \(O(n log n)\)
- Heap sort
- Merge sort
- Quick sort (avg)
- Specialized algorithms: \(O(n)\)
- Insertion sort:
- Maintain a sorted subarray, insert the unsorted elements into
it
- Stable!
- In-place!
- Slow :(
Lecture 12 - July 19
- Selection sort:
- Maintain a sorted subarray, find the smallest remaining element and
append it to sorted
- Stable!
- In-place!
- \(O(n^2)\), but good constant
factors
- Bubble sort: (it sucks!)
- \(O(n^2)\) and bad constant
factors
- Never use
- Heap sort:
- Put all elements into a heap, then remove the elements
- Not stable
- Can be in-place by treating the inital array as a heap, then
inserting into back of array
- \(O(n log n)\), but bad constant
factors
- AVL sort:
- We pretend this doesn’t exist!
- Not in-place, worse constant factors
- Heap sort is strictly better
- Merge sort: divide and conquer!
- Sort left and right halves of elements recursivly, then combine into
one list
- Stable (always pick from left array first)
- Not in-place: \(O(n)\)
- \(O(n log n)\), bad constant
factors because of copying
- Quick sort:
- Make sure to use the 332 quicksort
- Algorithm:
- Pick a pivot element, hopefully the median element
- Divide the array into two pieces, less than and greater than
pivot
- Recursivly sort the two pieces
- Combine the sorted pieces
- Picking a good pivot:
- Pick first or last element: fast to pick but likely worst-case
- Pick random element: good, but randomness is expensive
- Median of 3: pick median of first, middle, and last element, overall
good
- How to split into two pieces: Hoare partitioning
- Swap pivot with
arr[lo]
(move it out of the way)
- Use two pointers
l
and r
starting at
lo+1
and hi-1
and move inwards
- Put pivot back in middle: swap with
arr[l]
- Not stable
- In-place!
- \(O(n log n)\) average case, worst
constant factors
- In practice, way, way better
Lecture 13 - July 21
- Comparison sorting: CUTOFF strategy
- Use comparison sort until the array is small enough, then use a sort
with good constants
- All comparison sorting algorithms have a lower bound of \(O(n log n)\)
- Bucket sort:
- Find the
min
and max
value, make an
auxillary array of that range
- Iterate through the array and count how many of each number
- Copy the auxillary array into the original array
- Stable!
- Not in-place
- Possibly very large memory usage
- Good speed \(O(n + k)\) and
constants
- Non-integers: make each bucket a linked list of items
- Radix sort:
- For each digit (starting from LSD), implement bucket sort and
repeat
- As it is stable, this allows for ordering
- Stable
- Not in-place
- Very fast, \(O(d(n+k))\) and good
constant factors
Lecture 14 - July 24
- A graph is a mathematical representation of a set of objects
connected by links
- Terminology:
- Undirected graphs: edges have no direction, two-way, facebook
friends
- Degree of a vertex: number of edges containing that vertex
(undirected)
- Directed graphs: edges have a direction, pointing, instagram
follows
- In-Degree of a vertex: the number of in-bound edges
- Out-Degree of a vertex: the number of outbound edges
- Weighted graphs: each edge has a weight or cost, represting
relationships
- Walk: a sequence of adjacent vertices
- Path: a walk that doesn’t repeat a vertex
- Cycle: a walk that doesn’t repeat a vertex except the first
and last vertex
- Path length: number of edges in a path
- Path cost: the sum of the weights in the path
- Undirected graph connectivity: an undirected graph is connected if
there exists a path between each pair of vertices
- Undirected graph completeness: an undirected graph is complete if
there is an edge between each pair of vertices
- Directed strong connectivity: a directed graph is strongly connected
if there is a path between each pair of vertices
- Directed weakly connectivity: a directed graph is weakly connected
if there exists a path between each pair of vertices ignoring
edge directions
- Directed graph completeness: a directed graph is complete if for all
pairs of vertices there is an edge in each direction
- Self-edges: we pretend they don’t exist… (in 332)
- Trees are an undirected, acyclic, and connected graph
- Trees are rooted graphs where we think of edges as directed
- Directed Acyclic Graphs (DAGs):
- Directed graph with no cycles
- Maximum number of edges:
- Undirected: \(0 \leq \mid E \mid < \mid
V \mid ^2\)
- Directed: \(0 \leq \mid E \mid \leq \mid V
\mid ^2\)
- So: \(\mid E \mid \in O(\mid V \mid
^2)\)
- Sparse graph: \(\mid E \mid \in \Theta
(\mid V \mid)\)
- Dense graph: \(\mid E \mid \in \Theta
(\mid V \mid ^2)\)
- Graphs: the data structure
- Common operations:
- Is \((v, u)\) an edge?
- “What are the neighbors of \(v\)?
- Two standards:
- Adjacency Matrix
- Adjacency List
- Adjacency Matrix: a 2D array of booleans representing the presence
of an edge
- Properties:
- Get a vertex’s out-bound edges: \(O(\mid V
\mid)\)
- Get a vertex’s in-bound edges: \(O(\mid V
\mid)\)
- Decide if edge exists: \(O(1)\)
- Insert an edge: \(O(1)\)
- Delete an edge: \(O(1)\)
- Space requirements: \(O(\mid V \mid
^2)\)
- Adding vertices is not really possible
- Better for dense graphs (you’re already using the memory and
complexity)
- Can be weighted by using non-booleans as type
- Then you must define “not an edge”
- Adjacency List: an array of vertices each with a linked list of
edges
- Properties:
- Get a vertex’s out-bound edges: \(O(d)\)
- Get a vertex’s in-bound edges: \(O(\mid V
\mid + \mid E \mid )\)
- Decide if edge exists: \(O(d)\)
- Insert an edge: \(O(1)\)
- Delete an edge: \(O(d)\)
- Space requirements: \(O(\mid V \mid + \mid
E \mid )\)
- Better for sparse graphs (more operations depend on the number of
elements)
- Processing a node: “doing something” at a node
- Visiting a node: exploring it
Lecture 15 - July 26
- Basic idea: follow nodes and mark them to prevent cycles
- Use slide code for the algorithms
- DFS uses a stack, BFS uses a queue (that’s the main difference)
- Iterative DFS:
- Use a stack, we process after popping each node
- Recursive DFS:
- The “stack” is the program’s recursive stack
- BFS:
- Similar algorithm structure, just use a queue
- Comparisons:
- DFS
- Typically uses much less memory
- Applications: topological sorting, cycle detection
- BFS
- Typically uses more memory
- Applications: shortest paths
- Iterative Deep DFS (IDDFS)
- Use DFS with increasing depth limits
- Good memory and finds the shortest path
- To get the final path, we include the predecessor node in the mark
- To get path, we then backtrack
- Often implemented with an array
- Shortest paths for weighted graphs: we often assume no negative
weights
- Dijkstra’s Algorithm:
- Initially, start node has cost 0 and all other nodes have infinite
cost
- At each step:
- Pick closesst unvisited vertex
v
- Add it to the set of visited vertices
- Update distances for nodes with edges from
v
Lecture 16 - July 28
- Dijkstra’s is a greedy algorithm
- At each step, it does what seems best at that step
- It explores every node, so it must find globablly optimal path
- Unoptimal runtime is: \(O(\mid V \mid ^2 +
\mid E \mid )\)
- Can make more optimal by using a minHeap: \(O(\mid V \mid log \mid V \mid + \mid E \mid log
\mid V \mid )\)
- \(O( \mid V \mid log \mid V \mid
)\) for a sparse graph
- Parallelism
- We typically have worked with sequential programming: one thing
happens after another
- Removing this assumption means many new things to think about:
- How can we divide work between different threads and
synchronize?
- How can we parallelize algorithms?
- How can we support concurrent access with datastructures?
- This means we should be able to do multiple things once in one
program
- Parallelism: using extra resources to solve a problem faster
- Concurrency: correctly and efficiently manage access to shared
resources
- Threads can communicate by writing to shared locations
- Need to make sure that it doesn’t write in bad way!
- Message-passing: threads communicate by sending and receiving
messages
- Wait for one thread to coordinate all of the worker threads
- Java threading basics:
- We will use
java.lang.Thread
to start off with, then
use a better library later
- Define and instantiate a subclass
C
of
java.lang.Thread, overriding
run`
- Call the objects
start
method
run
method runs in the current thread
- Fork-Join Framework:
- To start parallelism, we must use
ForkJoinPool.invoke(RECURSIVE_ACTION)
- This calls the object’s
compute
field
- Each program should only have one
ForkJoinPool
, use
commonPool
to get static
- Subclass
RecursiveAction
to use fork
,
compute
, and join
compute
is the code we want to run
fork
starts a new thread running
compute
join
waits for the thread to finish
- Better for code which don’t return anything
- Subclasss
RecursiveTask
if you do want to
return values
Lecture 17 - July 31
- The fork-join pattern is quite common for solving many different
types of problems
- All that changes is the base-case and way we combine results
- These are called reductions
- It is required that the operation is associative
- A parallel map is applying an operation on each input
element independently
- E.g. multiplying each element of an array by two
- Better to subclass
RecursiveAction
as nothing is
returned
- To use divide-and-conquer parallelism, we need to be able to dive
the problem up efficently
- We can still use maps on sequential data to speedup the process,
think HF DS maps
- Analyzing Fork-Join algorithms:
- \(T_P\) is the time a program takes
to run if there are \(P\) processors
available
- \(T_1\) is the time it takes to run
if there is one processor, note: not sequential algorithm
- This is called the work, the total run time of all parts of
the algorithm
- \(T_{\infty}\) is the
span, how long it takes to run on an infinite amount of
processors
- Also called: critical path length or computational
depth
- We can describe a parallel program execution as a DAG:
- Nodes are constant-time pieces of work the program performs
- The number of nodes adds up to \(T_1\)
- Edges represent computational dependencies, what we must
complete before moving onto another node
- \(T_{\infty}\) is the length of the
longest path in the DAG
- A parallel reduction is two balanced trees on top of each other
- This allows \(T_1\) and \(T_{\infty}\) to become simple graph
properties
- \(T_1\) is \(O(2n)\), because two trees
- \(T_{\infty}\) is \(O(log n)\), because that’s the height of
the trees
- The speedup on \(P\)
processors is: \(\frac{T_P}{T_1}\)
- Note that this is often dishonest, as a true sequential algorithm
might be faster
- Perfect speedup is when \(\frac{T_P}{T_1} = P\), and is rare
- Perfect linear speedup is the case where doubling \(P\) halves the running time
- \(\frac{T_1}{T_{\infty}}\) is the
parallelism of an algorithm
- Parallel reduction: \(\frac{n}{logn}\), so there is an
exponential speedup
- The ForkJoin Framework has the expected time bound: \(T_P\) is \(O(\frac{T_1}{P} + T_{\infty}\)
- There is some randomness involved
- This works as long as each thread is expected to be about the same
amount of work
- Also, all threads are expected to do a small but not tiny amount of
work
- This is approx 5000 (within an order of magnitude) computational
steps
- Amdahl’s Law:
- \(\frac{T_1}{T_P} = \frac{1}{S + \frac{1 -
S}{P}}\)
- As \(P\) goes to \(\infty\), \(\frac{T_1}{T_{\infty}} = \frac{1}{S}\)
- Takeaways from Amdahl’s Law:
- This means that scaling computational resources leads to non-linear
speedups
- We cannot always rely on extra compute for speedups when there are
chunks of sequential code
- We can always find new algorithms work with the Law
- We can still use parallelism to solve bigger problems, but
not faster
- This happens when some parallel parts grow at a faster rate then the
sequential parts
Lecture 18 - August 2
- Parallel-Prefix Sum:
- We run this in two parts: first create a tree where each node is the
sum of a portion of the array
- The leaves are the sums of the one-element leaves (we would use
sequential cutoff)
- We run through the tree, passing along as an argument the sum of the
values to the left of the node
- This allows the “sequential” problem to have in \(O(logn)\) span
- This pattern is good for problems where you want to do
operation to the left of
i
- Pack: produce an array with the same order where all elements
satisfy some condition
- Perform a parallel map to produce a bit (binary) vector of elements
that satisfy a condition
- Perform parallel prefix sum to get the number of indicies to the
left which satisfy the condition
- Run a parallel map for each index to move the index value from the
input map to the new output map
- Use
bitsum
to get the output indicief
- Use
bitvector
to check if the value satisfies the
condition
- Parallel Quicksort:
- Parallelize sorting both partitions
- Parallelize partitioning using an auxillary array
- Parallel Mergesort:
- We can parallelize recursivly sorting the halves easilly, but we
need more for exponential speedup
- To do so, we need to parallelize the merge operation:
- Determine the median element of the larger array: \(O(1)\)
- Binary search to find the position of the median in the smaller
array: \(O(logn)\)
- Recursivly merge the left half of the large array with the left
portions of the small array
- Recursivly merge the right half in the same way
Lecture 19 - August 4
- Concurrent programming is about controlling access to shared
resources
- In concurrent programming, threads are largely doing their
own thing
- Operations are interleaved, happening before, after, or at
the same time as other threads
- We must enforce mutual exclusion on memory which might
change
- We do this by “hanging a sign” telling other threads to wait
- Checking there is no sign and then hanging the sign must be one
operation
- The work done while the sign is hanging is a critical
section
- We typically do this using synchronization primitives within a
language
- A mutual-exclusion lock (lock or mutex) is an abstract datatype for
concurrent programming
new
creates a new lock which is “not held”
acquire
takes a lock and blocks it until it is
currently “not held”, then sets it to “held” and returns
release
takes a lock and sets it to “not held”
- It is important to ensure that exceptions do not prevent a lock from
ever getting released
- Locks which allow a thread re-acquire a lock it holds are called
reentrant locks
- We use the
synchronized
keyword in Java to evaluate,
acquire, and release a lock
- This seems somewhat similar to the Python
with
statement
- As any object can be a lock in Java, we can just use the instance
(
this
) as the lock
- If the entire method should be
synchronized
, we can use
that as a method keyword instead
Lecture 20 - August 7
- A race condition is a bug in a program that makes program
correctness dependent on the order that threads execute
- There are two kinds of data races:
- When one thread reads and another writes at the same time
- When both threads attempt to write at the same time
- Even if we think that it wouldn’t cause bad behavior, never
have race conditions
- When running programs, there are often hidden intermediate states
- This is like times where the rep inv isn’t true
- If a program reads during this intermediate state, things get
weird
- These are critical sections
- One reason why data races are not allowed is because of compiler
optimizations
- The Grand Comprimise:
- The programmer promises not to write data races
- The compiler will make optimizations seem as if there was a global
interweaving
- We can also declare fields as
volitile
, making accesses
not count as data races
- They are good when there is only one shared field
- Concurrency Programming Guidelines
- Conceptually split memory into three parts: all memory should be at
least one of the following
- Memory that is thread-local
- It is typically better to copy data and have more thread-local
memory
- Memory that is immutable (after initialization)
- Make new objects instead of updating fields
- Memory that is synchronized, where locks are needed
- Approaches to synchronization:
- Avoid data races
- Use consistant locking (and document it)
- Use more course-grained locking and only be more specific if
performance is hurt
- Keep critical sections small and do not perform expensive operations
in them
- Think of what needs to be atomic, then determine a locking
strategy
- Always rely on the standard libraries
- Deadlock: threads being blocked forever
- This happens where there is a cycle of waiting
- We can define an ordering for which locks are allocated
Lecture 21 - August 9
- Topological Sort
- Given a DAG, output all vertices in order such that no parent after
children
- This can be used in backpropagation algorithms or other dependency
graphs
- There can be multiple legal permutations per DAG
- Simple Topo-sort algorithm
- Find all the in-degrees for each vertex
- These can be stored in a data structure
- While not all vertices are output:
- Choose a vertex \(v\) labed with
in-degree of 0
- Output \(v\) and remove it from the
graph (or add to an outside set)
- For each vertex \(w\) adjacent to
\(v\), decrement in-degree of \(w\)
- By using a queue, we can lower the time complexity to \(O(V + E)\)
Lecture 22 - August 11
- Minimum spanning trees: the minimal cost graph such that all nodes
are connected
- Two main algorithms to find the MST:
- Prim’s algorithm: similar to Dijkstra’s
- Randomly pick a vertex, then use Dijkstra’s from there
- Pick the vertices which are the lowest cost
- The random pick of vertex leads to different possible MSTs (if they
exists)
- Kruskals’ algorithm: completely different
- We make clusters of sub-MSTs, then they end up unifying
- Prim’s:
- Pick random element
- Djikstra’s connecting unconnected nodes to connected nodes
- Create MST by connecting edges with their preds
- Kruskals’:
- Sort all edges by weight
- Add edges in order if they connect nodes to MSTs they don’t already
connect to
- Disjoint Set ADT:
Union(x, y)
: combines the two sets which contain
x
and y
Find(x)
: returns the name of the set containing
x
Lecture 23 - August 14
- A decision problem: a problem that takes some input and returns true
or false
- Halting problem: takes in some code and an input, returns if it
terminates
- Defintion of \(P\) in \(P = NP\), a problem is \(P\) if it satisfies:
- It’s a decision problem
- There exists a polynomial time algorithm to solve it
- Euler Circuit Problem:
- Input: Undirected graph \(G\)
- Output: True iff there is a path in \(G\) that visits each edge exactly once
- Polynomial time solution: true iff
- Degree of every vertex is even
- Connected graph
- Definition of \(NP\),
non-deterministic polynomial: a problem is in \(NP\) iff:
- It’s a decision problem
- For every input where the output is true, there exists some quick
way to verify output is true
- Every problem in \(P\) is also
\(NP\), i.e. \(P \subset NP\)
- Is \(NP \subset P\)? Is \(P = NP\)?
- We don’t know…
- Most people believe that \(P \neq
NP\)
- Hamiltonian Circuit Problem:
- Input A graph \(G\)
- Output: True iff there exists a cycle in \(G\) that visits every vertex exactly
once
- Much more challenging than Euler Circut Problem
- \(NP\)-Complete
- \(NP\)-Complete: they are the
hardest problems in \(NP\)
- A problem is \(NP\)-complete if:
- It’s \(NP\) itself
- Every problem in \(NP\) is
polynomial time reducible to it
- This means that if any \(NP\)-complete problem is in \(NP\), then \(P =
NP\)
- Reduction:
- Consider two decision problems \(A\) and \(B\)
- \(A\) reduces to \(B\) if we can solve \(A\) using a solution to problem \(B\)
- Notation: \(A \leq B\)
- We say that \(A\) reduces to \(B\) in polynomial time if we can solve
\(A\) using a polynomial number of
calls to \(B\)
- If \(B \in P\) and \(A \leq B\) then \(A \in P\)
- The fact that there are thousands of \(NP\)-complete problems and we haven’t been
able to find one that is in \(P\), thus
people generally believe that \(P \neq
NP\)
- \(NP\)-Hard: \(NP\)-complete problems which
aren’t in \(NP\)
- Example problem: \(N \times N\)
chessboard, verify a move is optimal
- What to do when faced with a probelm that you think is hard?
- Confirm that it is actually hard
- Take an \(NP\)-complete problem,
take your problem, show that the \(NP\)-complete problem reduces to your
problem
- If so, stop trying to find a general solution to your problem
lol
- Instead, possibly use an approximation algorithm (within a constant
factor of true)
- Possibly used a randomized algorithm, repeat using the randomized
algorithm
- Consider special cases for your problem
Lecture 24 - August 16
- \(NP\)-Hard: every \(NP\) problem can be reduced to it
- \(NP\)-Complete: \(NP\)-hard and \(NP\)
- If an \(NP\)-hard problem is in
\(NP\), then \(P = NP\)
- 3-Colorable:
- Input: undirected graph \(G\)
- Output: true iff you can color each vertex of the graph such that no
two adjacent verticies have the same color
- \(NP\)-Complete
- 2-Colorable: a related problem
- Not \(NP\)-Complete
- All verticies that are at distance \(i+1\) from the start must be colored
differently from those at distance \(i\)
- Proof is to color it, return true if it works, else false
- 2-Colorable reduction to 3-Colorable
- Add a dummy vertex to each other node - this takes up one of the
three colors
- Thus, 3-Colorable on that graph will test if it is 2-Colorable
- 3-SAT:
- Terminology:
- Literal: a boolean variable (or its negation)
- Clause: a series of literals, OR’d together
- Input: a series of clauses AND’d together, each clause has at most 3
literals
- Output: true iff there is a setting of boolean variables such that
the expression is true
- \(NP\)-Complete
- The general problem of satisfiability is reduciable to 3-SAT
- Vertex Cover:
- Input: undirected graph \(G\),
integer \(k\)
- Output: true iff there is a set of verticies of size \(k\) such that every edge has at least one
vertex in the set
- \(NP\)-Complete
- “The set of vertices covers all edges”
- Special cases:
- If the graph is a tree, there is a linear time solution
- If the graph is 2-Colorable, there is a polynomial time
solution
- Approximation algorithm for finding minimum vertex cover
- At most \(2x\) larger than
optimal
- While there are edges in \(G\):
- Pick an arbitrary one
- Put both nodes in the vertex cover and delete nodes/edges connecting
to those nodes