CSE 422 (Modern Algos)
James R. Lee, Winter 2024
Lecture 1 - Jan 4
- 9 topics + a bonus topic, one per week
- One mini-project per week, always due on Wed
- “Power of two choices”
- Related problem:
- Put balls in bins, i.i.d.
- “What is the expected number of balls in the most-full bin?”
- It grows via:
- Two-choice strategy:
- Chose two bins uniformly at random
- Throw ball in the least loaded of the two!
- Grows via:
- -choices:
- The same “boost” doesn’t happen again outside of
Lecture 2 - Jan 9
- Basic web caching:
- Akamai was one of the first companies
- Basic web use: use a url to request data from a server
- Caching: use a intermediary cache to request data first
- Hash functions:
- Consistent hashing problem: given a cache distributed over machines, how to quickly figure out
which machine a string is
associated with, while balancing load between machines
- Motivated by web-caching: how can we cache data at URLs?
- The basic solution would be to use a typical hash function and modulo:
- However we need to agree to a fixed , and in real, dynamic environments we
cannot do so
- Changing would have a cost of
- “Circular hashing solution”:
- (Assume the hash outputs are 32-bit for this problem)
- Hash each server, place them on a circle
- When caching data , find
hash , then go clockwise to
first hashed server. This is the cache server
- Expected load per server: where is the number of objects
- When we add another server, we only need to relocate objects
- How to do the “clockwise scan”?
- We want some datastructure which can find the server for a has , such that
- Balanced binary search trees (e.g. red-black trees) can do this in
time
- If is small, we can also use
other data structures
- Variance reduction trick:
- Using basic method, some servers can get overloaded
- Trick: compute hashes for
each server (filling out the circle)
- If , there are
theoretical bounds that servers won’t be overloaded with high
probability
- This also helps when we have servers with differing memory
capacities, server with 2x memory can get 2x the number of
hash-keys!
Lecture 3 - Jan 9
- Majority element problem: given an array of length that has a majority element, (more than
), find that element!
- Simple algo to find linear-time algorithm, keep a counter of
most-seen element, increment/decrement it depending on what is seen
- Heavy hitters problem: given an array of large length and a smaller , compute the values which appear at
least times.
- Many applications: given a large stream of data, compute elements
which appear a certain amount of times
- If can fit in memory: trivial
, sort array, find which
elements appear at least times in
a row
- There is no algo which solves Heavy Hitters in one pass with
sublinear amount of auxillary space
- We cannot store all counts, therefore if possible candidates change,
we won’t know the counts of items which could replace it
- -approximate heavy
hitters problem:
- Input: same as HH, but with additional parameter
- Output: a list where:
- All values in which occur at
least times are in the
list
- All values in the list occur at least times in (provides margin of error)
- Auxillary memory required grows as
- We can’t have as
the memory requirement grows to infinity
- Count-min sketch: a datastructure for -approximate heavy hitters
- Supports two operations:
inc(x)
: increment the count of x
for all i = 1, 2, ..., L: arr[i][hash_i(x)] += 1
- time
count(x)
: get the count of x
return min_i arr[i][hash_i(x)]
- time
- Parameters:
- Number of buckets (larger
than , much smaller than )
- Number of hash functions
(smaller)
- Data structure:
2-D array of ints, init at 0
- The array counts can only underestimate the true counts -
the error is one-sided
- Given reasonable choices of parameters, the expected error rate is
- Role model: bloom filters
- Remembers which elements have been inserted
- Has false positives! Sometimes confirms elements which weren’t
added
- No false negatives however
- 1-byte per element results in false positive rate of less than
2%
- Represents accuracy-space tradeoff
Lecture 4 - Jan 16
- Metric Space:
- , a set of points in the
space
- , a distance function between
two points:
- Norms are a common example of a distance function
- Edit distance: between two strings, what’s the smallest number of
insertions/deletions to make them equal?
- Can either use a exact
algorithm or a fast approximate algorithm
- Jacard Similarity:
- Higher means more similar
- Not a measure of distance
- Used for BoW models between documents to find similarity
- Nearest Neighbor search:
- Trival time: , exact
algorithms
- Gold standard time: ,
approximate algorithms
- -NN: find the class of
a point within times the
actual min distance to a point
- Classical, low-dimensional NN:
- Use space partitioning
- Make a decision tree on space, splitting up on median points
- Works with one dimension
- For more than one dimension, use K-D trees
- Split on median points for one dimension, repeat over different
dimensions
- No longer any guarantee of finding nearest neighbor first
- Time:
- Modern, high-dimensional NN: use locality-sensitive hashing
- Curse of Dimensionality: costs and space grow exponentially w.r.t.
dimension
Lecture 5 - Jan 18
- Volume of the Euclidean ball of radius in dimensions grows proportionally to
- disjoint balls of radius
can fit inside a ball of radius
, for some
- Almost all of the volume of the ball is near the boundary
- This means that space partitioning is much worse in high
dimensions
- Johnson-Lindenstrauss Lemma:
- Let be a random matrix of random variables normalized by
- For some and , let
- For any -point subset , all vectors
, with high probability
we have:
- This means that a random matrix mapping dimensions to dimensions will with high
probability preserve interpoint distances between points
- A matrix where all elements
are positive or negative signs at random also has this property
- Idea is to have many one-dim maps which preserve distance in
expectation, use them in parallel
- MinHash:
- Let be a function with a
random permutation of all elements
- For some , let
- This basically returns the minimum of the random labels
assigned
- The probability of two subsets colliding is equal to their Jaccard
similarity!
- Independence of MinHash:
- Let be the results of
independent hash
functions
- Similarity:
- We have
- Similar to JL Lemma
- As long as for a set of users/subsets
- NN search using MinHash:
- Preprocess all elements by
finding their MinHash:
- When searching for ’s NN,
compute , find all elements
where , search through them
- One dimensional hash: dot product between input and random normal
-
has mean and variance
- Comes from addition of random Gaussian variables being another
Gaussian
Lecture 6 - Jan 23
- Binary classification:
- Given distribution and create classifier function
- Maps between 0 and 1 for this example
- Problem: given vectors with
ground truths, create mapping function
- Train error: error of func on training data
- Test error: error on test data
Lecture 7 - Jan 25
Missed
Lecture 8 - Jan 30
PCA
- How do we get feature vectors from data?
- Good to start with centering data (de-mean them)
- When our data is lower-dimensional vs the vector rep, we can use
linear combinations of vectors to represent it
- PCA:
- Find features which “explain” the data the most
- Maximizes distances between points projected onto the basis
vectors
- Can use it to map DNA sequences of Europeans to locations
- Keep on adding PCA vectors until representation doesn’t improve
Lecture 9 - Feb 1
- Input to PCA is -dim data points, and parameter
- Output is orthonormal
vectors, top principle
components
- This output maximizes:
- We can express our problem as: , where is a orthonormal matrix and
- Solutions are just diagonal matricies with a rotation
- Top principle components are
first rows of in
- Those are the first principle
components of
- PCA gets the eigenvectors of
which have the largest
eignenvalues
- SVD is cubic time, power iteration is quicker
- Power Iteration:
- Algo:
- Given matrix
- Select random unit vector
- For , set If , return
- Finds the first eigenvector
- Stretches out the random vector in the direction of the largest
eigenvector
- Can just repeatedly square instead of
- Number of iterations scales as:
- is
the spectral gap, the difference between how big the stretching is on
the sphere
- Continuing power iteration:
- Find top component using power iteration
- Subtract variance of data explained by
- Recurse to find components
of new data matrix
- Often best to just find many eigenvalues, plot, then pick your