Data structure cheat sheet: Difference between revisions

From Celeste@Hoppinglife
Jump to navigation Jump to search
 
(14 intermediate revisions by the same user not shown)
Line 10: Line 10:
=== Arrays ===
=== Arrays ===


== Sorting ==
== Sorting and Ordering ==


=== Simple Sort ===
=== Simple Sort ===
Line 18: Line 18:
=== Quick Sort ===
=== Quick Sort ===


The core of the quicksort is dividing the sorting array into half and repeat this operation. The dividing procedure is called ''partition'', and there are two typical implementations of it: Lomuto and Hoare partition. The lomuto partition process the elements one by one, and swap the elements smaller then the pivot to the left side. Hoare partition goes through the arrays on both ends, swap pairs on the wrong side, and end the progress when the two scan meets.
The core of the quicksort is dividing the sorting array into half and repeat this operation. The dividing procedure is called ''partition'', and there are two typical implementations of it: Lomuto and Hoare partition. The Lomuto partition process the elements one by one, and swap the elements smaller than the pivot to the left side. Hoare partition goes through the arrays on both ends, swap pairs on the wrong side, and end the progress when the two scan meets.


Quicksort on average takes <math>\Theta(n lg n)</math> time. Worse case it can cause <math>\Theta(n^2)</math>. There are a number of algorithms based on the idea of doing partitions and processing either or both sides of it. If only one side of it is processed, it uses O(n) time.
Quicksort on average takes <math>\Theta(n lg n)</math> time. Worse case it can cause <math>\Theta(n^2)</math>. There are a number of algorithms based on the idea of doing partitions and processing either or both sides of it. If only one side of it is processed, it uses O(n) time.


=== Merge Sort ===
=== Merge Sort ===
=== Median and order statistics ===
Selecting min/max from the array takes linear time, and selecting the general <math>k^{th}</math> element takes a expected <math>\Theta(n)</math> time using a method similar to quicksort.


== Tree-based Structures ==
== Tree-based Structures ==
=== Binary Tree ===
=== Binary Tree ===
==== Property ====
==== Property ====
* A full n-level binary tree have <math>2^n - 1</math> elements.
* A n-node binary tree have <math>\lceil log n \rceil + 1 </math> levels.
==== Storage ====
==== Traversal ====
==== Traversal ====
General traversal of binary trees all takes <math>O(n)</math> time, except for the root node, exactly two times of the search routine was called for each node.
For Binary Search Trees, Min/Max is easy - just follow the left/right link.  Getting successor/predecessor is tricker: the next element of x is either a) the minimum element of the right subtree, or b) the lowest ancestor of x whose left child is also the ancestor of x.
==== Insertion and Deletion ====
Insertion is relatively simple, just follows the search procedure until you find an empty node.
Deletion is more complicated: if the target <math>z</math> has two children, we find the successor of <math>z</math>, if it is the right child of <math>z</math>, it can be replaced by y, otherwise, we first replace y by its right child, then replace z using y.


=== Heap ===
=== Heap ===
Line 90: Line 109:
Fundamentally, the representation of a graph is always equivalent to an adjacent matrix. The most usual methods are uncompressed matrix, adjacent list, and compressed matrix.
Fundamentally, the representation of a graph is always equivalent to an adjacent matrix. The most usual methods are uncompressed matrix, adjacent list, and compressed matrix.
=== Traverse ===
=== Traverse ===
=== Properties ===
* BFS is the simplest traverse one can conduct on a graph. Useful to do the basic shortest path discoveries.
=== Algorithms ===
* DFS is useful for establishing strongly connected components. It can also be used to classify edges in the graph (tree edge, back, forward, cross).
=== Topological sort and Strongly Connected Components ===
* As each vertex is finished, insert it into the front of a linked list.
* For directional graphs, its strongly connected graph can be computed in the following way: 1) do a DFS to establish topological order, 2) do the DFS on the transposed graph using the order in reverse order of its finishing time.
=== Minimum Spanning Trees ===
* Kruskal: add the lowest edge that connected different components. <math>O(E lg V)</math>
* Prim: find the light edge between the sites. <math>O(E lg V)</math>
=== Shortest Paths ===
* Bellman-Ford: relax each edge V times, O(VE).
* DAG can be done in O(V+E) time, by looking at it in topological order.
* Dijkstra: relax all the edges of the newly visited node. O(V^2)


== Dynamic Programming ==
== Dynamic Programming ==
Line 99: Line 128:


== Searching ==
== Searching ==
=== Depth-first search / backtracking ===
=== Breath-first search ===
== Mathematical Problems ==
=== Permutations and Combinations ===
==== Generate next lexicographic permutation ====
* find the longest non-increasing suffix:
<pre>67[1]5432</pre>
* find the last element that is larger than the previous one:
<pre>67[1]543[2]</pre>
* swap
<pre>67[2]543[1]</pre>
* reverse the tail:
<pre>672[1345]</pre>
==== Generate the k-th permutation of N ====
The idea is to generate the permutation one element by one element:
we can determine the first element of the permutation by comparing k with N,
then repeat this procedure.
==== Generate permutations with the duplications ====
This can be done with a simple backtracking search.
==== Heap's algorithm ====

Latest revision as of 02:52, 27 October 2020

A quick cheat sheet on common algorithms and data structures:

Linear Structures

Linked Lists

Basic Implementation

String

Related Algorithms

Arrays

Sorting and Ordering

Simple Sort

Insertion sort.

Quick Sort

The core of the quicksort is dividing the sorting array into half and repeat this operation. The dividing procedure is called partition, and there are two typical implementations of it: Lomuto and Hoare partition. The Lomuto partition process the elements one by one, and swap the elements smaller than the pivot to the left side. Hoare partition goes through the arrays on both ends, swap pairs on the wrong side, and end the progress when the two scan meets.

Quicksort on average takes Θ(nlgn) time. Worse case it can cause Θ(n2). There are a number of algorithms based on the idea of doing partitions and processing either or both sides of it. If only one side of it is processed, it uses O(n) time.

Merge Sort

Median and order statistics

Selecting min/max from the array takes linear time, and selecting the general kth element takes a expected Θ(n) time using a method similar to quicksort.

Tree-based Structures

Binary Tree

Property

  • A full n-level binary tree have 2n1 elements.
  • A n-node binary tree have logn+1 levels.

Storage

Traversal

General traversal of binary trees all takes O(n) time, except for the root node, exactly two times of the search routine was called for each node.

For Binary Search Trees, Min/Max is easy - just follow the left/right link. Getting successor/predecessor is tricker: the next element of x is either a) the minimum element of the right subtree, or b) the lowest ancestor of x whose left child is also the ancestor of x.

Insertion and Deletion

Insertion is relatively simple, just follows the search procedure until you find an empty node.

Deletion is more complicated: if the target z has two children, we find the successor of z, if it is the right child of z, it can be replaced by y, otherwise, we first replace y by its right child, then replace z using y.

Heap

A (binary) heap is a complete binary tree that keeps a sepcific condition of its nodes: For a max-heap, A[parent[i]]>=A[i], for a min-heap, A[parent[i]]<=A[i]. A heap can be used to maintain a priority queue. A heap is often stored as a continous vector.

Heapify

Heapify is a fundamental operation to keep the heap property when there is a new node inserted into the root.

Complexity: Θ(lgn). You can prove that using recusion master theroem.

Build a heap

To build a heap, call heapify() n/2 times for all non-leaf nodes.

Complexity: Θ(n)

Heapsort

To sort an array in increasing order, build a max heap, put the first element to the final position of the array, and maintain the heap property by calling heapify with the remaining element.

Complexity:Θ(nlgn)

Priority Queue

A (max) priority queue support four operations: insert, max, extract-max, increase-key. With a heap, these four operations can be done using Θ(lgn), Θ(1), Θ(lgn), Θ(lgn) time.

Codes

void max_heapify(Node* A, int size, int start) {
    auto largest = start;
    if(start * 2 < size && A[largest] < A[start * 2])
        largest = start * 2;
    if(start * 2 + 1 < size && A[largest] < A[start * 2 + 1])
        largest = start * 2 + 1;
    if(largest != start) {
        swap(A[start], A[largest]);
        maxheapify(A, size, largest);
    }
}


void build_max_heap(Node* A, int size) {
  for(int i = size / 2; i >= 0; --i) {
    max_heapify(A, size, i);
  }
}

n-ary Tree

Union Find

Hashing Table

Graphs

Storage

Fundamentally, the representation of a graph is always equivalent to an adjacent matrix. The most usual methods are uncompressed matrix, adjacent list, and compressed matrix.

Traverse

  • BFS is the simplest traverse one can conduct on a graph. Useful to do the basic shortest path discoveries.
  • DFS is useful for establishing strongly connected components. It can also be used to classify edges in the graph (tree edge, back, forward, cross).

Topological sort and Strongly Connected Components

  • As each vertex is finished, insert it into the front of a linked list.
  • For directional graphs, its strongly connected graph can be computed in the following way: 1) do a DFS to establish topological order, 2) do the DFS on the transposed graph using the order in reverse order of its finishing time.

Minimum Spanning Trees

  • Kruskal: add the lowest edge that connected different components. O(ElgV)
  • Prim: find the light edge between the sites. O(ElgV)

Shortest Paths

  • Bellman-Ford: relax each edge V times, O(VE).
  • DAG can be done in O(V+E) time, by looking at it in topological order.
  • Dijkstra: relax all the edges of the newly visited node. O(V^2)

Dynamic Programming

Recursion

Divide and conquer

Searching

Depth-first search / backtracking

Breath-first search

Mathematical Problems

Permutations and Combinations

Generate next lexicographic permutation

  • find the longest non-increasing suffix:
67[1]5432
  • find the last element that is larger than the previous one:
67[1]543[2]
  • swap
67[2]543[1]
  • reverse the tail:
672[1345]

Generate the k-th permutation of N

The idea is to generate the permutation one element by one element: we can determine the first element of the permutation by comparing k with N, then repeat this procedure.

Generate permutations with the duplications

This can be done with a simple backtracking search.

Heap's algorithm