Lei Mao bio photo

Lei Mao

Machine Learning, Artificial Intelligence, Computer Science.

Twitter Facebook LinkedIn GitHub   G. Scholar E-Mail RSS

Introduction

Graph is a common data structure that represents the relationships between different entities using edges and vertices. In practice, a graph can be extremely large, and there is a need to traverse or search the graph along the edges. Breadth-first search and depth-first search algorithms are the most commonly used traverse and search algorithm for graphs.

Graph Search and Traversal

In this blog post, I would like to discuss the implementations of directed and undirected graphs, breath-first and depth-first search algorithms.

Graph Search and Traversal

Graph search and traversal are fundamentally the same. We will use the terms, search and traversal, interchangeably.

Graph Representation and Implementation

Graph is commonly represented using adjacency matrix or adjacency list. Adjacency matrix is not favored for large graphs because usually the edges in large graphs are sparse. Adjacency list uses the least amount of the memory, but querying the specified neighbor of a node can be inefficient in a graph whose edges are dense. So we uses adjacency set instead which is fast for query and not too bad for memory.


Graph can be directed and undirected. There could also be isolated vertices in the graph that have no edges. There could also be vertices that connects with themselves. These graph features should not be ignored and have to be implemented accordingly.

from typing import Iterable, List, Union, Set


class Graph:
    def __init__(self) -> None:

        # Adjacency set representation.
        # key: node name, value: adjacency node name set.
        self.edges = dict()

    def add_node(self, u: Union[int, str]) -> None:

        # Add a node.
        if u not in self.edges:
            self.edges[u] = set()

    def add_edge(self, u: Union[int, str], v: Union[int, str]) -> None:

        # Add an edge from u to v.
        self.add_node(u)
        self.add_node(v)
        self.edges[u].add(v)

    def get_neighbors(self, u: Union[int, str]) -> Set[Union[int, str]]:

        if u not in self.edges:
            raise RuntimeError(f"Node {u} does not exist in the graph.")

        return self.edges[u]

    def get_num_nodes(self) -> int:

        return len(self.edges)


class UndirectedGraph(Graph):
    def __init__(self) -> None:

        super(UndirectedGraph, self).__init__()

    def add_edge(self, u: Union[int, str], v: Union[int, str]) -> None:

        # Add an edge from u to v and from v to u.
        self.add_node(u)
        self.add_node(v)
        self.edges[u].add(v)
        self.edges[v].add(u)


class DirectedGraph(Graph):
    def __init__(self) -> None:

        super(DirectedGraph, self).__init__()

        # key: child name, value: parents set
        self.parents = dict()
        # key: parent name, value: children set
        self.children = self.edges

    def add_node(self, u: Union[int, str]) -> None:

        if u not in self.children:
            self.children[u] = set()
        if u not in self.parents:
            self.parents[u] = set()

    def add_edge(self, u: Union[int, str], v: Union[int, str]) -> None:

        self.add_node(u)
        self.add_node(v)
        self.children[u].add(v)
        self.parents[v].add(u)

    def get_parents(self, u: Union[int, str]) -> Set[Union[int, str]]:

        if u not in self.parents:
            raise RuntimeError(f"Node {u} does not exist in the graph.")

        return self.parents[u]

    def get_children(self, u: Union[int, str]) -> Set[Union[int, str]]:

        if u not in self.children:
            raise RuntimeError(f"Node {u} does not exist in the graph.")

        return self.children[u]

Breadth-First Search (BFS)

The breadth-first search algorithm explores all the nodes at the current level before it explores the next level. To achieve this, it uses a first-in-first-out (FIFO) queue.

# Graph search/traversal algorithms.
# These algorithms could be modified to achieve different goals for graphs.
from typing import Iterable, List, Union, Set
from queue import Queue
from graph import Graph, DirectedGraph, UndirectedGraph


def breadth_first_traversal(graph: Graph, start_nodes: List[int]) -> List[int]:

    # BFS supports multiple start nodes.
    # O(V + E) where V is the number of nodes and E is the number of edges.

    search_order = []

    num_nodes = graph.get_num_nodes()
    # First in first out (FIFO)
    q = Queue(maxsize=num_nodes)

    # Store the nodes that have been visited.
    visited = set()

    # Mark the source node as visited and enqueue it.
    for node in start_nodes:
        q.put(node)
        visited.add(node)

    while not q.empty():
        src_node = q.get()
        search_order.append(src_node)
        for tgt_node in graph.get_neighbors(src_node):
            if tgt_node not in visited:
                q.put(tgt_node)
                visited.add(tgt_node)

    return search_order

Depth-First Search (DFS)

The depth-first search algorithm explores as far as possible along each branch before backtracking. It can be implemented recursively or iteratively. The recursive algorithm naturally explores the depth first but is not favored for large graphs because of stack overflow. The iterative algorithm uses a last-in-first-out (LIFO) stack to explore the depth first.

def recursive_depth_first_traversal(graph: Graph,
                                    start_node: int) -> List[int]:

    # Recursive algorithm is not favored for large graphs because of stack overflow.
    # O(V + E) where V is the number of nodes and E is the number of edges.

    def depth_first_recursion(graph: Graph, visited: Set, src_node: int,
                              search_order: List[int]):

        visited.add(src_node)
        search_order.append(src_node)
        for tgt_node in graph.get_neighbors(src_node):
            if tgt_node not in visited:
                depth_first_recursion(graph=graph,
                                      visited=visited,
                                      src_node=tgt_node,
                                      search_order=search_order)

    # Store the nodes that have been visited.
    visited = set()
    search_order = []

    if start_node not in visited:
        depth_first_recursion(graph=graph,
                              visited=visited,
                              src_node=start_node,
                              search_order=search_order)

    return search_order


def iterative_depth_first_traversal(graph: Graph,
                                    start_node: List[int]) -> List[int]:

    # O(V + E) where V is the number of nodes and E is the number of edges.

    # In Python, the list data structure can be used as stack.

    # Store the nodes that have been visited.
    visited = set()
    search_order = []
    # Last in first out (LIFO)
    stack = list()

    stack.append(start_node)
    while len(stack) != 0:
        src_node = stack.pop()
        if src_node not in visited:
            visited.add(src_node)
            search_order.append(src_node)
            for tgt_node in graph.get_neighbors(src_node):
                stack.append(tgt_node)

    return search_order

Example

We created a dummy directed graph and a dummy undirected graph to test our search algorithms.

def create_dummy_directed_graph() -> DirectedGraph:

    graph = DirectedGraph()
    graph.add_edge(0, 1)
    graph.add_edge(0, 2)
    graph.add_edge(1, 2)
    graph.add_edge(2, 0)
    graph.add_edge(2, 3)
    graph.add_edge(3, 3)
    # Add an isolated node without connections.
    graph.add_node(4)

    return graph


def create_dummy_undirected_graph() -> UndirectedGraph:

    graph = UndirectedGraph()
    graph.add_edge(1, 2)
    graph.add_edge(1, 3)
    graph.add_edge(2, 5)
    graph.add_edge(3, 5)
    graph.add_edge(2, 4)
    graph.add_edge(4, 5)
    graph.add_edge(4, 6)
    graph.add_edge(5, 6)
    # Add an isolated node without connections.
    graph.add_node(0)

    return graph


if __name__ == "__main__":

    directed_graph = create_dummy_directed_graph()
    undirected_graph = create_dummy_undirected_graph()

    search_order = breadth_first_traversal(graph=directed_graph,
                                           start_nodes=[2, 0])
    print(search_order)
    search_order = breadth_first_traversal(graph=undirected_graph,
                                           start_nodes=[2, 0])
    print(search_order)

    search_order = recursive_depth_first_traversal(graph=directed_graph,
                                                   start_node=2)
    print(search_order)
    search_order = recursive_depth_first_traversal(graph=undirected_graph,
                                                   start_node=2)
    print(search_order)

    search_order = iterative_depth_first_traversal(graph=directed_graph,
                                                   start_node=2)
    print(search_order)
    search_order = iterative_depth_first_traversal(graph=undirected_graph,
                                                   start_node=2)
    print(search_order)

The expected outputs from the terminal are

$ python search.py 
[2, 0, 3, 1]
[2, 0, 1, 4, 5, 3, 6]
[2, 0, 1, 3]
[2, 1, 3, 5, 4, 6]
[2, 3, 0, 1]
[2, 5, 6, 4, 3, 1]

Note that the iterative and recursive depth-first search algorithms do not necessarily result in the same traversal order.

Source Code

All the source code could be found on GitHub Gist.

Miscellaneous

Many search and traversal applications in practice are implemented using modified breath-first and depth-first search algorithms.

Reference