The classical travelling salesman problem (TSP) asks the following question: “Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?” It is a classical NP-hard problem in combinatorial optimization, important in theoretical computer science and operations research.
It turns out that it could be formulated as a linear programming problem. In this blog post, I would like to implement a linear programming solver for the TSP problem.
Travelling Salesman Problem
Miller-Tucker-Zemlin (MTZ) Integer Linear Programming Formulation
There are two notable TSP integer linear programming formulations, Miller-Tucker-Zemlin (MTZ) formulation and the Dantzig-Fulkerson-Johnson (DFJ) formulation. We will be focusing on the MTZ formulation, as it is easier to program using the existing integer linear programming library.
Suppose we have $n$ cities, from $1$ to $n$. We define a binary square matrix $X \in \{0,1\}^{n \times n}$, where
$$ x_{ij} = \begin{cases} 1 & \text{if the path goes from city $i$ to city $j$}\\ 0 & \text{otherwise}\\ \end{cases} \\ $$
and a distance square matrix $C \in \mathbb{R}^{n \times n}$, where $c_{ij}$ is the distance from city $i$ to city $j$.
It is trivial to construct the integer linear programming optimization problem with the following constraints for the TSP problem.
However, with these constraints, we will likely have subtours in our solution, which violates the TSP setting that the travelling salesman has to return to the original city. Subtours are just the incorrect TSP solution which contains several directed cycles. For example, suppose we have six cities, $1, 2, 3, 4, 5, 6$, without additional constraints, $x_{1,2}=1, x_{2,3}=1, x_{3,1}=1, x_{4,5}=1, x_{5,6}=1, x_{6,4}=1$ ($1 \rightarrow 2 \rightarrow 3 \rightarrow 1$, $4 \rightarrow 5 \rightarrow 6 \rightarrow 4$) are one of the valid solutions for optimization but they are incorrect for the TSP settings.
To prevent subtours, Miller, Tucker, and Zemlin introduces additional auxiliary integer variables $u_2, u_3, \cdots, u_n$ and additional constraints.
$$ \begin{gather} u_i \in \mathbb{Z} \quad \forall i \in [2,n] \\ 1 \leq u_i \leq n - 1 \quad 2 \leq i \leq n \\ u_i - u_j + n x_{ij} \leq n - 1 \quad 2 \leq i \neq j \leq n \\ \end{gather} $$
With the constraints, if there are more than one directed cycles, at least one cycle will not contain node $1$ and the inequality of the $u$ variables will not be satisfied. For the previous incorrect TSP solution, we will have $u_4 < u_5 < u_6 < u_4$.
Also notice that linear function is both convex and concave (please show a simple proof on your own), which implies that every local optimum for the linear function is a global optimum. This means that the optimum solution from linear programming solver for the TSP problem is global optimum.
Install Dependencies
We will install and use pulp and glpk linear programming solvers to solve the TSP problems.
TSP Integer Linear Programming Solver Python Implementation
We implemented a brute force TSP solver and an integer linear programming TSP solver, and verified the correctness of the two solvers on symmetric TSP problems. The two solvers could also be used for asymmetric TSP problems.
from itertools import permutations import random import math from timeit import default_timer as timer from datetime import timedelta from typing importList, Tuple, Union, Callable import numpy as np from pulp import LpMaximize, LpMinimize, LpProblem, LpStatus, lpSum, LpVariable, LpBinary, LpInteger, lpDot from pulp import GLPK
model = LpProblem(name="TSP-Problem", sense=LpMinimize)
x_variables = [] for i inrange(n): for j inrange(n): x_variables.append(LpVariable(name=f"x_{i+1}_{j+1}", cat=LpBinary)) # The first value in the u_variables is a dummy value and is never used. # Depending on the literature, the lower bound might be 2, and the upper bound might be n. It is actually equivalent. u_variables = [1] + [ LpVariable(name=f"u_{i+2}", lowBound=1, upBound=n - 1, cat=LpInteger) for i inrange(n - 1) ]
# Add objectives. model += lpDot(x_variables, distance_list)
# Add constraints. # sum_{i=1, i \neq j}^{n} x_{ij} = 1 for all j for j inrange(1, n + 1): sum_exp = 0 for i inrange(1, n + 1): if i != j: idx_id = idx_2d_to_1d(i=i, j=j, width=n) sum_exp += x_variables[idx_id] model += sum_exp == 1
# sum_{j=1, j \neq i}^{n} x_{ji} = 1 for all i for i inrange(1, n + 1): sum_exp = 0 for j inrange(1, n + 1): if j != i: idx_id = idx_2d_to_1d(i=i, j=j, width=n) sum_exp += x_variables[idx_id] model += sum_exp == 1
# u_i - u_j + n x_{ij} \neq n - 1 for all 2 <= i, j <= n and i \neq j for i inrange(2, n + 1): for j inrange(2, n + 1): if i != j: idx_id = idx_2d_to_1d(i=j, j=i, width=n) model += u_variables[i - 1] - u_variables[ j - 1] + n * x_variables[idx_id] <= n - 1
status = model.solve(solver=GLPK(msg=False))
assert status == True, "Linear programming solver did not solve the problem successfully."
# Collect TSP solution non_zero_idx = [] for variable in model.variables(): if variable.name.startswith("x_") and variable.value() == 1: _, i, j = variable.name.split("_") i = int(i) j = int(j) non_zero_idx.append((i, j)) assertlen(non_zero_idx) == n
Although the linear programming solver is much faster ($100\times$) than the brute force solver for the TSP problems in Python when the number of nodes is small, the linear programming solver will still hang as the number of nodes becomes larger ($\geq 23$) because the number of variables and constraints grow dramatically.