Solving Sudoku Algebraically
Introduction
In today’s post I will show you how to solve a sudoku using tools from algebraic geometry.
Sudokus consist of a 9x9 grid containing some numbers, together with some special requirements. The task is to fill in the missing numbers. If you do not know what a sudoku is, please take a look here.
The usual approach for solving sudokus is a back-tracking approach. The first empty cell is filled in with a one and the conditions are checked. If the conditions are met, the algorithm continues with the next cell. If the conditions are not met, the current number is increased. If the current number is one, then execution continues at the previous number. This is the usual algorithmic approach for solving a sudoku. However, this does not give us any intuition about the problem.
In this blog post I will define the sudoku as a set of polynomial equations. These equations can be solved in order to yield the solution (supposing the solution is unique). As we are using tools from algebraic geometry, we can get more intuition about sudokus (at least if you are a mathematician). If you are not a mathematician I hope you find the presented methods beautiful nevertheless. If you are just interested in the code without so much math, scroll down.
Idea
First, we introduce variables \(x_0, ..., x_{80}\), one for each cell in the sudoku. Formally, we are now dealing with the polynomial ring \(\mathbb{Q}[x_0, ..., x_{80}]\). The idea is to encode the conditions on the sudoku as a set of polynomials \(F\subset \mathbb{Q}[x_0, ..., x_{80}]\). Using Hilberts Nullstellensatz the vanishing set \(V(\langle F\rangle)\) of the ideal generated by \(F\) corresponds to an algebraic variety \(V(\langle F\rangle)\subset \mathbb{Q}^{81}\). If we have done everything correctly then \(F\) is a maximal ideal and therefore its form is \(\langle x_0-a_0, ..., x_{80}-a_{80}\rangle\) from which we can simply read the solution.
So, from an intuitive point of view, each polynomial within \(F\) defines some subspace in that 81-dimensional space. If take the locus where all polynomials vanish, then this point should correspond to the solution of the sudoku.
The equations
In this paragraph, I will define the necessary equations which we put in F.
For restricting the values to whole numbers from 1 to 9, we can define the following polynomials for all \(i=0, ..., 80\).
\[\begin{align*} F_i=(x_i-1)\cdot(x_i-2)\cdot(x_i-3)\cdot(x_i-4)\cdot(x_i-5)\cdot(x_i-6)\cdot(x_i-7)\cdot(x_i-8)\cdot(x_i-9) \end{align*}\]It is clear that this polynomial vanishes exactly at \(1,2,...,9\).
Next, we want to define polynomials for the condition that each number can only be used once in each column/row, and in each block. This can be done by defining that the sum of all columns/rows and of each block should be \(45\). Its product should be \(362880\). And with these two conditions we have no duplicate numbers.
Addendum from 2024-03-08: This is not true. Matthias Paulsen informed me by mail that these two properties do not uniquely determine that each number is used exactly once. Take as example the numbers 1, 2, 4, 4, 4, 5, 7, 9, 9. Then their sum is also \(45\) and their product is \(362880\). Thus, my sudoku solver may yield more (incorrect) solutions. Thanks, Matthias. Instead a correct set of polynomials is \(G_ij=(F_i/F_j)/(x_i-x_j)\) for all \(i, j\), with \(i\not =j\) and \(x_i\) and \(x_j\) in the same block and row. However, this leads to my program becoming unbearably slow.
As a last step, some of the cells in the sudoku are already filled out. We call the numbers there \(a_j\), where \(J\subset\{1, ..., 80\}\). Then, we can define new polynomials for each \(j\).
\[\begin{align*} x_j-a_j \end{align*}\]Setting these two zero gives us a hyperplane that cuts the space at the desired point.
Solving the equations
Now we have a system of polynomial equations in \(81\) unknowns. In particular we have 135 equations plus one for each given cell in the sudoku. How do we solve this system?
For linear systems of equations there is Gaussian elimination which transforms a system of linear equations into a standard representation. In the case of linear equations, this normal form is a triangular matrix. For systems of polynomial equations there is also a normal form, called the Gröbner basis. And there is Buchberger’s algorithm which can compute a Gröbner basis. In a very real sense Buchberger’s algorithm is a generalization of Gaussian elimination and in the case of linear equations those two algorithms are equal.
So, in summary we can use Buchberger’s algorithm to compute a Gröbner basis of the equations and then read off the solution to the sudoku.
Code
In this section you can find the code. Please be aware that the code is slow. This is not due to my coding skills, but due to the Buchberger algorithm being slow.
from sympy import groebner, symbols
x = [symbols('x%d' % i) for i in range(81)]
sudoku = [0,3,0,0,0,0,0,0,0,
0,0,0,1,9,5,0,0,0,
0,0,8,0,0,0,5,6,7,
8,0,0,0,6,0,0,0,0,
4,0,0,8,0,0,0,0,1,
0,0,0,0,2,0,0,0,0,
0,6,0,0,0,0,2,8,0,
0,0,0,4,1,9,0,0,5,
0,0,0,0,0,0,0,7,0]
% solution = [5,3,4,6,7,8,9,1,2,
% 6,7,2,1,9,5,3,4,8,
% 1,9,8,3,4,2,5,6,7,
% 8,5,9,7,6,1,4,2,3,
% 4,2,6,8,5,3,7,9,1,
% 7,1,3,9,2,4,8,5,6,
% 9,6,1,5,3,7,2,8,4,
% 2,8,7,4,1,9,6,3,5,
% 3,4,5,2,8,6,1,7,9]
F=[]
# Restrict values to 1-9
for xi in x:
F.append((xi-1)*(xi-2)*(xi-3)*(xi-4)*(xi-5)*(xi-6)*(xi-7)*(xi-8)*(xi-9))
# Sum of lines = 45
for i in range(0,80,9):
F.append(x[i]+x[i+1]+x[i+2]+x[i+3]+x[i+4]+x[i+5]+x[i+6]+x[i+7]+x[i+8]-45)
for i in range(9):
F.append(x[i]+x[i+9]+x[i+18]+x[i+27]+x[i+36]+x[i+45]+x[i+54]+x[i+63]+x[i+72]-45)
# Product of lines = 362880
for i in range(0,80,9):
F.append(x[i]*x[i+1]*x[i+2]*x[i+3]*x[i+4]*x[i+5]*x[i+6]*x[i+7]*x[i+8]-362880)
for i in range(9):
F.append(x[i]*x[i+9]*x[i+18]*x[i+27]*x[i+36]*x[i+45]*x[i+54]*x[i+63]*x[i+72]-362880)
# Blocks sum to 45
for i in [0,3,6,27,30,33,54,57,60]:
F.append(x[i]+x[i+1]+x[i+2]+x[i+9]+x[i+10]+x[i+11]+x[i+18]+x[i+19]+x[i+20]-45)
# Product of numbers in blocks = 362880
for i in [0,3,6,27,30,33,54,57,60]:
F.append(x[i]*x[i+1]*x[i+2]*x[i+9]*x[i+10]*x[i+11]*x[i+18]*x[i+19]*x[i+20]-362880)
# Addendum from 2024-03-08: As already written above, these
# polynomials are not enough. To restrict the Sudoku solver to only
# output valid solutions, the following equations can be used instead
# of the line and block condition. Note that the program becomes
# unbearably slow.
#
# def G(y):
# for xi in y:
# fi=(xi-1)*(xi-2)*(xi-3)*(xi-4)*(xi-5)*(xi-6)*(xi-7)*(xi-8)*(xi-9)
# for xj in y:
# if xi != xj:
# fj=(xj-1)*(xj-2)*(xj-3)*(xj-4)*(xj-5)*(xj-6)*(xj-7)*(xj-8)*(xj-9)
# F.append((fi-fj)/(xi-xj))
#
# # all lines and blocks that contain different numbers
# G((x[0],x[1],x[2],x[3],x[4],x[5],x[6],x[7],x[8]))
# G((x[9],x[10],x[11],x[12],x[13],x[14],x[15],x[16],x[17]))
# G((x[18],x[19],x[20],x[21],x[22],x[23],x[24],x[25],x[26]))
# G((x[27],x[28],x[29],x[30],x[31],x[32],x[33],x[34],x[35]))
# G((x[36],x[37],x[38],x[39],x[40],x[41],x[42],x[43],x[44]))
# G((x[45],x[46],x[47],x[48],x[49],x[50],x[51],x[52],x[53]))
# G((x[54],x[55],x[56],x[57],x[58],x[59],x[60],x[61],x[62]))
# G((x[63],x[64],x[65],x[66],x[67],x[68],x[69],x[70],x[71]))
# G((x[72],x[73],x[74],x[75],x[76],x[77],x[78],x[79],x[80]))
# G((x[0],x[9],x[18],x[27],x[36],x[45],x[54],x[63],x[72]))
# G((x[1],x[10],x[19],x[28],x[37],x[46],x[55],x[64],x[73]))
# G((x[2],x[11],x[20],x[29],x[38],x[47],x[56],x[65],x[74]))
# G((x[3],x[12],x[21],x[30],x[39],x[48],x[57],x[66],x[75]))
# G((x[4],x[13],x[22],x[31],x[40],x[49],x[58],x[67],x[76]))
# G((x[5],x[14],x[23],x[32],x[41],x[50],x[59],x[68],x[77]))
# G((x[6],x[15],x[24],x[33],x[42],x[51],x[60],x[69],x[78]))
# G((x[7],x[16],x[25],x[34],x[43],x[52],x[61],x[70],x[79]))
# G((x[8],x[17],x[26],x[35],x[44],x[53],x[62],x[71],x[80]))
# G((x[0],x[1],x[2],x[9],x[10],x[11],x[18],x[19],x[20]))
# G((x[3],x[4],x[5],x[12],x[13],x[14],x[21],x[22],x[23]))
# G((x[6],x[7],x[8],x[15],x[16],x[17],x[24],x[25],x[26]))
# G((x[27],x[28],x[29],x[36],x[37],x[38],x[45],x[46],x[47]))
# G((x[30],x[31],x[32],x[39],x[40],x[41],x[48],x[49],x[50]))
# G((x[33],x[34],x[35],x[42],x[43],x[44],x[51],x[52],x[53]))
# G((x[54],x[55],x[56],x[63],x[64],x[65],x[72],x[73],x[74]))
# G((x[57],x[58],x[59],x[66],x[67],x[68],x[75],x[76],x[77]))
# G((x[60],x[61],x[62],x[69],x[70],x[71],x[78],x[79],x[80]))
# Fix points
# Due to how the Buchberger algorithm for computing Gröbner bases
# works, this is significantly faster, if we prepend the equations.
for count, val in enumerate(sudoku):
if val != 0:
F.insert(0,x[count]-val)
G = groebner(F)
If you execute the code and wait long enough, then G contains the Gröbner basis representation of the maximal ideal corresponding to the solution of the sudoku as shown below.
GroebnerBasis([x0 - 5, x1 - 3, x2 - 4, x3 - 6, x4 - 7, x5 - 8, x6 - 9, x7 - 1, x8 - 2, x9 - 6, x10 - 7, x11 - 2, x12 - 1, x13 - 9, x14 - 5, x15 - 3, x16 - 4, x17 - 8, x18 - 1, x19 - 9, x20 - 8, x21 - 3, x22 - 4, x23 - 2, x24 - 5, x25 - 6, x26 - 7, x27 - 8, x28 - 5, x29 - 9, x30 - 7, x31 - 6, x32 - 1, x33 - 4, x34 - 2, x35 - 3, x36 - 4, x37 - 2, x38 - 6, x39 - 8, x40 - 5, x41 - 3, x42 - 7, x43 - 9, x44 - 1, x45 - 7, x46 - 1, x47 - 3, x48 - 9, x49 - 2, x50 - 4, x51 - 8, x52 - 5, x53 - 6, x54 - 9, x55 - 6, x56 - 1, x57 - 5, x58 - 3, x59 - 7, x60 - 2, x61 - 8, x62 - 4, x63 - 2, x64 - 8, x65 - 7, x66 - 4, x67 - 1, x68 - 9, x69 - 6, x70 - 3, x71 - 5, x72 - 3, x73 - 4, x74 - 5, x75 - 2, x76 - 8, x77 - 6, x78 - 1, x79 - 7, x80 - 9], x0, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14, x15, x16, x17, x18, x19, x20, x21, x22, x23, x24, x25, x26, x27, x28, x29, x30, x31, x32, x33, x34, x35, x36, x37, x38, x39, x40, x41, x42, x43, x44, x45, x46, x47, x48, x49, x50, x51, x52, x53, x54, x55, x56, x57, x58, x59, x60, x61, x62, x63, x64, x65, x66, x67, x68, x69, x70, x71, x72, x73, x74, x75, x76, x77, x78, x79, x80, domain='ZZ', order='lex')
So we can see that \(x0\) is \(5\), \(x1\) is \(3\), and so on.
FAQ
- Your algorithm is stupidly slow
Yes. That is due to the Buchberger algorithm being slow. If you are looking for performance, you need a different approach. You could try to choose a different monomial ordering or even rotate the sudoku. All of this can impact the performance. In fact, the fast computation of Gröbner bases is a current research topic.
- Where did you find the inspiration?
Back when I was studying math at university, I read “A First Course in Computational Algebraic Geometry” by Wolfram Decker and Gerhard Pfister. There, in the last chapter, there is an approach to solve sudokus using Gröbner bases. However, the approach given there is different. In particular they are using a different set of equations.