Sudoku Puzzle Complexity

2 downloads 0 Views 684KB Size Report
Sciences, University of Glamorgan, Pontypridd, Mid Glamorgan, CF37. 1DL, United Kingdom (phone: +44 (0)1443 48 3486; e-mail: skjones@ glam.ac.uk).
19

Sudoku Puzzle Complexity Sian K. Jones, Paul A. Roach and Stephanie Perkins, Faculty of Advanced Technology  Abstract—A Sudoku grid is a 9×9 grid, arranged into nine 3×3 mini-grids each containing the values 1,…,9 such that no value is repeated in any row, column (or mini-grid). The Sudoku grid has been shown to be an interesting mathematical structure. A Sudoku puzzle contains some given values, enabling a solver to arrange the missing values so as to complete the grid uniquely. Typically, published puzzles (such as those found in newspapers or puzzle books) are accompanied by ratings such as 'easy', 'medium' and 'hard' for example. These complexity ratings are assigned through a number of different means, such as by the length of time taken, or the difficulty of the methods required, to solve the puzzle. In this paper, a new measure of complexity is defined which is related to the number of states in a search space for its automatic solution. This complexity measure is compared to those of a test set of 100 published puzzles, which leads to the conclusion that the size of a search space is a useful measure for identifying the complexity of the puzzle. However, the search space size is shown to not provide all the information required to rate a puzzle. Index Terms—Sudoku, Search Space, Puzzles, Complexity.

I. INTRODUCTION Sudoku grid, Sx,y, is a z×z array subdivided into z minigrids of size x×y (where z = xy); the values 1,..., z are contained within the array in such a way that each value occurs exactly once in every row, column and mini-grid. Sx,y consists of y bands, each composed of x horizontallyconsecutive mini-grids, and x stacks, each composed of y vertically-consecutive mini-grids. Each x×y mini-grid possesses x sub-rows, or tiers, and y sub-columns, or pillars. (Note that for the case where x = y the mini-grid is square and thus possesses two diagonals.) The word Sudoku is generally used to represent the structure S3,3 with the properties described above.

A

Fig. 1: An Example Sudoku Puzzle and its Solution [1 ]

Manuscript received March 3, 2011. S. K. Jones, S. Perkins and P. A. Roach are with the Faculty of Advanced Technology, Department of Computing and Mathematical Sciences, University of Glamorgan, Pontypridd, Mid Glamorgan, CF37 1DL, United Kingdom (phone: +44 (0)1443 48 3486; e-mail: skjones@ glam.ac.uk).

A Sudoku puzzle contains an incomplete assignment of values to a grid, with the goal of the puzzle being to complete the assignment. The values shown in the puzzle are referred to as givens and are unmovable. An example Sudoku puzzle, and its solution is presented in Fig. 1. The puzzle is solved when all remaining assignments of missing values have been made in such a way as to satisfy the constraints, by hand using logic or by using automated solvers. When solving Sudoku puzzles a wide range of types of reasoning may be required to find the solution of a puzzle [2]. The terminology in [3] for Latin squares is applied here, so that a Sudoku puzzle is strongly completable if it can be solved by logic alone and weakly completable if it requires some degree of search (trial and error). Sufficient givens should be provided in the Sudoku puzzle so as to specify a unique solution to the puzzle (i.e. the puzzle is proper or well-formed), though this is not necessarily always the case in published puzzles. Proper puzzles may be either strongly or weakly completable. Puzzles are typically categorized in terms of the difficulty in completing them by hand, with the use of four or five rating levels being common. Terms such as ‘easy’, ‘medium’ and ‘hard’ (and even ‘tough’ and ‘diabolical’) are commonly used. These rating levels are subjective and so the actual ranges and labels of the difficulty rating can vary greatly. In Sudoku, the relationship between the number of givens and puzzle complexity is not simple [4]. It has been reported in [4] that the positioning of the givens is a more important determinant of their merit as ‘hints’ than their quantity. Automated solution algorithms may be used to estimate the difficulty for human solvers to find a solution to a Sudoku puzzle. In [5], it was shown that polynomial-time propositional satisfiability (SAT) inference techniques provide a way to efficiently differentiate between Sudoku puzzles according to their difficulty, by analyzing which resolution technique solves a given puzzle; [6] also gives a difficulty metric which provides an effective measure for predicting human difficulty. Since constraint based approaches may effectively mimic the methods that a human solver would use to solve a Sudoku puzzle by hand, some algorithms have been developed as a tool used to assist human solvers [7]. Additionally, constraint based methods, such as those for graph colouring methods for solving Sudoku puzzles [8], may be used to produce solving strategies that would be of interest to human solvers. As an additional theme to the investigation presented in this paper, consideration will be given to the effect on complexity due to adding further constraints to Sudoku grids. A Δ-Quasi-Magic Sudoku grid is a S3,3 with the additional constraints that the values contained in the tiers, pillars and diagonals of the mini-grids sum to an integer in

20 the interval [15 − Δ, 15 + Δ] for Δ {0, . . . , 9}. The affect that such additional constraints have on the complexity of a puzzle is examined by comparing the number of distinct states in the search space for 2-Quasi-Magic Sudoku puzzles to those for Sudoku puzzles. Section II examines the complexity of Sudoku puzzles, and describes three formulations for constructing intermediate states in the search space of an automated Sudoku solver (subsections A, B and D). The numbers of distinct states in the search spaces for these different formulations are compared and related to both puzzle rating and the number of puzzle givens (subsections C and E). Concluding remarks are presented in Section III.

II. COMPLEXITY The ratings are indications of puzzle complexity. The ratings will be compared in this paper to numerical measure of complexity. This will establish whether numerical measures might act as predictors of puzzle rating. The methods usually considered for determining puzzle complexity fall into three categories: the time taken by standard constraint based solvers; the measure of the time taken by groups of human solvers to arrive at the solution; and more recently determining the difficulty of puzzles by classifying the types of reasoning required to solve the puzzle [9]. Since measures of complexity are subjective the actual ranges and labels of difficulty ratings can vary greatly. It is shown here that the number of givens in the puzzle and the ratings assigned by the authors of the puzzle are not directly correlated. A test set of 100 different Sudoku puzzles, which varied in both level of difficulty and number of givens, was constructed. This test set includes some puzzles with very hard ratings and a puzzle claimed to be the hardest Sudoku puzzle (called AI-Escargot and developed by Arto Inkala [10]). For this test set, the difficulty rating and the number of non-given values were compared. The result of this comparison is given in Fig. 2.

measured by considering the number of distinct states in its search space. The puzzle is used to construct an initial state at the root of the search space. Each branch of the space represents a sequence of ‘moves’ leading from the initial state; each move produces an intermediate state. Solving a puzzle is equivalent to locating the solution within a branch of the search space. The intermediate states may be produced in a number of formulations, three of which are considered in this section. The givens in the initial state of the puzzles already meet the problem constraints, that there is no repetition of the values 1,..., 9 in rows, columns or in mini-grids and it is known a priori which values are missing from the initial grid (i.e. how many of each value); the total number of missing values is referred to as n throughout. A. Formulation One The first formulation of an intermediate state used for creating the search space of Sudoku puzzle solutions is given in Construction 1. Construction 1. Take the n non-given values for the puzzle. Place these values in any order in the empty cells. A move consists of permuting any two non-given values in the entire grid. Lemma 2. If an intermediate state is formulated using Construction 1, and the number of non-given cells in the puzzles is n then the number of different states in the search space is n!. Proof. There are n! ways of ordering n objects.



Lemma 3. If an intermediate state is formulated using Construction 1, and the number of non-given cells in this puzzles is n then the number of different states in the neighbourhood is . Proof. By the Handshake Lemma [12].



B. Formulation Two The second formulation used for creating the intermediate state of Sudoku puzzle solutions is given in Construction 4. A Sudoku grid consists of nine 9-tuples. The nine 9-tuples correspond to the 9 cells in either the nine rows, the nine columns or the nine mini-grids, and the number of empty cells in the ith 9-tuple is ni. Relating the terms ni to the total number of non-given cells n, Fig. 2: A Comparison of Ratings and Number of Non-Given Cells

Although there is a general trend between the number of non-given values and the difficulty rating of the puzzle, Fig. 2 shows that the number of givens alone is not sufficient in predicting the difficulty rating of the puzzle. It is suggested by the current authors in [4] that the positioning of the givens is a more important determinant of their merit as ‘hints’ than their quantity. In the following sections, the complexity of a puzzle is

Construction 4. For each 9-tuple take each of the values 1,..., 9 and remove any value which resides as a given in that 9-tuple. Place the remaining values in any order in the empty cells of the 9-tuple. A move consists of permuting any two non-given values within one of the defined 9tuples.

21

Lemma 5. If an intermediate state is formulated using Construction 4, and the number of non-given values in the ith 9-tuple in the puzzle is ni then the number of different states in the search space is equal to

Proof. For each 9-tuple there are ni! ways of arranging the ni values. Since each 9-tuple is considered independently, the number of different states in the search space is equal to the product of these. □ The search spaces produced using the formulation described in Construction 4 are different if the 9-tuples are defined as rows, columns or mini-grids. If the givens in the Sudoku puzzle have positional diagonal symmetry (symmetry in the given values is a necessary condition for Sudoku puzzles published by Nikoli [11]) then the number of different states in the search space for a formulation using Construction 4 is equal for 9-tuples defined as rows or columns.

Lemma 6. If an intermediate state is formulated using Construction 4, and the number of non-given values in a 9tuple in the puzzle is ni then the number of different states in the neighbourhood is given by

Proof. The number of different states in the neighbourhood for each 9-tuple is by the Handshake Lemma [12]. Since each 9-tuple is independent, the number of different states in the neighbourhood is equal to the sum of these. □ C. A Comparison of Formulations One and Two If an intermediate state is formulated using Construction 1, the correct number of each value is guaranteed to be contained within the grid. However, any solution in the search space may violate the constraints on the rows, columns and/or mini-grids. If an intermediate state is formed using Construction 4, the correct number of each value is again guaranteed to be contained within the grid. However, any solution in the search space is also guaranteed to satisfy the constraints of the 9-tuples, be they defined on rows, columns or mini-grids. By comparing the number of different states in the search spaces, the puzzle complexity of Sudoku puzzles using Constructions 1 and 4 may be compared. By comparing Lemmas 2 and 5 it can be seen [13] that by using Construction 4 instead of Construction 1 the number of different states in the search space can be reduced by a number of states equal to

By comparing Lemmas 3 and 5 it can be seen [13] that by

satisfying more of the problem constraints in the formulation itself, the number of different states in the neighbourhood is reduced by a number of states equal to

D. Formulation Three The third formulation used for creating the intermediate state of Sudoku puzzle solutions is given in Construction 7. Construction 7. For each cell in a Sudoku puzzle which does not contain a given value, create a set by taking the values 1,..., 9 and removing from it those values which appear as a given in the same row, column or mini-grid as that cell. Place no values within the remaining cells, but allow any cell to be allocated with any value from its corresponding set. A move requires assigning to a specific non-given cell in the grid, a value from its corresponding set. If an intermediate state is formulated using Construction 7, a solution in the search space may contain more than nine of a specific value and may contain repetitions of values in rows, columns or mini-grids. However there are no violations on the given values in the puzzle. Lemma 8. If an intermediate state is formulated using Construction 7, and if Cx is the set containing the candidate values for each empty cell x then the number of different states in the search space is equal to:

Proof. For each empty cell, x, in a Sudoku puzzle the number of candidate values is equal to the size (cardinality) of the set Cx, |Cx|. Since each cell is considered to be independent of any other cell then the number of different states in the search space is equal to the product of the number of assignments possible for each cell. □ The number of different states in the neighbourhood for Construction 7 varies at each stage in the search space as it is related to the number of givens in the same row, column or mini-grid as the cell as well as to the number of values which had previously been assigned to this cell. Once a value has been assigned to a non-given cell it is temporarily regarded as a ‘given’ value, and the neighbourhood is altered accordingly. The method continues until all values have been assigned, backtracking and changing some assigned values as appropriate until a solution is found. E. Analysis An example Sudoku puzzle, with 27 givens, is shown in Fig. 3.

22

Fig. 4: Comparison Between Number of Distinct States in Search Space and Number of Non-Given Values (Construction 1)

Fig. 3: An Example Sudoku Grid

The measures of complexity for this puzzle (number of distinct states in a search space, and size of neighbourhood) are given in Table I using Constructions 1, 4 and 7. TABLE I COMPARISON OF SEARCH SPACES USING D IFFERENT METHODS OF CONSTRUCTION Constr uction 3 6 (m-g) 6 (row) 6 (col) 9

Number of Distinct States in Search Space 427488328406002556429801375338939964 9690343788366813724672000000000000 10110857908322304000000000 8666449635704832000000000 20221715816644608000000000 673903123672407736320000000

Size of N’hood 1378 131 130 136 variable

By examination of Table I, a comparison may be made of the properties of the search space for Constructions 1, 4 and 7 for the puzzle given in Fig. 3. The number of different states in the search space for Construction 1 is considerably larger than those for Constructions 4 and 7. The smallest number of distinct states is given using Construction 4. The numbers of different states formed using Construction 4 for the cases where the 9-tuples are defined on the rows, columns or mini-grids, differ but only slightly. The number of different states in the neighbourhood for Construction 1 is larger than that for 4, although proportionally (to the search space) the neighbourhood for Construction 4 is larger than that for Construction 1. It can be observed that if the formulation itself results in the satisfaction of more puzzle constraints, the number of distinct states is smaller. The number of different states in the search space was calculated for each puzzle in the test set of Sudoku puzzles, for each of the state formulations. The results are compared to both the number of givens (Figs. 4, 5 and 6) and the difficulty of the puzzles (Figs. 7, 8 and 9). Figs. 4, 5 and 6 compare the average number of different states in the search space for groupings of puzzles categorized by the number of non-given values (empty cells) in the puzzle for Constructions 1, 4 and 7 respectively. In Fig. 5 the 9-tuples are defined on rows, columns and mini-grids, with results given for each.

Fig. 5: Comparison Between Number of Distinct States in Search Space and Number of Non-Given Values (Construction 4)

Fig. 6: Comparison Between Number of Distinct States in Search Space and Number of Non-Given Values (Construction 7)

Figs. 4, 5 and 6 show, for at least this test set, that the number of givens is proportional to the number of different states in the search space regardless of whether Construction 1, 4 or 7 is used. Fig. 7, 8 and 9 compare the average number of different states in the search space for groupings of puzzles categorized by the stated difficulty rating of the puzzles for Constructions 1, 4 and 7 respectively.

23 Magic Sudoku puzzles is equivalent to that of Sudoku puzzles for these formulations. 2-Quasi-Magic Sudoku puzzles are not commonly published, but a test set of 180 puzzles was assembled from [14] and [15]. Since no consistent measure of puzzle complexity was generally available for the puzzles of the test set, the results are categorized only in terms of the numbers of givens, with the largest being 18, and the smallest 4. TABLE II COMPARISON OF SEARCH SPACES USING D IFFERENT METHODS OF CONSTRUCTION

Fig. 7: Comparison Between Number of Distinct States in Search Space and Difficulty Rating (Construction 1)

Fig. 8: Comparison Between Number of Distinct States in Search Space and Difficulty Rating (Construction 4)

Construction

Sudoku

No. of empty cells Size of Search Space

Mean Median

52.76 55

2-Quasi-Magic Sudoku 70.53 71

3 6 (m-g) 6 (row) 6 (col) 9

Size of N’hood

3 6 (m-g) 6 (row) 6 (col)

8.55×1088 2.22×1033 2.77×1033 2.31×1033 1.25×1043 1430.53 137.43 141.79 137.43

8.08×10110 1.07×1044 1.08×1044 1.08×1044 2.19×1067 2455.57 245.65 244.79 245.23

The average complexity of the puzzles for both the test set of 100 Sudoku puzzles and the test set of 180 2-QuasiMagic Sudoku puzzles are compared in Table II by calculating the number of different states in the search space using the formulations of the puzzle solutions given in Constructions 1, 4 and 7.

III. CONCLUSION

Fig. 9: Comparison Between Number of Distinct States in Search Space and Difficulty Rating (Construction 7)

By examining Figs. 7, 8 and 9 a very slight trend can be seen between stated puzzle difficulty and the number of different states in the search space; however, there is a stronger correlation with the number of givens than with the difficulty rating of the puzzle. Furthermore Fig. 8 shows that the average number of different states in the search space does not differ significantly between the 9-tuples in Construction 4 being defined on rows, columns or minigrids. (The size of the neighbourhood for Construction 7 is variable and thus not measurable in this context.) F. 2-Quasi-Magic Sudoku Since 2-Quasi-Magic Sudoku grids contain the same properties as Sudoku grids, but with the additional constraint that the values in the tiers, pillars and diagonals of the mini-grids sum to an integer in the interval [13, 17], then the consideration of the puzzle complexity for 2-Quasi-

It is not wholly surprising that the number of different states in the search space is so strongly correlated to the number of non-given values, since these formulations are so heavily dependent on the number of givens in the puzzle. These formulations however do indicate that the number of constraints of the puzzle fulfilled by the formulation does affect the number of different states in the search space. Currently published 2-Quasi-Magic Sudoku puzzles tend to have fewer given values than Sudoku puzzles, which would indicate that 2-Quasi-Magic Sudoku grids have larger search spaces than Sudoku puzzles (for these formulations). Table II shows that the search space for 2-Quasi-Magic Sudoku puzzles is considerably larger than that for Sudoku puzzles regardless of the formulation of the puzzle. (Therefore when designing an automatic solver for 2-QuasiMagic Sudoku a more restrictive form of search, such as with pruning rules (implemented in [16]) may be beneficial.) As can be seen from Figs. 7, 8 and 9, the numerical measure provided by the number of different states in the search space is related to the difficulty ratings of the puzzles. However, the measure alone does not appear to be sufficient to predict difficulty rating.

24 REFERENCES [1]

[2] [3] [4]

[5]

[6]

[7]

[8]

[9] [10]

[11]

[12]

[13]

[14] [15] [16]

G. Royle. Combinatorial concepts with Sudoku I: Symmetry. Available at: http://people.csse.uwa.edu.au/gordon/sudoku/sudokusymmetry.pdf, March 2006. Last Accessed: 30/09/09. S. K. Jones. Solving methods and enumeration of Sudoku. Final year project, University of Glamorgan, 2006. A. D. Keedwell. "Two remarks about Sudoku squares," Mathematical Gazette, vol. 90, pp. 425–430, November 2006. S. K. Jones, P. A. Roach, and S. Perkins. "Construction of heuristics for a search-based approach to solving SuDoku," in M. Bramer, F. Coenen, and M. Petridis, editors, Research and Development in Intelligent Systems XXIV: Proceedings of AI-2007, the Twentyseventh SGAI International Conference on Artificial Intelligence, no. 3, pp. 37–49. Springer-Verlag, 2007. I. Lynce and J. Ouaknine. "Sudoku as a SAT problem," in M. Golumbic, F. Hoffman, and S. Zilberstein, editors, Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics, AIMATH 2006, Fort Lauderdale. Springer, 2006. A. Leone, D. Mills, and P. Vaswani. Sudoku: Bagging a difficulty metric and building up puzzles. Technical report, University of Washington, February 2008. C. Reeson, K. Bayer, B. Y. Choueiry, and K. Huang. "An interactive constraint-based approach to Sudoku, " in Twenty-Second AAAI Conference on Artificial Intelligence (AAAI-2007), vol. 2, pp. 1976– 1978, 2007. D. Eppstein. "Nonrepetitive paths and cycles in graphs with application to Sudoku," ACM Computing Research Repository, July 2005. cs.DS/0507053. B. Hayes. "Unwed numbers," American Scientist, vol. 94, no. 1, pp. 12–15, 2006. A. Inkala. AI Escargot - the most difficult Sudoku puzzle. Available at: http://zonkedyak.blogspot.com/2006/11/worlds-hardest-sudokupuzzle-al.html, 2007. Last Accessed: 29/09/09. Nikoli Co, Ltd. Sudoku history. Available at: http://www.nikoli.co.jp/en/puzzles/sudoku/index_text.htm, 2007. Last Accessed: 16/09/09. M. Bona. A Walk Through Combinatorics: An Introduction to Enumeration and Graph Theory. World Scientific Publishing Co. Pte. Ltd., 2nd edition, 2006. S. K. Jones. On the Enumeration of Sudoku and Similar Combinatorial Structures. PhD Thesis, University of Glamorgan. 2011. T. Forbes. "Quasi-Magic Sudoku puzzles," M500, vol. 215, pp.1–10, April 2007. T. Forbes. Sudoku puzzles. Available at: http://anthony.d.forbes. googlepages.com/sudoku.htm, 2007. Last Accessed: 01/06/08. P. A. Roach, I. J. Grimstead, S. K. Jones, and S. Perkins. "A knowledge-rich approach to the rapid enumeration of Quasi-Magic Sudoku search spaces," in J. Filipe, A. Fred, and B. Sharp, editors, Proceedings of ICAART 2009, the 1st International Conference on Agents and Artificial Intelligence, pp. 246–254, Porto, Portugal, January 2009.