Mentions légales du service

Skip to content
Snippets Groups Projects

This repository contains some code to try to compute all possible shuffle-square-free words over an alphabet of some size.

The main program is ./sat_out.py and it is used as follows:

./sat_out.py NLETTERS CACHEFILE METHOD ORDER

where:

  • NLETTERS is the number of letters of the alphabet: for NLETTERS=3 the code will quickly terminate, for NLETTERS=4 it is open whether it does
  • CACHEFILE is the name of a file which will be used as cache of the tests of whether a word is a shuffle square or not, i.e., the file will be read at startup to populate an array, and each time that a word is tested to be a shuffle square or not, the result will be appended to that file. The file should already exist: you can initialize it as an empty file by running: "touch CACHEFILE"
  • METHOD is the algorithm to use to test whether some word is a shuffle square, possible values are:
    • "naive" (default), performs a naive exponential bruteforce algorithm (no dependencies)
    • all other methods solve the problem by a SAT encoding performed in shuffle.py, and correspond to an engine
      • "pycosat", the Pycosat engine (should be installed with python bindings)
      • "cadical", using Cadical (should be compiled as ../cadical/build/cadical)
      • "glucose", using Glucose (should be compiled as ../glucose-syrup-4.1/parallel/glucose-syrup)
      • "glucose.sh", using a wrapper script glucose.sh to work around a problem with Glucose. Specifically, Glucose sometimes hangs on an input and fails to terminate but has no high CPU usage. The wrapper script detects this and restarts glucose. This is the most efficient option on large instances.
  • ORDER controls the exploration order:
    • random (see below)
    • counts (see below)
    • anything else (default, see below)

The algorithm is that the program will simply explore all possible words in a DFS fashion, trying at each step to add a letter: it either tries each possible letter in order (default), or at random (random), or depending on the number of occurrences of each letter (count). Whenever we try a new word, the program consider each suffix (in increasing length) and checks if it is a shuffle square. The program doesn't do the test for one suffix if the suffix doesn't contain an even number of occurrences of each letter (as it is then guaranteed not to be a shuffle square).

The repository also contains a bruteforce implementation in C++ called test.cpp, which is faster than the naive Python version, but eventually slower than using SAT solvers.

We have run the program on an alphabet of size four with Glucose for around four weeks, but it did not terminate; suggesting that there are either infinitely many shuffle-square-free words on an alphabet of size 4 or that there are a large number of such words. The longest such word found by the program when we stopped it was the following 320-letter word:

abacabadabacabcdabadbcabacbdabacadabcbacdbacabadbcbabdacabadbacbdbcacdabdcabacbdabacdbcbdcabdbadcadbdcdabcdcabadbcbdabacabcdbadcdbcabdbacadacbdadcbabdcacbdbcdabcbadabdcbabcdadcdbcacbadacabcadbabdcacbcdadbdabcbabdcbcdabacdcbdcdbcacbadabdcdadbdacabadbcbabcdcbcacbcdcbadbdacadabadacabdbacadcbabcdbcbdacbcabdbcbadcabcdabadcb

We provide in cache_glucosesh the cache of all the tests for shuffle-square-ness that were performed during this run, so this file can be used to resume the computation where we left it.

Details of the main python script

The main file is shuffle.py which contains poorly commented code to generate clauses and to test if they are satisfiable calling different sat solvers. Some functions of interest:

  • The function _get_clauses generates the clauses.
  • The function word_to_DIMACS_str generates clause to the DIMACS format (returns a str).
  • The function solve solve the shuffle square problem for its input