PAPERMAN Charles
sat

Repository



This repository contains some code to try to compute all possible
shuffle-square-free words over an alphabet of some size.
The main program is ./sat_out.py and it is used as follows:
./sat_out.py NLETTERS CACHEFILE METHOD ORDER
where:

NLETTERS is the number of letters of the alphabet: for NLETTERS=3 the code
will quickly terminate, for NLETTERS=4 it is open whether it does
CACHEFILE is the name of a file which will be used as cache of the tests of
whether a word is a shuffle square or not, i.e., the file will be read at
startup to populate an array, and each time that a word is tested to be a
shuffle square or not, the result will be appended to that file. The file
should already exist: you can initialize it as an empty file by running:
"touch CACHEFILE"
METHOD is the algorithm to use to test whether some word is a shuffle square,
possible values are:

"naive" (default), performs a naive exponential bruteforce algorithm (no
dependencies)
all other methods solve the problem by a SAT encoding performed in
shuffle.py, and correspond to an engine

"pycosat", the Pycosat engine (should be installed with python bindings)
"cadical", using Cadical (should be compiled as ../cadical/build/cadical)
"glucose", using Glucose (should be compiled as
../glucose-syrup-4.1/parallel/glucose-syrup)
"glucose.sh", using a wrapper script glucose.sh to work around a problem
with Glucose. Specifically, Glucose sometimes hangs on an input and fails
to terminate but has no high CPU usage. The wrapper script detects this
and restarts glucose. This is the most efficient option on large
instances.


ORDER controls the exploration order:

random (see below)
counts (see below)
anything else (default, see below)


The algorithm is that the program will simply explore all possible words in a
DFS fashion, trying at each step to add a letter: it either tries each possible
letter in order (default), or at random (random), or depending on the number of
occurrences of each letter (count). Whenever we try a new word, the program
consider each suffix (in increasing length) and checks if it is a shuffle
square. The program doesn't do the test for one suffix if the suffix doesn't
contain an even number of occurrences of each letter (as it is then guaranteed
not to be a shuffle square).
The repository also contains a bruteforce implementation in C++ called test.cpp,
which is faster than the naive Python version, but eventually slower than using
SAT solvers.
We have run the program on an alphabet of size four with Glucose for around four
weeks, but it did not terminate; suggesting that there are either infinitely
many shuffle-square-free words on an alphabet of size 4 or that there are a
large number of such words. The longest such word found by the program when we
stopped it was the following 320-letter word:
abacabadabacabcdabadbcabacbdabacadabcbacdbacabadbcbabdacabadbacbdbcacdabdcabacbdabacdbcbdcabdbadcadbdcdabcdcabadbcbdabacabcdbadcdbcabdbacadacbdadcbabdcacbdbcdabcbadabdcbabcdadcdbcacbadacabcadbabdcacbcdadbdabcbabdcbcdabacdcbdcdbcacbadabdcdadbdacabadbcbabcdcbcacbcdcbadbdacadabadacabdbacadcbabcdbcbdacbcabdbcbadcabcdabadcb
We provide in cache_glucosesh the cache of all the tests for shuffle-square-ness
that were performed during this run, so this file can be used to resume the
computation where we left it.

Details of the main python script
The main file is shuffle.py which contains poorly commented code
to generate clauses and to test if they are satisfiable calling
different sat solvers.
Some functions of interest:

The function _get_clauses generates the clauses.
The function word_to_DIMACS_str generates clause to the DIMACS format (returns a str).
The function solve solve the shuffle square problem for its input