bwc: dispatch fails with large-ish matrix and too few threads
In bwc/balancing_workhorse.cpp
, in function dispatcher::prepare_pass()
, there is an implicit assumption that everything fits on 32-bit integers. More precisely, it is assumed that each core will have less than 2**32 entries. Therefore, dispatching a matrix with more than 2^32
coefficients using a single core fails (with a segfault). For instance, a matrix of size 10M and weight 8G fails with thr=1x1
but works with thr=2x2
.