Default number of threads does not count cores
When running on a hyperthreaded system, chameleon's timing examples uses by default as many threads as logical CPUs, not cores, resulting in overloading which is detrimental to performance.
This is due to timing.c's get_thread_count() which just calls sysconf().
Is there a reason for not just letting the runtime automatically detect what it prefers? (and if a runtime can not detect itself, the corresponding runtime/ directory could use get_thread_count())