CGBN GPU part 2
This is part 2 of CGBN GPU code (see !22 (merged))
I took a large chunk of time to rewrite the commit history and try to extract code into several self contained commits. Because I started from a very different place and did a complicated refactor/rebase some of the commits are ordered weirdly or contain extra bits of cleanup that were hard to move about. But anyone looking at history should be able to make good sense of what and why happened.
- Adds multiple kernels of various sizes (512, 1024, 1536, 2048, 3072, 4096 bits)
- Adds progress status and ETA in verbose mode (e.g.
Computing 871 bits/call, 74864/144343 (51.9%), ETA 7 + 8 = 15 seconds (~8 ms/curves))
- Roughly half compile time (requires CGBN be updated with https://github.com/NVlabs/CGBN/pull/17)
- Fix bug in overflow detection (previously allowed numbers up to 1024+6 bit instead of 1024-6 bits)
- Always set
ECM_GPU_CURVES_BY_BLOCK=32now that min_cc > 20
- Major code refactoring and cleanup
- Remove arbitrary B1 limit of ~35M (100M bits), Reduced GPU memory usage 8x