Mentions légales du service

Skip to content
  • John Mather's avatar
    Implement assembly fallback for __builtin_roundeven on Arm · c9bd263c
    John Mather authored
    Implemented __builtin_roundeven with the corresponding assembly insturction for
    Arm platforms when the compiler doesn't provide __builtin_roundeven.
    
    __builtin_roundeven isn't provided by Apple clang at the moment, so this makes
    a large difference on Apple silicon as it removes 19 instructions. The observed
    performance increase varies from 1.00x to 1.15x on an Apple M1 Ultra.
    
    cosf     4.65580 ns/call ->  4.03543 ns/call (-0.62037 ns) [1.15x]
    coshf    3.84132 ns/call ->  3.68328 ns/call (-0.15804 ns) [1.04x]
    sinf     4.65548 ns/call ->  4.19025 ns/call (-0.46523 ns) [1.11x]
    sinhf    3.99483 ns/call ->  3.68801 ns/call (-0.30682 ns) [1.08x]
    tanf     4.34781 ns/call ->  4.19637 ns/call (-0.15144 ns) [1.04x]
    tgammaf 20.97220 ns/call -> 20.62030 ns/call (-0.35190 ns) [1.02x]
    acos     7.94175 ns/call ->  7.48937 ns/call (-0.45238 ns) [1.06x]
    erfc    17.50430 ns/call -> 17.32990 ns/call (-0.17440 ns) [1.01x]
    exp      4.47247 ns/call ->  4.17019 ns/call (-0.30228 ns) [1.07x]
    exp2     4.08864 ns/call ->  3.77784 ns/call (-0.31080 ns) [1.08x]
    expm1    4.91174 ns/call ->  4.88564 ns/call (-0.02610 ns) [1.01x]
    log1p    5.75928 ns/call ->  5.75774 ns/call (-0.00154 ns) [1.00x]
    pow     15.27340 ns/call -> 14.98090 ns/call (-0.29250 ns) [1.02x]
    c9bd263c