This speed up hard_sigmoid and ultra_fast_sigmoid by 2x. We ignore input that are broadcasted scalar when checking for fortran inputs.