Add AVX implementation of graphene_simd4f_madd() #269

All supported compilers define `__AVX__` when building with the AVX instruction set enabled.

AVX introduced the _mm_fmadd_ps() intrinsic, so we can use it if AVX (or an equivalent instruction set) is available when building Graphene. There is no functional difference in this commit if AVX is not available, except that we moved from a generic static inline implementation to a SIMD-specific one.