New implementation for ftrsm and ftrmm: based on the multicascade algorithm (cf C Pernet pdf thesis), reducing the number of modular reduction for the updates. Automatic generation of the code for each of the 48 variations.