A good ALU is not "just a MUX of all the functions you desire."
A good ALU reuses as much logic as possible to provide a richer set of opcodes for as little cost as possible, it merges similar operations and combines others, things that automated tools couldn't figure out by themselves.
An optimised ALU not only provides useful opcodes but also reduces gate count, footprint, latency and power consumption, which are required by custom CPUs.