Nexus
We introduce Nexus, a novel optimizer that achieves better downstream generalization at the same pretraining loss by aligning gradients and encouraging convergence to shared minima.
We introduce Nexus, a novel optimizer that achieves better downstream generalization at the same pretraining loss by aligning gradients and encouraging convergence to shared minima.