mirror of
https://github.com/karpathy/nanochat.git
synced 2026-03-22 12:53:26 +00:00
- Added a reset_parameters method in MoEFeedForward to reinitialize expert parameters. - Updated the GPT class to call reset_parameters for MoEFeedForward instances during weight initialization. - Introduced a new test in test_moe.py to validate gradient updates for MoE experts, ensuring proper functionality during training. |
||
|---|---|---|
| .. | ||
| test_moe.py | ||
| test_rustbpe.py | ||