• Joined on 2024-05-31
tacit synced commits to refs/pull/93/merge at tacit/nanochat from mirror 2026-01-06 03:33:07 +00:00
ae0bf52529 tune hyperparameters based on overnight sweeps. warmdown_ratio is the biggest free win, increasing 0.2 -> 0.4, and embedding lr can be larger bumping 0.2 -> 0.3
eec0c79563 also add matplotlib dep so that we can have jupyter notebooks
54e59c38ad add notebook on deriving the CORE estimates for the GPT-3 miniseries.
Compare 4 commits »
tacit synced commits to refs/pull/414/merge at tacit/nanochat from mirror 2026-01-06 03:33:06 +00:00
ae0bf52529 tune hyperparameters based on overnight sweeps. warmdown_ratio is the biggest free win, increasing 0.2 -> 0.4, and embedding lr can be larger bumping 0.2 -> 0.3
eec0c79563 also add matplotlib dep so that we can have jupyter notebooks
54e59c38ad add notebook on deriving the CORE estimates for the GPT-3 miniseries.
Compare 4 commits »
tacit synced commits to refs/pull/59/merge at tacit/nanochat from mirror 2026-01-06 03:33:06 +00:00
ae0bf52529 tune hyperparameters based on overnight sweeps. warmdown_ratio is the biggest free win, increasing 0.2 -> 0.4, and embedding lr can be larger bumping 0.2 -> 0.3
eec0c79563 also add matplotlib dep so that we can have jupyter notebooks
54e59c38ad add notebook on deriving the CORE estimates for the GPT-3 miniseries.
9d4c9b786d many small fixes to base_train: reporting ETA, allowing some additional kwarg flexibility, making sure we don't crash when e.g. depth = 11 - we now calculate the closest num_heads that works
Compare 10 commits »
tacit synced commits to refs/pull/412/merge at tacit/nanochat from mirror 2026-01-06 03:33:03 +00:00
ae0bf52529 tune hyperparameters based on overnight sweeps. warmdown_ratio is the biggest free win, increasing 0.2 -> 0.4, and embedding lr can be larger bumping 0.2 -> 0.3
eec0c79563 also add matplotlib dep so that we can have jupyter notebooks
54e59c38ad add notebook on deriving the CORE estimates for the GPT-3 miniseries.
Compare 4 commits »
tacit synced commits to refs/pull/400/merge at tacit/nanochat from mirror 2026-01-06 03:33:02 +00:00
ae0bf52529 tune hyperparameters based on overnight sweeps. warmdown_ratio is the biggest free win, increasing 0.2 -> 0.4, and embedding lr can be larger bumping 0.2 -> 0.3
eec0c79563 also add matplotlib dep so that we can have jupyter notebooks
54e59c38ad add notebook on deriving the CORE estimates for the GPT-3 miniseries.
Compare 4 commits »
tacit synced commits to refs/pull/405/merge at tacit/nanochat from mirror 2026-01-06 03:33:02 +00:00
ae0bf52529 tune hyperparameters based on overnight sweeps. warmdown_ratio is the biggest free win, increasing 0.2 -> 0.4, and embedding lr can be larger bumping 0.2 -> 0.3
eec0c79563 also add matplotlib dep so that we can have jupyter notebooks
54e59c38ad add notebook on deriving the CORE estimates for the GPT-3 miniseries.
Compare 4 commits »
tacit synced commits to refs/pull/40/merge at tacit/nanochat from mirror 2026-01-06 03:33:01 +00:00
ae0bf52529 tune hyperparameters based on overnight sweeps. warmdown_ratio is the biggest free win, increasing 0.2 -> 0.4, and embedding lr can be larger bumping 0.2 -> 0.3
eec0c79563 also add matplotlib dep so that we can have jupyter notebooks
54e59c38ad add notebook on deriving the CORE estimates for the GPT-3 miniseries.
9d4c9b786d many small fixes to base_train: reporting ETA, allowing some additional kwarg flexibility, making sure we don't crash when e.g. depth = 11 - we now calculate the closest num_heads that works
Compare 14 commits »
tacit synced commits to refs/pull/396/merge at tacit/nanochat from mirror 2026-01-06 03:33:00 +00:00
ae0bf52529 tune hyperparameters based on overnight sweeps. warmdown_ratio is the biggest free win, increasing 0.2 -> 0.4, and embedding lr can be larger bumping 0.2 -> 0.3
eec0c79563 also add matplotlib dep so that we can have jupyter notebooks
54e59c38ad add notebook on deriving the CORE estimates for the GPT-3 miniseries.
Compare 4 commits »
tacit synced commits to refs/pull/328/merge at tacit/nanochat from mirror 2026-01-06 03:32:59 +00:00
ae0bf52529 tune hyperparameters based on overnight sweeps. warmdown_ratio is the biggest free win, increasing 0.2 -> 0.4, and embedding lr can be larger bumping 0.2 -> 0.3
eec0c79563 also add matplotlib dep so that we can have jupyter notebooks
54e59c38ad add notebook on deriving the CORE estimates for the GPT-3 miniseries.
9d4c9b786d many small fixes to base_train: reporting ETA, allowing some additional kwarg flexibility, making sure we don't crash when e.g. depth = 11 - we now calculate the closest num_heads that works
Compare 10 commits »
tacit synced commits to refs/pull/324/merge at tacit/nanochat from mirror 2026-01-06 03:32:59 +00:00
ae0bf52529 tune hyperparameters based on overnight sweeps. warmdown_ratio is the biggest free win, increasing 0.2 -> 0.4, and embedding lr can be larger bumping 0.2 -> 0.3
Compare 2 commits »
tacit synced and deleted reference refs/tags/refs/pull/318/merge at tacit/nanochat from mirror 2026-01-06 03:32:58 +00:00
tacit synced and deleted reference refs/tags/refs/pull/386/merge at tacit/nanochat from mirror 2026-01-06 03:32:58 +00:00
tacit synced commits to refs/pull/151/merge at tacit/nanochat from mirror 2026-01-06 03:32:58 +00:00
ae0bf52529 tune hyperparameters based on overnight sweeps. warmdown_ratio is the biggest free win, increasing 0.2 -> 0.4, and embedding lr can be larger bumping 0.2 -> 0.3
eec0c79563 also add matplotlib dep so that we can have jupyter notebooks
54e59c38ad add notebook on deriving the CORE estimates for the GPT-3 miniseries.
Compare 4 commits »
tacit synced and deleted reference refs/tags/refs/pull/3/merge at tacit/nanochat from mirror 2026-01-06 03:32:57 +00:00
tacit synced and deleted reference refs/tags/refs/pull/294/merge at tacit/nanochat from mirror 2026-01-06 03:32:57 +00:00
tacit synced and deleted reference refs/tags/refs/pull/252/merge at tacit/nanochat from mirror 2026-01-06 03:32:56 +00:00
tacit synced and deleted reference refs/tags/refs/pull/159/merge at tacit/nanochat from mirror 2026-01-06 03:32:55 +00:00
tacit synced and deleted reference refs/tags/refs/pull/172/merge at tacit/nanochat from mirror 2026-01-06 03:32:55 +00:00
tacit synced and deleted reference refs/tags/refs/pull/15/merge at tacit/nanochat from mirror 2026-01-06 03:32:53 +00:00
tacit synced commits to refs/pull/324/merge at tacit/nanochat from mirror 2026-01-05 19:22:50 +00:00
eec0c79563 also add matplotlib dep so that we can have jupyter notebooks
54e59c38ad add notebook on deriving the CORE estimates for the GPT-3 miniseries.
e00c73322c Merge branch 'master' into master_nitishpandey04
9d4c9b786d many small fixes to base_train: reporting ETA, allowing some additional kwarg flexibility, making sure we don't crash when e.g. depth = 11 - we now calculate the closest num_heads that works
Compare 10 commits »