mirror of https://github.com/karpathy/nanochat.git synced 2025-12-16 01:02:18 +00:00

Artemis Git Integration ffdbb9c247 test: add comprehensive test suite for auto-batch-size discovery with unit and integration tests, pytest framework, stability validation, and updated documentation

2025-11-05 16:52:29 +00:00

4.0 KiB

Raw Blame History

Quick Start Guide - Auto-Discovery Tests

TL;DR

# Make scripts executable
bash tests/make_executable.sh

# Run unit tests (10 seconds, no GPU)
bash tests/run_unit_tests.sh

# Run integration tests (30 minutes, requires GPU)
bash tests/run_integration_tests.sh

First Time Setup

Make test scripts executable:
```
bash tests/make_executable.sh
```

Verify environment:

# Check Python/PyTorch
python -c "import torch; print(torch.__version__)"

# Check GPU (if available)
nvidia-smi

Install test dependencies (if not already installed):
```
pip install pytest
```

Running Tests

Unit Tests (Recommended First)

Fast tests that don't require GPU:

bash tests/run_unit_tests.sh

Expected output:

==========================================
Running Unit Tests
==========================================

tests/test_auto_batch_size.py::test_exponential_search PASSED
tests/test_auto_batch_size.py::test_binary_search_refinement PASSED
tests/test_auto_batch_size.py::test_safety_margin PASSED
tests/test_auto_batch_size.py::test_cache_hit PASSED
tests/test_auto_batch_size.py::test_cache_miss PASSED
...

✓ All unit tests passed!

Integration Tests (Requires GPU)

# Standard suite (~30 minutes)
bash tests/run_integration_tests.sh

# Full suite with long stability tests (~2 hours)
RUN_LONG_TESTS=1 bash tests/run_integration_tests.sh

Individual Tests

Run specific integration tests:

# Test basic discovery
bash tests/integration/test_single_gpu_discovery.sh

# Test manual vs auto comparison
bash tests/integration/test_manual_vs_auto.sh

# Test DDP (requires 2+ GPUs)
bash tests/integration/test_ddp_discovery.sh

# Test throughput improvement
bash tests/integration/test_throughput_comparison.sh

# Test caching
bash tests/integration/test_cache_mechanism.sh

Expected Results

Unit Tests

✓ All 11 tests pass
✓ Completes in < 10 seconds
✓ No GPU required

Integration Tests (with full implementation)

✓ Discovery completes in < 30 seconds
✓ Auto batch size > manual batch size
✓ No OOM errors
✓ Throughput improvement ≥ 1.3x
✓ Cache reduces startup time to < 5 seconds

Viewing Results

Test outputs are saved to tests/results/:

# View latest discovery log
cat tests/results/test_single_gpu_discovery.log

# View throughput comparison
cat tests/results/throughput_comparison.json

# List all results
ls -lh tests/results/

Common Issues

"pytest: command not found"

pip install pytest

"Permission denied" when running scripts

bash tests/make_executable.sh

"CUDA out of memory"

Reduce model size in test scripts
Or skip long stability tests (they're optional)

"SKIP: DDP tests require at least 2 GPUs"

Normal if you have only 1 GPU
Tests will automatically skip

Next Steps

Read the docs:
- tests/README.md - Full documentation
- tests/TEST_PLAN.md - Detailed test specifications
- tests/IMPLEMENTATION_NOTES.md - Implementation details
Check implementation status:
- Unit tests should pass with stub implementation
- Integration tests need full implementation
Contribute:
- Add new tests to tests/test_auto_batch_size.py
- Create new integration scripts in tests/integration/
- Update documentation

Questions?

Check tests/README.md for detailed documentation
Look at test logs in tests/results/
Review tests/IMPLEMENTATION_NOTES.md for troubleshooting

Summary of Test Coverage

Category	Count	Time	GPU
Unit Tests	11	10s	No
Single GPU Tests	6	15min	1 GPU
Multi-GPU Tests	2	5min	2+ GPUs
Performance Tests	1	10min	1 GPU
Stability Tests	4	1-2hr	1 GPU
Override Tests	3	10min	1 GPU
Cache Tests	3	10min	1 GPU
Failure Tests	2	10min	1 GPU

Total: 22 tests covering all aspects of auto-discovery functionality.

4.0 KiB Raw Blame History