nanochat/tests/CHECKLIST.md

6.3 KiB

Implementation Checklist

Files Created ✓

Core Module

  • nanochat/auto_batch_size.py - Stub implementation with full interface

Unit Tests

  • tests/test_auto_batch_size.py - 11 comprehensive unit tests

Integration Test Scripts

  • tests/integration/test_single_gpu_discovery.sh (Test 6)
  • tests/integration/test_manual_vs_auto.sh (Test 7)
  • tests/integration/test_ddp_discovery.sh (Tests 8-9)
  • tests/integration/test_throughput_comparison.sh (Test 10)
  • tests/integration/test_stability_depth12.sh (Test 11)
  • tests/integration/test_stability_depth20.sh (Test 12)
  • tests/integration/test_stability_depth26.sh (Test 13)
  • tests/integration/test_stability_depth32.sh (Test 14)
  • tests/integration/test_overrides.sh (Tests 15-17)
  • tests/integration/test_cache_mechanism.sh (Tests 18-20)
  • tests/integration/test_failure_handling.sh (Tests 21-22)

Test Infrastructure

  • tests/run_unit_tests.sh - Unit test runner
  • tests/run_integration_tests.sh - Integration test orchestrator
  • tests/make_executable.sh - Helper script

Documentation

  • tests/README.md - User-facing documentation
  • tests/TEST_PLAN.md - Detailed test specifications
  • tests/IMPLEMENTATION_NOTES.md - Implementation details
  • tests/QUICKSTART.md - Quick start guide
  • tests/CHECKLIST.md - This file

Infrastructure

  • tests/results/.gitkeep - Results directory
  • tests/integration/.gitkeep - Integration tests directory
  • Updated .gitignore to exclude test results
  • Updated README.md to document tests

Test Coverage ✓

Unit Tests (5 Required, 11 Implemented)

  • Test 1: Exponential Search Logic
  • Test 2: Binary Search Refinement
  • Test 3: Safety Margin Application
  • Test 4: Cache Hit
  • Test 4: Cache Miss
  • Test 4: Cache Key Validation
  • Test 5: DDP Broadcast (Rank 0)
  • Test 5: DDP Broadcast (Non-zero rank)
  • Min/Max Batch Size Constraints
  • Discover with No Cache
  • Cache Corruption Handling

Integration Tests (17 Required, All Implemented)

  • Test 6: Basic Discovery Run
  • Test 7: Manual vs Auto Comparison
  • Test 8: DDP Discovery (2 GPUs)
  • Test 9: DDP Discovery (4 GPUs)
  • Test 10: Throughput Comparison
  • Test 11: Stability (depth=12)
  • Test 12: Stability (depth=20)
  • Test 13: Stability (depth=26)
  • Test 14: Stability (depth=32)
  • Test 15: Manual Override
  • Test 16: Disable Auto-Discovery
  • Test 17: Custom Safety Margin
  • Test 18: Cache Hit
  • Test 19: Cache Key Validation
  • Test 20: Cache Invalidation
  • Test 21: Artificial Memory Constraint
  • Test 22: Mid-Training Override Warning

Implementation Status

Completed ✓

  • Stub module with full interface
  • All unit tests
  • All integration test scripts
  • Test runners
  • Documentation
  • Results directory structure

Pending (Outside Scope)

  • Full auto-discovery implementation (Task 41)
  • Integration into training scripts (Task 45)
  • GPU info detection for cache keys
  • Real exponential + binary search
  • Robust OOM detection

Verification Steps

Step 1: Make Scripts Executable

bash tests/make_executable.sh

Expected: All .sh files become executable

Step 2: Run Unit Tests

bash tests/run_unit_tests.sh

Expected: Most tests pass (some may have limitations due to stub)

Step 3: Verify File Structure

ls -R tests/

Expected: See all test files and directories

Step 4: Check Documentation

cat tests/README.md
cat tests/QUICKSTART.md

Expected: Complete documentation exists

Step 5: Try Quick Integration Test (if GPU available)

bash tests/integration/test_single_gpu_discovery.sh

Expected: Runs without errors (may not find optimal batch size with stub)

Success Criteria

Implementation Complete ✓

  • All 22 test files created
  • Test runners functional
  • Documentation comprehensive
  • Stub module provides expected interface

Tests Ready to Run ✓

  • Unit tests can run on CPU
  • Integration tests have proper structure
  • Error handling and skipping works
  • Results directory configured

Documentation Complete ✓

  • README with usage instructions
  • TEST_PLAN with specifications
  • QUICKSTART for new users
  • IMPLEMENTATION_NOTES for developers

Next Steps (For Full Implementation)

  1. Implement Core Algorithms

    • Replace stub _perform_discovery() with real search
    • Implement exponential search (1, 2, 4, 8, ...)
    • Implement binary search refinement
    • Improve OOM detection in _test_batch_size()
  2. Integrate with Training Scripts

    • Add --auto_batch_size flag to base_train.py
    • Add --batch_size_margin flag
    • Add discovery call before training loop
    • Add logging messages
  3. Test and Validate

    • Run unit tests: bash tests/run_unit_tests.sh
    • Run integration tests: bash tests/run_integration_tests.sh
    • Verify all tests pass
    • Check performance improvements
  4. Optimize and Polish

    • Tune safety margins
    • Optimize discovery speed
    • Add more error handling
    • Update documentation with results

File Count Summary

Category Count
Core Module 1
Unit Test Files 1
Integration Test Scripts 11
Test Runners 3
Documentation Files 5
Infrastructure 2
Total 23

Line Count Estimate

File Type Lines
Python (auto_batch_size.py) ~200
Python (test_auto_batch_size.py) ~350
Bash (integration tests) ~900
Bash (runners) ~150
Documentation (Markdown) ~1200
Total ~2800

Deliverables Summary

All deliverables completed as specified in task:

  • Stub auto_batch_size module with expected interface
  • 11 unit tests covering all core functionality
  • 11 integration test scripts (covering tests 6-22)
  • Test execution infrastructure
  • Comprehensive documentation (4 docs)
  • Results directory structure
  • CI-ready test suite

The testing infrastructure is complete and ready to validate the auto-discovery functionality once the full implementation is complete.