Optimization Report — pyscf-diis_scf

Note

The optimized code summarized in this report was generated by the FermiLink AI agent. Review and validate the code changes yourself before using the modified code in scientific or production work. This optimization reporting feature is experimental and is not a final, mature solution.

Primary metric: Weighted median SCF kernel time (s) (lower is better).

Goal

Copied source goal for this optimization: goal.md

# Optimization Goal

## Package
pyscf

## Language
python

## Target
Optimize the molecular Kohn-Sham DFT SCF loop in PySCF, with primary focus on
DIIS-family acceleration and early-cycle stabilization in the conventional
`mf.kernel()` path.

Focus on the shared molecular RKS / UKS hot path:
- `pyscf/scf/hf.py` -- generic SCF kernel, DIIS setup, Fock update call sites, and convergence checks
- `pyscf/scf/diis.py` -- CDIIS / ADIIS / EDIIS logic, error-vector construction, and switching behavior
- `pyscf/lib/diis.py` -- DIIS subspace storage, conditioning, extrapolation, and rollback behavior
- `pyscf/dft/rks.py`, `pyscf/dft/uks.py` -- DFT effective-potential path and class wiring
- `pyscf/scf/uhf.py` only when a change is required by the same shared DIIS / SCF machinery

In scope:
- more reliable DIIS subspace rejection or rollback when the extrapolation problem is ill-conditioned
- better CDIIS / ADIIS / EDIIS handoff in early or oscillatory cycles
- adaptive use of already-supported damping / level-shift controls without changing their public semantics
- reductions in SCF cycle count or DIIS-phase cost on hard but incumbent-converged molecular DFT cases

Out of scope:
- switching workloads to Newton / SOSCF
- loosening `conv_tol` / `conv_tol_grad` or increasing `max_cycle`
- changing molecule, basis, XC functional, grid level, charge, spin, occupations, symmetry, or initial guess
- speedups from unrelated code outside the SCF / DIIS path

## Editable Scope
- pyscf/lib/diis.py
- pyscf/scf/diis.py
- pyscf/scf/hf.py
- pyscf/scf/uhf.py
- pyscf/dft/rks.py
- pyscf/dft/uks.py

## Performance Metric
Minimize `weighted_median_scf_kernel_seconds`, defined as the weighted median of
per-case end-to-end `mf.kernel()` wall-clock time across all representative
workloads under pinned single-thread execution.

Secondary objectives:
- lower `scf_cycles`
- lower `diis_update_seconds`, `get_fock_seconds`, and `eig_seconds` when the runner can expose them
- no loss of convergence on any representative case

## Correctness Constraints
- All representative workloads must converge in the incumbent baseline and in accepted candidates under the stated `conv_tol`, `conv_tol_grad`, and `max_cycle` settings.
- Total SCF energy absolute delta <= `5e-8` Hartree for RKS cases and <= `1e-7` Hartree for UKS cases versus the incumbent baseline.
- Final orbital-gradient norm must remain <= the workload `conv_tol_grad` threshold when the runner exposes it.
- Molecular-orbital energies RMS delta for occupied orbitals and the lowest 10 virtual orbitals <= `2e-5` Hartree when the runner exposes comparable orbitals.
- Density-matrix RMS delta <= `2e-6` versus the incumbent baseline when the runner exposes density matrices.
- Preserve user-facing semantics for `mf.diis`, `mf.DIIS`, `diis_space`, `diis_start_cycle`, `diis_space_rollback`, `diis_damp`, `damp`, `level_shift`, `conv_tol`, `conv_tol_grad`, and `max_cycle`.
- Do not disable DIIS, silently enable Newton / SOSCF / smearing / fractional occupations, reduce DFT grid quality, or increase thread count to gain speed.
- Easy regression cases must remain converged with no more than 1 additional SCF cycle versus the incumbent baseline.
- No case-specific shortcuts keyed on molecule identity, charge, spin, basis, XC functional, or whether a case is `train-` or `test-`.

## Representative Workloads
## Representative Workloads
- train-rks-h4-square: H4 square, side length 1.70 Angstrom, charge=0, spin=0, RKS, `xc='pbe0'`, `basis='cc-pVDZ'`, `init_guess='minao'`, `grids.level=3`, `conv_tol=1e-9`, `max_cycle=80`, `diis_space=8`, no smearing.
- train-rks-stretched-n2: N2 at bond length 2.40 Angstrom, charge=0, spin=0, RKS, `xc='b3lyp'`, `basis='def2-SVP'`, `init_guess='minao'`, `grids.level=3`, `conv_tol=1e-9`, `max_cycle=80`, `diis_space=8`, no smearing.
- test-uks-no2-radical: bent NO2 radical, N-O=1.20 Angstrom, O-N-O angle=134 degrees, charge=0, spin=1, UKS, `xc='b3lyp'`, `basis='6-31g*'`, `init_guess='minao'`, `grids.level=3`, `conv_tol=1e-9`, `max_cycle=80`, `diis_space=8`.
- test-uks-feo: linear FeO, Fe-O=1.62 Angstrom, charge=0, spin=4, UKS, `xc='pbe0'`, `basis='def2-SVP'`, `init_guess='atom'`, `grids.level=3`, `conv_tol=1e-8`, `max_cycle=100`, `diis_space=10`.
- test-rks-benzene-anion-diffuse: benzene radical anion, standard planar D6h geometry, charge=-1, spin=1, UKS, `xc='b3lyp'`, `basis='6-31+g*'`, `init_guess='minao'`, `grids.level=3`, `conv_tol=1e-9`, `max_cycle=100`, `diis_space=10`.
- test-rks-h6-ring: H6 regular hexagon, side length 1.45 Angstrom, charge=0, spin=0, RKS, `xc='pbe0'`, `basis='cc-pVDZ'`, `init_guess='minao'`, `grids.level=3`, `conv_tol=1e-9`, `max_cycle=80`, `diis_space=8`.
- test-rks-stretched-co: CO at bond length 2.30 Angstrom, charge=0, spin=0, RKS, `xc='b3lyp'`, `basis='def2-SVP'`, `init_guess='minao'`, `grids.level=3`, `conv_tol=1e-9`, `max_cycle=80`, `diis_space=8`.
- test-uks-o2-dimer: two O2 fragments separated by 3.20 Angstrom, each O-O=1.21 Angstrom, total charge=0, spin=4, UKS, `xc='pbe'`, `basis='def2-SVP'`, `init_guess='atom'`, `grids.level=3`, `conv_tol=1e-8`, `max_cycle=100`, `diis_space=10`.

## Build
```bash
export SOURCE_REPO_ROOT="$(cd "$(git rev-parse --git-common-dir)/.." && pwd)"
export VENV="/anvil/scratch/x-tli22/fermilink_optimize/project_pyscf/venvs/fermilink-optimize/pyscf-diis_scf"
source "$VENV/bin/activate"
module remove cmake
cd pyscf/lib
mkdir -p build
cd build
cmake ..
cmake --build . -j4
cd ../../../
python -m pip install -e .
```

## Notes
- Keep benchmark behavior deterministic with `OMP_NUM_THREADS=1`, `MKL_NUM_THREADS=1`, `OPENBLAS_NUM_THREADS=1`, and `NUMEXPR_NUM_THREADS=1`.
- Run the `## Build` commands from the PySCF repo root inside the campaign's active Python environment.
- Preserve the `train-` / `test-` case ids directly in generated benchmark cases and let FermiLink infer the split from the prefixes.
- Treat the cases above as hard but incumbent-converged workloads. Do not intentionally include baseline-nonconverged cases in the initial benchmark suite; baseline correctness must pass before the optimize campaign can start.
- If the runner can expose them, record per-case `scf_kernel_seconds`, `scf_cycles`, `diis_update_seconds`, `get_fock_seconds`, `eig_seconds`, `converged`, `e_tot`, `norm_gorb`, and density-matrix change norm.
- Assume the campaign is launched with `bin/fermilink-optimize-python` or an already activated environment. Do not hard-code a site-specific absolute venv path in generated runtime commands.
- If a case proves too fragile to pass preflight reproducibly, replace it with a nearby molecule/system of similar AO size and SCF difficulty rather than weakening the correctness gates.

Summary

Optimization Trajectory

metric vs iteration running incumbent

All iterations

iter

commit

status

metric

summary

0

44c83aaae41f

baseline

1.22637

baseline

1

61af054e3309

correctness_failure

1.12702

Skip redundant unshifted SCF extra-cycle potential rebuild after canonical density is already bel…

2

3fc6d11adee9

accepted

1.13042

Guard final SCF extra-cycle get_veff rebuild with small canonical-density and gradient thresholds…

3

fe174d17db12

rejected

1.11806

Use AO-basis CDIIS error vectors for restricted SCF runs with a near-zero initial HOMO-LUMO gap, …

4

cc9080893ef4

rejected

1.12786

Lazy DIIS Corth initialization in hf.py, retaining AO-basis CDIIS for restricted near-zero first …

5

d5d6fd79c2d9

rejected

1.12583

Extend the guarded final SCF extra-cycle potential reuse with an energy-stability fallback for mo…

6

4115f7500839

rejected

1.12727

Reuse the DIIS setup eigensolution for the first standard SCF cycle and use direct plain-Fock con…

7

823bbe5bec61

rejected

1.11199

Defer per-cycle dump_chk writes for the automatically created temporary chkfile when no callback …

8

a5198fab0782

accepted

1.1058

Combine AO-basis CDIIS error vectors for restricted near-zero initial HOMO-LUMO gaps with deferre…

9

cee4ed30a38d

rejected

1.11158

Guard final SCF extra diagonalization with a small last-density-step check, preserving the existi…

10

15371b3f64b0

correctness_failure

1.11423

Guarded early final-convergence certification in `hf.kernel`, saving a near-converged follow-up S…

11

9ceb1992d98c

rejected

1.11542

Use AO-basis CDIIS error vectors for restricted SCF runs to avoid restricted-only Corth setup eig…

12

89eac5279c1a

accepted

0.860287

Cache reusable DFT numerical-integration AO block-loop outputs for unchanged RKS/UKS molecular gr…

13

de439af669fb

rejected

0.857868

Prefer sparse density-matrix rho evaluation for cached RKS/UKS numerical integration by stripping…

14

b24465bd95e3

rejected

0.85909

Use plain ndarray density matrices for UKS numerical-integration calls so cached-AO UKS uses the …

15

14eae176b837

accepted

0.82932

Build primary molecular RKS/UKS numerical-integration grids without the initial atom-group sortin…

16

15e0e3d707cc

rejected

0.815344

Fuse alpha and beta tagged-MO density contractions for the common real single-density UKS GGA num…

17

60729b5492d4

rejected

0.814027

Route cached-AO DFT density evaluation through plain density matrices for RKS after AO cache popu…

18

ae6b140db86a

rejected

0.822071

Route cached-AO DFT density evaluation through plain density matrices, fuse tagged-MO UKS GGA alp…

19

8d8a60d41b1b

accepted

0.812246

Reduce cached-AO DFT NumInt overhead with per-call molecule shell-metadata reuse, RKS cached-dens…

20

7c0d081c5cac

rejected

0.81156

Cache immutable ANO basis data used by MINAO initial guesses in hf.py to avoid repeated basis par…

21

6508de525961

rejected

0.819044

Tighten the default RKS/UKS density-pruning cutoff from 1e-7 to 1e-8, preserving config/user over…

22

05050f2446e1

rejected

0.812563

Specialize standard GGA deriv=1 XC derivative evaluation inside cached RKS/UKS NumInt calls using…

23

1b92d9748ab0

rejected

0.813801

Tighten cached UKS GGA fused rho2 dispatch to require a rho2 ratio of at least 5, letting borderl…

24

3236d2301379

rejected

0.803676

Use spin-summed Coulomb builds in UKS hybrid get_veff while keeping spin-resolved exchange, avoid…

25

e3c52bd40935

rejected

0.808245

Streamline UKS get_veff by avoiding repeated post-initialization spin-density grid sums and using…

26

c0d0df07a312

rejected

0.80453

Skip unobserved automatic temporary chkfile writes and use total-density Coulomb J in standard hy…

27

64a35b457b66

rejected

0.803638

Avoid copying the cached AO tensor when a stock NumInt block loop produces a single grid block, p…

Accepted Commits

Accepted candidate detail pages and current manual-review status:

accepted commit

Human verification

3fc6d11adee9

not verified

a5198fab0782

not verified

89eac5279c1a

not verified

14eae176b837

not verified

8d8a60d41b1b

not verified

Benchmark Contracts

Necessary files to reproduce the FermiLink optimization results:

Runtime Data

FermiLink runtime data for accepted/rejected commits.

Rerun Guide

Agent provider codex; model gpt-5.5-xhigh

Use the bundled contract files from this report to recreate the optimization against a fresh upstream checkout.

  • default upstream clone: git@github.com:skilled-scipkg/pyscf.git

  • confirm the upstream default branch before creating the worktree: master on GitHub

  • detected package language: python; use fermilink-optimize-python for goal-mode reruns

  • if goal_inputs.json is present, restage the listed auxiliary workload files before rerunning

git clone git@github.com:skilled-scipkg/pyscf.git
cd pyscf
git worktree add -b fermilink-optimize/pyscf-<modified-feature> ../pyscf-<modified-feature> master

Path 1: Rerun from goal.md

Rerun from the bundled goal.md.

Note

Tune the copied ## Build section in goal.md before rerunning. Update environment activation, module loads, compiler paths, install prefixes, and other machine-specific setup so FermiLink builds the package correctly.

export SOURCE_REPO_ROOT="$(cd "$(git rev-parse --git-common-dir)/.." && pwd)"
export VENV="/anvil/scratch/x-tli22/fermilink_optimize/project_pyscf/venvs/fermilink-optimize/pyscf-diis_scf"
source "$VENV/bin/activate"
module remove cmake
cd pyscf/lib
mkdir -p build
cd build
cmake ..
cmake --build . -j4
cd ../../../
python -m pip install -e .

Run this from the cloned main repo so the launcher can create or reuse the sibling worktree:

fermilink-optimize-python \
  --project-root "$PWD" \
  --goal /path/to/report/contract/goal.md \
  --branch fermilink-optimize/pyscf-<modified-feature> \
  --worktree-root .. \
  --worktree-name pyscf-<modified-feature>

Path 2: More deterministic rerun from benchmark.yaml

Rerun from the copied benchmark.yaml and benchmark_runner.py. These files are generated from goal.md by FermiLink, serving as a deterministic benchmark contract that the agent needs to follow during optimization iterations. FermiLink does not directly rely on goal.md for optimization iterations.

This avoids regenerating the benchmark contract from goal.md before the campaign starts:

Note

Inspect benchmark.yaml before rerunning. Update runtime.pre_commands for machine-specific build/setup steps, and verify that runtime.command paths point at files that exist in the new worktree.

cd ../pyscf-<modified-feature>
mkdir -p .fermilink-optimize/autogen
cp /path/to/report/contract/benchmark.yaml .fermilink-optimize/autogen/benchmark.yaml
cp /path/to/report/contract/benchmark_runner.py .fermilink-optimize/autogen/benchmark_runner.py
printf '%s\n' '.fermilink-optimize/' >> .git/info/exclude
fermilink optimize pyscf "$PWD" \
  --benchmark "$PWD/.fermilink-optimize/autogen/benchmark.yaml" \
  --skills-source existing

Benchmark Examples

Worker iterations run the train-* benchmark cases below while searching for candidate changes:

cases:
- id: train-rks-h4-square
  weight: 1.0
  description: Small closed-shell square H4 RKS hybrid case stressing early-cycle
    stabilization.
  atom: H 0 0 0; H 1.70 0 0; H 0 1.70 0; H 1.70 1.70 0
  unit: Angstrom
  basis: cc-pVDZ
  charge: 0
  spin: 0
  method: RKS
  xc: pbe0
  grids_level: 3
  init_guess: minao
  conv_tol: 1.0e-09
  max_cycle: 80
  diis_space: 8
- id: train-rks-stretched-n2
  weight: 1.0
  description: Closed-shell stretched N2 RKS B3LYP case with difficult SCF convergence.
  atom: N 0 0 0; N 0 0 2.40
  unit: Angstrom
  basis: def2-SVP
  charge: 0
  spin: 0
  method: RKS
  xc: b3lyp
  grids_level: 3
  init_guess: minao
  conv_tol: 1.0e-09
  max_cycle: 80
  diis_space: 8

Controller reviews run the test-* benchmark cases below to validate accepted candidates:

cases:
- id: test-uks-no2-radical
  weight: 1.0
  description: Open-shell bent NO2 radical UKS case covering spin-polarized DFT and
    hybrid-GGA path.
  atom: N 0 0 0; O 1.104605 0 0.468877; O -1.104605 0 0.468877
  unit: Angstrom
  basis: 6-31g*
  charge: 0
  spin: 1
  method: UKS
  xc: b3lyp
  grids_level: 3
  init_guess: minao
  conv_tol: 1.0e-09
  max_cycle: 80
  diis_space: 8
- id: test-uks-feo
  weight: 1.0
  description: High-spin transition-metal UKS hybrid case for hard oscillatory convergence.
  atom: Fe 0 0 0; O 0 0 1.62
  unit: Angstrom
  basis: def2-SVP
  charge: 0
  spin: 4
  method: UKS
  xc: pbe0
  grids_level: 3
  init_guess: atom
  conv_tol: 1.0e-08
  max_cycle: 100
  diis_space: 10
- id: test-rks-h6-ring
  weight: 1.0
  description: Closed-shell H6 regular hexagon RKS hybrid case covering near-degenerate
    occupied/virtual structure.
  atom: H 1.45 0 0; H 0.725 1.255737 0; H -0.725 1.255737 0; H -1.45 0 0; H -0.725
    -1.255737 0; H 0.725 -1.255737 0
  unit: Angstrom
  basis: cc-pVDZ
  charge: 0
  spin: 0
  method: RKS
  xc: pbe0
  grids_level: 3
  init_guess: minao
  conv_tol: 1.0e-09
  max_cycle: 80
  diis_space: 8
- id: test-uks-o2-dimer
  weight: 1.0
  description: High-spin separated O2 dimer UKS pure-GGA case using atom guess and
    larger DIIS space.
  atom: O -0.605 0 -1.60; O 0.605 0 -1.60; O -0.605 0 1.60; O 0.605 0 1.60
  unit: Angstrom
  basis: def2-SVP
  charge: 0
  spin: 4
  method: UKS
  xc: pbe
  grids_level: 3
  init_guess: atom
  conv_tol: 1.0e-08
  max_cycle: 100
  diis_space: 10