Optimization Report — pyscf-davidson

Note

The optimized code summarized in this report was generated by the FermiLink AI agent. Review and validate the code changes yourself before using the modified code in scientific or production work. This optimization reporting feature is experimental and is not a final, mature solution.

Primary metric: Weighted median td kernel wall time (s) (lower is better).

Goal

Copied source goal for this optimization: goal.md

# Optimization Goal

## Package
pyscf

## Language
python

## Target
Optimize the Davidson-style subspace eigensolver used by PySCF TDDFT/TDA, with primary focus on `pyscf/tdscf/_lr_eig.py` and the TD response call sites in `pyscf/tdscf/rhf.py`, `pyscf/tdscf/rks.py`, `pyscf/tdscf/uhf.py`, and `pyscf/tdscf/uks.py`.

Target optimization opportunities include:
- more efficient preconditioner strategy for reduced davidson cycles
- lower-cost projected-subspace construction and update in `eigh`, `eig`, and `real_eig`

Do not treat this as a local Python micro-optimization task. The goal is materially faster TDDFT/TDA eigensolver behavior through better Davidson/subspace algorithm choices.

## Editable Scope
- pyscf/tdscf/_lr_eig.py
- pyscf/tdscf/rhf.py
- pyscf/tdscf/rks.py
- pyscf/tdscf/uhf.py
- pyscf/tdscf/uks.py
- pyscf/lib/linalg_helper.py

## Performance Metric
Minimize end-to-end TDDFT/TDA kernel time.

Primary objective should be weighted median total wall-clock time across all benchmark cases. Secondary objective should be lower Davidson iteration count or fewer matrix-vector applications when the benchmark runner can expose those metrics.

## Correctness Constraints
- Excitation energies absolute delta <= 5e-6 Hartree vs incumbent baseline for every reported root
- Oscillator strengths absolute delta <= 1e-4 for singlet closed-shell cases where the benchmark exposes them
- Exact match of the values of transition dipole moments is not required as gauge change may flip the sign of transition dipoles
- All requested roots must converge, and root ordering should remain consistent with the incumbent baseline
- Do not loosen SCF `conv_tol`, TD solver `conv_tol`, `lindep`, `max_cycle`, `positive_eig_threshold`, `deg_eia_thresh`, `nstates`, or symmetry filtering
- Do not replace TDDFT with TDA/Casida, reduce the number of roots, change functionals/basis sets, or alter DFT grid settings to gain speed
- No case-specific shortcuts keyed on molecule identity, spin state, functional family, or whether the case is train vs test

## Representative Workloads
- train-rks-bp86-casida-benzene: benzene geometry from `examples/2-benchmark/bz.py` but with smaller basis / 6-31g / RKS / `xc='b88,p86'` / `CasidaTDDFT` / singlet / `nstates=12`
- train-rks-b3lyp-tddft-benzene: benzene geometry from `examples/2-benchmark/bz.py` but with smaller basis / 6-31g / RKS / `xc='b3lyp5'` / `TDDFT` / singlet / `nstates=10`
- train-uks-bp86-casida-allyl: allyl radical geometry from `examples/mp/12-dfump2-natorbs.py` but with smaller basis / def2-svp / spin=1 / UKS / `xc='b88,p86'` / `CasidaTDDFT` / `nstates=8`
- test-rks-bp86-casida-benzene-631gss: benzene geometry from `examples/2-benchmark/bz.py` / 6-31g** / RKS / `xc='b88,p86'` / `CasidaTDDFT` / singlet / `nstates=12`
- test-rks-b3lyp-tddft-benzene-631gss: benzene geometry from `examples/2-benchmark/bz.py` / 6-31g** / RKS / `xc='b3lyp5'` / `TDDFT` / singlet / `nstates=10`
- test-uks-bp86-casida-allyl-def2tzvp: allyl radical geometry from `examples/mp/12-dfump2-natorbs.py` / def2-TZVP / spin=1 / UKS / `xc='b88,p86'` / `CasidaTDDFT` / `nstates=8`

## Build
```bash
export SOURCE_REPO_ROOT="$(cd "$(git rev-parse --git-common-dir)/.." && pwd)"
export VENV="/anvil/scratch/x-tli22/fermilink_optimize/project_pyscf/venvs/fermilink-optimize/pyscf-davidson"
source "$VENV/bin/activate"
module remove cmake
cd pyscf/lib
mkdir -p build
cd build
cmake ..
cmake --build . -j4
cd ../../../
python -m pip install -e .
```

## Notes
- Base the benchmark setups on the larger single-machine geometries already shipped in the local PySCF tree:
  - benzene from `examples/2-benchmark/bz.py`
  - allyl radical from `examples/mp/12-dfump2-natorbs.py`
- Prefer a smaller number of materially larger cases over many toy test cases, so the benchmark is dominated by Davidson/subspace work rather than Python overhead or SCF startup noise.
- For DFT cases, mirror the upstream test setup with `dft.radi.ATOM_SPECIFIC_TREUTLER_GRIDS = False` and `mf.grids.prune = None` so the benchmark is dominated by TDDFT/TDA solver behavior instead of grid-noise differences.
- Keep benchmark behavior deterministic across repeated runs.
- If the benchmark runner can expose them, record per-case Davidson iteration count, matrix-vector application count, and total TD kernel wall time.
- Keep all workloads runnable on a single workstation-class machine with BLAS thread counts pinned to 1; prefer increasing molecular size or `nstates` only until TD kernel time clearly dominates SCF time.
- In the generated benchmark YAML, include a top-level split block:
  ```yaml
  split:
    train_case_ids:
      - train-rks-bp86-casida-benzene
      - train-rks-b3lyp-tddft-benzene
      - train-uks-bp86-casida-allyl
  ```

Summary

Optimization Trajectory

metric vs iteration running incumbent

All iterations

iter

commit

status

metric

summary

0

44c83aaae41f

baseline

128.753

baseline

1

b93246c4bf06

accepted

99.5699

Use root-specific Ritz values for LR Davidson residual preconditioning, including vectorized shif…

2

e159c73ee967

accepted

58.7037

Limit real TDDFT Davidson expansion in `_lr_eig.real_eig` to requested roots to avoid non-target …

3

7b1eeb5571f7

rejected

58.6193

Limit symmetric `_lr_eig.eigh` Davidson trial-vector expansion to requested roots while keeping r…

4

f46de8e0dbb1

rejected

59.1828

Cap symmetric LR Davidson expansion to requested roots and add a 1e-3 default TD preconditioner l…

5

ccc6bedc3559

accepted

45.1467

Correct real TDDFT Davidson preconditioning to pass the full lower LR residual block by using `-R…

6

7612d6ba4e26

accepted

37.4126

Cap symmetric `_lr_eig.eigh` Davidson expansion to requested roots to avoid non-target Casida res…

7

c1e86bf88a7c

accepted

31.5845

Limit symmetric `_lr_eig.eigh` residual and preconditioner candidate generation to requested targ…

8

3c057cbab323

rejected

31.47

Pass occupied-only MO coefficient/occupation arrays into DFT TD response kernel cache setup for r…

9

caffdfe9f92b

rejected

31.5343

Add a configurable 0.02 Hartree spectral shift to `_lr_eig.real_eig` correction preconditioning t…

10

64da7449ef1c

accepted

30.0181

Add a 0.05 Hartree real_eig correction preconditioner spectral shift to reduce late-cycle B3LYP T…

11

110904bf4af3

rejected

29.8953

Reduce DFT TD response setup/allocation overhead by using occupied-only response cache inputs and…

12

701a4b52c9e7

rejected

29.9925

Stable-sort threshold-selected RHF/UHF Koopmans TD initial guesses by increasing diagonal gap bef…

13

c2b4290991d4

rejected

36.0526

Add configurable LR correction preconditioner shifts in `_lr_eig.py`: a small `+1e-3` shift for s…

14

933fdb1739c3

rejected

29.9178

Reduce DFT TD response setup/allocation overhead by using occupied-only response-cache inputs, fu…

15

c046c9ec1362

rejected

29.968

Vectorize symmetric `_lr_eig.eigh` correction preconditioning for unconverged target residuals in…

16

2a188b646cff

rejected

30.5035

Taper the accepted real_eig correction preconditioner shift downward for late-stage residuals bel…

Accepted Commits

Accepted candidate detail pages and current manual-review status:

accepted commit

Human verification

b93246c4bf06

not verified

e159c73ee967

not verified

ccc6bedc3559

not verified

7612d6ba4e26

not verified

c1e86bf88a7c

not verified

64da7449ef1c

not verified

Benchmark Contracts

Necessary files to reproduce the FermiLink optimization results:

Runtime Data

FermiLink runtime data for accepted/rejected commits.

Rerun Guide

Agent provider codex; model gpt-5.4-xhigh

Use the bundled contract files from this report to recreate the optimization against a fresh upstream checkout.

  • default upstream clone: git@github.com:skilled-scipkg/pyscf.git

  • confirm the upstream default branch before creating the worktree: master on GitHub

  • detected package language: python; use fermilink-optimize-python for goal-mode reruns

  • if goal_inputs.json is present, restage the listed auxiliary workload files before rerunning

git clone git@github.com:skilled-scipkg/pyscf.git
cd pyscf
git worktree add -b fermilink-optimize/pyscf-<modified-feature> ../pyscf-<modified-feature> master

Path 1: Rerun from goal.md

Rerun from the bundled goal.md.

Note

Tune the copied ## Build section in goal.md before rerunning. Update environment activation, module loads, compiler paths, install prefixes, and other machine-specific setup so FermiLink builds the package correctly.

export SOURCE_REPO_ROOT="$(cd "$(git rev-parse --git-common-dir)/.." && pwd)"
export VENV="/anvil/scratch/x-tli22/fermilink_optimize/project_pyscf/venvs/fermilink-optimize/pyscf-davidson"
source "$VENV/bin/activate"
module remove cmake
cd pyscf/lib
mkdir -p build
cd build
cmake ..
cmake --build . -j4
cd ../../../
python -m pip install -e .

Run this from the cloned main repo so the launcher can create or reuse the sibling worktree:

fermilink-optimize-python \
  --project-root "$PWD" \
  --goal /path/to/report/contract/goal.md \
  --branch fermilink-optimize/pyscf-<modified-feature> \
  --worktree-root .. \
  --worktree-name pyscf-<modified-feature>

Path 2: More deterministic rerun from benchmark.yaml

Rerun from the copied benchmark.yaml and benchmark_runner.py. These files are generated from goal.md by FermiLink, serving as a deterministic benchmark contract that the agent needs to follow during optimization iterations. FermiLink does not directly rely on goal.md for optimization iterations.

This avoids regenerating the benchmark contract from goal.md before the campaign starts:

Note

Inspect benchmark.yaml before rerunning. Update runtime.pre_commands for machine-specific build/setup steps, and verify that runtime.command paths point at files that exist in the new worktree.

cd ../pyscf-<modified-feature>
mkdir -p .fermilink-optimize/autogen
cp /path/to/report/contract/benchmark.yaml .fermilink-optimize/autogen/benchmark.yaml
cp /path/to/report/contract/benchmark_runner.py .fermilink-optimize/autogen/benchmark_runner.py
printf '%s\n' '.fermilink-optimize/' >> .git/info/exclude
fermilink optimize pyscf "$PWD" \
  --benchmark "$PWD/.fermilink-optimize/autogen/benchmark.yaml" \
  --skills-source existing

Benchmark Examples

Worker iterations run the train-* benchmark cases below while searching for candidate changes:

cases:
- id: train-rks-bp86-casida-benzene
  weight: 1.0
  geometry_name: benzene
  geometry_source: examples/2-benchmark/bz.py
  basis: 6-31g
  charge: 0
  spin: 0
  symmetry: false
  scf_method: RKS
  xc: b88,p86
  td_method: CasidaTDDFT
  nstates: 12
  singlet: true
  frozen: null
  wfnsym: null
  scf_conv_tol: 1.0e-10
  td_conv_tol: 1.0e-05
  lindep: 1.0e-12
  max_cycle: 100
  positive_eig_threshold: 0.001
  deg_eia_thresh: 0.001
  max_memory: 4000
  oscillator_strength: true
- id: train-rks-b3lyp-tddft-benzene
  weight: 1.0
  geometry_name: benzene
  geometry_source: examples/2-benchmark/bz.py
  basis: 6-31g
  charge: 0
  spin: 0
  symmetry: false
  scf_method: RKS
  xc: b3lyp5
  td_method: TDDFT
  nstates: 10
  singlet: true
  frozen: null
  wfnsym: null
  scf_conv_tol: 1.0e-10
  td_conv_tol: 1.0e-05
  lindep: 1.0e-12
  max_cycle: 100
  positive_eig_threshold: 0.001
  deg_eia_thresh: 0.001
  max_memory: 4000
  oscillator_strength: true
- id: train-uks-bp86-casida-allyl
  weight: 1.0
  geometry_name: allyl
  geometry_source: examples/mp/12-dfump2-natorbs.py
  basis: def2-svp
  charge: 0
  spin: 1
  symmetry: false
  scf_method: UKS
  xc: b88,p86
  td_method: CasidaTDDFT
  nstates: 8
  singlet: null
  frozen: null
  wfnsym: null
  scf_conv_tol: 1.0e-10
  td_conv_tol: 1.0e-05
  lindep: 1.0e-12
  max_cycle: 100
  positive_eig_threshold: 0.001
  deg_eia_thresh: 0.001
  max_memory: 4000
  oscillator_strength: false

Controller reviews run the test-* benchmark cases below to validate accepted candidates:

cases:
- id: test-rks-bp86-casida-benzene-631gss
  weight: 1.0
  geometry_name: benzene
  geometry_source: examples/2-benchmark/bz.py
  basis: 6-31g**
  charge: 0
  spin: 0
  symmetry: false
  scf_method: RKS
  xc: b88,p86
  td_method: CasidaTDDFT
  nstates: 12
  singlet: true
  frozen: null
  wfnsym: null
  scf_conv_tol: 1.0e-10
  td_conv_tol: 1.0e-05
  lindep: 1.0e-12
  max_cycle: 100
  positive_eig_threshold: 0.001
  deg_eia_thresh: 0.001
  max_memory: 4000
  oscillator_strength: true
- id: test-rks-b3lyp-tddft-benzene-631gss
  weight: 1.0
  geometry_name: benzene
  geometry_source: examples/2-benchmark/bz.py
  basis: 6-31g**
  charge: 0
  spin: 0
  symmetry: false
  scf_method: RKS
  xc: b3lyp5
  td_method: TDDFT
  nstates: 10
  singlet: true
  frozen: null
  wfnsym: null
  scf_conv_tol: 1.0e-10
  td_conv_tol: 1.0e-05
  lindep: 1.0e-12
  max_cycle: 100
  positive_eig_threshold: 0.001
  deg_eia_thresh: 0.001
  max_memory: 4000
  oscillator_strength: true
- id: test-uks-bp86-casida-allyl-def2tzvp
  weight: 1.0
  geometry_name: allyl
  geometry_source: examples/mp/12-dfump2-natorbs.py
  basis: def2-TZVP
  charge: 0
  spin: 1
  symmetry: false
  scf_method: UKS
  xc: b88,p86
  td_method: CasidaTDDFT
  nstates: 8
  singlet: null
  frozen: null
  wfnsym: null
  scf_conv_tol: 1.0e-10
  td_conv_tol: 1.0e-05
  lindep: 1.0e-12
  max_cycle: 100
  positive_eig_threshold: 0.001
  deg_eia_thresh: 0.001
  max_memory: 4000
  oscillator_strength: false