Optimization Report — pyscf-davidson¶

Note

The optimized code summarized in this report was generated by the FermiLink AI agent. Review and validate the code changes yourself before using the modified code in scientific or production work. This optimization reporting feature is experimental and is not a final, mature solution.

Primary metric: Weighted median td kernel wall time (s) (lower is better).

Goal¶

Copied source goal for this optimization: goal.md

# Optimization Goal

## Package
pyscf

## Language
python

## Target
Optimize the Davidson-style subspace eigensolver used by PySCF TDDFT/TDA, with primary focus on `pyscf/tdscf/_lr_eig.py` and the TD response call sites in `pyscf/tdscf/rhf.py`, `pyscf/tdscf/rks.py`, `pyscf/tdscf/uhf.py`, and `pyscf/tdscf/uks.py`.

Target optimization opportunities include:
- more efficient preconditioner strategy for reduced davidson cycles
- lower-cost projected-subspace construction and update in `eigh`, `eig`, and `real_eig`

Do not treat this as a local Python micro-optimization task. The goal is materially faster TDDFT/TDA eigensolver behavior through better Davidson/subspace algorithm choices.

## Editable Scope
- pyscf/tdscf/_lr_eig.py
- pyscf/tdscf/rhf.py
- pyscf/tdscf/rks.py
- pyscf/tdscf/uhf.py
- pyscf/tdscf/uks.py
- pyscf/lib/linalg_helper.py

## Performance Metric
Minimize end-to-end TDDFT/TDA kernel time.

Primary objective should be weighted median total wall-clock time across all benchmark cases. Secondary objective should be lower Davidson iteration count or fewer matrix-vector applications when the benchmark runner can expose those metrics.

## Correctness Constraints
- Excitation energies absolute delta <= 5e-6 Hartree vs incumbent baseline for every reported root
- Oscillator strengths absolute delta <= 1e-4 for singlet closed-shell cases where the benchmark exposes them
- Exact match of the values of transition dipole moments is not required as gauge change may flip the sign of transition dipoles
- All requested roots must converge, and root ordering should remain consistent with the incumbent baseline
- Do not loosen SCF `conv_tol`, TD solver `conv_tol`, `lindep`, `max_cycle`, `positive_eig_threshold`, `deg_eia_thresh`, `nstates`, or symmetry filtering
- Do not replace TDDFT with TDA/Casida, reduce the number of roots, change functionals/basis sets, or alter DFT grid settings to gain speed
- No case-specific shortcuts keyed on molecule identity, spin state, functional family, or whether the case is train vs test

## Representative Workloads
- train-rks-bp86-casida-benzene: benzene geometry from `examples/2-benchmark/bz.py` but with smaller basis / 6-31g / RKS / `xc='b88,p86'` / `CasidaTDDFT` / singlet / `nstates=12`
- train-rks-b3lyp-tddft-benzene: benzene geometry from `examples/2-benchmark/bz.py` but with smaller basis / 6-31g / RKS / `xc='b3lyp5'` / `TDDFT` / singlet / `nstates=10`
- train-uks-bp86-casida-allyl: allyl radical geometry from `examples/mp/12-dfump2-natorbs.py` but with smaller basis / def2-svp / spin=1 / UKS / `xc='b88,p86'` / `CasidaTDDFT` / `nstates=8`
- test-rks-bp86-casida-benzene-631gss: benzene geometry from `examples/2-benchmark/bz.py` / 6-31g** / RKS / `xc='b88,p86'` / `CasidaTDDFT` / singlet / `nstates=12`
- test-rks-b3lyp-tddft-benzene-631gss: benzene geometry from `examples/2-benchmark/bz.py` / 6-31g** / RKS / `xc='b3lyp5'` / `TDDFT` / singlet / `nstates=10`
- test-uks-bp86-casida-allyl-def2tzvp: allyl radical geometry from `examples/mp/12-dfump2-natorbs.py` / def2-TZVP / spin=1 / UKS / `xc='b88,p86'` / `CasidaTDDFT` / `nstates=8`

## Build
```bash
export SOURCE_REPO_ROOT="$(cd "$(git rev-parse --git-common-dir)/.." && pwd)"
export VENV="/anvil/scratch/x-tli22/fermilink_optimize/project_pyscf/venvs/fermilink-optimize/pyscf-davidson"
source "$VENV/bin/activate"
module remove cmake
cd pyscf/lib
mkdir -p build
cd build
cmake ..
cmake --build . -j4
cd ../../../
python -m pip install -e .
```

## Notes
- Base the benchmark setups on the larger single-machine geometries already shipped in the local PySCF tree:
  - benzene from `examples/2-benchmark/bz.py`
  - allyl radical from `examples/mp/12-dfump2-natorbs.py`
- Prefer a smaller number of materially larger cases over many toy test cases, so the benchmark is dominated by Davidson/subspace work rather than Python overhead or SCF startup noise.
- For DFT cases, mirror the upstream test setup with `dft.radi.ATOM_SPECIFIC_TREUTLER_GRIDS = False` and `mf.grids.prune = None` so the benchmark is dominated by TDDFT/TDA solver behavior instead of grid-noise differences.
- Keep benchmark behavior deterministic across repeated runs.
- If the benchmark runner can expose them, record per-case Davidson iteration count, matrix-vector application count, and total TD kernel wall time.
- Keep all workloads runnable on a single workstation-class machine with BLAS thread counts pinned to 1; prefer increasing molecular size or `nstates` only until TD kernel time clearly dominates SCF time.
- In the generated benchmark YAML, include a top-level split block:
  ```yaml
  split:
    train_case_ids:
      - train-rks-bp86-casida-benzene
      - train-rks-b3lyp-tddft-benzene
      - train-uks-bp86-casida-allyl
  ```

Summary¶

baseline (44c83aaae41f): 128.753
best accepted (64da7449ef1c): 30.0181 (+76.69% vs baseline)
published GitHub branch: fermilink-optimize/pyscf-davidson
iterations: 17 total | 6 accepted | 10 rejected | 0 correctness failure

Optimization Trajectory¶

All iterations¶

iter	commit	status	metric	summary
0	44c83aaae41f	baseline	128.753	baseline
1	b93246c4bf06	accepted	99.5699	Use root-specific Ritz values for LR Davidson residual preconditioning, including vectorized shif…
2	e159c73ee967	accepted	58.7037	Limit real TDDFT Davidson expansion in `_lr_eig.real_eig` to requested roots to avoid non-target …
3	7b1eeb5571f7	rejected	58.6193	Limit symmetric `_lr_eig.eigh` Davidson trial-vector expansion to requested roots while keeping r…
4	f46de8e0dbb1	rejected	59.1828	Cap symmetric LR Davidson expansion to requested roots and add a 1e-3 default TD preconditioner l…
5	ccc6bedc3559	accepted	45.1467	Correct real TDDFT Davidson preconditioning to pass the full lower LR residual block by using `-R…
6	7612d6ba4e26	accepted	37.4126	Cap symmetric `_lr_eig.eigh` Davidson expansion to requested roots to avoid non-target Casida res…
7	c1e86bf88a7c	accepted	31.5845	Limit symmetric `_lr_eig.eigh` residual and preconditioner candidate generation to requested targ…
8	3c057cbab323	rejected	31.47	Pass occupied-only MO coefficient/occupation arrays into DFT TD response kernel cache setup for r…
9	caffdfe9f92b	rejected	31.5343	Add a configurable 0.02 Hartree spectral shift to `_lr_eig.real_eig` correction preconditioning t…
10	64da7449ef1c	accepted	30.0181	Add a 0.05 Hartree real_eig correction preconditioner spectral shift to reduce late-cycle B3LYP T…
11	110904bf4af3	rejected	29.8953	Reduce DFT TD response setup/allocation overhead by using occupied-only response cache inputs and…
12	701a4b52c9e7	rejected	29.9925	Stable-sort threshold-selected RHF/UHF Koopmans TD initial guesses by increasing diagonal gap bef…
13	c2b4290991d4	rejected	36.0526	Add configurable LR correction preconditioner shifts in `_lr_eig.py`: a small `+1e-3` shift for s…
14	933fdb1739c3	rejected	29.9178	Reduce DFT TD response setup/allocation overhead by using occupied-only response-cache inputs, fu…
15	c046c9ec1362	rejected	29.968	Vectorize symmetric `_lr_eig.eigh` correction preconditioning for unconverged target residuals in…
16	2a188b646cff	rejected	30.5035	Taper the accepted real_eig correction preconditioner shift downward for late-stage residuals bel…

Accepted Commits¶

Accepted candidate detail pages and current manual-review status:

accepted commit	Human verification
b93246c4bf06	not verified
e159c73ee967	not verified
ccc6bedc3559	not verified
7612d6ba4e26	not verified
c1e86bf88a7c	not verified
64da7449ef1c	not verified

Benchmark Contracts¶

Necessary files to reproduce the FermiLink optimization results:

Runtime Data¶

FermiLink runtime data for accepted/rejected commits.

Rerun Guide¶

Agent provider codex; model gpt-5.4-xhigh

Use the bundled contract files from this report to recreate the optimization against a fresh upstream checkout.

default upstream clone: git@github.com:skilled-scipkg/pyscf.git
confirm the upstream default branch before creating the worktree: master on GitHub
detected package language: python; use fermilink-optimize-python for goal-mode reruns
if goal_inputs.json is present, restage the listed auxiliary workload files before rerunning

git clone git@github.com:skilled-scipkg/pyscf.git
cd pyscf
git worktree add -b fermilink-optimize/pyscf-<modified-feature> ../pyscf-<modified-feature> master

Path 1: Rerun from goal.md¶

Rerun from the bundled goal.md.

Note

Tune the copied ## Build section in goal.md before rerunning. Update environment activation, module loads, compiler paths, install prefixes, and other machine-specific setup so FermiLink builds the package correctly.

export SOURCE_REPO_ROOT="$(cd "$(git rev-parse --git-common-dir)/.." && pwd)"
export VENV="/anvil/scratch/x-tli22/fermilink_optimize/project_pyscf/venvs/fermilink-optimize/pyscf-davidson"
source "$VENV/bin/activate"
module remove cmake
cd pyscf/lib
mkdir -p build
cd build
cmake ..
cmake --build . -j4
cd ../../../
python -m pip install -e .

Run this from the cloned main repo so the launcher can create or reuse the sibling worktree:

fermilink-optimize-python \
  --project-root "$PWD" \
  --goal /path/to/report/contract/goal.md \
  --branch fermilink-optimize/pyscf-<modified-feature> \
  --worktree-root .. \
  --worktree-name pyscf-<modified-feature>

Path 2: More deterministic rerun from benchmark.yaml¶

Rerun from the copied benchmark.yaml and benchmark_runner.py. These files are generated from goal.md by FermiLink, serving as a deterministic benchmark contract that the agent needs to follow during optimization iterations. FermiLink does not directly rely on goal.md for optimization iterations.

This avoids regenerating the benchmark contract from goal.md before the campaign starts:

Note

Inspect benchmark.yaml before rerunning. Update runtime.pre_commands for machine-specific build/setup steps, and verify that runtime.command paths point at files that exist in the new worktree.

cd ../pyscf-<modified-feature>
mkdir -p .fermilink-optimize/autogen
cp /path/to/report/contract/benchmark.yaml .fermilink-optimize/autogen/benchmark.yaml
cp /path/to/report/contract/benchmark_runner.py .fermilink-optimize/autogen/benchmark_runner.py
printf '%s\n' '.fermilink-optimize/' >> .git/info/exclude
fermilink optimize pyscf "$PWD" \
  --benchmark "$PWD/.fermilink-optimize/autogen/benchmark.yaml" \
  --skills-source existing

Benchmark Examples¶

Worker iterations run the train-* benchmark cases below while searching for candidate changes:

cases:
- id: train-rks-bp86-casida-benzene
  weight: 1.0
  geometry_name: benzene
  geometry_source: examples/2-benchmark/bz.py
  basis: 6-31g
  charge: 0
  spin: 0
  symmetry: false
  scf_method: RKS
  xc: b88,p86
  td_method: CasidaTDDFT
  nstates: 12
  singlet: true
  frozen: null
  wfnsym: null
  scf_conv_tol: 1.0e-10
  td_conv_tol: 1.0e-05
  lindep: 1.0e-12
  max_cycle: 100
  positive_eig_threshold: 0.001
  deg_eia_thresh: 0.001
  max_memory: 4000
  oscillator_strength: true
- id: train-rks-b3lyp-tddft-benzene
  weight: 1.0
  geometry_name: benzene
  geometry_source: examples/2-benchmark/bz.py
  basis: 6-31g
  charge: 0
  spin: 0
  symmetry: false
  scf_method: RKS
  xc: b3lyp5
  td_method: TDDFT
  nstates: 10
  singlet: true
  frozen: null
  wfnsym: null
  scf_conv_tol: 1.0e-10
  td_conv_tol: 1.0e-05
  lindep: 1.0e-12
  max_cycle: 100
  positive_eig_threshold: 0.001
  deg_eia_thresh: 0.001
  max_memory: 4000
  oscillator_strength: true
- id: train-uks-bp86-casida-allyl
  weight: 1.0
  geometry_name: allyl
  geometry_source: examples/mp/12-dfump2-natorbs.py
  basis: def2-svp
  charge: 0
  spin: 1
  symmetry: false
  scf_method: UKS
  xc: b88,p86
  td_method: CasidaTDDFT
  nstates: 8
  singlet: null
  frozen: null
  wfnsym: null
  scf_conv_tol: 1.0e-10
  td_conv_tol: 1.0e-05
  lindep: 1.0e-12
  max_cycle: 100
  positive_eig_threshold: 0.001
  deg_eia_thresh: 0.001
  max_memory: 4000
  oscillator_strength: false

Controller reviews run the test-* benchmark cases below to validate accepted candidates:

cases:
- id: test-rks-bp86-casida-benzene-631gss
  weight: 1.0
  geometry_name: benzene
  geometry_source: examples/2-benchmark/bz.py
  basis: 6-31g**
  charge: 0
  spin: 0
  symmetry: false
  scf_method: RKS
  xc: b88,p86
  td_method: CasidaTDDFT
  nstates: 12
  singlet: true
  frozen: null
  wfnsym: null
  scf_conv_tol: 1.0e-10
  td_conv_tol: 1.0e-05
  lindep: 1.0e-12
  max_cycle: 100
  positive_eig_threshold: 0.001
  deg_eia_thresh: 0.001
  max_memory: 4000
  oscillator_strength: true
- id: test-rks-b3lyp-tddft-benzene-631gss
  weight: 1.0
  geometry_name: benzene
  geometry_source: examples/2-benchmark/bz.py
  basis: 6-31g**
  charge: 0
  spin: 0
  symmetry: false
  scf_method: RKS
  xc: b3lyp5
  td_method: TDDFT
  nstates: 10
  singlet: true
  frozen: null
  wfnsym: null
  scf_conv_tol: 1.0e-10
  td_conv_tol: 1.0e-05
  lindep: 1.0e-12
  max_cycle: 100
  positive_eig_threshold: 0.001
  deg_eia_thresh: 0.001
  max_memory: 4000
  oscillator_strength: true
- id: test-uks-bp86-casida-allyl-def2tzvp
  weight: 1.0
  geometry_name: allyl
  geometry_source: examples/mp/12-dfump2-natorbs.py
  basis: def2-TZVP
  charge: 0
  spin: 1
  symmetry: false
  scf_method: UKS
  xc: b88,p86
  td_method: CasidaTDDFT
  nstates: 8
  singlet: null
  frozen: null
  wfnsym: null
  scf_conv_tol: 1.0e-10
  td_conv_tol: 1.0e-05
  lindep: 1.0e-12
  max_cycle: 100
  positive_eig_threshold: 0.001
  deg_eia_thresh: 0.001
  max_memory: 4000
  oscillator_strength: false