Optimization Report — pyscf-casscf¶

Note

The optimized code summarized in this report was generated by the FermiLink AI agent. Review and validate the code changes yourself before using the modified code in scientific or production work. This optimization reporting feature is experimental and is not a final, mature solution.

Primary metric: Weighted median casscf kernel time (s) (lower is better).

Goal¶

Copied source goal for this optimization: goal.md

# Optimization Goal

## Package
pyscf

## Language
python

## Target
Optimize PySCF CASSCF convergence behavior and end-to-end runtime, with primary focus on the matrix-free second-order / CIAH solver in `pyscf/mcscf/newton_casscf.py` and the one-step / two-step macro-micro iteration control flow in `pyscf/mcscf/mc1step.py`.

Prioritize algorithmic and numerical improvements that reduce wasted iterations without changing the scientific target:
- more stable augmented-Hessian / CIAH preconditioning using inexpensive diagonal, orbital-gap, active-space, or response information without forming the full orbital Hessian
- more reliable handling of near-redundant orbital rotations, tiny denominators, and indefinite local curvature
- better trust-region, step-acceptance, and keyframe-recovery behavior so difficult cases reach the same stationary point with fewer macroiterations and AH microiterations
- lower wasted CASCI, Hessian-vector, and restart work through smarter reuse of existing solver state when that does not change the mathematical problem
- safer state-specific and state-averaged root handling through overlap-aware diagnostics and root tracking

Out of scope:
- loosening CASSCF, AH, or FCI tolerances or increasing iteration limits to win the benchmark
- replacing CASSCF with CASCI, DMRG, selected CI, reduced active spaces, or molecule-specific shortcuts
- broad unrelated changes to SCF, integral generation, or non-MCSCF modules unless required by the same solver path

## Editable Scope
- pyscf/mcscf/newton_casscf.py
- pyscf/mcscf/newton_casscf_symm.py
- pyscf/mcscf/mc1step.py
- pyscf/mcscf/mc1step_symm.py
- pyscf/mcscf/addons.py
- pyscf/mcscf/casci.py
- pyscf/soscf/ciah.py
- pyscf/lib/linalg_helper.py

## Performance Metric
Minimize `weighted_median_casscf_kernel_seconds`, defined as the weighted median of per-case end-to-end solver wall time measured from the selected CASSCF call (`mc.mc1step()`, `mc.mc2step()`, or `mc.newton().kernel()`) until convergence, after molecule construction, RHF or ROHF setup, and active-orbital selection are complete.

Secondary objectives:
- lower `macro_cycles`, `micro_cycles`, `ah_seconds`, `h_op_calls`, `casci_seconds`, `ao2mo_seconds`, `keyframe_restarts`, and `rejected_steps` when the runner can expose them
- preserve convergence and root identity on every representative case

## Correctness Constraints
- All benchmark cases must converge in the incumbent baseline and in accepted candidates under the stated `conv_tol`, `conv_tol_grad`, `max_cycle_macro`, `max_cycle_micro`, and AH settings.
- Single-state final total CASSCF energy absolute delta must be <= `5e-8` Hartree versus the incumbent baseline.
- State-averaged final total energy absolute delta must be <= `1e-7` Hartree, and per-state energy absolute delta must be <= `5e-6` Hartree, when the runner exposes `e_states`.
- Final orbital-gradient norm and CI-gradient norm must remain no worse than the workload convergence thresholds when the runner exposes them.
- Preserve the same molecule, basis, charge, spin, symmetry flag, active electron count, active orbital count, state weights, target root, solver family, and initial orbital sorting or projection path as the baseline workload.
- Root matching for state-specific excited-root or state-averaged cases must use overlaps when comparable vectors are available; sorted-energy-only matching is not sufficient on crowded cases.
- Do not loosen `conv_tol`, `conv_tol_grad`, `fcisolver.conv_tol`, `max_cycle_macro`, `max_cycle_micro`, `max_stepsize`, `ah_conv_tol`, `ah_lindep`, `ah_level_shift`, `ah_start_tol`, `ah_start_cycle`, `ah_max_cycle`, `nroots`, state weights, `fix_spin_`, `wfnsym`, or `internal_rotation`.
- Do not replace Newton-CASSCF with an easier solver, reduce the number of states, alter active-space selection, disable scheduler callbacks, or change public API behavior to gain speed.
- No case-specific shortcuts keyed on molecule identity, bond length, basis, active space, root number, or whether a case is `train-` or `test-`.

## Representative Workloads
- train-newton-benzene-6-31g-cas66-pi-sort: Benzene geometry and active-orbital sort from `examples/mcscf/17-approx_orbital_hessian.py`; RHF; `basis='6-31g'`; `symmetry=True`; CAS(6o,6e); fixed active-orbital sort `[17,20,21,22,23,30]`; run `mc.newton().kernel(mo)`; optional comparison against `mcscf.approx_hessian(mcscf.CASSCF(...)).kernel(mo)`; purpose: larger AO/virtual orbital space with a small exact-FCI CAS, useful for testing matrix-free AH/preconditioner behavior without making the CI solve the bottleneck.
```python
atom = """
C -0.65830719  0.61123287 -0.00800148
C  0.73685281  0.61123287 -0.00800148
C  1.43439081  1.81898387 -0.00800148
C  0.73673681  3.02749287 -0.00920048
C -0.65808819  3.02741487 -0.00967948
C -1.35568919  1.81920887 -0.00868348
H -1.20806619 -0.34108413 -0.00755148
H  1.28636081 -0.34128013 -0.00668648
H  2.53407081  1.81906387 -0.00736748
H  1.28693681  3.97963587 -0.00925948
H -1.20821019  3.97969587 -0.01063248
H -2.45529319  1.81939187 -0.00886348
"""
```
- test-newton-benzene-ccpvtz-cas66-pi-sort: Benzene geometry and active-orbital sort from `examples/mcscf/17-approx_orbital_hessian.py`; RHF; `basis='ccpvtz'`; `symmetry=True`; CAS(6o,6e); fixed active-orbital sort `[17,20,21,22,23,30]`; run `mc.newton().kernel(mo)`; optional comparison against `mcscf.approx_hessian(mcscf.CASSCF(...)).kernel(mo)`; purpose: larger AO/virtual orbital space with a small exact-FCI CAS, useful for testing matrix-free AH/preconditioner behavior without making the CI solve the bottleneck.
```python
atom = """
C -0.65830719  0.61123287 -0.00800148
C  0.73685281  0.61123287 -0.00800148
C  1.43439081  1.81898387 -0.00800148
C  0.73673681  3.02749287 -0.00920048
C -0.65808819  3.02741487 -0.00967948
C -1.35568919  1.81920887 -0.00868348
H -1.20806619 -0.34108413 -0.00755148
H  1.28636081 -0.34128013 -0.00668648
H  2.53407081  1.81906387 -0.00736748
H  1.28693681  3.97963587 -0.00925948
H -1.20821019  3.97969587 -0.01063248
H -2.45529319  1.81939187 -0.00886348
"""
```

## Build
```bash
export SOURCE_REPO_ROOT="$(cd "$(git rev-parse --git-common-dir)/.." && pwd)"
export VENV="/anvil/scratch/x-tli22/fermilink_optimize/project_pyscf_casscf/venvs/fermilink-optimize/pyscf-casscf"
source "$VENV/bin/activate"
module remove cmake
cd pyscf/lib
mkdir -p build
cd build
cmake ..
cmake --build . -j4
cd ../../../
python -m pip install -e .
```

## Notes
- Keep benchmark behavior deterministic with `OMP_NUM_THREADS=16`, `OPENBLAS_NUM_THREADS=1`, `MKL_NUM_THREADS=1`, and `NUMEXPR_NUM_THREADS=1`.
- Run the `## Build` commands from the PySCF repo root inside the campaign's active environment; the generated benchmark should use explicit build pre-commands derived from this section.
- Preserve the `train-` and `test-` ids directly and let FermiLink infer the split from the prefixes; do not add a manual `split` block when every case already uses those prefixes.
- Time only the selected CASSCF solver call after molecule setup, SCF setup, and active-orbital selection are complete.
- If the runner can expose them, record per-case `casscf_kernel_seconds`, `macro_cycles`, `micro_cycles`, `ah_seconds`, `h_op_calls`, `casci_seconds`, `ao2mo_seconds`, `keyframe_restarts`, `rejected_steps`, `converged`, `e_tot`, `e_states`, `norm_gorb`, `norm_gci`, and root-overlap diagnostics.
- Keep the initial optimize benchmark to fixed-geometry cases that finish reproducibly in repeated local runs; do not include whole geometry scans, multistage homotopy schedules, or very heavy Cr2-style stress cases in the initial autogen suite.
- If a listed case proves baseline-nonreproducible on the target machine, replace it before the campaign starts with a nearby fixed-geometry case in the same CASSCF regime rather than weakening tolerances or broadening the editable scope.

Summary¶

baseline (44c83aaae41f): 102.444
best accepted (911063b081d2): 39.7457 (+61.20% vs baseline)
published GitHub branch: fermilink-optimize/pyscf-casscf
iterations: 22 total | 4 accepted | 16 rejected | 1 correctness failure

Optimization Trajectory¶

All iterations¶

iter	commit	status	metric	summary
0	44c83aaae41f	baseline	102.444	baseline
1	fdc4b31c84b8	rejected	102.853	Vectorize incore Newton-CASSCF Hessian-vector ERI contractions while preserving the out-of-core f…
2	77d0fdfe785d	rejected	102.771	Optimize Newton-CASSCF Hessian-vector H_co JK contraction by exploiting sparse core-response rows
3	f6b960fe047c	rejected	102.157	Skip redundant final Newton-CASSCF AO2MO/CASCI after a no-step keyframe-gradient check
4	0e0c3c0191da	correctness_failure	95.7243	Honor the already scaled Newton-CASSCF inner AH gradient stop target in update_orb_ci instead of …
5	c08bd5458630	rejected	102.44	Split Newton-CASSCF AH JK response into independent dm3/dm4 contractions and skip redundant termi…
6	a3ed2ebcaf36	rejected	102.163	Reassociate Newton-CASSCF AH Hessian-vector density products over core/active support and skip re…
7	2eb53381cda9	accepted	48.5098	Use lazy density-fitted JK for Newton-CASSCF AH Hessian-vector response contractions while keepin…
8	7193e2b76d2b	accepted	41.7139	Factorize large Newton-CASSCF AH density-fitted JK response builds over low-rank dm3/dm4 factors,…
9	db1c93a03725	rejected	41.9501	Batch Newton-CASSCF low-rank AH DF-JK factor transforms across response densities to reduce per-b…
10	7ed98d8c2d85	rejected	41.1557	Cache Newton-CASSCF low-rank AH density-fitted response intermediates by reusing unpacked AO DF b…
11	334a40a0e17c	accepted	40.7776	Materialize exact Newton-CASSCF outcore ppaa/papa AO2MO datasets into guarded in-memory NumPy arr…
12	a60cd89bb8c2	rejected	40.2259	Vectorize exact Newton-CASSCF AH H_co contractions over materialized ppaa/papa tensors while pres…
13	911063b081d2	accepted	39.7457	Cache per-AO2MO low-rank DF A-side transforms and use cached materialized ppaa/papa slices to vec…
14	188c01d21e99	rejected	39.4836	Guard exact Newton-CASSCF outcore AO2MO to build ppaa/papa directly as same-run in-memory arrays,…
15	e06fb6d6e54d	rejected	39.6404	Incremental exact direct-JK updates for Newton-CASSCF keyframe gradients plus BLAS vectorization …
16	dd9e49defdfb	rejected	39.8904	Guarded vectorized Newton-CASSCF gen_g_hop setup over same-run materialized exact ppaa/papa tenso…
17	ccb1454453cd	rejected	40.264	Skip redundant terminal Newton-CASSCF AO2MO/CASCI when no microstep is taken at strict gradient c…
18	3d3efe61c91f	rejected	39.2684	Cache exact per-AO2MO hcore MO transforms on Newton-CASSCF ERIS objects and reuse them in gen_g_h…
19	f4c984052a2a	rejected	39.5938	Same-state Newton-CASSCF reuse: cache exact per-ERIS hcore MO transforms and skip the redundant A…
20	0e6acc89cdae	rejected	46.2534	Project large-AO low-rank Newton-CASSCF AH DF exchange directly into the required MO K blocks ins…
21	b7da83d5ec68	rejected	39.0866	Cache same-run Newton-CASSCF invariant DF/hcore intermediates and canonicalize from the exact MO-…

Accepted Commits¶

Accepted candidate detail pages and current manual-review status:

accepted commit	Human verification
2eb53381cda9	not verified
7193e2b76d2b	not verified
334a40a0e17c	not verified
911063b081d2	not verified

Benchmark Contracts¶

Necessary files to reproduce the FermiLink optimization results:

Runtime Data¶

FermiLink runtime data for accepted/rejected commits.

Rerun Guide¶

Agent provider codex; model gpt-5.5-xhigh

Use the bundled contract files from this report to recreate the optimization against a fresh upstream checkout.

default upstream clone: git@github.com:skilled-scipkg/pyscf.git
confirm the upstream default branch before creating the worktree: master on GitHub
detected package language: python; use fermilink-optimize-python for goal-mode reruns
if goal_inputs.json is present, restage the listed auxiliary workload files before rerunning

git clone git@github.com:skilled-scipkg/pyscf.git
cd pyscf
git worktree add -b fermilink-optimize/pyscf-<modified-feature> ../pyscf-<modified-feature> master

Path 1: Rerun from goal.md¶

Rerun from the bundled goal.md.

Note

Tune the copied ## Build section in goal.md before rerunning. Update environment activation, module loads, compiler paths, install prefixes, and other machine-specific setup so FermiLink builds the package correctly.

export SOURCE_REPO_ROOT="$(cd "$(git rev-parse --git-common-dir)/.." && pwd)"
export VENV="/anvil/scratch/x-tli22/fermilink_optimize/project_pyscf_casscf/venvs/fermilink-optimize/pyscf-casscf"
source "$VENV/bin/activate"
module remove cmake
cd pyscf/lib
mkdir -p build
cd build
cmake ..
cmake --build . -j4
cd ../../../
python -m pip install -e .

Run this from the cloned main repo so the launcher can create or reuse the sibling worktree:

fermilink-optimize-python \
  --project-root "$PWD" \
  --goal /path/to/report/contract/goal.md \
  --branch fermilink-optimize/pyscf-<modified-feature> \
  --worktree-root .. \
  --worktree-name pyscf-<modified-feature>

Path 2: More deterministic rerun from benchmark.yaml¶

Rerun from the copied benchmark.yaml and benchmark_runner.py. These files are generated from goal.md by FermiLink, serving as a deterministic benchmark contract that the agent needs to follow during optimization iterations. FermiLink does not directly rely on goal.md for optimization iterations.

This avoids regenerating the benchmark contract from goal.md before the campaign starts:

Note

Inspect benchmark.yaml before rerunning. Update runtime.pre_commands for machine-specific build/setup steps, and verify that runtime.command paths point at files that exist in the new worktree.

cd ../pyscf-<modified-feature>
mkdir -p .fermilink-optimize/autogen
cp /path/to/report/contract/benchmark.yaml .fermilink-optimize/autogen/benchmark.yaml
cp /path/to/report/contract/benchmark_runner.py .fermilink-optimize/autogen/benchmark_runner.py
printf '%s\n' '.fermilink-optimize/' >> .git/info/exclude
fermilink optimize pyscf "$PWD" \
  --benchmark "$PWD/.fermilink-optimize/autogen/benchmark.yaml" \
  --skills-source existing

Benchmark Examples¶

Worker iterations run the train-* benchmark cases below while searching for candidate changes:

cases:
- id: train-newton-benzene-6-31g-cas66-pi-sort
  weight: 3.0
  description: >-
    Goal workload: benzene pi CAS(6,6), symmetry enabled, Newton-CASSCF
    matrix-free AH/CIAH path.
  atom: |
    C -0.65830719  0.61123287 -0.00800148
    C  0.73685281  0.61123287 -0.00800148
    C  1.43439081  1.81898387 -0.00800148
    C  0.73673681  3.02749287 -0.00920048
    C -0.65808819  3.02741487 -0.00967948
    C -1.35568919  1.81920887 -0.00868348
    H -1.20806619 -0.34108413 -0.00755148
    H  1.28636081 -0.34128013 -0.00668648
    H  2.53407081  1.81906387 -0.00736748
    H  1.28693681  3.97963587 -0.00925948
    H -1.20821019  3.97969587 -0.01063248
    H -2.45529319  1.81939187 -0.00886348
  basis: 6-31g
  charge: 0
  spin: 0
  symmetry: true
  mf_type: RHF
  mf_conv_tol: 1.0e-10
  ncas: 6
  nelecas: 6
  active_orbital_sort:
  - 17
  - 20
  - 21
  - 22
  - 23
  - 30
  solver_family: newton
  conv_tol: 1.0e-07
  conv_tol_grad: null
  max_cycle_macro: 50
  max_cycle_micro: 10
  max_stepsize: 0.03
  ah_level_shift: 1.0e-08
  ah_conv_tol: 1.0e-12
  ah_lindep: 1.0e-14
  ah_start_tol: 500.0
  ah_start_cycle: 3
  ah_max_cycle: 30

Controller reviews run the test-* benchmark cases below to validate accepted candidates:

cases:
- id: test-newton-benzene-ccpvtz-cas66-pi-sort
  weight: 3.0
  description: >-
    Goal test workload: same benzene pi CAS(6,6), symmetry enabled, Newton-CASSCF
    path with a larger cc-pVTZ AO/virtual space.
  atom: |
    C -0.65830719  0.61123287 -0.00800148
    C  0.73685281  0.61123287 -0.00800148
    C  1.43439081  1.81898387 -0.00800148
    C  0.73673681  3.02749287 -0.00920048
    C -0.65808819  3.02741487 -0.00967948
    C -1.35568919  1.81920887 -0.00868348
    H -1.20806619 -0.34108413 -0.00755148
    H  1.28636081 -0.34128013 -0.00668648
    H  2.53407081  1.81906387 -0.00736748
    H  1.28693681  3.97963587 -0.00925948
    H -1.20821019  3.97969587 -0.01063248
    H -2.45529319  1.81939187 -0.00886348
  basis: ccpvtz
  charge: 0
  spin: 0
  symmetry: true
  mf_type: RHF
  mf_conv_tol: 1.0e-10
  ncas: 6
  nelecas: 6
  active_orbital_sort:
  - 17
  - 20
  - 21
  - 22
  - 23
  - 30
  solver_family: newton
  conv_tol: 1.0e-07
  conv_tol_grad: null
  max_cycle_macro: 50
  max_cycle_micro: 10
  max_stepsize: 0.03
  ah_level_shift: 1.0e-08
  ah_conv_tol: 1.0e-12
  ah_lindep: 1.0e-14
  ah_start_tol: 500.0
  ah_start_cycle: 3
  ah_max_cycle: 30