Should have \sum_n W_nk = 1 error with pymbar3.0.5 while analysis is working with pymbar3.0.3 #419

hannahbaumann · 2020-11-25T18:19:48Z

Hi,

I'm analyzing free energy calculations (Gromacs-based) using alchemlyb. The analysis fails due to following error:
Warning: Should have \sum_n W_nk = 1. Actual column sum for state 0 was 0.835621. 21 other columns have similar problems.
The pymbar version here was 3.0.5

Running the same input files/script in a different environment with pymbar==3.0.3 didn't raise this error, the script was able to calculate the free energy difference.

The issue seems to be similar to other issues, see e.g. #320

Here is a folder with the .xvg files from one of the runs and a python script for the analysis and the dependencies in the 2 different environments:
https://drive.google.com/drive/folders/1lKncHmRj38OY8H2YjsnYHdIpEcvxKKpX?usp=sharing
Let me know if you need more information!

davidlmobley · 2020-11-25T18:57:34Z

Tagging @mrshirts

jaimergp · 2020-11-26T11:39:34Z

This is the diff between two versions: 3.0.3...3.0.5

This particular change in the default tolerance might be interesting to analyze.

Otherwise, the standard approach here is to run git bisect run ... between the two tags and let the script exit code reveal the offending commit. For this to happen you'll need a local copy of the repo, and fresh environment where pymbar is installed in editable mode. Roughly:

# get pymbar env file for the 3.0.x branch
% wget https://github.com/choderalab/pymbar/raw/3.0_branch/devtools/conda-envs/test_env.yaml
% conda env create -n pymbar-dev test_env.yaml
% conda activate pymbar-dev
% git clone https://github.com/choderalab/pymbar
% cd pymbar
% git checkout 3.0.3  # start with the working state
% pip install -e .
% git bisect start 3.0.3
% git bisect run your_script.py

xiki-tempula · 2020-12-01T15:45:37Z

This is similar to the issue that I got in #407

jaimergp · 2020-12-01T18:27:19Z

Ok, I think I have found the issue.

✅ Last commit where this was working is fe9ea53.
🔴 The immediate child fe57313 causes this error. This is the first commit in PR Remove Subsampling from MBAR Solving #295.

History is not lineal at that point because both #295 and #293 were being developed in parallel back then. The dict API change makes it a bit more complicated to test because the provided script breaks between merged #293 (released in 3.0.4) and 3.0.5.

Anyway, the error should lie in this diff: fe9ea53...fe57313

Hope that helps debug the actual issue!

mikemhenry · 2021-02-25T22:10:54Z

Also running into this issue when following the openmm alchemical free energy tutorial

rsdefever · 2021-03-10T22:26:24Z

I'm also having this issue where I get the error with 3.0.5 but not 3.0.3. It doesn't happen for all of my systems.

In case it helps with debugging, I have attached a minimal working example for energy distributions where 3.0.3 gives a result and 3.0.5 fails with the Should have \sum_n W_nk = 1 error error. The attached zip file contains 3 things: A u_kn.txt, N_k.txt, and a short python script that loads the matrices and calls pymbar.

Thanks in advance!

pymbar_debug.zip

jchodera · 2021-03-11T17:17:09Z

@mrshirts Do you have the resources to look into this right now? I think this must have been introduced in the overhaul.

mrshirts · 2021-03-11T19:06:39Z

Let me take a look over the weekend (Mar 13-15) and see what I can identify.

mrshirts · 2021-03-22T02:06:33Z

I don't know if this is the problem every time, but I suspect for the majority of 3.0.3 -> 3.0.5 issues, the default solver switched (which is good - the "adaptive" solver is much more consistent), but the default maximum number of iterations is too low (250), and thus MBAR doesn't go to completion.

It's actually a bit of a pain to increase the maximum number of iterations in 3.0: the way to do it is invoke mbar this way:

    solver_options = {"maximum_iterations":10000,"verbose":True}
    solver_protocol = {"method":"adaptive","options":solver_options}
    mbar = MBAR(u_kn, N_k, solver_protocol = (solver_protocol,))

Let me know how many of the issues are addressed with this change. We will look at better ways of passing in parameters for 4.0.

rsdefever · 2021-03-22T14:06:01Z

This helped for some cases that were failing in 3.0.5. However, I still have several cases that result in the Warning: Should have \sum_n W_nk = 1 error in 3.0.5 but seem to converge in 3.0.3. Interestingly, the cases that fail require far fewer solver iterations. When I switch the solver in 3.0.5 to BFGS, the Warning: Should have \sum_n W_nk = 1 goes away.

If it would be helpful, I can pull together some more MWEs.

Thanks!

mrshirts · 2021-03-22T14:32:34Z

Yes, I would very much like to see those examples.

rsdefever · 2021-03-22T15:01:34Z

Attached is another MWE. The attached zip file contains u_kn.txt, N_k.txt, and test.py, a short python script that loads the matrices and calls pymbar. The script has blocks for 3.0.3, 3.0.5 + adaptive, and 3.0.5 + BGFS. At least for me, 3.0.3 and 3.0.5 + BGFS work, while 3.0.5 + adaptive results in the \sum_n W_nk error. Let me know if you have any questions. Thanks!

pymbar_debug_solver.zip

mrshirts · 2021-03-22T16:51:25Z

OK, looking into this; adaptive is terminating too soon. However, it does look like one CAN use LBFGS with 3.0.5, so at least the correct results can be obtained without changing the code itself! Better automated handling is, of course, better when possible.

jchodera · 2021-03-22T16:55:17Z

Why not have it fall back to a more robust solver if it fails to converge with the requested solver?
This robust fallback behavior can be the default, but you can let people disable it if they want to live dangerously.

We should always provide robust results by default if at all possible, and force people to specify optional arguments if they want to try something specific.

mrshirts · 2021-03-22T17:03:31Z

Why not have it fall back to a more robust solver if it fails to converge with the requested solver?

Adaptive was previously the more robust solver. This is the first case where BFGS worked where adaptive didn't (which is why we reintroduced adaptive and made it default in 3.0.5 - MOST of the issues were improper minimum number of iteration handling). Basically, we need to assemble a set of hard problems and see what the best overall strategy may be, and provide clearer details when it fails about other options to take. In this particular case, self-consistent iteration iteration is getting stuck with the free energies not changing between iterations, apparently because of rounding errors.

Going through all the cases and finding a robust scheme is probably going to have to wait until classes are over for me (last week in April), at which point getting a full 4.0 release is a high priority (90% done, except for things like this).

jchodera · 2021-03-23T00:14:38Z

Based on what you said above, wouldn't simply falling back to adaptive---and then to BFGS if that fails---work in all the cases we know about?

mrshirts · 2021-03-23T03:34:19Z

Probably? But I'd rather get everything incorporated into 4.0, understand WHY these cases are failing in different places. Will just take some undivided hours to get everything together. And update the documentation, etc.

May be related to #419

xiki-tempula · 2021-10-23T14:15:29Z

@rsdefever Thanks for the example, I'm interested in reproducing this problem, the link that you have provided seems to be broken and can I have the files, please? Thank you.

rsdefever · 2021-10-23T22:22:21Z

@xiki-tempula could you check again? I was able down download the zip from the link just now.

xiki-tempula · 2021-10-24T08:25:58Z

@rsdefever Thank you. It seems that changing the browser allows me to download it. I wonder if you mind allow me to put this file in https://github.com/alchemistry/alchemtest to check the stability of the estimator?

rsdefever · 2021-10-24T14:26:46Z

@xiki-tempula that is absolutely ok.

zhang-ivy · 2021-12-17T19:40:15Z

@jchodera @mrshirts What's the status of this issue? Will there be a bug fix release soon? I just ran into this issue as well and fixed it by installing from master (as I think this PR #425 fixes the issue for the adaptive solver).

(Note: I was having trouble installing pymbar 3.0.3 in my existing environment because it uses python 3.8.11, which is why I installed from master instead)

mrshirts · 2021-12-17T21:41:30Z

Semester is over, so I'm planning on revisiting things over the next few weeks for a FINAL 4.0 release; I should have time to play around with some options. I have the two data sets, so should be able to test with both of those. Maybe a target of Jan 10th?

mikemhenry · 2022-01-21T21:45:48Z

@mrshirts Just wanted to give this issue a nudge :)

mrshirts · 2022-01-21T23:53:30Z

Hah, thanks. It was a crazy break (Colorado wildfires, car crash, illness). Still on my radar, but may be another couple of weeks.

sef43 · 2023-07-14T11:17:05Z

I was getting this error on the OpenMM Alchemical free energy tutorial but it has now gone away after fixing a bug in the simulation that would have been incorrectly sampling states: openmm/openmm-cookbook#28

ijpulidos · 2023-09-12T23:18:52Z

We are hitting the same issue with pymbar 4 for our perses GPU CI tests.

I have been able to reproduce it locally and if I use pymbar 3 they run just fine. We probably need to dive into this with more detail and maybe come up with an easy to reproduce example, since these tests take at least 20 mins using a decent GPU.

mrshirts · 2023-09-13T02:21:47Z

There's about 20 different optimizers for MBAR now (adaptive or any of the scipy ones), so it would be good to know what options are failing, and if they are failing because they reach the maximum iteration, or some other failure mode.

Some data sets are going to be numerically impossible to converge, so perhaps we need a better reporting framework when a given dataset has problems - this particular error is uninformative.

ijpulidos · 2023-09-13T22:16:28Z

I hope this can help debug the problem. I'm attaching a .tar.gz package with the required .nc files and python script to reproduce the problem. Just copy the nc files and python script into a directory and run the script with python process_with_pymbar.py

The output with pymbar 4.0.x is as follows:

Warning on use of the timeseries module: If the inherent timescales of the system are long compared to those being analyzed, this statistical inefficiency may be an underestimate.  The estimate presumes the use of many statistically independent samples.  Tests should be performed to assess whether this condition is satisfied.   Be cautious in the interpretation of the data.

****** PyMBAR will use 64-bit JAX! *******
* JAX is currently set to 32-bit bitsize *
* which is its default.                  *
*                                        *
* PyMBAR requires 64-bit mode and WILL   *
* enable JAX's 64-bit mode when called.  *
*                                        *
* This MAY cause problems with other     *
* Uses of JAX in the same code.          *
******************************************

Warning: The openmmtools.multistate API is experimental and may change in future releases
Warning: The openmmtools.multistate API is experimental and may change in future releases

******* JAX 64-bit mode is now on! *******
*     JAX is now set to 64-bit mode!     *
*   This MAY cause problems with other   *
*      uses of JAX in the same code.     *
******************************************

Failed to reach a solution to within tolerance with hybr: trying next method
WARNING: Did not converge to within specified tolerance.
max_delta = 0.000000e+00, tol = 1.000000e-12, maximum_iterations = 10000, iterations completed = 9999
Failed to reach a solution to within tolerance with adaptive: trying next method
No solution found to within tolerance.
The solution with the smallest gradient 5.107602e+02 norm is hybr
Please exercise caution with this solution and consider alternative methods or a different tolerance.
Traceback (most recent call last):
  File "/home/user/mambaforge/envs/perses-dev/lib/python3.10/site-packages/openmmtools/multistate/multistateanalyzer.py", line 1923, in _compute_free_energy
    Deltaf_ij, dDeltaf_ij = self.mbar.getFreeEnergyDifferences()
AttributeError: 'MBAR' object has no attribute 'getFreeEnergyDifferences'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/workdir/debugging/pymbar/process_with_pymbar.py", line 10, in <module>
    f_ij, df_ij = analyzer.get_free_energy()
  File "/home/user/mambaforge/envs/perses-dev/lib/python3.10/site-packages/openmmtools/multistate/multistateanalyzer.py", line 1965, in get_free_energy
    self._compute_free_energy()
  File "/home/user/mambaforge/envs/perses-dev/lib/python3.10/site-packages/openmmtools/multistate/multistateanalyzer.py", line 1926, in _compute_free_energy
    results = self.mbar.compute_free_energy_differences()
  File "/home/user/mambaforge/envs/perses-dev/lib/python3.10/site-packages/pymbar/mbar.py", line 698, in compute_free_energy_differences
    Theta_ij = self._computeAsymptoticCovarianceMatrix(
  File "/home/user/mambaforge/envs/perses-dev/lib/python3.10/site-packages/pymbar/mbar.py", line 1793, in _computeAsymptoticCovarianceMatrix
    check_w_normalized(W, N_k)
  File "/home/user/mambaforge/envs/perses-dev/lib/python3.10/site-packages/pymbar/utils.py", line 371, in check_w_normalized
    raise ParameterError(
pymbar.utils.ParameterError: Warning: Should have \sum_n W_nk = 1. Actual column sum for state 0 was 0.000000. 14 other columns have similar problems. 
This generally indicates the free energies are not converged.

whereas with pymbar 3.1.1 you shouldn't get any errors. I tried watching the difference between both versions for the relevant code paths and couldn't see any difference that would explain the error. I hope this helps with the debugging.

pymbar4-test.tar.gz

pitmanme · 2024-02-20T22:00:45Z

This is more for users, but if anybody is running into this issue, one option is to downgrade pymbar, but if you want to continue with pymbar version 4.0 you can also set the solver to robust to avoid this bug with

analyzer = MultiStateSamplerAnalyzer(reporter, max_n_iterations=n_iterations, analysis_kwargs={"solver_protocol": "robust"})

hannahbaumann mentioned this issue Nov 25, 2020

Should have \sum_n W_nk = 1 #320

Open

davidlmobley mentioned this issue Dec 1, 2020

pymbar.utils.ParameterError encountered with MBAR estimator #407

Open

mikemhenry mentioned this issue Feb 26, 2021

formatting updates to alchem tutorial openmm/openmm-org#47

Open

sadafrani6 mentioned this issue Apr 20, 2021

CRITICAL: Experiment NaN choderalab/yank#1252

Open

ncheron mentioned this issue May 4, 2021

Change number of iterations in mbar alchemistry/flamel#2

Closed

orbeckst mentioned this issue May 4, 2021

Visualisation of overlap matrix alchemistry/alchemlyb#127

Closed

xiki-tempula mentioned this issue Jul 17, 2021

MBAR estimator automatically choosing the usable method #424

Open

proteneer mentioned this issue Aug 4, 2021

downgrade pymbar to 3.0.3 proteneer/timemachine#477

Closed

maxentile added a commit that referenced this issue Aug 4, 2021

Pass maximum_iterations to solver['options']

c7dc3b8

May be related to #419

maxentile mentioned this issue Aug 5, 2021

Forward maximum_iterations from MBAR constructor #425

Merged

xiki-tempula mentioned this issue Aug 30, 2021

Make it that the MBAR will take in maximum_iterations argument. alchemistry/alchemlyb#162

Merged

xiki-tempula mentioned this issue Oct 12, 2021

About a more robust MBAR solver alchemistry/alchemlyb#170

Closed

z-gong mentioned this issue Oct 20, 2021

Alchemical free energy calculations example on CPU & GPU openmm/openmm#3288

Open

xiki-tempula mentioned this issue Oct 24, 2021

implement auto mbar alchemistry/alchemlyb#172

Merged

kexul mentioned this issue Nov 20, 2021

Update FreeEnergyAnalysis.py michellab/Sire#358

Merged

mikemhenry mentioned this issue Dec 22, 2021

Import error on Google Colab choderalab/openmmtools#532

Closed

xiki-tempula mentioned this issue Dec 30, 2021

Bump the alchemlyb to 0.6.0 luancarvalhomartins/PyAutoFEP#24

Closed

zhang-ivy mentioned this issue Jan 21, 2022

Cannot install perses > 0.8.1 with python > 3.7 because of pymbar choderalab/perses#923

Closed

jfennick mentioned this issue Mar 7, 2022

Add replica exchange attempts during equilibration phase choderalab/openmmtools#556

Merged

5 tasks

VOD555 mentioned this issue Mar 21, 2022

Support AutoMBAR in forward and backward convergence alchemistry/alchemlyb#189

Closed

zhang-ivy mentioned this issue May 25, 2022

Update HybridCompatibilityMixin to handle RESTCapableHybridTopologyFactory and add repex self-consistency tests choderalab/perses#992

Merged

mrshirts mentioned this issue Jun 20, 2022

[WIP] Trying to make MBAR work better adaptively when it fails. #442

Draft

mrshirts mentioned this issue Jul 19, 2022

Bootstrap uncertainties first pass #302

Merged

VOD555 mentioned this issue Jan 20, 2023

Use AutoMBAR instead of MBAR Becksteinlab/MDPOW#233

Closed

ijpulidos mentioned this issue Nov 17, 2023

[DNM] Test/openmm 8.1.0rc1 choderalab/perses#1243

Closed

tlhr mentioned this issue Apr 18, 2024

Frequent "Should have \sum_n W_nk = 1" errors in analysis OpenFreeEnergy/openfe#833

Open

mikemhenry mentioned this issue May 10, 2024

Add GAFF test (and GAFF skips) to CI OpenFreeEnergy/openfe#847

Merged

2 tasks

ijpulidos mentioned this issue Jun 19, 2024

Problems with Exponentially distributed temperatures choderalab/openmmtools#733

Open

ijpulidos mentioned this issue Aug 15, 2024

[WIP] Using the robust solver for pyMBAR - avoiding convergence Failu… choderalab/openmmtools#735

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should have \sum_n W_nk = 1 error with pymbar3.0.5 while analysis is working with pymbar3.0.3 #419

Should have \sum_n W_nk = 1 error with pymbar3.0.5 while analysis is working with pymbar3.0.3 #419

hannahbaumann commented Nov 25, 2020

davidlmobley commented Nov 25, 2020

jaimergp commented Nov 26, 2020

xiki-tempula commented Dec 1, 2020

jaimergp commented Dec 1, 2020 •

edited

Loading

mikemhenry commented Feb 25, 2021

rsdefever commented Mar 10, 2021

jchodera commented Mar 11, 2021

mrshirts commented Mar 11, 2021

mrshirts commented Mar 22, 2021

rsdefever commented Mar 22, 2021 •

edited

Loading

mrshirts commented Mar 22, 2021

rsdefever commented Mar 22, 2021 •

edited

Loading

mrshirts commented Mar 22, 2021

jchodera commented Mar 22, 2021

mrshirts commented Mar 22, 2021

jchodera commented Mar 23, 2021

mrshirts commented Mar 23, 2021

xiki-tempula commented Oct 23, 2021

rsdefever commented Oct 23, 2021

xiki-tempula commented Oct 24, 2021

rsdefever commented Oct 24, 2021

zhang-ivy commented Dec 17, 2021

mrshirts commented Dec 17, 2021

mikemhenry commented Jan 21, 2022

mrshirts commented Jan 21, 2022

sef43 commented Jul 14, 2023

ijpulidos commented Sep 12, 2023

mrshirts commented Sep 13, 2023

ijpulidos commented Sep 13, 2023

pitmanme commented Feb 20, 2024 •

edited

Loading

Should have \sum_n W_nk = 1 error with pymbar3.0.5 while analysis is working with pymbar3.0.3 #419

Should have \sum_n W_nk = 1 error with pymbar3.0.5 while analysis is working with pymbar3.0.3 #419

Comments

hannahbaumann commented Nov 25, 2020

davidlmobley commented Nov 25, 2020

jaimergp commented Nov 26, 2020

xiki-tempula commented Dec 1, 2020

jaimergp commented Dec 1, 2020 • edited Loading

mikemhenry commented Feb 25, 2021

rsdefever commented Mar 10, 2021

jchodera commented Mar 11, 2021

mrshirts commented Mar 11, 2021

mrshirts commented Mar 22, 2021

rsdefever commented Mar 22, 2021 • edited Loading

mrshirts commented Mar 22, 2021

rsdefever commented Mar 22, 2021 • edited Loading

mrshirts commented Mar 22, 2021

jchodera commented Mar 22, 2021

mrshirts commented Mar 22, 2021

jchodera commented Mar 23, 2021

mrshirts commented Mar 23, 2021

xiki-tempula commented Oct 23, 2021

rsdefever commented Oct 23, 2021

xiki-tempula commented Oct 24, 2021

rsdefever commented Oct 24, 2021

zhang-ivy commented Dec 17, 2021

mrshirts commented Dec 17, 2021

mikemhenry commented Jan 21, 2022

mrshirts commented Jan 21, 2022

sef43 commented Jul 14, 2023

ijpulidos commented Sep 12, 2023

mrshirts commented Sep 13, 2023

ijpulidos commented Sep 13, 2023

pitmanme commented Feb 20, 2024 • edited Loading

jaimergp commented Dec 1, 2020 •

edited

Loading

rsdefever commented Mar 22, 2021 •

edited

Loading

rsdefever commented Mar 22, 2021 •

edited

Loading

pitmanme commented Feb 20, 2024 •

edited

Loading