Fix bug in bivariate MI/TE estimation and min stats

Fix bug in bivariate MI/TE estimation. Conditioning was not performed correctly. Find non-uniform embedding separately for each link. This requires conditioning on the target's past for TE and conditioning on all variables already selected for this link. Fix bug in minimum statistics: conditional for surrogate creation should not contain the minimum candidate. Otherwise the minimum candidate's CMI is calculated with a conditional that is smaller by one dimension compared to the conditional used for surrogate creation. Add unit tests and change documentation.
pwollstadt · Aug 19, 2018 · 634a769 · 634a769
1 parent 3596453
commit 634a769
Show file tree

Hide file tree

Showing 10 changed files with 735 additions and 238 deletions.
diff --git a/idtxl/bivariate_mi.py b/idtxl/bivariate_mi.py
@@ -170,15 +170,22 @@ def analyse_single_target(self, settings, data, target, sources='all'):
         processes and the target process. Uses bivariate, non-uniform embedding
         found through information maximisation
 
-        MI is calculated in two steps:
+        MI is calculated in three steps:
 
-        (1) find all relevant samples in a single source processes' past, by
-            iteratively adding candidate samples that have significant
+        (1) find all relevant variables in a single source processes' past, by
+            iteratively adding candidate variables that have significant
             conditional mutual information (CMI) with the current value
-            (conditional on all samples that were added previously)
-        (2) statistics on the final set of sources (test for over-all transfer
+            (conditional on all variables that were added previously)
+        (2) prune the final conditional set for each link (i.e., each
+            process-target pairing): test the CMI between each variable in
+            the final set and the current value, conditional on all other
+            variables in the final set of the current link; treat each
+            potential source process separately, i.e., the CMI is calculated
+            with respect to already selected variables the current processes'
+            past only
+        (3) statistics on the final set of sources (test for over-all transfer
             between the final conditional set and the current value, and for
-            significant transfer of all individual samples in the set)
+            significant transfer of all individual variables in the set)
 
         Note:
             For a further description of the algorithm see references in the
@@ -263,7 +270,9 @@ class docstring.
         # Main algorithm.
         print('\n---------------------------- (1) include source candidates')
         self._include_source_candidates(data)
-        print('\n---------------------------- (2) omnibus test')
+        print('\n---------------------------- (2) prune cadidates')
+        self._prune_candidates(data)
+        print('\n---------------------------- (3) final statistics')
         self._test_final_conditional(data)
 
         # Clean up and return results.

diff --git a/idtxl/bivariate_te.py b/idtxl/bivariate_te.py
@@ -176,15 +176,22 @@ def analyse_single_target(self, settings, data, target, sources='all'):
 
         Bivariate TE is calculated in four steps:
 
-        (1) find all relevant samples in the target processes' own past, by
-            iteratively adding candidate samples that have significant
+        (1) find all relevant variables in the target processes' own past, by
+            iteratively adding candidate variables that have significant
             conditional mutual information (CMI) with the current value
-            (conditional on all samples that were added previously)
-        (2) find all relevant samples in the single source processes' pasts
-            (again by finding all candidates with significant CMI)
-        (3) statistics on the final set of sources (test for over-all transfer
+            (conditional on all variables that were added previously)
+        (2) find all relevant variables in the single source processes' pasts
+            (again by finding all candidates with significant CMI); treat each
+            potential source process separately, i.e., the CMI is calculated
+            with respect to already selected variables from the target's past
+            and from the current processes' past only
+        (3) prune the final conditional set for each link (i.e., each
+            process-target pairing): test the CMI between each variable in
+            the final set and the current value, conditional on all other
+            variables in the final set of the current link
+        (4) statistics on the final set of sources (test for over-all transfer
             between the final conditional set and the current value, and for
-            significant transfer of all individual samples in the set)
+            significant transfer of all individual variables in the set)
 
         Note:
             For a further description of the algorithm see references in the
@@ -268,7 +275,9 @@ class docstring.
         self._include_target_candidates(data)
         print('\n---------------------------- (2) include source candidates')
         self._include_source_candidates(data)
-        print('\n---------------------------- (3) omnibus test')
+        print('\n---------------------------- (3) prune cadidates')
+        self._prune_candidates(data)
+        print('\n---------------------------- (4) final statistics')
         self._test_final_conditional(data)
 
         # Clean up and return results.

diff --git a/idtxl/multivariate_te.py b/idtxl/multivariate_te.py
@@ -179,18 +179,18 @@ def analyse_single_target(self, settings, data, target, sources='all'):
         through information maximisation. Multivariate TE is calculated in four
         steps:
 
-        (1) find all relevant samples in the target processes' own past, by
-            iteratively adding candidate samples that have significant
+        (1) find all relevant variables in the target processes' own past, by
+            iteratively adding candidate variables that have significant
             conditional mutual information (CMI) with the current value
-            (conditional on all samples that were added previously)
-        (2) find all relevant samples in the source processes' pasts (again
+            (conditional on all variables that were added previously)
+        (2) find all relevant variables in the source processes' pasts (again
             by finding all candidates with significant CMI)
         (3) prune the final conditional set by testing the CMI between each
-            sample in the final set and the current value, conditional on all
-            other samples in the final set
+            variable in the final set and the current value, conditional on all
+            other variables in the final set
         (4) statistics on the final set of sources (test for over-all transfer
             between the final conditional set and the current value, and for
-            significant transfer of all individual samples in the set)
+            significant transfer of all individual variables in the set)
 
         Note:
             For a further description of the algorithm see references in the