Added pickle save() function for SpawnExperiment, updated README, set

plot-pca-all false on default, just True for SpawnExp for now.
2021-05-21 16:37:27 +02:00
parent 1e8ccd2b8b
commit bcfe5807a7
3 changed files with 51 additions and 41 deletions
--- a/README.md
+++ b/README.md
@@ -1,27 +1,26 @@
-# code ALIFE paper journal edition
+# self-rep NN paper - ALIFE journal edition

- see journal_basins.py for the "train -> spawn with noise -> train again and see where they end up" first draft. Apply noise follows the `vary` function that was used in the paper robustness test with `+- prng() * eps`. Change if desired.
- has some interesting results, but maybe due to PCA the newly spawned weights + noise get plotted quite a bit aways from the parent particle, even though the weights are similar to 10e-8?
- see journal_basins.py for an attempt at a distance matrix between the nets/end_weight_states of an experiment. Also has the position-invariant-manhattan-distance as option. (checks n^2, neither fast nor elegant ;D )
- i forgot what "sanity check for the beacon" meant but we leave that out anyway, right?
+- [x] Plateau / Pillar sizeWhat does happen to the fixpoints after noise introduction and retraining?Options beeing: Same Fixpoint, Similar Fixpoint (Basin), Different Fixpoint?Do they do the clustering thingy?
+
+    - see journal_basins.py for the "train -> spawn with noise -> train again and see where they end up" functionality. Apply noise follows the `vary` function that was used in the paper robustness test with `+- prng() * eps`. Change if desired.
+
+    - there is also a distance matrix for all-to-all particle comparisons (with distance parameter one of: `MSE`, `MAE` (mean absolute error = mean manhattan) and `MIM` (mean position invariant manhattan))


-## Some changes from cristian's code (or suggestions, rather)
+- [ ] Same Thing with Soup interactionWe would expect the same behaviour...Influence of interaction with near and far away particles.

-This is just my understanding, I might be wrong here. Just a short writeup of what I noticed from trying to implement the new experiments. 
-EDIT: I also saw that you updated your branch, so some of these things might have already been adressed.  
+- [ ] Robustness test with a trained NetworkTraining for high quality fixpoints, compare with the "perfect" fixpoint.Average Loss per application step

-1. I think, id_function is only training to reproduce the *very first weight* configuration, right? Now I see where the confusion is. But according to my understanding the selfrep networks gets trained with the task to ouput the *current weights at each training timestep*, so dynamic targets as the weight learns until it stabilizes/converges. I have changed that accordingly in the experiments to produce one input/target **per step** and train on that once (batch_size 1) for e.g. ST_step many times (not ST_many times on the inital one input/target).
+- [ ] Adjust Self Training so that it favors second order fixpoints-> Second order test implementation (?)

-2. Not sure about this one but: Train only seems to save the *output* (i.e, the prediction, not the net weight states)? Semantically this changes the 3d trajectories from the papers:
-    - "the trajectory dont change anymore because the *weights* are always the same" , ie. the backprop gradient doesnt change anything because the loss of the prediction is basically nonexistant,
-    - to "the net has learned to return the input vector 1:1 (id_function, yes) and the *output* prediction is the *same* everytime". Eventually weights == output == target_data, but we are interested in the weight states trajectory during learning and not really the output, i guess (because we know that the output will eventually converge to the right prediction of the weighs, but not how the weights develop during training to accomodate this ability). Logging target_data would be better because that is basically the weights at each step we are aiming for. Thats what i am using now at least.

-3. robustness test doesnt seem to self apply the prediction currently, it only changes the weights (apply weights ≠ self.apply), right? Thats why the Readme has the notice "never fails for the smaller values", because it only adds epsilon small enough to not destroy the fixpoint property (10e-6 onwards) and not actually tries to self-apply. If the changed weights + noise small enough = fixpoint, then it will always hold without change (i.e., without the actual self application). Also the noise is *on the input*, which is a robustness-test for id_function, yes, while the paper experiment has the noise *on the weights*. Semantically, noise on the input asks "can the same net/weights produce the same input/output even when we change the output", which of course not. But the output may be changed (small enough) that its within epsilon-degree of change and therefore not looses the fixpoint property. 
-The robustness exp in the paper tests self-application resistance, which means how much faster do the nets loose prediction-accuracy on self-application when weights get x-amount of noise. They all loose precision even without noise (see the paper, self-application is by nature a value degrading operation/predicion closer to 0-values, "easier to predict"), its "just" the visualisation of how much faster it collapses to the 0-fixpoint with different amounts of noise on weight (and since the nets sample from within their own weights, on the input as well; weights => input).
+---
+## Notes: 

-4. getting randdigit for the path destroys save-order, no? Makes finding stuff tricky. IRC thats why steffen used timestamps, they are ordered ascendingly?
+- In the spawn-experiment we now fit and transform the PCA over *ALL* trajectories, instead of each net-history by its own. This can be toggled by the `plot_pca_together` parameter in `visualisation.py/plot_3d_self_train() & plot_3d()` (default: `False` but set `True` in the spawn-experiment class).

-5. the normalize() is different from the paper, right? It gets normalized over len(state_dict) = 14, not over the positional encoding of each layer / cell / weight_value?
+- I have also added a `start_time` property for the nets (default: `1`). This is intended to be set flexibly for e.g., clones (when they are spawned midway through the experiment), such that the PCA can start the plotting trace from this timestep. When we spawn clones we deepcopy their parent's saved weight_history too, so that the PCA transforms same lenght trajectories. With `plot_pca_together` that means that clones and their parents will literally be plotted perfectly overlayed on top, up until the spawn-time, where you can see the offset / noise we apply. By setting the start_time, you can avoid this overlap and avoid hiding the parent's trace color which gets plotted first (because the parent is always added to self.nets first). **But more importantly, you can effectively zoom into the plot, by setting the parents start-time to just shy of the end of first epoch (where they get checked on fixpoint-property and spawn clones) and the start-times of clones to the second epoch. This will make the plot begin at spawn time, cutting off the parents initial trajectory and zoom-in to the action (see. `journal_basins.py/spawn_and_continue()`).**

-6. test_for_fixpoint doesnt return/or set the id_functions array? How does that work? Do you then just filter all nets with the fitting "string" property somewhere?
+- I saved the whole experiment class as pickle dump (`experiment_pickle.p`, just like cristian), hope thats fine.
+
+- I have also added a `requirement.txt` for quick venv / pip -r installs. Append as necessary.  
--- a/journal_basins.py
+++ b/journal_basins.py
@@ -1,5 +1,6 @@
 import os
 from pathlib import Path
+import pickle

 from tqdm import tqdm
 import random
@@ -51,6 +52,7 @@ def distance_matrix(nets, distance="MIM", print_it=True):


 def distance_from_parent(nets, distance="MIM", print_it=True):
+    list_of_matrices = []
    parents = list(filter(lambda x: "clone" not in x.name and is_identity_function(x), nets))
    distance_range = range(10)
    for parent in parents:
@@ -67,13 +69,16 @@ def distance_from_parent(nets, distance="MIM", print_it=True):
                    matrix[idx][dist] = MAE(parent_weights, clone_weights) < pow(10, -dist)
                elif distance in ["MIM"]:
                    matrix[idx][dist] = mean_invariate_manhattan_distance(parent_weights, clone_weights) < pow(10, -dist)
+
        if print_it:
            print(f"\nDistances from parent {parent.name} [{distance}]:")
            col_headers = [str(f"10e-{d}") for d in distance_range]
            row_headers = [str(f"clone_{i}") for i in range(len(clones))]
            print(tabulate(matrix, showindex=row_headers, headers=col_headers, tablefmt='orgtbl'))

-    return matrix
+        list_of_matrices.append(matrix)
+    
+    return list_of_matrices

 class SpawnExperiment:

@@ -115,8 +120,10 @@ class SpawnExperiment:
        self.spawn_and_continue()
        self.weights_evolution_3d_experiment()
        # self.visualize_loss()
-        distance_matrix(self.nets)
-        distance_from_parent(self.nets)
+        self.distance_matrix = distance_matrix(self.nets)
+        self.parent_clone_distances = distance_from_parent(self.nets)
+
+        self.save()

    def populate_environment(self):
        loop_population_size = tqdm(range(self.population_size))
@@ -184,7 +191,7 @@ class SpawnExperiment:

    def weights_evolution_3d_experiment(self):
        exp_name = f"ST_{str(len(self.nets))}_nets_3d_weights_PCA"
-        return plot_3d_self_train(self.nets, exp_name, self.directory, self.log_step_size)
+        return plot_3d_self_train(self.nets, exp_name, self.directory, self.log_step_size, plot_pca_together=True)

    def visualize_loss(self):
        for i in range(len(self.nets)):
@@ -193,6 +200,10 @@ class SpawnExperiment:
        plot_loss(self.loss_history, self.directory)


+    def save(self):
+        pickle.dump(self, open(f"{self.directory}/experiment_pickle.p", "wb"))
+        print(f"\nSaved experiment to {self.directory}.")
+
 if __name__ == "__main__":

    NET_INPUT_SIZE = 4
--- a/visualization.py
+++ b/visualization.py
@@ -73,7 +73,7 @@ def bar_chart_fixpoints(fixpoint_counter: Dict, population_size: int, directory:


 def plot_3d(matrices_weights_history, directory: Union[str, Path], population_size, z_axis_legend,
-            exp_name="experiment", is_trained="", batch_size=1, plot_pca_together=True):
+            exp_name="experiment", is_trained="", batch_size=1, plot_pca_together=False):
    """ Plotting the the weights of the nets in a 3d form using principal component analysis (PCA) """

    fig = plt.figure()
@@ -168,7 +168,7 @@ def plot_3d(matrices_weights_history, directory: Union[str, Path], population_si
    plt.show()


-def plot_3d_self_train(nets_array: List, exp_name: str, directory: Union[str, Path], batch_size: int):
+def plot_3d_self_train(nets_array: List, exp_name: str, directory: Union[str, Path], batch_size: int, plot_pca_together: bool):
    """ Plotting the evolution of the weights in a 3D space when doing self training. """

    matrices_weights_history = []
@@ -181,7 +181,7 @@ def plot_3d_self_train(nets_array: List, exp_name: str, directory: Union[str, Pa
        
    z_axis_legend = "epochs"

-    return plot_3d(matrices_weights_history, directory, len(nets_array), z_axis_legend, exp_name, "", batch_size)
+    return plot_3d(matrices_weights_history, directory, len(nets_array), z_axis_legend, exp_name, "", batch_size, plot_pca_together=plot_pca_together)


 def plot_3d_self_application(nets_array: List, exp_name: str, directory_name: Union[str, Path], batch_size: int) -> None: