Initial Push

This commit is contained in:
Steffen Illium
2022-02-02 12:03:31 +01:00
parent 1b7581e656
commit eb3b9b8958
3 changed files with 83 additions and 63 deletions

View File

@ -1,52 +1,48 @@
# self-rep NN paper - ALIFE journal edition # Bureaucratic Cohort Swarms
### (The Meta-Task Experience) # Deadline: 28.02.22
## Experimente
Fixpoint Tests:
- [x] Plateau / Pillar sizeWhat does happen to the fixpoints after noise introduction and retraining?Options beeing: Same Fixpoint, Similar Fixpoint (Basin), -> Dropout Test
- Different Fixpoint? (Macht das Partikel beim Goal mit oder ist es nur SRN)
Yes, we did not found same (10-5) Zero_ident diff = -00.04999637603759766 %
- Do they do the clustering thingy?
Kind of: Small movement towards (MIM-Distance getting smaller) parent fixpoint.
Small movement for everyone? -> Distribution
- see `journal_basins.py` for the "train -> spawn with noise -> train again and see where they end up" functionality. Apply noise follows the `vary` function that was used in the paper robustness test with `+- prng() * eps`. Change if desired. -> gnf(1) -> Aprox. Weight
Übersetung in ein Gewichtsskalar
-> Einbettung in ein Reguläres Netz
- there is also a distance matrix for all-to-all particle comparisons (with distance parameter one of: `MSE`, `MAE` (mean absolute error = mean manhattan) and `MIM` (mean position invariant manhattan)) (-> Übersetung in ein Explainable AI Framework)
-> Rückschlüsse auf Mikro Netze
-> Visualiserung
-> Der Zugehörigkeit
-> Der Vernetzung
- [ ] Same Thing with Soup interaction. We would expect the same behaviour...Influence of interaction with near and far away particles. -> PCA()
- -> Dataframe Epoch, Weight, dim_1, ..., dim_n
- -> Visualisierung als Trajectory Cube
- [x] Robustness test with a trained NetworkTraining for high quality fixpoints, compare with the "perfect" fixpoint. Average Loss per application step -> Recherche zu Makro Mikro Netze Strukturen
Gibts das schon?
- see `journal_robustness.py` for robustness test modeled after cristians robustness-exp (with the exeption that we put noise on the weights). Has `synthetic` bool to switch to hand-modeled perfect fixpoint instead of naturally trained ones. Hypernetwork?
arxiv: 1905.02898
- Also added two difference between the "time-as-fixpoint" and "time-to-verge" (i.e. to divergence / zero).
- We might need to consult about the "average loss per application step", as I think application loss get gradually higher the worse the weights get. So the average might not tell us much here.
- [x] Adjust Self Training so that it favors second order fixpoints-> Second order test implementation (?)
- [x] Barplot over clones -> how many become a fixpoint cs how many diverge per noise level
- [x] Box-Plot of Avg. Distance of clones from parent
- [x] Search subspace between two fixpoints by linage(10**-5), check were they end up
- [x] How are basins / "attractor areas" shaped?
# Future Todos:
- [ ] Find a statistik over weight space that provides a better init function
- [ ] Test this init function on a mnist classifier - just for the lolz
--- ---
## Notes:
- In the spawn-experiment we now fit and transform the PCA over *ALL* trajectories, instead of each net-history by its own. This can be toggled by the `plot_pca_together` parameter in `visualisation.py/plot_3d_self_train() & plot_3d()` (default: `False` but set `True` in the spawn-experiment class). Tasks für Steffen:
- Training mit kleineren GNs
- Weiter Trainieren -> 500 Epochs?
- Loss Gewichtung anpassen
- Training ohne Residual Skip Connection
- Test mit Baseline Dense Network
-> mit vergleichbaren Neuron Count
-> mit gesamt Weight Count
- Task/Goal statt SRNN-Task
- I have also added a `start_time` property for the nets (default: `1`). This is intended to be set flexibly for e.g., clones (when they are spawned midway through the experiment), such that the PCA can start the plotting trace from this timestep. When we spawn clones we deepcopy their parent's saved weight_history too, so that the PCA transforms same lenght trajectories. With `plot_pca_together` that means that clones and their parents will literally be plotted perfectly overlayed on top, up until the spawn-time, where you can see the offset / noise we apply. By setting the start_time, you can avoid this overlap and avoid hiding the parent's trace color which gets plotted first (because the parent is always added to self.nets first). **But more importantly, you can effectively zoom into the plot, by setting the parents start-time to just shy of the end of first epoch (where they get checked on fixpoint-property and spawn clones) and the start-times of clones to the second epoch. This will make the plot begin at spawn time, cutting off the parents initial trajectory and zoom-in to the action (see. `journal_basins.py/spawn_and_continue()`).** ---
- Now saving the whole experiment class as pickle dump (`experiment_pickle.p`, just like cristian), hope thats fine. Für Menschen mit zu viel Zeit:
-> Sparse Network Training der Self Replication
(Just for the lulz and speeeeeeed)
- Added a `requirement.txt` for quick venv / pip -r installs. Append as necessary. ---

View File

@ -165,7 +165,7 @@ if __name__ == '__main__':
self_train = False self_train = False
training = False training = False
plotting = False plotting = True
particle_analysis = True particle_analysis = True
as_sparse_network_test = True as_sparse_network_test = True
@ -185,22 +185,28 @@ if __name__ == '__main__':
d = DataLoader(dataset, batch_size=BATCHSIZE, shuffle=True, drop_last=True, num_workers=WORKER) d = DataLoader(dataset, batch_size=BATCHSIZE, shuffle=True, drop_last=True, num_workers=WORKER)
interface = np.prod(dataset[0][0].shape) interface = np.prod(dataset[0][0].shape)
metanet = MetaNet(interface, depth=4, width=6, out=10).to(DEVICE).train() metanet = MetaNet(interface, depth=5, width=6, out=10).to(DEVICE)
loss_fn = nn.CrossEntropyLoss() loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(metanet.parameters(), lr=0.004, momentum=0.9) optimizer = torch.optim.SGD(metanet.parameters(), lr=0.008, momentum=0.9)
train_store = new_train_storage_df() train_store = new_train_storage_df()
for epoch in tqdm(range(EPOCH), desc='MetaNet Train - Epochs'): for epoch in tqdm(range(EPOCH), desc='MetaNet Train - Epochs'):
is_validation_epoch = epoch % VALIDATION_FRQ == 0 if not debug else True is_validation_epoch = epoch % VALIDATION_FRQ == 0 if not debug else True
is_self_train_epoch = epoch % SELF_TRAIN_FRQ == 0 if not debug else True is_self_train_epoch = epoch % SELF_TRAIN_FRQ == 0 if not debug else True
metanet = metanet.train()
if is_validation_epoch: if is_validation_epoch:
metric = torchmetrics.Accuracy() metric = torchmetrics.Accuracy()
else: else:
metric = None metric = None
for batch, (batch_x, batch_y) in tqdm(enumerate(d), total=len(d), desc='MetaNet Train - Batch'): for batch, (batch_x, batch_y) in tqdm(enumerate(d), total=len(d), desc='MetaNet Train - Batch'):
if self_train and is_self_train_epoch: if self_train and is_self_train_epoch:
self_train_loss = metanet.combined_self_train(optimizer) # Zero your gradients for every batch!
optimizer.zero_grad()
self_train_loss = metanet.combined_self_train()
self_train_loss.backward()
# Adjust learning weights
optimizer.step()
step_log = dict(Epoch=epoch, Batch=batch, Metric='Self Train Loss', Score=self_train_loss.item()) step_log = dict(Epoch=epoch, Batch=batch, Metric='Self Train Loss', Score=self_train_loss.item())
train_store.loc[train_store.shape[0]] = step_log train_store.loc[train_store.shape[0]] = step_log
@ -225,6 +231,7 @@ if __name__ == '__main__':
break break
if is_validation_epoch: if is_validation_epoch:
metanet = metanet.eval()
validation_log = dict(Epoch=int(epoch), Batch=BATCHSIZE, validation_log = dict(Epoch=int(epoch), Batch=BATCHSIZE,
Metric='Train Accuracy', Score=metric.compute().item()) Metric='Train Accuracy', Score=metric.compute().item())
train_store.loc[train_store.shape[0]] = validation_log train_store.loc[train_store.shape[0]] = validation_log
@ -241,8 +248,9 @@ if __name__ == '__main__':
step_log = dict(Epoch=int(epoch), Batch=BATCHSIZE, Metric=key, Score=value) step_log = dict(Epoch=int(epoch), Batch=BATCHSIZE, Metric=key, Score=value)
train_store.loc[train_store.shape[0]] = step_log train_store.loc[train_store.shape[0]] = step_log
train_store.to_csv(df_store_path, mode='a', header=not df_store_path.exists()) train_store.to_csv(df_store_path, mode='a', header=not df_store_path.exists())
train_store = new_train_storage_df() # train_store = new_train_storage_df()
metanet.eval()
accuracy = checkpoint_and_validate(metanet, run_path, EPOCH, final_model=True) accuracy = checkpoint_and_validate(metanet, run_path, EPOCH, final_model=True)
validation_log = dict(Epoch=EPOCH, Batch=BATCHSIZE, validation_log = dict(Epoch=EPOCH, Batch=BATCHSIZE,
Metric='Test Accuracy', Score=accuracy.item()) Metric='Test Accuracy', Score=accuracy.item())
@ -254,7 +262,7 @@ if __name__ == '__main__':
plot_training_result(df_store_path) plot_training_result(df_store_path)
if particle_analysis: if particle_analysis:
model_path = next(run_path.glob('*ckpt.tp')) model_path = next(run_path.glob(f'*e{EPOCH}.tp'))
latest_model = torch.load(model_path, map_location=DEVICE).eval() latest_model = torch.load(model_path, map_location=DEVICE).eval()
counter_dict = defaultdict(lambda: 0) counter_dict = defaultdict(lambda: 0)
_ = test_for_fixpoints(counter_dict, list(latest_model.particles)) _ = test_for_fixpoints(counter_dict, list(latest_model.particles))

View File

@ -296,7 +296,7 @@ class MetaCell(nn.Module):
self.name = name self.name = name
self.interface = interface self.interface = interface
self.weight_interface = 5 self.weight_interface = 5
self.net_hidden_size = 4 self.net_hidden_size = 3
self.net_ouput_size = 1 self.net_ouput_size = 1
self.meta_weight_list = nn.ModuleList() self.meta_weight_list = nn.ModuleList()
self.meta_weight_list.extend( self.meta_weight_list.extend(
@ -413,22 +413,38 @@ class MetaNet(nn.Module):
def particles(self): def particles(self):
return (cell for metalayer in self._meta_layer_list for cell in metalayer.particles) return (cell for metalayer in self._meta_layer_list for cell in metalayer.particles)
def combined_self_train(self, external_optimizer): def combined_self_train(self):
losses = [] losses = []
for particle in self.particles: for particle in self.particles:
# Zero your gradients for every batch!
external_optimizer.zero_grad()
# Intergrate optimizer and backward function # Intergrate optimizer and backward function
input_data = particle.input_weight_matrix() input_data = particle.input_weight_matrix()
target_data = particle.create_target_weights(input_data) target_data = particle.create_target_weights(input_data)
output = particle(input_data) output = particle(input_data)
loss = F.mse_loss(output, target_data) losses.append(F.mse_loss(output, target_data))
losses.append(loss.detach) return torch.hstack(losses).sum(dim=-1, keepdim=True)
loss.backward()
# Adjust learning weights
external_optimizer.step() class MetaNetCompareBaseline(nn.Module):
# return torch.hstack(losses).sum(dim=-1, keepdim=True)
return sum(losses) def __init__(self, interface=4, depth=3, width=4, out=1, activation=None):
super().__init__()
self.activation = activation
self.out = out
self.interface = interface
self.width = width
self.depth = depth
self._meta_layer_list = nn.ModuleList()
self._meta_layer_list.append(nn.Linear(self.interface, self.width, bias=False))
self._meta_layer_list.extend([ nn.Linear(self.width, self.width, bias=False) for _ in range(self.depth - 2)])
self._meta_layer_list.append(nn.Linear(self.width, self.out, bias=False))
def forward(self, x):
tensor = x
for meta_layer in self._meta_layer_list:
tensor = meta_layer(tensor)
return tensor
if __name__ == '__main__': if __name__ == '__main__':