Initial Push
This commit is contained in:
76
README.md
76
README.md
@ -1,52 +1,48 @@
|
||||
# self-rep NN paper - ALIFE journal edition
|
||||
# Bureaucratic Cohort Swarms
|
||||
### (The Meta-Task Experience) # Deadline: 28.02.22
|
||||
## Experimente
|
||||
Fixpoint Tests:
|
||||
|
||||
- [x] Plateau / Pillar sizeWhat does happen to the fixpoints after noise introduction and retraining?Options beeing: Same Fixpoint, Similar Fixpoint (Basin),
|
||||
- Different Fixpoint?
|
||||
Yes, we did not found same (10-5)
|
||||
- Do they do the clustering thingy?
|
||||
Kind of: Small movement towards (MIM-Distance getting smaller) parent fixpoint.
|
||||
Small movement for everyone? -> Distribution
|
||||
-> Dropout Test
|
||||
(Macht das Partikel beim Goal mit oder ist es nur SRN)
|
||||
Zero_ident diff = -00.04999637603759766 %
|
||||
|
||||
- see `journal_basins.py` for the "train -> spawn with noise -> train again and see where they end up" functionality. Apply noise follows the `vary` function that was used in the paper robustness test with `+- prng() * eps`. Change if desired.
|
||||
-> gnf(1) -> Aprox. Weight
|
||||
Übersetung in ein Gewichtsskalar
|
||||
-> Einbettung in ein Reguläres Netz
|
||||
|
||||
- there is also a distance matrix for all-to-all particle comparisons (with distance parameter one of: `MSE`, `MAE` (mean absolute error = mean manhattan) and `MIM` (mean position invariant manhattan))
|
||||
(-> Übersetung in ein Explainable AI Framework)
|
||||
-> Rückschlüsse auf Mikro Netze
|
||||
|
||||
-> Visualiserung
|
||||
-> Der Zugehörigkeit
|
||||
-> Der Vernetzung
|
||||
|
||||
- [ ] Same Thing with Soup interaction. We would expect the same behaviour...Influence of interaction with near and far away particles.
|
||||
-
|
||||
-
|
||||
-> PCA()
|
||||
-> Dataframe Epoch, Weight, dim_1, ..., dim_n
|
||||
-> Visualisierung als Trajectory Cube
|
||||
|
||||
- [x] Robustness test with a trained NetworkTraining for high quality fixpoints, compare with the "perfect" fixpoint. Average Loss per application step
|
||||
|
||||
- see `journal_robustness.py` for robustness test modeled after cristians robustness-exp (with the exeption that we put noise on the weights). Has `synthetic` bool to switch to hand-modeled perfect fixpoint instead of naturally trained ones.
|
||||
|
||||
- Also added two difference between the "time-as-fixpoint" and "time-to-verge" (i.e. to divergence / zero).
|
||||
|
||||
- We might need to consult about the "average loss per application step", as I think application loss get gradually higher the worse the weights get. So the average might not tell us much here.
|
||||
|
||||
- [x] Adjust Self Training so that it favors second order fixpoints-> Second order test implementation (?)
|
||||
|
||||
- [x] Barplot over clones -> how many become a fixpoint cs how many diverge per noise level
|
||||
|
||||
- [x] Box-Plot of Avg. Distance of clones from parent
|
||||
|
||||
- [x] Search subspace between two fixpoints by linage(10**-5), check were they end up
|
||||
|
||||
- [x] How are basins / "attractor areas" shaped?
|
||||
|
||||
|
||||
# Future Todos:
|
||||
|
||||
- [ ] Find a statistik over weight space that provides a better init function
|
||||
- [ ] Test this init function on a mnist classifier - just for the lolz
|
||||
-> Recherche zu Makro Mikro Netze Strukturen
|
||||
Gibts das schon?
|
||||
Hypernetwork?
|
||||
arxiv: 1905.02898
|
||||
|
||||
---
|
||||
## Notes:
|
||||
|
||||
- In the spawn-experiment we now fit and transform the PCA over *ALL* trajectories, instead of each net-history by its own. This can be toggled by the `plot_pca_together` parameter in `visualisation.py/plot_3d_self_train() & plot_3d()` (default: `False` but set `True` in the spawn-experiment class).
|
||||
Tasks für Steffen:
|
||||
- Training mit kleineren GNs
|
||||
- Weiter Trainieren -> 500 Epochs?
|
||||
- Loss Gewichtung anpassen
|
||||
- Training ohne Residual Skip Connection
|
||||
- Test mit Baseline Dense Network
|
||||
-> mit vergleichbaren Neuron Count
|
||||
-> mit gesamt Weight Count
|
||||
- Task/Goal statt SRNN-Task
|
||||
|
||||
- I have also added a `start_time` property for the nets (default: `1`). This is intended to be set flexibly for e.g., clones (when they are spawned midway through the experiment), such that the PCA can start the plotting trace from this timestep. When we spawn clones we deepcopy their parent's saved weight_history too, so that the PCA transforms same lenght trajectories. With `plot_pca_together` that means that clones and their parents will literally be plotted perfectly overlayed on top, up until the spawn-time, where you can see the offset / noise we apply. By setting the start_time, you can avoid this overlap and avoid hiding the parent's trace color which gets plotted first (because the parent is always added to self.nets first). **But more importantly, you can effectively zoom into the plot, by setting the parents start-time to just shy of the end of first epoch (where they get checked on fixpoint-property and spawn clones) and the start-times of clones to the second epoch. This will make the plot begin at spawn time, cutting off the parents initial trajectory and zoom-in to the action (see. `journal_basins.py/spawn_and_continue()`).**
|
||||
---
|
||||
|
||||
- Now saving the whole experiment class as pickle dump (`experiment_pickle.p`, just like cristian), hope thats fine.
|
||||
Für Menschen mit zu viel Zeit:
|
||||
-> Sparse Network Training der Self Replication
|
||||
(Just for the lulz and speeeeeeed)
|
||||
|
||||
- Added a `requirement.txt` for quick venv / pip -r installs. Append as necessary.
|
||||
---
|
||||
|
@ -165,7 +165,7 @@ if __name__ == '__main__':
|
||||
|
||||
self_train = False
|
||||
training = False
|
||||
plotting = False
|
||||
plotting = True
|
||||
particle_analysis = True
|
||||
as_sparse_network_test = True
|
||||
|
||||
@ -185,22 +185,28 @@ if __name__ == '__main__':
|
||||
d = DataLoader(dataset, batch_size=BATCHSIZE, shuffle=True, drop_last=True, num_workers=WORKER)
|
||||
|
||||
interface = np.prod(dataset[0][0].shape)
|
||||
metanet = MetaNet(interface, depth=4, width=6, out=10).to(DEVICE).train()
|
||||
metanet = MetaNet(interface, depth=5, width=6, out=10).to(DEVICE)
|
||||
|
||||
loss_fn = nn.CrossEntropyLoss()
|
||||
optimizer = torch.optim.SGD(metanet.parameters(), lr=0.004, momentum=0.9)
|
||||
optimizer = torch.optim.SGD(metanet.parameters(), lr=0.008, momentum=0.9)
|
||||
|
||||
train_store = new_train_storage_df()
|
||||
for epoch in tqdm(range(EPOCH), desc='MetaNet Train - Epochs'):
|
||||
is_validation_epoch = epoch % VALIDATION_FRQ == 0 if not debug else True
|
||||
is_self_train_epoch = epoch % SELF_TRAIN_FRQ == 0 if not debug else True
|
||||
metanet = metanet.train()
|
||||
if is_validation_epoch:
|
||||
metric = torchmetrics.Accuracy()
|
||||
else:
|
||||
metric = None
|
||||
for batch, (batch_x, batch_y) in tqdm(enumerate(d), total=len(d), desc='MetaNet Train - Batch'):
|
||||
if self_train and is_self_train_epoch:
|
||||
self_train_loss = metanet.combined_self_train(optimizer)
|
||||
# Zero your gradients for every batch!
|
||||
optimizer.zero_grad()
|
||||
self_train_loss = metanet.combined_self_train()
|
||||
self_train_loss.backward()
|
||||
# Adjust learning weights
|
||||
optimizer.step()
|
||||
step_log = dict(Epoch=epoch, Batch=batch, Metric='Self Train Loss', Score=self_train_loss.item())
|
||||
train_store.loc[train_store.shape[0]] = step_log
|
||||
|
||||
@ -225,6 +231,7 @@ if __name__ == '__main__':
|
||||
break
|
||||
|
||||
if is_validation_epoch:
|
||||
metanet = metanet.eval()
|
||||
validation_log = dict(Epoch=int(epoch), Batch=BATCHSIZE,
|
||||
Metric='Train Accuracy', Score=metric.compute().item())
|
||||
train_store.loc[train_store.shape[0]] = validation_log
|
||||
@ -241,8 +248,9 @@ if __name__ == '__main__':
|
||||
step_log = dict(Epoch=int(epoch), Batch=BATCHSIZE, Metric=key, Score=value)
|
||||
train_store.loc[train_store.shape[0]] = step_log
|
||||
train_store.to_csv(df_store_path, mode='a', header=not df_store_path.exists())
|
||||
train_store = new_train_storage_df()
|
||||
# train_store = new_train_storage_df()
|
||||
|
||||
metanet.eval()
|
||||
accuracy = checkpoint_and_validate(metanet, run_path, EPOCH, final_model=True)
|
||||
validation_log = dict(Epoch=EPOCH, Batch=BATCHSIZE,
|
||||
Metric='Test Accuracy', Score=accuracy.item())
|
||||
@ -254,7 +262,7 @@ if __name__ == '__main__':
|
||||
plot_training_result(df_store_path)
|
||||
|
||||
if particle_analysis:
|
||||
model_path = next(run_path.glob('*ckpt.tp'))
|
||||
model_path = next(run_path.glob(f'*e{EPOCH}.tp'))
|
||||
latest_model = torch.load(model_path, map_location=DEVICE).eval()
|
||||
counter_dict = defaultdict(lambda: 0)
|
||||
_ = test_for_fixpoints(counter_dict, list(latest_model.particles))
|
||||
|
38
network.py
38
network.py
@ -296,7 +296,7 @@ class MetaCell(nn.Module):
|
||||
self.name = name
|
||||
self.interface = interface
|
||||
self.weight_interface = 5
|
||||
self.net_hidden_size = 4
|
||||
self.net_hidden_size = 3
|
||||
self.net_ouput_size = 1
|
||||
self.meta_weight_list = nn.ModuleList()
|
||||
self.meta_weight_list.extend(
|
||||
@ -413,22 +413,38 @@ class MetaNet(nn.Module):
|
||||
def particles(self):
|
||||
return (cell for metalayer in self._meta_layer_list for cell in metalayer.particles)
|
||||
|
||||
def combined_self_train(self, external_optimizer):
|
||||
def combined_self_train(self):
|
||||
losses = []
|
||||
for particle in self.particles:
|
||||
# Zero your gradients for every batch!
|
||||
external_optimizer.zero_grad()
|
||||
# Intergrate optimizer and backward function
|
||||
input_data = particle.input_weight_matrix()
|
||||
target_data = particle.create_target_weights(input_data)
|
||||
output = particle(input_data)
|
||||
loss = F.mse_loss(output, target_data)
|
||||
losses.append(loss.detach)
|
||||
loss.backward()
|
||||
# Adjust learning weights
|
||||
external_optimizer.step()
|
||||
# return torch.hstack(losses).sum(dim=-1, keepdim=True)
|
||||
return sum(losses)
|
||||
losses.append(F.mse_loss(output, target_data))
|
||||
return torch.hstack(losses).sum(dim=-1, keepdim=True)
|
||||
|
||||
|
||||
class MetaNetCompareBaseline(nn.Module):
|
||||
|
||||
def __init__(self, interface=4, depth=3, width=4, out=1, activation=None):
|
||||
super().__init__()
|
||||
self.activation = activation
|
||||
self.out = out
|
||||
self.interface = interface
|
||||
self.width = width
|
||||
self.depth = depth
|
||||
|
||||
self._meta_layer_list = nn.ModuleList()
|
||||
|
||||
self._meta_layer_list.append(nn.Linear(self.interface, self.width, bias=False))
|
||||
self._meta_layer_list.extend([ nn.Linear(self.width, self.width, bias=False) for _ in range(self.depth - 2)])
|
||||
self._meta_layer_list.append(nn.Linear(self.width, self.out, bias=False))
|
||||
|
||||
def forward(self, x):
|
||||
tensor = x
|
||||
for meta_layer in self._meta_layer_list:
|
||||
tensor = meta_layer(tensor)
|
||||
return tensor
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
Reference in New Issue
Block a user