- arxiv.org/abs/2411.12537 arxiv.org/abs/2405.17394 Allowing non-negative gates widens the class of regular expressions, specifically for periodic state tracking (e.g. modular counting, bitstring parity) I had a similar thought back when I was first toying with linear RNNs, so I had tried tanh.Nov 25, 2024 00:03
- Regarding periodicity in general, another thing I had expiremented with at some points was just inserting periodic position encodings, inspired by arxiv.org/abs/2402.00236
- I wonder if keeping track of cumulative multipliers like in LRNNs, but then using them as a soft mask for attention, would help this in transformers. Sort of like CoPE arxiv.org/abs/2405.18719