Going beyond ReLU

In a ReLU neural network a neuron in a layer forward connects to n weights in the next layer (one for each neuron in the next layer.)
When the weighted sum value x in the neuron is greater or equal to zero the pattern defined by the forward weights is projected onto the next layer with intensity x. When x<0 the ReLU activation function is off and it is as though the forward connected weights were not there.
However there may be a better way to deal with the x<0 case.
When x<0 you can switch to an alternative set of forward weights, again projected with intensity x.
That doubles the number of weights you need in a neural network! Quite the opposite of sparsity. Nevertheless you can still use the idea with sparse nets in various way.
The main benefit is you no longer face the absolute loss of information that happens when ReLU is off.
I made some further comments in these places:
https://discourse.numenta.org/t/relu-neural-networks-as-amplitude-modulated-dictionaries/8904
https://discourse.processing.org/t/relu-is-half-a-cookie/32134
I did provide some sample code that mixes multiple width 4 neural layers with beyond ReLU, with the fast Walsh Hadamard transform, that you could call sparse.