There are a number of ways you can use the fast Walsh Hadamard Transform (WHT) for sparse systems.
Since a change in one single input element alters all the output elements one way or another it provides fully connectivity at a very low cost.
You can use it to provide full initial connectivity before a sparse net. A slight problem with that is the transform takes a spectrum of the input data. This problem can be dealt with by applying a fixed randomly chosen pattern of sign flips to the input data before calculating the transform. That results in effectively a random projection. Sub-random projections are also possible.
You can also directly use the fast WHT to create neural networks. You can view the WHT as a fixed dense layer of weighted sums with a calculation cost of nlog2(n) add subtract operations. Far cheaper than a convention dense weighted sum layer that has a cost of n squared fused multiply-add operations.
There is a big problem though…there is nothing to adjust. If you create a neural network using the WHT for the weighted sums you end up with a frozen neural network, that does something. Who knows what, but something.
The solution is to actually use individually adjustable parametric activation functions.
Then you can have a complete neural network layer for nlog2(n) add subtract operations and n multiplies using 2n parameters, for example.
I’m reluctantly on twitter too: