thoughts on deep learning

What we are doing is to model the world, with prior. Two instances illustrating this view-point are in below.

The first instance is the model for discovering the quantum mechanics, introduced in 1901.11103. Therein, they modeled the (maybe simulated) data of quantum experiments by assuming, thus a prior, that the encoding obeys a differential equation depending only on the observable, i.e., potential. (The encoding is then found as the wave-function.) The non-determined parts are all left to neural networks, functions of universality.

The second instance is the transformer for natural language process, modeling the attention phenonmenon in reality. And further, its extention, universal transformer models the repeated attention on ambiguous part of sentence. As the same, the non-determined parts are all modeled by nerual networks, as functions of universality.

The transformer, as well as its extension, thus is an imitation of human recognization of natural language process, as the prior, while leaving the non-determined parts to machine, searching the most optimized possibility.