Mathematical notation (also known as "mathematical symbol") is a collection of symbols for recording mathematical concepts and relations.

It can accurately and profoundly represent certain concepts, methods and logical relations. A sophisticated formula, if described in everyday language in stead of notation, is often very lengthy and ambiguous. Dated back to six or seven thousand years ago, symbolic figures with numeral indication were carved on pottery. Since the world entered into the 15th century, mathematical notation has witnessed a rapid development. The plus and minus signs were firstly introduced by German mathematician Widmann in 1489; French mathematician Vieta began to use the brackets, or "()" in 1591; and Benjamin Peirce from Harvard University ushered in the notation "π" in 1859. Up to date, a notation system consisting of more than 200 notations has been created. Mathematics has evolved into a universal symbolic language for scientific research, playing an important role in the vigorous advancement of modern science. So does the computing sector. The 1979 Turing Award winner, Kenneth E. Iverson argued in his award-winning speech that mathematical notation "sets one's brain free to concentrate on more advanced problems" and serves as a coherent language to effectively interpret the executability and versatility of computer programming.

Today, mathematical notation has also become the main medium for AI scholars and industry pioneers to conduct research, study and exchange on relevant theories and technologies.

However, in the emerging disciplines such as AI and machine learning, there exist inconsistency and confusion along with the thriving of mathematical notation. It depresses the need for further exchanges on the fast growing artificial intelligence in both theory and technology to some extend. With the commitment to address the problem, BAAI (Beijing Academy of Artificial Intelligence) releases the first version of general mathematical notation in machine learning ( https://notation.baai.ac.cn/). Adhering to the principle of "accuracy, self consistency and intuitiveness", this version was jointly composed by scholars from Purdue University, Institute for Advanced Study, BAAI, Peking University and Shanghai Jiao Tong University, etc. These scholars, specializing in computational mathematics, computational neuroscience, partial differential equation, deep learning theory, etc, have solicited opinions of many researchers in machine learning.

Here is the detailed illustration of mathematical notation in terms of functionality, content and designing principle.

The mathematical notation proposed by BAAI is to provide with a standardized solution to unify certain commonly used but easily confusing notations, laying out a foundation to address the following challenges: 1) The selection of conventional notations in thesis writing; 2) Communication trouble caused by chaotic notation usage. The significance of mathematical notation is denoted in the following aspects:

Expedite literature reviewing. Theoretical articles usually include a specific chapter dedicated to the usage of notations. As a result, the meanings of which are not likely to be specified again where the theorems are elaborated. Readers have to go back to the previous chapter to retrieve the notation.

Avoid misunderstanding on the contents. Some might directly jump to the theorems when overloaded with enormous reading and tend to identify the notations by individual habits. The confusion brought by unstandardized notations may lead to fundamental misunderstanding on the theorems. For example, m, n, M and N often refer to the number of neurons and samples. However, given the notation is yet to be unified, it is prone to induce misunderstanding once the notations are mess up.

Effectively enhance communication efficiency. In academic lectures, the time available for the audience to grasp the content is extremely limited. Memorizing and recognizing the notations incline to impose great burden on the audience, and may prevent the audience from catching up with or properly apprehending the context. For instance, f sometimes represents objective function. But it also indicates neural network in other scenarios. Therefore, under certain circumstances, it's extremely difficult to distinguish the meaning of f only by the context at a glance, which affects the audience's understanding eventually.

Reduce the comprehension difficulty on notation. This document helps the new researchers in machine learning to alleviate the difficulties in reading and notation selection in writing.

Below are a couple of examples showing the inconvenience produced by the multiple representations for a single concept in machine learning. We enumerate various notation options for the same research subject among the papers in the dominant fields between 2018 and 2019, such as mean field theory, neural tangent kernel theory, over-parameterized network finds global minima, frequency principle, and function space analysis. The diversified notations exert extra memory load and build up additional comprehension barriers for readers.

Mean field theory:

Met et al.， 2019， A mean field view of the landscape of two-layer neural networks Rotskoff et al., 2018, Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Sirignano et al.,, 2018, Mean Field Analysis of Neural Networks

Neural Tangent kernel theory:

Jacot et al., 2018, Neural Tangent Kernel: Convergence and Generalization in Neural Networks:

Arora et al., 2019, On Exact Computation with an Infinitely Wide Neural Net

Over-parameterized network finds global minima:

Du et al., 2018, Gradient Descent Finds Global Minima of Deep Neural Networks

Zou et al., 2018, Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Allen-Zhu et al., 2018, A Convergence Theory for Deep Learning via Over-Parameterization

Frequency perspective in study deep learning

Rahaman et al., 2018, On the Spectral Bias of Neural Networks

Xu et al., 2018, Training Behavior of Deep Neural Network in Frequency Domain

Function space analysis

E et al., 2019, A Priori Estimates For Two-layer Neural Networks

This version of mathematical notation mainly covers a series notations which are commonly used, essential and key to understanding of machine learning reports. The suggested notation includes Dataset, Function, Loss Function, Activation Function, Two-layer Neural Network, General Deep Neural Network, Complexity, Training, Fourier Frequency, Convolution, etc. In addition, LaTex codes for these are also available to the public.

Here are some examples:

Objective function and neural network are mostly concerned by designers. In view of the diversified notation preferences among researchers, we decide to adopt 𝑓 for all scenarios. The objective function is denoted as 𝑓(𝑥). With regards to neural network described by 𝑓 (𝑥) or 𝑓(𝑥, 𝜃), the parameter is identified as 𝜃. Thus we can distinguish two kinds of functions with one notation, not only concise, but saving notations for other concepts.

Loss Function is known as Loss or Risk. Given both L and R are frequent terms, we keep both notations in the end. The loss function for training in some papers is referred as \(\hat{L}\). The reason for applying ⋅̂ is that the estimator usually wears a hat in statistics. But for the sake of simplicity, we prefer leaving out the hat.

For the weight of multilayer neural network -- \(W^{[l]}\), we put the index on the top to provide room at the bottom so that \(W_{ij}^{[l]}\) is able to describe every element of \(W^{[l]}\). Using [⋅] in stead of (⋅) is to avoid the potential confusion caused by derivative if there is any.

It should be noted that the first version of mathematical notation so far doesn't cluster all of the notation definitions in machine learning. At present, some notations related to Reinforcement Learning, Generative Adversarial Network, Recurrent Neural Network, etc. have not been fully taken into account. We will continue to amplify the scope of notation along the progress in the area.

Given machine learning is a cross-disciplinary sector, notation preferences vary in different segments. In such circumstances, the basic principles of this mathematical notation are accuracy, consistency and intuitiveness. By standardizing the existing notations and incorporating users' preference with regards to mathematics and machine learning, the suggested notation is expected to facilitate the readers to recognize the notations at first sight.

Consideration is also given to the compatibility of different schools in AI arena. Taking Loss Function as an example, we keep both Loss Function (L) and Risk (R) as its description simultaneously. On the other hand, we screen out some simple notations. The estimator of training error, e.g., wears a hat in statistics, written as \(\hat{L}_n\) . But many researchers directly use \(L_n\) , which is obviously simpler without ambiguity. In this case, we choose the latter option.

To date, this mathematical notation has been distributed among some researchers in machine learning for trial use. Its applicability has been primarily recognized and verified by peers. We hope this official release may attract more people to participate in the formation of notation standards for machine learning. Please follow us for the regular updates in the future.

The LaTeX codes are available at https://github.com/Mayuyu/suggested-notation-for-machine-learning. Feedbacks through GitHub are appreciated as well.

References:

[1]Mei et al., 2019, A mean field view of the landscape of two-layer neural networks

[2]Rotskoff et al., 2018, Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

[3]Sirignano et al., 2018, Mean Field Analysis of Neural Networks

[4]Jacot et al., 2018, Neural Tangent Kernel: Convergence and Generalization in Neural Networks: [5]Arora et al., 2019, On Exact Computation with an Infinitely Wide Neural Net

[6]Du et al., 2018, Gradient Descent Finds Global Minima of Deep Neural Networks

[7]Zou et al., 2018, Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

[8]Allen-Zhu et al., 2018, A Convergence Theory for Deep Learning via Over-Parameterization

[9] Rahaman et al., 2018, On the Spectral Bias of Neural Networks

[10] Xu et al., 2018, Training Behavior of Deep Neural Network in Frequency Domain

[11] E et al., 2019, A Priori Estimates For Two-layer Neural Networks