Mathematics as information compression via the matching and unification of patterns

This paper describes a novel perspective on the foundations of mathematics: how mathematics may be seen to be largely about 'information compression via the matching and unification of patterns' (ICMUP). ICMUP is itself a novel approach to information compression, couched in terms of non-mathematical primitives, as is necessary in any investigation of the foundations of mathematics. This new perspective on the foundations of mathematics has grown out of an extensive programme of research developing the"SP Theory of Intelligence"and its realisation in the"SP Computer Model", a system in which a generalised version of ICMUP -- the powerful concept of SP-multiple-alignment -- plays a central role. These ideas may be seen to be part of a"Big Picture"comprising six areas of interest, with information compression as a unifying theme. The paper describes the close relation between mathematics and information compression, and describes examples showing how variants of ICMUP may be seen in widely-used structures and operations in mathematics. Examples are also given to show how the mathematics-related disciplines of logic and computing may be understood as ICMUP. There are many potential benefits and applications of these ideas.


Introduction
This paper, which draws on and considerably expands some of the thinking in [1,Chapter 10], describes how much of mathematics, perhaps all of it, may be seen as a set of techniques for the compression of information, and their application.
For reasons given in Section 2.5, information compression is seen here as a process of searching for patterns that match each other and merging or 'unifying' patterns that are the same. 2 The expression 'information compression via the matching and unification of patterns' is abbreviated as 'ICMUP'. The main subject of this paper-mathematics as ICMUP-is referred to as 'MICMUP'.
This MICMUP perspective appears to be novel. It has apparently not been previously described in writings about the philosophy of mathematics, the philosophy of science, or elsewhere.
The MICMUP perspective has grown out of a programme of research developing the SP Theory of Intelligence and its realisation in the SP Computer Model [2,1], seeking to simplify and integrate observations and concepts across artificial intelligence, mainstream computing, mathematics, and human learning, perception, and cognition. The SP system is outlined in Section 2.3 and described in a little more detail in Appendix A, with pointers to where fuller information may be found.
MICMUP and the SP system may be seen to be part of a 'Big Picture' with information compression as a unifying theme, described in outline in Section 2.4.

Presentation
The next section provides some background to the main body of the paper including: the novelty of 'mathematics as information compression'; mathematics as an aid to human thinking may be seen to reflect the importance of information compression in human learning, perception, and cognition; a sketch of the SP Theory of Intelligence and its realisation in the SP Computer Model; an outline of a 'Big Picture' of which this research is a part; ICMUP as an alternative approach to information compression; and an outline of seven variants of ICMUP.
Section 3 describes, first, how information compression may be seen in the workings of mathematics, and then how variants of ICMUP may be seen in the structure and workings of mathematics. Section 4 outlines how the mathematics-related disciplines of logic and computing may be seen as ICMUP.
Section 5 discusses briefly how MICMUP is in keeping with other evidence that mathematics is fundamentally probabilistic, and how this may be reconciled with the all-or-nothing, 'exact', forms of calculation or inference that are familiar in mathematics.
Section 7 outlines some potential benefits and applications of MICMUP, and of ideas that are associated with MICMUP in the Big Picture.

Background
This section outlines some preliminaries to the sections that follow.

The novelty of the idea that mathematics may be seen as compression of information
Three recent books about the philosophy of mathematics [3,4,5] make no mention of anything resembling information compression or ICMUP. More generally, the idea that information compression might be part of the foundations of mathematics appears to be invisible in writings about the nature of mathematics.
Keith Devlin's academic book, Logic and Information [6], aims to develop a mathematical theory of information, a goal which is related to but distinct from the central idea in MICMUP, that mathematics may be seen to be largely ICMUP.
Devlin's later book for the general reader, Mathematics: The Science of Patterns [7], discusses things like "patterns of symmetry [such as] the symmetry of a snowflake or a flower" (p. 145) and computer science.
and "the patterns involved in packing objects in an efficient manner" (p. 152) which would be relatively complex kinds of pattern representing abstract concepts. But the key ideas in MICMUP, as described in this paper, are not discussed.
Amongst the several "isms" in the philosophy of mathematics-foundationism, logicism, intuitionism, formalism, Platonism, neo-Fregeanism, and more-the three which are perhaps most closely related to MICMUP are: psychologism (mathematical concepts derive from human psychology); embodied mind theories (mathematical thought is a natural outgrowth of human cognition); and intuitionism (mathematics is a creation of the human mind). This is because the latter three views are broadly consistent with the afore-mentioned evidence that much of HLPC may be understood as information compression [8]. But it appears that there is nothing like information compression or ICMUP in any of those three views or any other school of thought in the philosophy of mathematics.

Mathematics as an aid to human thinking, and evidence for Information compression in human cognition
Since mathematics may be seen as an aid to human thinking, it should not be surprising to find that it conforms to much evidence for the importance of information compression, and more specifically ICMUP, in human learning, perception, and cognition [8].

Outline of the SP Theory of Intelligence and the SP Computer Model
The ideas and arguments presented in this paper have grown out of an extensive programme of research developing the SP Theory of Intelligence and its realisation in the SP Computer Model. This subsection highlights some key ideas in this research. There is more detail about the SP system in Appendix A with pointers to where fuller information may be found.
The main aim in this research is, in accordance with Ockham's razor: simplification and integration of observations and concepts across artificial intelligence, mainstream computing, mathematics, and human leaning, perception, and cognition (HLPC), with ICMUP as a unifying theme. 3 The focus on information compression and ICMUP is because there is substantial evidence that much of HLPC may be understood in those terms [8].
An important idea in the SP system is the powerful concept of SP-multiple-alignment, borrowed and adapted from the concept of 'multiple sequence alignment' in bioinformatics, and outlined in Appendix A.1. The SP-multiple-alignment concept, which may be seen as a generalised version of ICMUP, is the key to the SP system's versatility in modelling diverse aspects of human intelligence, in the representation of diverse kinds of knowledge, and in the seamless integration of diverse aspects of intelligence and diverse forms of knowledge, in any combination. Some related issues are described briefly in Appendix C with pointers to where fuller information may be found.

The 'Big Picture'
This research may be seen to be part of a 'Big Picture' with (at least) six components: • Information compression as a foundation for mathematics. The present paper, "Mathematics as Information Compression Via the Matching and Unification of Patterns," argues that much of mathematics, perhaps all of it, may be understood in terms of ICMUP.
• Information compression as a unifying principle in science. It is widely agreed that "Science is, at root, just the search for compression in the world" [9, p. 247], with variations such as "Science may be regarded as the art of data compression" [10, p. 585].
• Information compression and concepts of inference and probability. It is known that there is an intimate relation between information compression and concepts of inference and probability (Section 5).
• Evidence for information compression as a unifying principle in HLPC. A companion to the present paper describes relatively direct empirical evidence for information compression, which in many cases may be seen as ICMUP, as a unifying principle in HLPC [8]. In view of that evidence, and since mathematics has been developed as an aid to human thinking, it should not be surprising that mathematics may be founded on information compression (Section 6).
• Information compression in the SP Theory of Intelligence. A central idea in the SP Theory of Intelligence (Section 2.3 and Appendix A) is the powerful concept of SP-multiple-alignment which may be seen to be a generalised version of ICMUP (Section 2.6.7 and [11, Appendix B]).
• Information compression in neuroscience. Because of its central role in the SP system, ICMUP is likely to prove significant in SP-Neural [12], a version of the SP theory which describes how abstract concepts in the theory may be realised in terms of neurons and their interconnections.
The six components of the Big Picture are mutually supportive in the sense that the credibility of any one of them, including the main thesis of this paper, is strengthened via empirical and analytical evidence in support of the Big Picture across all six of its components.
The significance of the Big Picture is discussed briefly in Section 7.

Information compression via the matching and unification of patterns
Readers who are acquainted with techniques for the compression of information will know that many of them, such as Huffman coding, arithmetic coding, and wavelet compression, have a mathematical flavour (see, for example, [13]). Much the same may be said about algorithmic compression in the framework of AIT [10].
Since ideas of that kind have a good pedigree and have proved their worth in many applications, one might suppose that they would be the starting point for any discussion of how mathematics may be understood in terms of information compression. But: • The SP programme of research attempts to reach down below the mathematics of other approaches, to focus on the relatively simple, 'primitive' idea that information compression may be understood in terms of the matching and unification of patterns.
• In any discussion of the fundamentals of mathematics, it would not be appropriate to use mathematics itself.
• Since ICMUP is a relatively 'concrete' idea, less abstract than much of mathematics, it suggests avenues that may be explored in understanding possible mechanisms for information compression in artificial systems and in brains and nervous systems.
ICMUP and its variants may appear too childishly simple to merit attention in any discussion of the fundamentals of mathematics. But ICMUP is bedrock in the powerful concept of SP-multiplealignment, outlined in Appendix A.1, which has proven capabilities, not only in modelling the six other versions of ICMUP described in Section 2.6 (as described in [11,Appendix B]), but also, more importantly, in modelling diverse aspects of human intelligence, the representation of diverse forms of knowledge, and the seamless integration of diverse aspects of intelligence and diverse forms of knowledge, in any combination [2,1]. And, as described in [8], ICMUP is prominent in HLPC.

Seven techniques for the compression of information via the matching and unification of patterns
While care has been taken in this programme of research to avoid unnecessary duplication of information across different publications, the importance of the following seven variants of ICMUP has made it necessary, for the sake of clarity, to describe them quite fully both in this paper and also in [8].

Basic ICMUP: information compression via the matching and unification of patterns
The simplest of the techniques to be described is to find two or more patterns that match each other within a body of information, I, and then merge or 'unify' them so that multiple instances are reduced to one. This is illustrated in the upper part of Figure 1 where two instances of the pattern 'INFORMATION' near the top of the figure has been reduced to one instance, shown in the middle of the figure, with 'w62' appended at the front, for reasons given in Section 2.6.2, below.
Here, and in subsections below, we shall assume that the single pattern which is the product of unification is placed in some kind of dictionary of patterns that is separate from I.
The version of ICMUP just described will be referred to as basic ICMUP. A detail that should not distract us from the main idea is that, when compression of a body of information, I, is to be achieved via basic ICMUP, any repeating pattern that is to be unified should occur more often in I than one would expect by chance.

Chunking-with-codes
A point that has been glossed over in describing basic ICMUP is that, when a body of information, I, is to be compressed by unifying two or more instances of a pattern like 'INFORMATION', there is a loss of information about the location within I of each instance of 'INFORMATION'. In other words, basic ICMUP achieves 'lossy' compression of I.
This problem is overcome in the chunking-with-codes variant of ICMUP: • A unified pattern like 'INFORMATION', which is often referred to as a 'chunk' of information, 4 is stored in a dictionary of patterns, as mentioned in Section 2.6.1.
• As before, the unified chunk is given a relatively short name, identifier, or 'code', like the 'w62' pattern appended at the front of the 'INFORMATION' pattern in the middle of Figure 1.
• Then the 'w62' code is used as a shorthand which replaces the 'INFORMATION' chunk of information wherever it occurs within I. This is shown at the bottom of Figure 1.
• Since the code 'w62' is shorter than each instance of the pattern 'INFORMATION' which it replaces, the overall effect is to shorten I. But, unlike basic ICMUP, chunking-with-codes achieves 'lossless' compression of I.
• A detail here is that compression can be optimised by giving shorter codes to chunks that occur frequently and longer codes to chunks that are rare. This may be done using some such scheme as Shannon-Fano-Elias coding, described in, for example, [14].

Schema-plus-correction
A variant of the chunking-with-codes version of ICMUP is called schema-plus-correction. Here, the 'schema' is like a chunk of information and, as with chunking-with-codes, there is a relatively short identifier or code that may be used to represent the chunk.
What is different about the schema-plus-correction idea is that the schema may be modified or 'corrected' in various ways on different occasions.
For example, a menu for a meal in a cafe or restaurant may be something like 'MN: ST MC PG', where 'MN' is the identifier or code for the menu, 'ST' is a variable that may take values representing different kinds of 'starter', 'MC' is a variable that may take values representing different kinds of 'main course', and 'PG' is a variable that may take values representing different kinds of 'pudding'.
With this scheme, a particular meal may be represented economically as something like 'MN: ST(st2) MC(mc5) PG(pg3)', where 'st2' is the code or identifier for 'minestrone soup', 'mc5' is the code for 'vegetable lassagne', and 'pg3' is the code for 'ice cream'; another meal may be represented economically as 'MN: ST(st6) MC(mc1) PG(pg4)', where 'st6' is the code or identifier for 'prawn cocktail', 'mc1' is the code for 'lamb shank', and 'pg4' is the code for 'apple crumble'; and so on. Here, the codes for different dishes serve as modifiers or 'corrections' to the categories 'ST', 'MC', and 'PG' within the schema 'MN: ST MC PG'.

Run-length coding
A third variant, run-length coding, may be used where there is a sequence of two or more copies of a pattern, each one except the first following immediately after its predecessor like this: In this case, the multiple copies may be reduced to one, as before, with something to say how many copies there are (eg, '(INFORMATION)×5'), or where the repetition begins and ends (eg, '[(INFORMATION)*]' where '[' and ']' are the beginning and end symbols, and '*' signifies repetition), or, more vaguely, that the pattern is repeated without anything to say when the sequence stops (eg, '(INFORMATION)*').
In a similar way, a sports coach might specify exercises as something like "touch toes (×15), push-ups (×10), skipping (×30), ..." or "Start running on the spot when I say 'start' and keep going until I say 'stop' ".
With the 'running' example, "start" marks the beginning of the sequence, "keep going" in the context of "running" means "keep repeating the process of putting one foot in front of the other, in the manner of running", and "stop" marks the end of the repeating process. It is clearly much more econonomical to say "keep going" than to constantly repeat the instruction to put one foot in front of the other.

Class-inclusion hierarchies
A widely-used idea in everyday thinking and elsewhere is the class-inclusion hierarchy: the grouping of entities into classes and the grouping of classes into higher-level classes through as many levels as are needed.
This idea may achieve ICMUP because, at each level in the hierarchy, attributes may be recorded which apply to that level and all levels below it. This can mean great economies because, for example, it is not necessary to record that cats have fur, dogs have fur, rabbits have fur, and so on-it is only necessary to record that mammals have fur and ensure that all lower-level classes and entities can 'inherit' that attribute. In effect, multiple instances of the attribute 'fur' have been merged or unified to create that attribute for mammals, thus achieving compression of information. 5 This idea may be generalised to cross-classification, where any one entity or class may belong in one or more higher-level classes that do not have the relationship superclass/subclass, one with another. For example, a given person may belong in the class 'woman' and 'doctor' although 'woman' is not a subclass of 'doctor' and vice versa.

Part-whole hierarchies
Another widely-used idea is the part-whole hierarchy in which a given entity or class of entities is divided into parts and sub-parts through as many levels as is needed. Here, ICMUP may be achieved because two or more parts of a class such as 'car' may share the overarching structure in which they all belong. So, for example, each wheel of a car, the doors of a car, the engine or a car, and so on, all belong in the same encompassing structure, 'car', and it is not necessary to repeat that enveloping structure for each individual part.

SP-multiple-alignment
The seventh version of ICMUP, the SP-multiple-alignment construct outlined in Appendix A.1, encompasses all the preceding six versions of ICMUP and much more besides.
How the preceding six versions of ICMUP may be modelled within the SP-multiple-alignment framework is described in [11,Appendix B]. The strengths and potential of the SP-multiplealignment construct in modelling aspects of human intelligence and the representation of knowledge is outlined in [16] with pointers to where fuller information may be found. The potential of this construct in modelling aspects of mathematics is described in [1, Chapter 10].

Mathematics as information compression via the matching and unification of patterns
The first step in the argument for MICMUP, depends on evidence that mathematics is fundamentally about the compression of information. The following subsections present evidence in support of this idea.
In what follows, ICMUP may be seen to have an impact on the structuring of mathematics, and on the dynamics of mathematical calculation or inference.

An example of information compression via mathematics
The equation s = (gt 2 )/2, one of several that can be derived from Newton's Second Law of Motion, is a very compact means of representing any table, including large ones, showing the distance travelled by a falling object (s) in a given time since it started to fall (t), as illustrated in Table 1. 6 The constant, g, is the acceleration due to gravity-about 9.8m/s 2 . That small equation would represent the values in the table even if it was a 1000 times or a million times bigger, and so on. Likewise for other equations such as , and so on.
To make these points, it is not strictly necessary to show Table 1. But the table helps to emphasise the contrast between the potentially huge volumes of data in such a table and the small size of the equation which describes those data-and, correspondingly, the potentially high levels of information compression that may be achieved with ordinary mathematics which is not specialised for compression of information.

How ICMUP may be seen in the structure and workings of mathematics
The subsections that follow describe how some of the basic principles and techniques for the compression of information that were outlined in Section 2.6 may be seen in the structure and workings of mathematics.
In themselves, these examples do not prove that mathematics may be understood as being entirely devoted to the compression of information. But there are reasons to think that compression of information is fundamental in mathematics: • Since the techniques to be described are low-level techniques that are part of the foundations of mathematics and widely used in more complex forms of mathematics, it seems likely that mathematics may indeed be understood in its entirety as ICMUP.
• As described in Section 4.1.1, the workings of simple logical functions, including the NAND logical function, may be understood in terms of ICMUP. Since it is widely accepted that, in principle, the computational heart of any general-purpose digital computer may be constructed entirely from NAND gates [17], it appears that, within the bounds imposed by computational complexity, ICMUP has the generality to support any kind of computation, including mathematical computations.

Basic ICMUP
The 'basic' version of ICMUP, "basic ICMUP"(Section 2.6.1) may be seen in mathematics whenever one identifier is matched with another, with implicit unification of the two.

The matching and unification of identifiers
In mathematics, ICMUP may be seen wherever there is a need to invoke a named entity. If, for example, we want to calculate the value of z from these equations: x = 4; y = 5; z = x + y, we need to match the identifier x in the third equation with the identifier x in the first equation, and to unify the two so that the correct value is used for the calculation of z. Likewise for y.
In a similar way if we wish to invoke or 'call' a function such as 'log x' (the logarithm of a number), there must be a match between the name of the function in the call to the function (such as 'log 1000' and the name of the function in its definition, 'log x'. Unification of the call to the function with the definition of the function may be seen to have the effect of assigning the number in the call (1000 in this example) to the variable x in the definition of the function.

The execution of a function
At an abstract level, any function may be seen as a table in which each row shows the connection between one or more input values and one or more output values. And simple functions, such as a one-bit adder, may be specified in exactly that way, as shown in Table 2.
In the workings of this adder, basic ICMUP may be seen, for example, in the matching and unification of input values '1' and '0' with corresponding values in 'input' columns of the table. In this case, the matches which achieve the greatest compression (both '1' and '0' in one row) will be to select the second row in the table, with the sum '1' and the carry digit '0', which are of course the correct outputs for those two inputs.

Matching and unification of patterns with Peano's axiom for natural numbers
The sixth of Peano's axioms for natural numbers-for every natural number n, S(n) is a natural number-provides the basis for a succession of numbers: S(0), S(S(0)), S(S(S(0))) ..., itself equivalent to unary numbers in which 1 = /, 2 = //, 3 = ///, and so on. Here, S at one level in the recursive definition is repeatedly matched and unified with S at the next level.

Chunking-with-codes
This subsection describes aspects mathematics that may be seen to exemplify the chunkingwith-codes technique for information compression, as described in Section 2.6.2.

Named functions
If a body of mathematics is repeated in two or more parts of something larger then it is natural to declare it once as a named 'function', where the body of the function may be seen as a 'chunk' of information, and the name of the function is its 'code' or identifier. This avoids the need to repeat that body of mathematics in two or more places.
An example of this kind of thing is the calculations needed to find the square root of a number, often provided as a ready-made square-root function with the non-alphabetic name ' √ x'. That name may be used to invoke the function wherever it is needed, like this: ' √ 16'. Similar things may be done with functions such as 'sin(x)', 'cos(x)', and 'log(x)'.
Although they are not commonly seen as 'functions', all of the operations of addition, subtraction, multiplication, the power notation, and division, may be cast in that mould as, for example, 'plus(x,y)', 'subtract(x,y)', and so on. As such, they may be seen as examples of chunking-with-codes and schema-plus-correction (Section 3.5). As we shall see in Section 3.6, they may also be seen as examples of run-length coding.

The number system
Number systems with bases greater than 1, like the binary, octal, decimal and hexadecimal number systems, may all be seen to illustrate the chunking-with-codes technique for compressing information. For example: • A unary number like '///////' may be referred to more briefly in the decimal system as '7'.
• Of course, this 'positional' system can be extended so that a digit in the third position from the right represents the number of 100s, a digit in the fourth position represents the number of 1000s, and so on.
Here, we can see how the chunking-with-codes technique allows us to eliminate the repetition or redundancy that exists in all unary numbers except '/'. This means that large numbers, like 2035723, may be expressed in a form that is very much more compact than the equivalent unary number.

Schema-plus-correction
Most functions in mathematics, like those mentioned above, are not only examples of chunkingwith-codes: they are also examples of the schema-plus-correction device for compressing information. This is because they normally require input via one or more 'arguments' or 'parameters'. For example, the square root function needs a number like 49 for it to work on. Without that number, the function is a very general 'schema' for solving square root problems. With a number like 49, which may be regarded as a 'correction' to the schema, the function becomes focussed much more narrowly on finding the square root of 49.

Run-length coding
Run-length coding appears in various forms in mathematics, often combined with other things.
The key idea is that some entity, pattern, or operation is repeated two or more times in an unbroken sequence. Here are some examples: • Since all numbers with bases above 1 may be seen to be compressed representations of unary numbers (Section 3.4.2), unary numbers may be regarded as more fundamental than nonunary numbers. If that is accepted, then, for example, '3 + 7' may be seen as a shorthand for the repeated process of transferring one unary digit from a group of seven unary digits to a group of three unary digits. Thus the expression '+7' within '3 + 7' may be seen as an example of run-length coding.
Subtraction may be interpreted in a similar way when a smaller number is subtracted from a larger number.
• Multiplication is repeated addition. So, for example, '3 × 10' is the 10-fold repetition of the operation 'x + 3', where 'x' starts with the value '0'. Then '×10' within '3 × 10' may be seen as run-length coding. Since addition is itself a form of run-length coding (as described in the preceding bullet point), multiplication may be seen as run-length coding on two levels.
• Division of a larger number by a smaller one (eg, '12/3') is repeated subtraction which, as with multiplication, may be seen as run-length coding. Of course there will be a 'remainder' if the larger number is not an exact multiple of the smaller number. As with addition as a part of multiplication, subtraction as a part of division means that division may be seen as run-length coding on two levels.
• The power notation (eg, '10 9 ') is repeated multiplication, and is thus another example of run-length coding. Since multiplication, as repeated addition, is a form of run-length coding, and since addition may be seen as run-length coding (the first bullet point above), the power notation may be seen as run-length coding on three levels!
• The bounded summation notation (eg, ' i ') and the bounded power notation (eg, ' 10 n=1 n n−1 ') are shorthands for repeated addition and repeated multiplication, respectively. In both cases, there is normally a change in the value of one or more variables on each iteration, so these notations may be seen as a combination of run-length coding and schemaplus-correction.
• In matrix multiplication, 'AB', for example, is a shorthand for the repeated operation of multiplying each entry in matrix 'A' with the corresponding entry in matrix 'B'.

Class-inclusion hierarchies
Classes and subclasses (Section 2.6.5) feature in mathematics as 'sets', both as a sometimesdisputed foundation for mathematics and as a branch of mathematics.
The notion of 'inheritance' does not have the prominence in set theory that it does in objectoriented programming, but, nevertheless, ICMUP may be seen in other concepts associated with sets, described in Section 4.1.

Part-whole hierarchies
It seems that part-whole hierarchies are not much used in mathematics, except perhaps in set theory, but, as we shall see in Section 4.2, they are quite prominent in the mathematics-related discipline of computing.

SP-multiple-alignment
Preliminary work described in [1,Chapter 10] shows that the SP system, with SP-multiplealignment centre-stage, has potential to model mathematical constructs and mathematical processes. This should not be altogether surprising since, as noted in Section 2.6.7, SP-multiplealignments can do everything that can be done with the six variants of ICMUP described in Sections 2.6.1 to 2.6.6, and it provides for their seamless integration too.
Other reasons for believing that the SP system has potential to model many and perhaps all concepts and processes in mathematics are: • The generality of information compression as a means of representing knowledge in a succinct manner.
• The central role of information compression in the SP-multiple-alignment framework.
• The versatility of the SP-multiple-alignment framework in aspects of intelligence and the representation of knowledge (Appendix A.2).
• The close connection that is known to exist between information compression and concepts of inference and probability (Section 5).

Some equations
It seems that most equations that have become established in mathematics and science may be interpreted in terms of some combination of the techniques for compressing information described in Section 2.6. Thus: • Einstein's equation, E = mc 2 , illustrates run-length coding in its power notation (c 2 ) and in the multiplication of m with c 2 .
• Newton's equation, s = (gt 2 )/2, that featured in Section 3.1, illustrates run-length coding in its power notation (t 2 ), in the multiplication of g with t 2 , and in the division of (gt 2 ) by 2.
• Pythagoras's equation, a 2 + b 2 = c 2 , illustrates run-length coding via the power notation in a 2 , b 2 , and c 2 , and via the addition of b 2 to a 2 (the first bullet point in Section 3.6).
• Boyle's law, P V = k, illustrates run-length coding in the multiplication of P by V .
• The charged particle equation, F = q(E + v × B), illustrates run-length coding in the multiplication of v by B, in the multiplication of (E + v × B) by q, and in the addition of v × B to E.
• One of special relativity's equations for time dilation, ∆t = ∆t/ 1 − v 2 /c 2 , illustrates chunking-with-codes and schema-plus-correction in its use of the square root function, and it illustrates run-length coding in the division of v 2 by c 2 , in the subtraction of v 2 /c 2 from 1, and in the division of ∆t by 1 − v 2 /c 2 .
• In its use of bounded summation ( ), Shannon's equation for entropy, H = − i p i log 2 (p i ), illustrates a combination of run-length coding and schema-plus-correction (as noted in Section 3.6). It also illustrates chunking-with-codes in its use of the log 2 notation.
Since addition, subtraction, multiplication, the power notation, and division, may each be seen as an example of chunking-with-codes and schema-plus-correction (Sections 3.4 and 3.5), as well as run-length coding (Section 3.6), the same can be said about the appearance of those notations in each of the examples above.

Mathematics-related disciplines as information compression via the matching and unification of patterns
It seems that, to a large extent, what has been said about mathematics in Section 3 also applies to the mathematically-related disciplines of logic and computing. 7 The following two subsections present some examples in support of that idea.

Logic
Subsections that follow describe some evidence for ICMUP in logic.

XOR and other logical operations
The XOR logical function, and other simple logical functions, may be defined and interpreted in much the same way as the one-bit adder shown in Table 2, as shown in Table 3.
As with the one-bit adder, the operation of the XOR function may be understood in terms of basic ICMUP. Input values such as 1 (first) and 0 (second) may be matched and unified with values in the corresponding 'input' columns of the table. With those two input values, the third row is selected because it yields most matches-which, with unification, also means the greatest compression of information. And of course the third row yields the correct output value, which in this example is 1.
There are two points of interest here: • The XOR Function and Artificial Neural Networks. As is well known, Marvin Minsky and Seymour Papert [(year?)] demonstrated that basic perceptrons of the kind that were available in the late 1960s could not produce correct results with the XOR function, a demonstration which, for a time, led to a fall in interest in artificial neural networks.
• The Generality of the NAND Logical Function. As noted in Section 3.2, the fact that the NAND logical function may, like XOR and other simple logical functions, be understood in terms of ICMUP, and the generally-accepted idea that the computational heart of any general-purpose computer may, in principle, be constructed entirely from NAND gates, provide evidence in support of the idea that compression of information is fundamental in all kinds of computation including mathematical computations.

Deriving a set from a multiset
In logic and mathematics, a 'multiset' or 'bag' is like a set but any element within the multiset may be repeated as, for example, in the multiset {a, b, a, c, b, b, c, a, c}.
Conversion of any such multiset into the corresponding set means matching each element within the multiset with every other element and, wherever a match is found, unifying the two elements, including elements that are the result of earlier unifications, thus achieving ICMUP. In this case, the multiset {a, b, a, c, b, b, c, a, c} is reduced to the set {a, b, c}.

The union and intersection of sets
In much the same way that a set may be derived from a multiset (Section 4.1.2), the union and intersection of two sets may be found by the matching and unification of elements, yielding a reduction in the overall size of the two sets when unification has been achieved. Thus, for example, the union of the sets {b, f, d, a, c, e} and {e, g, i, f, d, h} is {a, b, c, d, e, f, g, h, i}, with the intersection {d, e, f}. In accordance with ICMUP, the union is smaller than the two sets from which it was derived.

ICMUP in Prolog
Further evidence for the significance of ICMUP in logic is that systems like Prolog-a computerbased version of logic-may be seen to function largely via the matching and merging of patterns.
Here, the meaning of 'unification' in Prolog-comparing two terms to see if they can be made to represent the same structure-is quite close to the meaning of 'unification' in this paper.

Versatility in reasoning with the SP system
Since SP-multiple-alignment is a generalised form of ICMUP (Section 2.6.7), and since SPmultiple-alignment is an important part of the SP system, it is pertinent to say that the SP Computer Model demonstrates several kinds of reasoning including: one-step 'deductive' reasoning; chains of reasoning; abductive reasoning; reasoning with probabilistic networks and trees; reasoning with 'rules'; nonmonotonic reasoning and reasoning with default values; Bayesian reasoning with 'explaining away'; causal reasoning; reasoning that is not supported by evidence; the inheritance of attributes in class hierarchies; and inheritance of contexts in part-whole hierarchies ([2, Section 10], [1,Chapter 7]).
Because of the probabilistic nature of the SP system, these forms of reasoning are probabilistic, although some of them, such as one-step 'deductive' reasoning, have the all-or-nothing character of traditional forms of logic. Nevetheless, if it is accepted that logic, like mathematics, is probabilistic at a deep level-for reasons given in Section 5-then the above-mentioned strengths of the SP system in probabilistic reasoning may be seen as further evidence for the importance of ICMUP in logic.

Computing
As with logic, it seems likely that, since computing is closely related to mathematics, it may, like mathematics, be understood in terms of ICMUP. Evidence in support of that view is presented in subsections that follow.

Matching and unification of patterns in definitions of 'computing'
Emil Post's [(year?)] "Canonical System", which is recognised as a definition of 'computing' that is equivalent to a universal Turing machine, may be seen to work largely via the matching and unification of patterns [1,Chapter 4].
Much the same is true of the workings of the transition function in a universal Turing machine. This is essentially a look-up table like that shown in Table 4.
Much as with the examples described in Sections 3.3.2 and 4.1.1, ICMUP may be seen, for example, in the matching and unification of input values 's 1 ' and '1' with corresponding values in the input columns of the table. In this case, the effect will be to select the third row in the table, with the output values 's 1 ' and '«'-which mean "Set the state of the machine to 's 1 ' and move the read/write head of the machine one place to the left".
In a similar way, ICMUP may be seen in the workings of the NAND logical function which, as noted in Sections 3.2 and 4.1.1 may in principle provide the computational heart of any generalpurpose digital computer.

Some other examples of ICMUP in computing
Here, in brief, are some other putative examples of ICMUP in computing: • Basic ICMUP. As in mathematics (Section 3.3), basic ICMUP may be seen in computing in the matching of identifiers for variables and in calls to functions. • Class-Inclusion and Part-Whole Hierarchies. In computing, the creation of classes and hierarchies of classes is supported in such object-oriented programming languages as Simula, Smalltalk, C++, and many more. Part-while hierarchies are also prominent in software. In both cases, ICMUP has a role to play, much as described in Sections 2.6.5 and 2.6.6.
• Retrieving Data From Computer Memory. It is true that electronic circuits provide the mechanism for finding an address in computer memory but, at a more abstract level, the process may be seen as one of searching for a match between the address held in the CPU and the corresponding address in computer memory. When a match has been found between the address in the CPU and the corresponding address in memory, there is implicit unification of the two.
• Query-by-Example. A popular technique for retrieving information from databases, queryby-example, is essentially a process of finding good matches between a query pattern and patterns in the database, with unification of the best matches.
Input (1) Input (2) Output  Table 4: An example of a transition function in a universal Turing machine, represented as a look-up table, as described in the text. Key: '»' means "move the read/write head one place to the right"; '«' means "move the read/write head one place to the left". Based on the example in [20, Section 2], with permission.

Information compression, inference, and probabilities
The main focus of this paper is on MICMUP, but it is relevant to mention that it has been recognised for some time that there is an intimate connection between information compression and concepts of inference and probability, as described in [21], in Ray Solomonoff's Algorithmic Probability Theory (APT) [22,23], and in the closely-related AIT [10]. Information compression and concepts of inference and probability may be seen as two sides of the same coin.
The close connection between those things makes sense in terms of ICMUP (Section 2.6): • A pattern that repeats is one that, via inductive reasoning, we naturally regard as a guide to what may happen in the future.
• A pattern that repeats is one that, via the merging or unification of patterns, may yield compression of information.
• A partial match between one pattern and another can be the basis for infering the occurrence of the unmatched parts, a form of inference that is sometimes called prediction by partial matching [24].
What has this got to do with mathematics? It would take us too far afield to discuss this issue in any depth. A few brief remarks are made here. The close connection between information compression and concepts of inference and probability, and evidence for MICMUP presented in this paper, suggests that: • Notwithstanding the apparent certainty of equations like 2 + 2 = 4, mathematics may be seen to be fundamentally probabilistic.
• In view of the important role that mathematics has in the making of inferences in science and elsewhere, and notwithstanding the apparent certainties of many of those inferences, MICMUP may be seen as a driver for the making of 'exact' inferences.
Regarding the first point, a probabilistic foundation for mathematics is consistent with the discovery of randomness in number theory: "I have recently been able to take a further step along the path laid out by Gödel and Turing. By translating a particular computer program into an algebraic equation of a type that was familiar even to the ancient Greeks, I have shown that there is randomness in the branch of pure mathematics known as number theory. My work indicates thatto borrow Einstein's metaphor-God sometimes plays dice with whole numbers." [25, p. 80].
As indicated in this quotation, randomness in number theory is closely related to Gödel's incompleteness theorems. These are themselves closely related to the phenomenon of recursion, a feature of many formal systems, several of Escher's pictures, and much of Bach's music, as described in some detail by Douglas Hofstadter in Gödel, Escher, Bach: An Eternal Golden Braid [26].
Again with regard to the first point, the SP system, which is dedicated to information compression, has clear strengths in the making of uncertain inferences and the calculation of associated probabilities ([2, Section 4.4] and [1, Sections 3.7 and Chapter 7]). It seems possible that, with further development, the SP system may have strengths to rival conventional statistics.
With regard to the second point, it seems possible that, although mathematics may be fundamentally probabilistic, it may, with appropriate data or under appropriate conditions, deliver results where the associated probabilities are at or very close to 0 or 1. This kind of possibility is discussed briefly in [2, Section 6.3] and in [ • And if human thought, like other aspects of HLPC, is seen to be driven largely by processes of information compression, in accordance with evidence presented in [8], and in accordance with the principles at the heart of the SP Theory of Intelligence (Section 2.3 and Appendix A).

So what?
While it may be accepted that mathematics may be understood as information compression via the matching and unification of patterns, readers may wonder what benefits or applications there may be, if any, for MICMUP and related ideas. Here are some possibilities: • The Big Picture. The evidence and arguments in this paper provide support for the Big Picture and its six components, outlined in Section 2.4. In keeping with Ockham's Razor, the Big Picture is important in showing the potential of information compression as a unifying principle across a wide canvass.
• The development of mathematics. Since the MICMUP ideas in this paper have grown out of the SP Theory of Intelligence (Section 1), there is potential for augmenting mathematics with concepts and mechanisms from the SP System, especially SP-multiple-alignment and its associated mechanisms, and unsupervised learning via the building of grammars. Those concepts, together with the MICMUP concepts, may lead to such things as: radically new ways of creating mathematical conjectures or hypotheses (more below); a radically new approach to the proof of theorems, propositions, lemmas, and so on, via compression of information.
• A new mathematics for science. There is potential for the development of a new mathematics for science (NMFS) as outlined in Appendix Appendix B. Possibilities here include: -Extending the range of things that mathematics may do in science so that it becomes a universal framework for the representation and processing of diverse kinds of knowledge (UFK) [32, Section III] that may take over from diagrams, videos, and descriptions in natural language.
-Using mathematics as means of quantifying the Simplicity of any scientific theory, and its descriptive or explanatory Power, and thus facilitating quantitative comparisons amongst rival scientific theories.
-The automatic or semi-automatic creation of scientific theories from data [? , Section 6.10.7].
-By providing a UFK for the description and processing of related but incompatible theories such as quantum mechanics and relativity, a NMFS has the potential to help iron out inconsistencies and facilitate the integration of such theories.
• Sources of hypotheses. In view of the evidence presented in this paper and in [8], and evidence in support of the SP Theory of Intelligence (Appendix A), information compression, and more specifically ICMUP, SP-multiple-alignment and the SP Theory of Intelligence, are likely to be fertile sources of hypotheses in the study of: the foundations of mathematics, logic and computer science; concepts of inference and probability; science and scientific methods; human learning, perception, and cognition; and neuroscience.
In view of the many potential benefits and applications of the SP System (Appendix A.4), there are reasons to anticipate more such benefits and applications from MICMUP concepts and associated ideas.

Conclusion
This paper describes a novel perspective on the foundations of mathematics: how mathematics may be seen to be largely about 'information compression via the matching and unification of patterns' (ICMUP). 'Mathematics as ICMUP' may be shortened to 'MICMUP'.
ICMUP is itself a novel approach to information compression, couched in terms of nonmathematical primitives, as is necessary in any investigation of the foundations of mathematics.
This new perspective on the foundations of mathematics has grown out of an extensive programme of research developing the SP Theory of Intelligence and its realisation in the SP Computer Model, a system in which a generalised version of ICMUP-the powerful concept of SP-multiplealignment-plays a central role.
These ideas may be seen to be part of a 'Big Picture' comprising six areas of interest, with information compression as a unifying theme.
Seven variants of ICMUP are described in Section 2.6. In arguing for MICMUP, Section 3 shows first how mathematics may achieve compression of information. Then it shows how variants of ICMUP may be seen in widely-used structures and operations in mathematics.
Section 4, argues that, in a similar way, mathematics-related disciplines of logic and computing may be seen as ICMUP.
Section 5 discusses how the intimate relation between information compression and concepts of inference and probability may be seen as a driver for the making of mathematical inferences, how it relates to the already-established view that, at a fundamental level, mathematics is intrinsically probabilistic, and how that latter view may be reconciled with the all-or-nothing, 'exact', forms of calculation or inference that are familiar in mathematics.
In Section 6, it is argued that MICMUP makes sense if mathematics is seen as a form of human thought and if human thought, like other aspects of HLPC, is seen to be driven largely by processes of information compression, in accordance with much empirical evidence [8], and the principles at the heart of the SP Theory of Intelligence.
Section 7 outlines many potential benefits and applications from MICMUP concepts and associated ideas.