SUMMARIES OF CERTAIN SPATIAL PATTERNS RETRIEVED FROM MULTIDATE REMOTE-SENSING DATA HEMA NAIR

This paper presents an approach to describe patterns in 
remote-sensed images utilising fuzzy logic. The truth of a 
linguistic proposition such as Y is F can be determined for 
each pattern characterised by a tuple in the database, where Y is 
the pattern and F is a summary that applies to that pattern. 
This proposition is formulated in terms of primary quantitative 
measures, such as area, length, perimeter, and so forth, of the 
pattern. Fuzzy descriptions of linguistic summaries help to 
evaluate the degree to which a summary describes a pattern or 
object in the database. Techniques, such as clustering and genetic 
algorithms, are used to mine images. Image mining is a relatively 
new area of research. It is used to extract patterns from 
multidated satellite images of a geographic area.


Introduction
The objective of this paper is to propose an approach for developing linguistic summaries of certain spatial patterns or features in remote-sensed images.A linguistic proposition of the form "Y is F" is formulated using primary quantitative measures, such as length, area, perimeter, and so forth, of the pattern.From these measures, it is possible to compute secondary measures, such as circularity ratio and degree of self-affinity exponents of Hurst and Hack (see [7]).It becomes possible then to construct a spatiotemporal model if images over several time periods of the same geographic area are available.
In the past, much research has been focused on data mining or extracting implicit patterns in relational databases [6,10,13,16], but data mining in multimedia environment has met limited success.This is mainly due to the fact that multimedia data is not as structured as relational data [17].There is also the issue of diverse multimedia types, such as images, sound, video, and so forth.While one method of data mining may find success with one type of multimedia, such as images, the same method may not be well suited to many other types of multimedia due to varying structure and content.Some related work [17] has met success.In [17], the objective is to mine internet-based images and video.The results generated could be a set of characteristic features based on a topic (keyword), a set of association rules which associate data items, a set of comparison characteristics that contrast different sets of data, or classification of data using keywords.Data mining techniques can be used in image mining [15] to classify, cluster, or associate images.Image mining is a relatively new research area with applications in many domains including space images and geological images.It can be used to extract unusual patterns, such as forest fires, from multidated satellite images of a geographic area.
This paper proposes an approach that utilises fuzzy logic to describe patterns in remote-sensed images.This method aims to extract some feature descriptors, such as area, length, and so forth, of objects in remote-sensed images and store them in a relational table.Data-mining techniques that employ clustering and genetic algorithms (GAs) are then used to develop the most suitable linguistic summary of each object stored in the table.The objective is to generate linguistic summaries of natural patterns, such as land, island, water body, river and so forth, in remote-sensed images.The approach is to use fuzzy logic to match actual image feature descriptors with feature definitions and to evolve the best-suited linguistic summary of the image object/pattern using GAs.
This paper is organised as follows.Section 2 describes the system architecture, Section 3 describes the approach, Section 4 discusses the implementation issues, and Section 5 discusses the conclusions and future work.

System architecture
The system architecture is shown in Figure 2.1.The data summarizer is the key component of the system.The input image is analysed and some feature descriptors are extracted by the image analysis component.Feature descriptors are extracted using MATLAB [5] and ENVI [4] which perform the functionality of the image analysis component.These descriptors are stored thereafter in a relational table in the database.The knowledge base uses geographic facts to define feature descriptors in a typical remote-sensed image.It interacts with a built-in library of linguistic labels.As new feature definitions are added into the knowledge base, corresponding linguistic labels are added in the built-in library.Likewise, as we see a need to expand the built-in library, we would add corresponding feature definitions based on geographic facts in the knowledge base.The built-in library also interacts with the summarizer as it supplies the necessary labels to it.The summarizer receives input from the database and the knowledge base.It performs a comparison between actual feature descriptors of the image stored in the database with the feature definitions stored in the knowledge base.After this comparison, the summarizer uses the labels supplied by the library to develop some possible linguistic summaries describing each object.From these summaries, the most suitable one is selected by interaction with the engine (GA).The GA evolves the most suitable solution which has the highest fitness value (Section 4) after several generations.This solution is passed back to the summarizer which translates it into its corresponding linguistic summary.Thus, this system is composed of two subsystems at this stage.The feature descriptor extraction using MATLAB and ENVI is a manual subsystem involving user interaction.After descriptors are extracted and stored in a relational table in the database, the automated subsystem consisting of summarizer, knowledge base, library, and engine evaluates the descriptors and compares them with feature definitions.An optimal linguistic summary of each object is then generated automatically.

Approach
The following assumptions are made regarding the data model.R is a relational table defined as A fuzzy set is the most natural representation of a linguistic variable.A linguistic variable is one whose value is not a number but a word or a sentence in a natural language [9].As we are concerned with generating linguistic summaries of objects, we will define some fuzzy sets that represent our notion of what the object description or summary should look like.
The general form of a linguistically quantified proposition is "QY's are F", where Q is a fuzzy linguistic quantifier, Y is a class of objects, and F is a summary that applies to that class.F is defined as a fuzzy set in Y. Q represents a linguistic quantifier that groups objects in the class Y.An object/pattern in the image is characterised by a single tuple in the database, therefore, Q can be ignored in this analysis.
An example of such a linguistically quantified proposition in the domain of remotesensed images would be as follows: island is moderately large.
In the above example, Y is island and F is moderately large.In terms of linguistics, this description is equivalent to: moderately large island.
The objects considered are river, expanse of water (other water body which is not river), land, and island.The attributes of the objects that are used to develop their linguistic summaries are (1) area, (2) length, (3) location in image, (4) additional information.
Area, length, and location (x-and y-coordinates in image) are extracted automatically by image analysis component in Figure 2.1.For river, the most significant feature descriptor that is extracted is its length.For land, island, and expanse of water, the most significant feature descriptor extracted is area. If then where µ F (y i ) is the degree of membership of y i in the fuzzy set F and 0 ≤ µ F (y i ) ≤ 1.The higher the degree of membership, the higher the truth value of the linguistic proposition.
In this case, referring to (3.2) and (3.3), y i could be island, area of land, expanse of water, or river.Area of land represents land other than island, expanse of water represents any water body that is not a river.For each object y i , the degree of membership of its feature descriptor, such as area or length in corresponding fuzzy sets, is calculated.Fuzzy sets for area are large, considerably large, moderately large, fairly large, and small and fuzzy sets for length are long, considerably long, relatively long, fairly long, and short.The linguistic description is calculated as follows: where m i j is the matching degree [6] of the ith attribute in the jth tuple.m i j ∈ [0,1] is a measure of degree of membership of the ith attribute value in a fuzzy set denoted by a fuzzy label.Referring to (3.4), T j thus evaluates the truth value for each object y i , as it matches the feature descriptors of that object with fuzzy set definitions by calculating the matching degrees and combining them together using logical AND operator.The logical AND (∧) of matching degrees is calculated as the minimum of the matching degrees [6].
Equation (3.5) means that we calculate the conjunction of only those matching degrees that are nonzero in order to evaluate T j .This aids in computational efficiency.All such T j 's are added up to evaluate T. T is a numeric value that represents the truth of a possible set of summaries of all the objects in the database.
As an example, consider the area as the attribute in the single-attribute case.Possible fuzzy labels are large, considerably large, moderately large, fairly large, and small.If k = 4 (there are 4 objects in the table), Hema Nair 291 where m 1 j represents the fuzzy membership value of area in fuzzy sets large, considerably large, moderately large, fairly large, or small, Equation (3.7) means that each object has a nonzero fuzzy membership value for its area in any of the fuzzy sets mentioned above; that membership value is added cumulatively to the membership value calculated similarly for the next object in the table.Thus T is evaluated as the sum of membership values (nonzero) of the area of all 4 objects in the table.The next section discusses how the GA evolves the most suitable linguistic summary for all the objects by maximising T.

Implementation issues
This section explains the GA approach and then discusses the results from applying this approach to mining images.

GA approach.
A GA emulates biological evolutionary theories as it attempts to solve optimisation problems.The GA comprises a set of individual elements (the population) and a set of biologically inspired operators, such as selection, crossover, and mutation.According to evolutionary theories, only the most suited elements in a population are likely to survive and generate offspring, thus transmitting their biological heredity to new generations.In computing terms, a GA maps a problem onto a set of binary strings (the population), each representing a potential solution.Using selection, crossover, and mutation operators, the GA then manipulates the most promising strings (denoted by their high-fitness value from the evaluation function), as it searches for the best solution to the problem [2,3,14].
Given n attributes, each having m possible fuzzy labels, it is possible to generate m n + 1 descriptions.The GA searches for an optimal solution among these descriptions.Each of these summaries is represented by a uniquely coded chromosome string (a string of 0's and 1's).The population of such strings is manipulated (using selection, crossover, and mutation operators [14]) and evaluated by the GA, and the most suitable linguistic summary that fits each object is generated.The evaluation function for the linguistic summaries or descriptions of all objects in the table is where T in (4.1) is evaluated as shown in the previous section and f is the maximum fitness value of a particular set of linguistic summaries that have evolved over several generations of the GA.

Results.
In general, image objects are classified at the highest level into land and water.Land is further classified into island and other land.Water is further classified into river (characterised by its length) and other water body (characterised by area).Some of the fuzzy sets being considered are (1) for land: large, considerably large, moderately large, fairly large, and small, based on degree of membership of area of the land in the respective fuzzy sets, (2) for other water body or expanse of water (except river): large, considerably large, moderately large, fairly large, and small, based on degree of membership of area of the water body in the respective fuzzy sets, (3) for river: long, considerably long, relatively long, fairly long, and short, based on degree of membership of length of the river in the respective fuzzy sets.
These fuzzy sets are defined based on geographic facts, such as (i) largest continent is Asia with area of 44579000 km 2 , (ii) largest freshwater lake is Lake Superior with area of 82103 km 2 , (iii) smallest continent is Australia/Oceania with area of 7687000 km 2 , (iv) longest river is the Nile with length 6669 km, (v) shortest river is the Roe with length 0.037 km.
The fuzzy set for large expanse of water [11,12] is defined in (4.2) referring to Figure 4.1(a) and the geographic fact describing the size of the largest freshwater lake, where x 1 = 79900 km 2 , x 2 = 82103 km 2 , The fuzzy set for considerably large expanse of water is defined in (4.3) referring to Figure 4.2, where x 1 = 28034.33km 2 , x 2 = 55068.66km 2 , x 3 = 82103 km 2 , µ considerably large expanse of water (x)      The fuzzy set for small expanse of water is defined in (4.6) referring to Figure 4.1(b): The set for small area of land is defined in (4.7) referring to Figure 4.1(b): (i) µ large expanse of water (6.683) = 0, (ii) µ considerably large expanse of water (6.683) = 0, (iii) µ moderately large expanse of water (6.683) = 0, (iv) µ fairly large expanse of water (6.683) = 0, (v) µ small expanse of water (6.683) = 1.Thus this pattern can be described as a small expanse of water.
Similarly, for the other object in Table 4.1, the truth values are evaluated.The fuzzy label with highest truth value is selected to form the most suitable linguistic summary for the corresponding object/pattern.
An example pair of Spot multispectral images to be analysed is shown in Figures 4.3 and 4.4.These are subimages of the original images, used here due to file-size limitations.The geographic coordinates of the original images are approximately 3 • 17 U-3 • 48 U latitude and 100 • 58 T-101 • 38 T longitude referring to the topographic map.The scale of the images is approximately 1 : 0.0003764.This means that 1 pixel square represents 0.0003764 km 2 .Table 4.1 shows a small sample data set of feature descriptors from some of the objects in the image (Figure 4.3).Figure 4.4 shows an image of the same geographic area taken on a later date.Table 4.2 shows a small sample data set of feature descriptors from some of the objects in the image (Figure 4.4).Area is in km 2 and length is in km.Additional information attribute denotes numbers as follows: 0 = river, 1 = other water body, 2 = island, 3 = land, and 4 = fire.Location indicates xand ycoordinates of centroid of object.X,Y = 0 indicates the remaining part of image as location.The grey level values are from the R band as this band shows all the patterns clearly.For objects where area is considered as the most significant parameter in calculations, their length is set to 0. Fire is considered as a separate pattern.
The objective of this paper is to describe patterns/objects, such as river, land, island, expanse of water, and so forth, quantitatively in terms of measures, such as area or length.(1) If a pattern is to be classified as an island, it should have a water envelope surrounding it such that it has a uniform band ratio of at least eight points on this envelope (corresponding to directions E, W, N, S, NE, NW, SE, SW).Also, greylevel values on the envelope could be lower than the grey-level values on the pattern.

Hema Nair 295
(2) If a pattern does not have an envelope in all directions as described in the first rule above, then it is classified as land.(3) If a pattern is to be classified as water body (or other water body or expanse of water), it is necessary that it should have a uniform band ratio.
The above rules hold for multiband images.
Comparing the images in Figures 4.     Fire is identified in Table 4.2 with additional information attribute equal to 4. Future work (Section 5) will focus on developing rules for classifying river.
The spatial location attribute in Tables 4.1 and 4.2 is given a linguistic value, such as centre, left, top, left, and so forth, using the following calculation.Centre span is a variable defined to denote a circular distance around the xand y-coordinates of the centre of an image.The value of centre span may vary from image to image as it is subjective.It is a number that is obtained by measuring the distance around the centre of the image, which can be used to denote an area that still represents the centre of the overall image.This value is evaluated by user interaction with the image.All objects, whose centroids [1] lie within the range of centre span from the centre of the image, are still located at the centre of the image.If the difference between xand y-coordinates of the centroid of the object and the centre of the image is greater than centre span, then the object is located at lower right (diagonally from image centre).If the reverse is true, then the object is located at top left (diagonally from image centre).If the difference between the x-coordinate of the object and the x-coordinate of image centre is greater than centre span and the difference between the y-coordinate of image centre and the y-coordinate of centroid of the object is greater than centre span, then the object is located at the top right of the image.Similar calculations are used to evaluate the locations lower left, right, left, top, and bottom of image.An xand y-coordinate of 0, 0 evaluates the location as remainder of image, because the actual coordinates of the image have an origin greater than 0, 0. It is to be noted that patterns, such as urban area settlements, are ignored as trivial in this analysis.The main concerns are natural patterns, such as water bodies, land, island, and so forth.Additionally, patterns that signal calamities, such as fire, are also extracted and described.The linguistic summaries are generated with reference to the scale of land and water defined in the geographic facts from which the fuzzy sets are developed, even though the area of land in the images may be large compared to the expanse of water.
The GA is run with the following input parameter set.These parameter values are set after several trial runs.With other values, the GA produces the summary of only one object/pattern in the table: (1) number of bits in a chromosome string of the population = 10, (2) generations per cycle = 26, (3) population size = 200 strings, (4) probability of cross-over = 0.53, (5) probability of mutation = 0.001.
After 208 generations, the linguistic summaries generated from the image in Figure 4.3 (no fire) are (i) a small area of land at the centre, (ii) a small expanse of water at the top right.
The GA input parameters are varied to obtain the linguistic summaries of patterns in Table 4.2.The parameters used are (1) number of bits in a chromosome string of the population = 10, (2) generations per cycle = 10, (3) population size = 200 strings, (4) probability of cross-over = 0.53, (5) probability of mutation = 0.001.
After 80 generations, the linguistic summaries generated from the image in Figure 4.4 are (i) bluish white smoke indicating fire at the left, (ii) a small expanse of water at the top right, (iii) a small area of land at the centre.
Thus, comparing the results of the GA after mining the images of the same geographic area without fire and with fire taken on two dates separated by a period of more than three years, we can see that the GA can correctly describe an unusual pattern, such as the fire indicated in the image in Figure 4.4.Referring to the corresponding topographic map, it is possible to conclude that this fire could be the result of burning in a paddy field or a nearby primary forest.
Thus, with two attributes such as length and area, each having five possible fuzzy labels, it is possible to generate 5 2 + 1 descriptions.The GA has searched for an optimal solution among these descriptions within a very short time.

Conclusions and future work
This paper has presented a new approach to describing images using linguistic summaries that use fuzzy labels.A GA technique has been employed to evolve the most suitable linguistic summary that describes each object/pattern in the database.This method can be extended to an array of images of the same geographic area, taken over a period of several years, to describe many other interesting and unusual patterns that emerge over time.Some directions for future work include the following.
(1) Development of a user friendly tool with graphical interface to ease the task of extracting and calculating feature descriptors, such as area, length, grey-level intensity, colour, and so forth, stored in the tables.Currently, both MATLAB and ENVI are required in order to populate the tables.Each has its own limitations.

Figure 4 . 1 .
Figure 4.1.(a) Fuzzy set for large.(b) Fuzzy set for small or short.

Figure 4 . 5 .
Figure 4.5.Indications of the smoke plume prominently in purple and the associated burnt scar in dark green in the green slice range (32 to 63) or blue slice range (64 to 95).

3
and 4.4, it is noted that there are a few major changes in the spatial patterns.A fire is indicated prominently in Figure 4.4 on the left side of the image.Fire is considered as a separate pattern characterised by its bluish white smoke plume and burnt scar area close to it.These patterns become clearly visible if density slicing is performed on the image in the R-band.The colour image of Figure 4.5 indicates the smoke plume prominently in purple and the associated burnt scar in dark green in the green slice range (32 to 63) or blue slice range (64 to 95).Histograms of the area near the fire (Figures 4.6 and 4.7) corresponding to Figures 4.

3
and 4.4 also indicate that most of the pixels are of lower intensity for the burnt scar area from the fire image (Figure 4.4) when compared with the same area without fire (Figure 4.3).

)
A 1 ,A 2 ,...,A n are the attributes in the table R (i.e., the columns of the relational table) and t 1 ,t 2 ,...,t k are the tuples or records or entries in the table R (i.e., the rows of the relational table).
The fuzzy set for moderately large expanse of water is defined in (4.4) referring to Figure4.2, where x 1 = 1000 km 2 , x 2 = 28034.33km 2 , x 3 = 55068.66km 2 , Consider the data in Table4.1.For the second tuple which is an expanse of water with approximate area 6.683 km 2 , the possible fuzzy labels are large ex- panse of water, considerably large expanse of water, moderately large expanse of water, fairly large expanse of water, or small expanse of water.The truth value (fuzzy membership value) for each of these cases is evaluated as shown below by substituting x = 6.683 in (4.2), (4.3), (4.4), (4.5), and (4.6), respectively: