Why am I writing this post?
A few years ago I made a tweet about replicating the style of a vintage map in ArcGIS Pro. It was a map depicting England and Wales, that I had found in the David Rumsey Map Collection, drawn in 1827 by Anthony Finley.
What I had liked and still like about this map is its simplicity in the use of color. Even though it is showing over fifty neighboring administrative areas, it only applies five distinct colors to separate them from one another. This in a nutshell is the concept behind the Four or Five Color Theorem; that only four or five colors are sufficient to symbolize a polygon dataset on a map in such a way that not any feature shares the same color with its neighbors.
Recently, I was developing an Arcade expression (also burning brain cells) to calculate color classes for a polygon dataset, so that when symbolized with Unique Values no adjacent features have the same color. While writing the expression and to my surprise I found that such a tool, with the name ColorTools, already existed, available for download from ArcGIS Online!
Having explored two distinct methods to apply the Four or Five Color Theorem – a ready-made tool and a custom Arcade expression – I decided to write this post to share them both!
So, in the following paragraphs, I first give a few boring history facts (all stolen from the internet) and a personal opinion, then I describe how I prepare maps and datasets, I then go through the ColorTools toolbox approach or how to apply the Four or Five color Theorem easily and then I am reinventing the wheel trying to achieve the Four or Five Color Theorem with Arcade!
I have uploaded a Map Package for ArcGIS Pro on my ArcGIS Online account here with all the maps and data and tools and expressions that will be described, so feel free to download and use it under a CC BY-NC-SA 4.0 license.
A few boring history facts (all stolen from the internet) and a personal opinion
The Four Color Theorem and the Five Color Theorem are two foundational results in graph theory, particularly in the field of topology. They are both concerned with the minimum number of colors needed to color a map in such a way that no two adjacent regions (countries, states, etc.) share the same color. While they are closely related, they differ in terms of their history, difficulty of proof, and implications.
The Four Color Theorem states that any map drawn on a plane or the surface of a sphere can be colored using at most four distinct colors, such that no two adjacent regions share the same color. This theorem, first conjectured in 1852 by Francis Guthrie, was one of the most famous and enduring puzzles in mathematics. Guthrie, while trying to color the map of England, noticed that four colors seemed sufficient to color the regions in such a way that no two adjacent regions had the same color. Despite its apparent simplicity, proving this statement rigorously took over a century of mathematical effort.
Many prominent mathematicians attempted to prove the theorem, and several incorrect proofs were published before a valid proof was finally found. The breakthrough came in 1976 when Kenneth Appel and Wolfgang Haken provided the first successful proof using a combination of traditional mathematical techniques and computer algorithms. This was significant because it was one of the first major theorems to be proved using a computer, which sparked debates about the nature of mathematical proof. Critics argued that the reliance on a computer made the proof less elegant or transparent, but it has since been widely accepted.
The Five Color Theorem is a simpler and earlier result. It states that any map can be colored with at most five colors such that no two adjacent regions share the same color. This theorem was proven by Percy Heawood in 1890, long before the Four Color Theorem was resolved. The Five Color Theorem is often seen as a stepping stone to the more difficult Four Color Theorem.
Heawood’s proof of the Five Color Theorem was far simpler than the Four Color Theorem’s, relying on classical mathematical reasoning and not requiring the use of computers. While Heawood himself sought to solve the four-color problem, his work only succeeded in reducing the problem to five colors. Nevertheless, his result was crucial in showing that at least five colors were always sufficient, providing partial progress towards the more elusive four-color problem.
Confused?
So which one of the two should one use on a map? From my experience I have found that both theorems depend on the complexity of the polygon dataset about to be colored. In less complex datasets with a relatively small number of features with not many neighbors, the Four Color Theorem seems to suffice. As the complexity of a dataset increases the number of the necessary color classes also increases, so in more complex datasets I have found that the Five Color Theorem is suitable. In even more complex datasets I have found that six or seven or even more color classes are needed to color all features in a way that none of them share the same color with their neighbors.
This is the reason why I gave this title “The Four or Five or Whatever Color Theorem“. Because the number of color classes depends on the complexity of the features.
The first method I describe here applies strictly the Five Color Theorem (assigns only five color classes), while the one I have made with Arcade is more flexible and it can apply all theorems and with more that five color classes, depending on the complexity of the dataset.
Prepare maps and datasets
Let’s turn to present tense and illustrate step-by-step the process. First of all, I have to download the dataset with which I will work. I will use the Municipal Units 2021 polygons provided by the Hellenic Statistical Authority in shapefile format. If that webpage is Greek to you, you may directly download that specific dataset from here.
I download the dataset, unzip it in a folder and then add it on a newly created map in ArcGIS Pro. As shown in Picture 1, the map zooms to the dataset’s extent, which covers the entire country, and its coordinate system automatically turns to that of the dataset, which is the Hellenic Geodetic Reference System 1987, or GGRS87, or Greek Grid (EPSG:2100).
The next step is to save the shapefile as a feature class in the project’s geodatabase. This is one very important step, not only because I think it is a good practice to have all of your data in one location, but mostly because I have found from my experience that in order for most Arcade methods to work properly, they need to be applied to feature classes and tables stored in the same geodatabase (for instance Feature Set methods etc).
So, with the geoprocessing tool Feature Class To Geodatabase I convert the shapefile to a feature class within the project’s default geodatabase, as shown in Picture 2.
I then remove the shapefile and the basemaps that Pro adds by default on a newly created map, and I add the feature class, named MunicipalUnit2021, as shown in Picture 3.
Now, I want to only show the polygons associated with the Peloponnese area and not the entire Greece. There are certain ways to do this, including deleting all other polygons but the ones you need, or selecting those you need and exporting only the selection in a new feature class. Even though both these ways are correct, I always prefer a non-destructive way, or a way that at the end will produce the least possible feature classes. That is filtering the available dataset with a definition query.
To do this, I open the attribute table of the dataset to explore its fields and a potential way to apply a query. As shown in Picture 4, there is a field named “CODE” which stores the official municipal unit code of each polygon. This code is unique for each polygon and it consists of six digits. From my experience working with Greek administrative datasets, I know that the first two digits describe the regional unit where each municipal units belongs. So, perhaps I can use these two digits to filter the dataset.
By selecting polygons in the Peloponnese area I can see the different regional units, each municipal unit belongs to, by checking the first two characters in the CODE field of the selected features, as shown in Picture 5.
Finishing this exercise, I end up that the Peloponnese polygons have in common the following numbers as the first two digits of the CODE field: 37, 39, 40, 41, 42, 43 and 44 and this is the first piece of information.
The second is that I must find a way to retrieve the first two digits of the CODE field in the definition query and this can be achieved with the SQL SUBSTRING function.
So the SQL expression in the definition query dialog should be:
SUBSTRING(CODE, 1, 2) IN ('37','39','40','41','42','43','44')
As shown in Picture 6, I open the Properties panel of the MunicipalUnit2021 dataset and at the Definition Query tab I activate the SQL dialog, where I write the expression above.
When I click OK the definition query is being applied in the dataset and it filters it, so that only the filtered features are shown, those of the Peloponnese, as shown in Picture 7.
Note, that depending on the dataset in use, such queries will vary. Most probably, administrative datasets of other countries will have a field with a naming convention similar to the one described above for Greece.
Note also the single quotes around each number in the second part of the query. The single quotes have been used because the data type of the CODE field is text (string) and not numerical.
Final touch before proceeding is to change the map’s base color, merely for aesthetic purposes. To do this, I open the Map Properties panel and at the General tab I open the Color Editor for the Background color, as shown in Picture 8.
At the HEX field I type the number 333333 and click OK and OK. The map’s base now has turned to dark gray and matches the overall Pro’s interface of the dark mode, as shown in Picture 9.
The ColorTools toolbox or how to apply the Four or Five color Theorem easily!
Time for fun! But first let’s download the ColorTools toolbox from ArcGIS Online, unzip the downloaded file and store it somewhere in my computer.
Then, in the Catalog pane I right-click on the Toolboxes tab and from the list that opens I click on Add Toolbox, as shown in Picture 10, to connect to a toolbox.
I then navigate to the location in my system where I have unzipped the ColorTools toolbox, I select it and I click OK, as shown in Picture 11.
The ColorTools toolbox is being added in the available toolboxes of the current project and it is accessible from either the Geoprocessing pane, or the Catalog pane, as shown in Picture 12.
To expand the ColorTools toolbox I can click on the arrow left of its name, or double-click on its name. As shown in Picture 13, the toolbox expands and it reveals its two available tools, which are:
- The Eliminate Overlaps Tool, which generates a clean, single-part polygon class with overlaps removed out of the original polygon dataset and
- The Five Color Tool, which calculates color classes and stores them in a field of the polygon dataset, using the Five Color algorithm.
Let’s apply the first tool. I double-click on its name and the corresponding dialog opens.
As shown in Picture 14:
- For Input Features I select the MunicipalUnit2021 feature class,
- I leave the predefined selection that warns me that only the 152 filtered records will be used and
- for the Output features I leave the default name MunicipalUnit2021_EliminateOverlapsTool, automatically generated by the tool.
When I hit the Run button the tool will generate the new feature class, stored in the default geodatabase of the project as MunicipalUnit2021_EliminateOverlapsTool, which is being added on the Contents pane and on the map, as shown in Picture 15.
Now, for the second tool to run I need to create a new field in the new feature class named MunicipalUnit2021_EliminateOverlapsTool. As shown in Picture 15, I open the fields table and I add a new field with name ColorClass and with data type Long (numeric).
Now, I go back to the Catalog pane and I double-click on the Five Color Tool.
As shown in Picture 16:
- For Input Features I select the MunicipalUnit2021_EliminateOverlapsTool and
- For Color Field I select the ColorClass field, created in the previous step.
When I click on the Run button, the tool will calculate the values in the ColorClass field in a way that no adjacent features share the same color class.
As shown in Picture 17, when I open the attribute table of the MunicipalUnit2021_EliminateOverlapsTool feature layer I can see that all the values in the ColorClass field have been calculated! There are five color class values (0, 1, 2, 3 and 4) and these are the ones with which I will symbolize the polygon dataset.
Now, on the Contents pane I right-click on the MunicipalUnit2021_EliminateOverlapsTool feature class to open its Symbology pane.
On the Symbology pane:
- For Primary symbology I select Unique values and
- For Field 1 I select the ColorClass field.
As shown in Picture 18, all five values stored in the ColorClass field are being added and they symbolize each polygon feature in such a way that no adjacent features share the same color.
Voila!
The Four or Five Color Theorem in Action!
As a final touch I will open the available Color schemes for the polygon layer to change the predefined Random Color Scheme to a Discrete Color Scheme with five color blocks, as shown in Picture 19.
Reinventing the wheel: Achieve the Four or Five Color Theorem with Arcade!
Since, I am a little obsessed with Arcade I could not resist exploring a manual method to achieve the Four or Five Color Theorem with a custom expression. Besides, I primarily started experimenting with the expression before discovering the ready-made ColorTools toolbox, which is awesome, but I definitely wanted to finish my research and development with Arcade.
So, if we share the same bizarre attraction to self-torturing with writing code, keep reading, as I will describe my methodology in the following paragraphs.
To get started, as before, I create a new map and I add from the geodatabase the feature class named MunicipalUnit2021, as shown in Picture 20.
For the Arcade expression to work, I need firstly to create two new fields in the existing table of the MunicipalUnit2021 feature class. As shown in Picture 21, I open the Fields table and I add:
- The Neighbors field with data type Text (string) and with a length of 2550 characters (ten times larger than the default, which is 255 characters, to ensure that there will be enough room for the neighbors list to fit in) and
- The ColorClass field with data type Long (numeric).
The Neighbors field will store a comma separated string with the codes of the neighbors of each feature, while the ColorClass field will store the color class for each feature.
Next step is to create a new table in the geodatabase which will indicate the neighbors of each feature of the MunicipalUnit2021 feature class. I will do this with the Polygon Neighbors (Analysis) geoprocessing tool.
So, as shown in Picture 22, I open the Polygon Neighbors (Analysis) tool where:
- For Input Features I select the MunicipalUnit2021 feature class,
- For Output Table I leave the predefined name that the tool automatically generated, which is MunicipalUnit2_PolygonNeighb,
- For Report By Fields I select the CODE field (which exists in the attribute table of the MunicipalUnit2021 feature class and stores the code value of each feature) and
- I leave all other options as they are.
When I click the Run button the Polygon Neighbors (Analysis) tool will create a new table in the geodatabase with the name MunicipalUnit2_PolygonNeighb and it will also add it on the Contents pane.
As shown in Picture 23, I open the MunicipalUnit2_PolygonNeighb table, where I can see two fields:
- The src_CODE field which stores the code of each feature of the MunicipalUnit2021 feature class and
- The nbr_CODE field which stores the neighbors of each feature.
In Picture 23, I show for example, that the feature with code 010102 has as neighbors the features with codes 010101, 010103, 010301, 010303 and 060202.
More information about how the ArcGIS geoprocessing tool Polygon Neighbors (Analysis) finds neighbors and populates the output table can be found in How Polygon Neighbors Works.
I will now use the two fields (src_CODE and nbr_CODE) of the MunicipalUnit2_PolygonNeighb table to aggregate a comma-separated list of the neighbor values for each feature of the MunicipalUnit2021 feature class.
So, I open the attribute table of the MunicipalUnit2021 feature class and I find its Neighbors field (created in the previous steps), as shown in Picture 24.
Before proceeding, I filter the MunicipalUnit2021 feature class with the same definition query I had used for the previous approach (see Picture 6).
I then right-click on the Neighbors field name in the attribute table to open the Calculate Field (Data Management) tool, as shown in Picture 25.
As shown in Picture 25, I complete the options of the Calculate Field tool as follows:
- For Input Table I leave the predefined selection MunicipalUnit2021,
- I leave selected the Use the filtered records button,
- For Field Name I leave the predefined Neighbors,
- For Expression Type I select Arcade and
- At the expression building field I write the following Arcade expression:
var code = $feature.CODE;
var neighborsArray = [];
var neighborsTable = FeatureSetByName($datastore, 'MunicipalUnit2_PolygonNeighb', ['*'], false);
var neighbors = Filter(neighborsTable, 'src_CODE = @code');
for (var neighbor in neighbors) {
Push(neighborsArray, neighbor['nbr_CODE']);
}
return Concatenate(neighborsArray, ',');
This expression firstly creates two variables:
- the variable code where it stores the value from the CODE field of the MunicipalUnit2021 feature class and
- the variable neighborsArray which is a new, virtual empty array.
It then creates:
- a third variable named neighborsTable, with which it retrieves the table MunicipalUnit2_PolygonNeighb from the project’s geodatabase, using the FeatureSetByName method, as well as
- a fourth variable named neighbors, with which it corresponds each feature of the MunicipalUnit2021 feature class with a row of the MunicipalUnit2_PolygonNeighb table by matching their code value (src_CODE = @code), using the Filter method.
Then it loops through the neighbors virtual table with a For…in loop and for each feature of the MunicipalUnit2021 feature class it finds its corresponding neighbor code (nbr_CODE) by retrieving it from the MunicipalUnit2_PolygonNeighb table, storing it within the neighborsArray, using the Push method.
Finally, it turns the neighborsArray into a comma-separated text (string), for each feature, using the Concatenate method.
As shown in Picture 26, this expression calculated a comma-separated list of the neighbors of each feature of the MunicipalUnit2021 feature class and stored them in the Neighbors field.
Compare, for example the neighbors of the feature with code 370101, which are those with the codes 370103, 370104, 370105, 370205, 370403. In the MunicipalUnit2_PolygonNeighb table they are separate rows, while in the attribute table of the MunicipalUnit2021 feature class they are a list within a field for each row (feature).
Now, since I generated the neighbors list of each feature, I will use this field to calculate the color classes in such a way that not any feature share the same color class with its neighbors.
Therefore, I right-click on the ColorClass field name in the attribute table to open the Calculate Field (Data Management) tool, as shown in Picture 27.
I complete the options of the Calculate Field tool as follows:
- For Input Table I leave the predefined selection MunicipalUnit2021,
- I leave selected the Use the filtered records button,
- For Field Name I leave the predefined ColorClass,
- For Expression Type I select Arcade and
- At the expression building field I write the following Arcade expression:
var colorClasses = [1, 2, 3, 4, 5];
var neighbors = Split($feature.Neighbors, ',');
var neighborColorClasses = [];
for (var i in neighbors) {
var neighborCode = neighbors[i];
var thisLayer = FeatureSetByName($datastore, 'MunicipalUnit2021', ['*'], false);
var neighborFeature = First(Filter(thisLayer, "CODE = @neighborCode"));
if (!IsEmpty(neighborFeature)) {
var neighborColor = neighborFeature['ColorClass'];
if (!IsEmpty(neighborColor) && IndexOf(neighborColorClasses, neighborColor) == -1) {
Push(neighborColorClasses, neighborColor);
}
}
}
var availableColors = [];
for (var j in colorClasses) {
if (IndexOf(neighborColorClasses, colorClasses[j]) == -1) {
Push(availableColors, colorClasses[j]);
}
}
if (Count(availableColors) > 0) {
return First(availableColors);
} else {
return colorClasses[0];
}
The second expression is a little more complicated, so let’s break it down and explain it as simply as possible.
At first, it creates a variable named colorClasses which is an array of values. The number of the items within the array (the length of the array) is the number of the color classes that the expression will calculate, while the items themselves are the values for these color classes. The colorClasses array in this example will create five color classes with the values 1, 2, 3, 4 and 5.
It then creates a second variable named neighbors which retrieves the comma-separated values from the Neighbors field of the MunicipalUnit2021 feature layer, calculated before, and stores them on-the-fly in a virtual array, using the Split method.
Then a third variable is created named neighborColorClasses, which is an empty virtual array. This array will store the color classes of all the neighbors of every feature.
Just after these three variables, a For…in loop begins, which loops inside the neighbors virtual array. Inside this loop, a variable named neighborCode is created, which holds every neighbor code neighbors[i] inside the neighbors array and for each feature. Then, another variable named thisLayer stores virtually the table of the MunicipalUnit2021 feature layer (its own self), using the FeatureSetByName method.
Eventually, a third variable within the loop is created, named neighborFeature, which creates one virtual feature for every neighbor of each feature of the MunicipalUnit2021 feature layer, using the First and then the Filter methods.
Just after the initiation of all variables, a series of if/else conditional statement begins. Firstly, the statement examines if the feature has a neighbor, using the reverse of the IsEmpty statement in the neighborFeature. The reverse is assigned with the exclamation mark (!) Logical operator. If the feature does have a neighbor, then it creates a variable named neighborColor, where it stores the color class value of that neighbor.
Then if the feature’s neighbor has a color class, if that color class does not exist in the neighborColorClasses virtual array, then it adds it in that array, using the Push method. To examine whether the color class exists in the neighborColorClasses virtual array the statement uses the IndexOf method and tests if it returns -1 (which means that the color class in search does not exist within the array).
Running this conditional statement for all features (rows) of the MunicipalUnit2021 feature layer will populate the virtual array named neighborColorClasses for each feature which will store on-the-fly all the color classes for each one of its neighbors. That way, the expression understands which color classes are occupied by the neighbors of each feature, thus indicating which ones have been left and remain available for the feature itself.
The expression continues and creates a new variable named availableColors, which is a new empty virtual array. This new array will store the remaining color classes, those not occupied by the neighbors of each feature, which basically are the candidates for the feature’s color class about to be assigned.
One more For…in loop begins. This one searches the colorClasses array and if it finds a color class which is not included in the neighborColorClasses virtual array, it adds it in the availableColors virtual array. As before is uses the IndexOf and Push methods. This way, the expression stores in the availableColors virtual array all the available color classes for each feature, which are not being used by its neighbors.
Finally, another if/else conditional statement tests whether the availableColors virtual array has at least one or more color classes for each feature and if yes, then it returns the first one found. This will be the color class that will be calculated at the ColorClass field for each feature. In the case that not any available color is found, then the statement will return the first item of the colorClasses array (colorClasses[0]).
When this expression run, the ColorClass field in the attribute table of the MunicipalUnit2021 feature layer will be calculated and one color class value for each feature (row) will be assigned, as shown in Picture 28.
Now, I go to the Contents pane and right-click on the MunicipalUnit2021 feature layer to open its Symbology pane.
On the Symbology pane for Primary symbology I select Unique values and for Field 1 I select the ColorClass field.
As shown in Picture 29, all five values stored in the ColorClass field are being added and they symbolize each polygon feature in such a way that no adjacent features share the same color.
Again the Four or Five Color Theorem has been achieved with Arcade!
As I final touch I will open the available Color schemes for the polygon layer to change the predefined Random Color Scheme to a Discrete Color Scheme with five color blocks, as shown in Picture 30.
Considerations about the Arcade Expression
There are a few thing to consider when using the Arcade expression I wrote. These are:
- Iterations: I have found that in some datasets more than one iteration of the expression brings the desired result. This means that I might have to run the field calculation for the ColorClass field two or three times for the expression to aptly apply the color classes to each feature, so that none of them share the same color class with their neighbors.
- Number of color classes: At the first row of the expression an array named colorClasses begins. The number of the items within the array corresponds to the Four or to the Five Color Theorem and beyond those, a larger number of classes, like six or seven, can be assigned.
- Complexity of the dataset: The number of classes (the number of items within the colorClasses array) should be adjusted to the complexity of the dataset.
Conclusion
That was a tough one. I mean it took me a while to write this article, not to mention the time I spent writing the Arcade expression and exploring the available tools. But I think the results are quite satisfying.
I have uploaded a Map Package for ArcGIS Pro on my ArcGIS Online account here which includes all the maps and data and tools described in the previous paragraphs, as well as the two Arcade expressions in .cal format. When you extract the package all maps should work properly in ArcGIS Pro, while the two expressions can be found in the \commondata\userdata folder. Feel free to download and and use the package under a CC BY-NC-SA 4.0 license.
I would be very happy if any of my readers find bugs or potential for improvement in the expressions or in the overall approach. Please share them with me if you do.
I would also like to see if anyone actually benefited from applying the Four or Five Color Theorem with any of the aforementioned techniques and applied them in their own polygon feature datasets. I would like to see any results, so please make sure you tag me if you post on social.
Wishes for a colorful cartographic journey, with four, five, or as many colors as you like!
Kindest regards from Crete, Greece!
Spiros