Removal of text and cartographic symbols in scanned maps 
By  Stefan Ene, Director GeoProcessing Unit, Dept. of Human Geography, Stockholm University, Stockholm, Sweden. 
Jan 1998. 

Scanning maps is a convenient way of obtaining digital geographic data. However maps most often contains information that might not always be of spatial relevance e.g. text and cartographic symbols. As these objects might range very much in size conventional filtering is not a very good approach when we want to get rid of these features. Instead I have found that using ordinary distance operators provides a very smooth and accurate way of solving the problem if two conditions are fullfilled; 

  • the objects to be removed do have a color that is different from objects to be kept
  • the features to be kept are neither dithered or patterned. (If so filtering might solve the problem at the expense of resolution), 
 
 
Digital map after scanning  
Result after processing  

Step by step solution 

  • Scan the map 
  • If necessary use a noise removal filter. If your map is "fuzzy" you might need more filtering of different types in order to get it as "clean" as possible. (If you are scanning a large series of similar maps - take your time to test - it will pay off in the long run ). 
  • Thematically classify your map into whatever number of classes you want to extract. Put all the stuff you want to get rid of in separate class (=color). Let us call it the "scrap class".
  • Measure the distance from every pixel in the "scrap class" to any of the other classes.
  • Change the value of the pixel in your image to the value of the closest "none scrap class". 

  •  

Example 
(Software used in this example are Adobe Photoshop and Idrisi) 

A map scanned on a Vidar CS 400 A0 scanner 
Digital map after scanning 
Detail  
 

 "Dust and scratches" filter is applied in Adobe Photoshop in order to remove noise. 
Map after noise removal 
Detail  
 

"Color Selection" tool is used in Adobe Photoshop. (If "unwanted" text or symbols is found in the same color range as information that you want to keep this step will require more manual work). 
Map classified in 5 classes  
 
Water (blue) 
Forest (green) 
Open land (yellow) 
Built area (red) 
Scrap class (black) 
Detail  
 
 
After import to IDRISI the "Distance" module is used to create a distance matrix. Distances measured are from all pixels in the "scrap class" to any "non-scrap" pixel. 
Distance map  
Detail  
 

By using the IDRISI module "Allocate" all "scrap class" pixels are allocated to the closest "non scrap" class 
Resulting map with text and symbols  
removed 
 
Water (blue) 
Forest (green) 
Open land (yellow) 
Built area (red)
Detail 
 
 
In this example we have extracted four classes of land use from a scanned map with a minimum of manual work. I have found that this is a both fast and accurate way of post processing scanned maps. 

Stefan Ene