Python for Quick and Easy GIS Data Manipulation

Written by , May 20, 2017

Getting started with Python

The key to every information system is the data within it. It’s rare to find an organization that has all of their data structured perfectly and organized in the exact formats and systems that they need to accomplish their goals. Many of our clients come to us with data that is incomplete or maybe in a spreadsheet when it needs to be in a Shapefile or a geodatabase. The Python programming language is an invaluable tool when it comes to geo-data manipulation. From copying field values from one table to another, to converting between spatial data storage formats, using Python scripts can speed up many data-related work flows.

Python scripts are great time savers for situations when there are a lot of features to be updated and more complicated rules regarding the updates. For something like calculating the value of a field based on another field, the field calculator built into ArcGIS or QGIS is the way to go. However, with situations like updates that depend upon a spatial relationship, or field calculations based on a field in another table, Python is an excellent tool for the job. Without scripting, these types of workflows become painstaking grinds of manual feature-by-feature updates. But with a little know-how, it doesn’t have to be that way.

Recently, we needed to update several thousand parcels to create a field that would indicate which rivers and tributaries ran through each parcel. In our rivers layer, we had several dozen rivers and several hundred creeks and tributaries. Instead of doing a tedious manual update, we wrote to a Python script using the open source OGR module. Read on for a line by line walk through of the script.

Python Script – Line by Line Walk Through:  Startup

Assuming you have the Python OGR/GDAL module installed (it comes with QGIS, or you can install separately with pip or conda), to start with, import the OGR module:

from osgeo import ogr

First, we define the data sources by assigning their file system paths’ to variables:

rivers_shp = r"L:\Line 45\Shapefiles\rivers.shp"
parcels_shp = r"L:\Line 45\Shapefiles\parcels.shp"

Next, we open the files and store the resulting objects to variables:

parcels_src = ogr.Open(parcels_shp, 1)
rivers_src = ogr.Open(rivers_shp, 0)

The digits passed after the file name variables indicate the mode the files should be opened in: 0 for read-only, 1 for write access. Since we’re updating the parcels layer, we need to open with write access.

OGR works with objects called layers, which we can get from our opened file object variables using the GetLayer() method:

rivers_layer = rivers_src.GetLayer()
parcels_layer = parcels_src.GetLayer()

Python Script – Line by Line Walk Through:  Look Through Rivers

Once we have a layer, we can iterate through it, feature by feature using a for loop. If you’re familiar with ArcPy, this is similar to how the Data Access Cursors allow you to iterate data. Now that we have rivers to iterate, we need to do something with each river:

for river in rivers_layer:
    river_name = river.GetField("GNIS_NAME")
    river_shape = river.GetGeometryRef()

The first line above sets up the for loop, that will access each feature (each river in our case) in the layer one at a time. The three lines following tell it what we want done with that river once it’s accessed. First, we read a field, GNIS_NAME, and store the value in that field to the variable river_name. Second, we look up the geometry (the shape) of the river, and store that to the variable river_shape. Finally, we set a spatial filter on our parcels layer. This is like a select by location, where we limit our parcels to only those intersecting the shape we pass to the filter. In our case, we want only parcels that intersect the river.

Python Script – Line by Line Walk Through:  Look Through Parcels

With the parcels selected, we can iterate through them with another for loop to apply our updates:

for parcel in parcels_layer:
    frontage = parcel.GetField("frontage")
    if frontage:
        frontage += ", {}".format(river_name)
        frontage = river_name
        parcel.SetField("frontage", frontage)
        parcel = None

Just like before, we use GetField() to read the value of the frontage field and store it to a variable. The next line uses if to create a condition, stating if there was a value in frontage field, then complete the code indented below. Otherwise, complete the code indented below the corresponding else statement. So, if the frontage field has any value in it, we add to our frontage variable a comma, a space, and the name of the current river. If there was no value at all in the frontage field, then we set our frontage variable to the name of the river.

Similar to how we can read values of fields with GetField(), we can write values to fields using SetField(). In the code, the first value, the string frontage, indicates which field we want to write, the second value is what we want to write to the field (our variable that we have updated above with the river name). To save our changes to the file, we call SetFeature() on the parcel we’re updating, then de-reference it by setting its value to None. This will finalize our changes to that parcel.

Easily Extensible

This is the logic for a single iteration of the loop through the rivers. Now that we have that set up, when we run the script it will go through all of our hundreds of rivers and thousands of parcels and update them, all in a matter of seconds. This example uses parcels and rivers, but the same logic could be applied to compile records of all the roads that pass through sensitive areas, all the trails that lead into a particular wilderness area, all the buildings within a given business district, or any number of other scenarios. With a little tweaking, we could get even more information, such as how many feet of frontage a parcel has along each river.

If this has you interested in using Python to solve your problems, check out some of the examples in this GDAL/OGR cookbook.  ArcPy is also very well documented with some quick walk throughs.  We’d also love to hear from you and help implement solutions like these with your data and requirements.

– Ben