Spatial Analytics
The use of location data has been on the rise in recent years. Everything we do is geo-tagged. It might be the restaurant we had a meal at, the jogging circuit we frequent. Companies and industries selling goods & services want to understand where the revenue is high, where their customers are based and many such location-related queries. But the data for places is usually in the form of addresses or a description relating to the location. And when we want to map and visualise, it's generally not possible. Therefore, we need to assign the geographical coordinates to each of these places of interest. It helps us to be able to show data on a map. And we use the process of geocoding to do precisely this. Here in this blog, we look at the process of geocoding and how we can geocode addresses for free.
If you would like to read more about how location(geospatial) data is rising in importance and is the way forward in data analytics, check out my article on the topic here.
What's Geocoding?
Geocoding is the process of transforming a description of a location - usually, the name of a place or its address into geographical coordinates that represent the place's location on the earth's surface. This is given as a pair of coordinates called, latitude and longitude. Latitude is the coordinate that represents the position of a subject in the north-south direction on the earth's surface (-90° to 90°). While longitude is the coordinate in the east-west direction (-180° to 180°).
In some cases, the description of the location can be of different types as well, such as IP address, pair of coordinates in a different coordinate system etc.
Process of Geocoding
Individual Address vs Bulk Addresses
There are several map platforms existing today that enable searching any given address on the globe. Google Maps is one of the most used tools for this purpose. If it's just a few handfuls of places you are looking to geocode, you can use this platform and get the geographic coordinates for the location. You can also use other platforms like Bing, Mapbox, OpenStreetMap etc. It helps in geocoding for quick use cases of locating the address on a map and visualising it.
But in many cases, you have large counts of addresses to geocode and identify the geographic coordinates. Following the above method, visualising the location one by one for several hundred places is not feasible. It would be a terrible waste of time, even if you did. So we look for options to geocode multiple addresses in bulk by automating the process.
For geocoding in bulk, we use some tools offered as UI or an API on the web. These services are usually charged money on each geocoding request of an address into coordinates. If you have a few thousand addresses, it can cost you anywhere between $2-$10 based on the provider you choose. We'll look at a few service providers and tools that help geocode addresses for free for most of our use cases.
Bonus: If you are interested and have some time to spare, you can even set up your geocoding server using which you can geocode as many addresses as you want.
Preparing the address column
Before you send in your addresses for geocoding, you need to make sure your address is in the correct and standard format used for geocoding. Instead of having each of the street address, area, city and postal codes in different columns, preferably, you need to have one single line of text in order as you generally see on a postcard.
It's ideal to have the data of each component of an address standardised. If it doesn't match the format of the addresses on the official postal database, you might end up with a wrong or no match at all. Fundamentally, geocoding is checking an address against a standardised locations database of addresses for the best match. Therefore this is a necessary step to avoid errors.
Standard order of the address used - name, house number, street name, city name, state name, ZIP Code
GUI-based Geocoding Tools
There are a few options online that offer bulk geocoding of addresses for free. Of course, these services have limits on how many geocoding requests you can make. But these tools should be sufficient for someone who is working on some one-off projects or handles a limited amount of data.
Local Focus
Known as the Batch Geocoder for Journalists, this tool can geocode for addresses in bulk. Once you open the link, choose the country from the drop-down. It helps optimise the process of geocoding. Next, you need to paste all the addresses you want to geocode, with each location address in a single line. By clicking on "Add to Geocoder", you start the process and get an output to copy at the end with three columns - latitude, longitude and status appended to the address.
Google Spreadsheets - Using Add-ons
Google Spreadsheets offers a simple solution for geocoding. All you need is an add-on & you can start geocoding a few thousand location addresses. There are several add-ons, and here we demo Geocode by Awesome Table. This add-on uses the 'address' column you specify on the sheet to geocode and add two more columns for the latitude and longitude when done.
To use this tool, choose Add-ons > Geocode by Awesome Table > Start Geocoding. Be sure to check the 'Try wider results' box before you start geocoding. Another cool feature of this tool is if you have the address in multiple columns, such as street address and city name, it lets you concatenate the columns to create a full location address on the go.
There's a limit of 10,000 rows per day per account on this tool, but it should be enough for most use cases.
You can check out another add-on for GSheets: Geocoding by SmartMonkey
QGIS Plugins
QGIS is one of the software packages you'll come across when learning to handle geospatial data. Being an open-source package, it has so many features and algorithms to run on spatial data. Plugins help increase its already existing capabilities. One such plugin is 'MMQGIS'. This plugin can do several of the analysis tasks on spatial data. But for geocoding, we are using the 'Geocode CSV with Web Service' from the drop-down.
Supply a CSV file with all the necessary columns for address and optional city, state and country columns to run the geocoding. You can geocode using four different services offered in the plugin - Google, OSM Nominatim, US Census Bureau and ESRI Server. You would need the API keys for both Google and ESRI. Whereas, for the others, you can straight away head to geocoding.
Script-based Geocoding using APIs
Even though the tools mentioned above can handle geocoding in many cases, it would make the entire process much faster and easier if you could run bulk requests for geocoding using scripts. Many services offer API with endpoint URLs. You need to make a web request by supplying the query parameters for an address, and you'll receive a response that would generally be a JSON object. This object has the details of the location address along with the geographic coordinates.
These API services are typically on a subscription model. They charge money for each of the geocoding requests you make. But, almost all of them offer a free tier of the plan, and that's sufficient for a typical user's use cases in a given month. Following are some of the services and their free quota limits.
- Position Stack's Accurate Forward & Reverse Batch Geocoding REST API: This service offers geocoding for more than 2 billion addresses around the world. It has multiple service tiers based on usage requirements, and the free tier plan allows a total of 25,000 requests per month for an account. You can perform both forward and reverse geocoding on this API using any of the following languages PHP, Python, Go, Ruby, Nodejs, or jQuery. The responses are JSON, XML or GeoJSON objects. See pricing.
Sign up and get your free API key here and Read the documentation on API usage here. - ESRI's Geocoding and Search API: Signing up for an ArcGIS developer account gives you access to a wide variety of services related to mapping. And geocoding is one of the default services activated for your API key. You need to choose the type of API you wish to use, whether on python, javascript or other methods.
This service offers a total of 20,000 free requests per month, then on $0.5 per 1,000 Geocodes if you cross the limit. One thing to be noted is that geocode results from the web requests you make are temporary JSON response objects and are not stored. See pricing.
Sign up and get your API key here and Read the documentation on the API usage here. - Mapbox: Mapbox has been a go-to service when dealing with building custom maps with JavaScript. And its services with maps, navigation, search and tilesets API are among the cheapest solutions available. It generously offers a total of 100,000 free requests per month when you use its geocoding API and slab-wise charging per 1000 requests later on. Each address constitutes one geocoding request made & the results are not stored on the account. The response received is a JSON object similar to other services. See pricing.
Sign up and get your API key here and Read the documentation on the API usage here. - LocationIQ: It's another provider that provides multiple services for mapping along with geocoding. It offers one of the cheapest pricing and a whopping 5,000 requests per day which translates to almost 150,000 requests per month on average. Though the limit looks good on paper, the limit per day option has its drawbacks. Even if you have a slightly larger number than 5K places, you'll have to wait for a whole day for counters to reset after you exhaust your daily quota. Also, the requests can't be more than 2 per second. However, when dealing with a limited number of addresses without a time constraint, this service offers the maximum limit for a free quota for geocoding. See pricing.
Sign up and get your API key here and Read the documentation on the API usage here. - MapQuest: Mapquest offers several services for mapping ranging from geocoding, directions, and elevation to a static map, traffic etc. You can get a free quota of 15,000 transactions per month. This quota is across all the services offered rather than just geocoding. See pricing.
Sign up and get your API key here and Read the documentation on the API usage here. - OpenRouteService: An open source-based service that provides geocoding API along with directions, isochrones, elevation, time-distance matrix etc., offers a per day quota of 1,000 requests @100 requests per minute. You can donate to the development of the service, but there's no paid tier for this.
Sign up and get your API key here and Read the documentation on the API usage here. - What3Words: We generally use geographic coordinates or postal addresses to identify a place on the earth's surface. But this mapping service completely changes how we identify a location. This service uses a unique algorithm to convert the entire globe into grids of 3m x 3m size and assign a 3-word address to each. This 3-word address can be in multiple languages offered by the provider. If you want to geocode these 3-word addresses, you can use their API that offers up to 1,000 free requests per month. And you can have unlimited reverse geocoding of locations (geographical coordinates to 3-word address). See pricing.
Sign up and get your API key here and Read the documentation on the API usage here. - FreeGeoIP.net: Another unique type of address used for geocoding is the IP address of a network. Each IP address is associated with location information. Each IP address has a piece of associated location information. We can get this information in any format required (country/state/city/lat-long) by geocoding. It helps to geolocate the origin of network pings on the web to identify systems or people like hackers. The service FreeGeoIP.net offers an API, using which you can geocode IP addresses with a free quota of 10,000 requests per month. See pricing. Sign up and get your API key here and Read the documentation on the API usage here.
Requesting from API endpoints
Usually, you can use an API by using the endpoints set by the service providers. When you make a web request with all the required query parameters, you receive the response, generally, as a JSON object. Here we demonstrate the use of cURL and the python method to fetch the output from the services.
1. Positionstack - Python
The following code shows how to make a geocoding request for the address 'Copacabana' in the Rio de Janeiro region using the positionstack API. Look into the usage docs to know more about the parameters and response objects.
The printed output of this request would look as below.
2. Open Route Service - Python
When you make a web request using this open-source service, you get multiple matches for the address. You need to mention the max number of results you wish to receive with each request.
The printed output of this request would look as below.
3. Mapbox - Curl
In this case, we are using the curl method that's run on the terminal to geocode 'Mountain View, California' using the Mapbox API.
The printed output of this request would look as below.
4. FreeGeoIP.net - Python
We use a random US IP address for geocoding using the service here.
The printed output of this request would look as below.
5. What3Words - Python
For this, the API offers a python module, 'what3words', which can be installed using pip as pip install what3words. Then it's imported and requested for geocoding on a 3-word address as given in the code below.
The printed output of this request would look as below.
Using Modules on Python
The use of different APIs with endpoints means you have to remember the URLs and query parameters for each of them. You have to create a request for each address with its query, headers and parameters. To simplify this process and automate most of the task involved, we use python modules that have functions to run geocoding. These modules (two particularly) help in geocoding by collating various APIs and their endpoints. You only have to provide the address (and an API key in some cases) to run geocoding.
Geocoder and GeoPy are powerful modules and offer a lot of features. In this blog, we cover the Geocoder module in detail. You can check out the documentation on using these both here – Geocoder, Geopy.
1. OSM Nominatim
The Nominatim Server is based on the OpenStreetMap and offers free geocoding without an API key. You can run as many requests as possible with the limitation of one request per second. Nominatim restricts the bulk geocoding with these rules. (from the Nominatim usage policy)
- limit your requests to a single thread
- limited to 1 machine only, no distributed scripts (including multiple Amazon EC2 instances or similar)
- Results must be cached on your side. Clients sending, repeatedly, the same query may be classified as faulty and blocked.
This means you can make up to 86,400 geocoding requests per day.
However, if you have an Ubuntu machine and want to set up your own Nominatim server locally, you can do so by installing the Nominatim. Doing so will help you make as many geocoding requests as you want. Once you have set up the server, you can even update it regularly.
2. ArcGIS
For this service, you can do a similar query like OSM to get the resulting JSON object. You can also supply maxRows as a parameter to limit the number of responses received with geocoding.
3. Mapbox
This function uses all the parameters used by the Mapbox API. You can even set up an environment variable for your API key instead of using it directly in your script. You can define the API key by doing the following from your system's terminal.
Parameters that can be used:
proximity: Search nearby [lat, lng]
bbox: Search within a bounding box [minX, minY, maxX, maxY]. Pass as an array.
country: Filtering by country code
method: (default=geocode) geocode, reverse
4. LocationIQ
This function uses all the parameters used by LocationIQ API. You can set up an environment variable for your API key just as for Mapbox instead of using it directly in your script. You can define the API key by doing the following from your system's terminal.
Parameters that can be used:
url: custom osm server
maxRows: (default=1) Max number of results to fetch
method: (default=geocode) geocode
5. MapQuest
This function uses all the parameters used by MapQuest API. You can set up an environment variable for your API key instead of using it directly in your script. You can define the API key by doing the following from your system's terminal.
Parameters that can be used:
maxRows: (default=1) Max number of results to fetch
bbox: Search within a bounding box [minX, minY, maxX, maxY]. Pass as an array.
method: (default=geocode) geocode, batch
Similarly, we can run the geocoding for other services like What3Words, FreeGeoIP.net etc.
Conclusion
Geocoding is an integral part of spatial data science. It helps convert every address of the data into usable and geographically mappable coordinates on the surface of the earth. But the costs of geocoding with these services are usually high in large scale projects. Therefore, we look for some free resources available to fulfil our needs. These services cover the most basic types of geocoding addresses for free. Thus, helping users with many of the use cases.
Subscribe to the blog now and get notified about future blog posts. You can find me on LinkedIn, Twitter for any queries or discussions. Check out my previous blog on How Alternative Data is Helping the Companies Invest Big here, and why you need to use Geopackage instead of Shapefile or Geojson here.
Do you like our stuff? Subscribe now.
You may also like