I am not an expert on GIS but I like to think of myself as a geomapping enthusiastic. I recently downloaded Code-Point® Open file from the Ordnance Survey, a CSV file that provides a precise geographic location for each postcode unit in Great Britain. The product is a CSV file containing postcodes, grid references, NHS® health and regional health authority codes, administrative ward, district, county and country area codes.

The geo coordinates of this file come as Easting & Northing. From Wikipedia "*the terms easting and northing are geographic Cartesian coordinates for a point. Easting refers to the eastward-measured distance (or the x-coordinate), while northing refers to the northward-measured distance (or the y-coordinate)*".

Unfortunately things like Google Maps, Google Earth & others use as internal coordinate system the geographic coordinates (latitude/longitude) on the World Geodetic System of 1984 (WGS84) datum so in other to use Code-Point® Open file on one the above tools I had to convert eastings & northings to WGS84.

Thus I scanned the internet for possible solutions to my dilemma but none of them where exactly what I needed it. Some options where to use well-known APIs that will do the conversion for you, but my problem was I had 1.7 million postcodes so this wasn't an option really.

I also looked for python modules and read about Fiona and Shapely but none of them had the solution (probably due to my lack of knowledge on the subject) I was looking for.

Then I came across a python package called Pyproj which along with Pandas solved my problem easily (well, after couple of days of research).

Here it is my solution:

  1. Download and unzip the Code-Point® Open file in your computer
  2. Use the following this piece of code:
  • Import modules:
import os
import pandas as pd
import pyproj
import re
  • Read in all csv files into a Pandas Data Frame:
listfiles = os.listdir("codepo_gb/Data")
pieces = []
columns = ['pstcode','positional_quality_indicator','eastings','northings','country_code','nhs_regional_ha_code',
           'nhs_ha_code','admin_county_code','admin_district_code','admin_ward_code']
for f in listfiles:
  path = "codepo_gb/Data/%s" % f
  frame=pd.read_csv(path, names = columns)
  frame['filename']=f
  pieces.append(frame)
    
postcodes = pd.concat(pieces, ignore_index=True)
  • Check that Data frame postcodes has been created suscessfully:
postcodes
    
    Int64Index: 1687605 entries, 0 to 1687604
    Data columns:
    pstcode                         1687605  non-null values
    positional_quality_indicator    1687605  non-null values
    eastings                        1687605  non-null values
    northings                       1687605  non-null values
    country_code                    1687605  non-null values
    nhs_regional_ha_code            1440838  non-null values
    nhs_ha_code                     1596706  non-null values
    admin_county_code               636420  non-null values
    admin_district_code             1687514  non-null values
    admin_ward_code                 1687514  non-null values
    filename                        1687605  non-null values
    dtypes: int64(3), object(8)

Now here it is a quick example of converting eastings and northings to latitude and logitude and further down is the same piece of code wrapped into a function. I also extracting the area level from the post code.

bng = pyproj.Proj(init='epsg:27700')
wgs84 = pyproj.Proj(init='epsg:4326')
# AL1 1AB - pyproj.transform(from,to,easting,northing)
lon,lat = pyproj.transform(bng,wgs84, 394251, 806376)
print lon, lat
-2.0966478976 57.1482316621
sample = postcodes
def proj_transform(df):
    bng = pyproj.Proj(init='epsg:27700')
    wgs84 = pyproj.Proj(init='epsg:4326')
    lon,lat = pyproj.transform(bng,wgs84,df['eastings'], df['northings'])
    area_pattern = r'([a-zA-Z]+)(?=\d)'
    #area = re.search(r'[a-zA-Z]+(?=\d)',df['pstcode'].str[:-3]).group()
    
    df['lat'] = lat
    df['lon'] = lon
    df['area'] = df['pstcode'].str.findall(area_pattern, flags=re.IGNORECASE).str[0]
    return df
    

I now create a new data frame called sample2 with 3 new columns: latitude, longitud and area postcode level.

sample2 = proj_transform(sample)

With my new dataframe I then export it to a csv file and job done.

sample2.to_csv('uk-postcodes-latlng.csv',index=False)

I have been playing with QuatumGIS but not enough to add something interesting yet so I'll be posting about producing shapefiles than could be use on Google maps soon.

I am done for now.



blog comments powered by Disqus