Hướng dẫn dùng geopandas plot python

(English below)

GeoPandas 0.9.0

GeoPandas là một dự án mã nguồn mở giúp làm việc với dữ liệu địa không gian trong python dễ dàng hơn. GeoPandas mở rộng các kiểu dữ liệu được Khung dữ liệu Pandas sử dụng để cho phép các hoạt động không gian trên các kiểu hình học. Các phép toán hình học được thực hiện bởi shapely. Geopandas còn phụ thuộc vào fiona để truy cập tệp và matplotlib để vẽ biểu đồ.

Hướng dẫn dùng geopandas plot python

 

Miêu tả
Mục tiêu của GeoPandas là giúp làm việc với dữ liệu địa không gian trong python dễ dàng hơn. Nó kết hợp các khả năng của Khung dữ liệu Pandas và tạo hình, cung cấp các hoạt động địa không gian ở Khung dữ liệu Pandas và giao diện cấp cao cho nhiều hình dạng để tạo hình. GeoPandas cho phép bạn dễ dàng thực hiện các thao tác trong python mà nếu không sẽ yêu cầu cơ sở dữ liệu không gian như PostGIS.

GeoPandas là một dự án dẫn đầu cộng đồng  được viết, sử dụng và hỗ trợ bởi nhiều người từ khắp nơi trên thế giới với nhiều nguồn gốc khác nhau. 

GeoPandas sẽ luôn là phần mềm nguồn mở 100%, miễn phí cho tất cả mọi người sử dụng và được phát hành theo các điều khoản tự do của giấy phép BSD-3-Khoản.

Thông tin chi tiết về hướng dẫn sử dụng:

  • Bắt đầu

https://geopandas.org/getting_started.html

  • Tài liệu

https://geopandas.org/docs.html

  • Giới thiệu về GeoPandas

https://geopandas.org/about.html

  •  Cộng đồng

https://geopandas.org/community.html

-------

Using geospatial data in python easier with Geopandas

GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. Geopandas further depends on fiona for file access and matplotlib for plotting.

Description
The goal of GeoPandas is to make working with geospatial data in python easier. It combines the capabilities of pandas and shapely, providing geospatial operations in pandas and a high-level interface to multiple geometries to shapely. GeoPandas enables you to easily do operations in python that would otherwise require a spatial database such as PostGIS.

GeoPandas is a community-led project written, used and supported by a wide range of people from all around of world of a large variety of backgrounds. 

GeoPandas will always be 100% open source software, free for all to use and released under the liberal terms of the BSD-3-Clause license.
Further information: 
Getting started  
https://geopandas.org/getting_started.html
Documentation
https://geopandas.org/docs.html
About GeoPandas
https://geopandas.org/about.html
 Community
https://geopandas.org/community.html

Geolink tổng hợp từ Geopandas

Being an intern at FORSK TECHNOLOGIES,I have explored quite a few Python libraries (Matplotlib, Pandas, Numpy, Seaborn, Shapefile, Basemap, Geopandas) which have really helped in plotting data(somehow real-time data too..) over maps.

Hướng dẫn dùng geopandas plot python

Mapping Geograph In Python

Visualizing data over a map is very helpful while working on data science which can be done through modules such as geopandas etc. Here we will be exploring the method to create geo map and visualize data over it, using shapefiles(.shp) and some other Python libraries.

Here we will be working on city wise population of Rajasthan following visualisation of data on a map.

The shapefile required for this article can be downloaded from this link click here

Installing Shapefile Library

~ conda/pip install pyshp

Importing Libraries

import numpy as np
import pandas as pd
import shapefile as shp
import matplotlib.pyplot as plt
import seaborn as sns

Initializing Visualization Set

sns.set(style=”whitegrid”, palette=”pastel”, color_codes=True) sns.mpl.rc(“figure”, figsize=(10,6))

Opening The Vector Map

A vector map is a group of several files with a .shp format.

#opening the vector mapshp_path = “\\District_Boundary.shp”#reading the shape file by using reader function of the shape libsf = shp.Reader(shp_path)

Number of different shapes which were imported by shp.reader

len(sf.shapes())

The result will come out to be 33 which tells us that there are 33 shapes or we can say cities in the region of Rajasthan.

To explore those records:

sf.records()

Hướng dẫn dùng geopandas plot python

A sample output

To explore a particular record where 1 is the Id or row number and 0 refers to the column:

sf.records()[1][0]

Result-

Output= Barmer

Converting Shapefile Data Into Pandas Dataframes:

Making accessing cities easier by converting shapefile data into a more relatable Pandas Dataframe format.

def read_shapefile(sf):
#fetching the headings from the shape file
fields = [x[0] for x in sf.fields][1:]
#fetching the records from the shape file
records = [list(i) for i in sf.records()]
shps = [s.points for s in sf.shapes()]
#converting shapefile data into pandas dataframe
df = pd.DataFrame(columns=fields, data=records)
#assigning the coordinates
df = df.assign(coords=shps)
return df

Visualization of data after being converted into Dataframes where it refers to rows and columns

df = read_shapefile(sf)df.shape

Dataframe having a shape of (33,6) means it has 33 rows and 6 columns in it.

Let’s See a Sample Of The Dataframe Created

# sample of a data representation the last point has the coordinates of the data latitude and longitude which will be used to create a specific map shapedf.sample(5)

The result will look like —

Hướng dẫn dùng geopandas plot python

Output

Here cords are the latitude and longitudes which will be used to create the map.

Plotting The Map Of a City In Rajasthan Or a Specific Shape With The Help Of Matplotlib

#a) Plots the shape (polygon) based on the city’s coordinates and,

#b) calculates and return the medium point of that specific shape (x0, y0).

#This medium point is also used to define where to print the city name.

def plot_shape(id, s=None):
plt.figure()
#plotting the graphical axes where map ploting will be done
ax = plt.axes()
ax.set_aspect('equal')
#storing the id number to be worked upon
shape_ex = sf.shape(id)
#NP.ZERO initializes an array of rows and column with 0 in place of each elements
#an array will be generated where number of rows will be(len(shape_ex,point))and number of columns will be 1 and stored into the variable
x_lon = np.zeros((len(shape_ex.points),1))
#an array will be generated where number of rows will be(len(shape_ex,point))and number of columns will be 1 and stored into the variable
y_lat = np.zeros((len(shape_ex.points),1))
for ip in range(len(shape_ex.points)):
x_lon[ip] = shape_ex.points[ip][0]
y_lat[ip] = shape_ex.points[ip][1]
#plotting using the derived coordinated stored in array created by numpy
plt.plot(x_lon,y_lat)
x0 = np.mean(x_lon)
y0 = np.mean(y_lat)
plt.text(x0, y0, s, fontsize=10)
# use bbox (bounding box) to set plot limits
plt.xlim(shape_ex.bbox[0],shape_ex.bbox[2])
return x0, y0

Setting The City Name To Plot Respective Map

DIST_NAME = ‘JAIPUR’#to get the id of the city map to be plottedcom_id = df[df.DIST_NAME == ‘JAIPUR’].index.get_values()[0]plot_shape(com_id, DIST_NAME)sf.shape(com_id)

Hướng dẫn dùng geopandas plot python

Output

For plotting a specific shape we had to know the ID of the particular city but changing shapefile file data into Pandas dataframe made work much more easier and simpler, Now we can directly call it by its name.

Plotting The Full Map

This particular function combines all the single shapes that were made using coordinates and show it as a combined shape.

def plot_map(sf, x_lim = None, y_lim = None, figsize = (11,9)):
plt.figure(figsize = figsize)
id=0
for shape in sf.shapeRecords():
x = [i[0] for i in shape.shape.points[:]]
y = [i[1] for i in shape.shape.points[:]]
plt.plot(x, y, 'k')

if (x_lim == None) & (y_lim == None):
x0 = np.mean(x)
y0 = np.mean(y)
plt.text(x0, y0, id, fontsize=10)
id = id+1

if (x_lim != None) & (y_lim != None):
plt.xlim(x_lim)
plt.ylim(y_lim)

#calling the function and passing required parameters to plot the full mapplot_map(sf)

Hướng dẫn dùng geopandas plot python

Plotting a Zoomed Map

y_lim = (2900000,3000000) # latitudex_lim = (200000, 400000) # longitudeplot_map(sf, x_lim, y_lim)

Hướng dẫn dùng geopandas plot python

output

Highlighting a Single Shape Over a Complete Map

Combining previous functions will do the work. Where we can plot a single shape over a complete map. ID and the colour code will be the required parameters for the function.

def plot_map_fill(id, sf, x_lim = None, 
y_lim = None,
figsize = (11,9),
color = 'r'):

plt.figure(figsize = figsize)
fig, ax = plt.subplots(figsize = figsize)
for shape in sf.shapeRecords():
x = [i[0] for i in shape.shape.points[:]]
y = [i[1] for i in shape.shape.points[:]]
ax.plot(x, y, 'k')

shape_ex = sf.shape(id)
x_lon = np.zeros((len(shape_ex.points),1))
y_lat = np.zeros((len(shape_ex.points),1))
for ip in range(len(shape_ex.points)):
x_lon[ip] = shape_ex.points[ip][0]
y_lat[ip] = shape_ex.points[ip][1]
ax.fill(x_lon,y_lat, color)

if (x_lim != None) & (y_lim != None):
plt.xlim(x_lim)
plt.ylim(y_lim)

#plot_map_fill(0, sf, x_lim, y_lim, color=’y’)plot_map_fill(13, sf,color=’y’)

Hướng dẫn dùng geopandas plot python

Desired Output

Highlighting Multiple Shapes Over The Complete Map With City ID’s

With the help of this function, as parameters, we can give ID’s of the multiple cities which will result in highlighting multiple cities instead of 1.

def plot_map_fill_multiples_ids(title, city, sf, 
x_lim = None,
y_lim = None,
figsize = (11,9),
color = 'r'):

plt.figure(figsize = figsize)
fig, ax = plt.subplots(figsize = figsize)
fig.suptitle(title, fontsize=16)
for shape in sf.shapeRecords():
x = [i[0] for i in shape.shape.points[:]]
y = [i[1] for i in shape.shape.points[:]]
ax.plot(x, y, 'k')

for id in city:
shape_ex = sf.shape(id)
x_lon = np.zeros((len(shape_ex.points),1))
y_lat = np.zeros((len(shape_ex.points),1))
for ip in range(len(shape_ex.points)):
x_lon[ip] = shape_ex.points[ip][0]
y_lat[ip] = shape_ex.points[ip][1]
ax.fill(x_lon,y_lat, color)

x0 = np.mean(x_lon)
y0 = np.mean(y_lat)
plt.text(x0, y0, id, fontsize=10)

if (x_lim != None) & (y_lim != None):
plt.xlim(x_lim)
plt.ylim(y_lim)

Let’s see how the map looks like

#naming the id numbers of the cities to be colouredcity_id = [0, 1, 2, 3, 4, 5, 6]plot_map_fill_multiples_ids(“Multiple Shapes”,city_id, sf, color = ‘g’)

Hướng dẫn dùng geopandas plot python

Highlighting Multiple Shapes Over The Complete Map By City Names

Recently we have encountered the problem of highlighting shapes with city ID(index) but though we have our data as Pandas Dataframe we can also do it with mentioning the city names.

# plotting the city on the map to be coloured by using the dist_namedef plot_cities_2(sf, title, cities, color):

df = read_shapefile(sf)
city_id = []
for i in cities:
city_id.append(df[df.DIST_NAME == i.upper()]
.index.get_values()[0])
plot_map_fill_multiples_ids(title, city_id, sf,
x_lim = None,
y_lim = None,
figsize = (11,9),
color = color);

Let’s have a look at the output

south = [‘jaipur’,’churu’,’bikaner’]plot_cities_2(sf, ‘DIST’, south, ‘c’)

Hướng dẫn dùng geopandas plot python

Plotting a Heat Map

It is a type of map where shapes are filled with a specific colour of varying intensities according to the value provided. It provides clear data interpretation in the geographic format.

In the first function, we will divide our list of data on intervals or bins where each bin will have a specific colour intensity, 6 bins and 4 different colour pallets.

def calc_color(data, color=None):
if color == 1:
color_sq = ['#dadaebFF','#bcbddcF0','#9e9ac8F0','#807dbaF0','#6a51a3F0','#54278fF0'];
colors = 'Purples';
elif color == 2:
color_sq = ['#c7e9b4','#7fcdbb','#41b6c4','#1d91c0','#225ea8','#253494'];
colors = 'YlGnBu';
elif color == 3:
color_sq = ['#f7f7f7','#d9d9d9','#bdbdbd','#969696','#636363','#252525'];
colors = 'Greys';
elif color == 9:
color_sq = ['#ff0000','#ff0000','#ff0000','#ff0000','#ff0000','#ff0000'];

else:
color_sq = ['#ffffd4','#fee391','#fec44f','#fe9929','#d95f0e','#993404'];
colors = 'YlOrBr';
new_data, bins = pd.qcut(data, 6, retbins=True,
labels=list(range(6)))
color_ton = []
for val in new_data:
color_ton.append(color_sq[val])
if color != 9:
colors = sns.color_palette(colors, n_colors=6)
sns.palplot(colors, 0.6);
for i in range(6):
print ("\n"+str(i+1)+': '+str(int(bins[i]))+
" => "+str(int(bins[i+1])-1))
print("\n\n 1 2 3 4 5 6")
return color_ton, bins;

Functions plot_cities() and plot_map_fill_multiples_ids should be adapted to take advantage of this new coloured scheme:

def plot_cities_data(sf, title, cities, data=None,color=None, print_id=False):

color_ton, bins = calc_color(data, color)
df = read_shapefile(sf)
city_id = []
for i in cities:
city_id.append(df[df.DIST_NAME ==
i.upper()].index.get_values()[0])
plot_map_fill_multiples_ids_tone(sf, title, city_id,
print_id,
color_ton,
bins,
x_lim = None,
y_lim = None,
figsize = (11,9));

def plot_map_fill_multiples_ids_tone(sf, title, city,
print_id, color_ton,
bins,
x_lim = None,
y_lim = None,
figsize = (11,9)):

plt.figure(figsize = figsize)
fig, ax = plt.subplots(figsize = figsize)
fig.suptitle(title, fontsize=16)
for shape in sf.shapeRecords():
x = [i[0] for i in shape.shape.points[:]]
y = [i[1] for i in shape.shape.points[:]]
ax.plot(x, y, 'k')

for id in city:
shape_ex = sf.shape(id)
x_lon = np.zeros((len(shape_ex.points),1))
y_lat = np.zeros((len(shape_ex.points),1))
for ip in range(len(shape_ex.points)):
x_lon[ip] = shape_ex.points[ip][0]
y_lat[ip] = shape_ex.points[ip][1]
ax.fill(x_lon,y_lat, color_ton[city.index(id)])
if print_id != False:
x0 = np.mean(x_lon)
y0 = np.mean(y_lat)
plt.text(x0, y0, id, fontsize=10)
if (x_lim != None) & (y_lim != None):
plt.xlim(x_lim)
plt.ylim(y_lim)

Let’s take an example to plot the data in a heat map presentational format.

names= [‘jaipur’,’bikaner’,’churu’,’bhilwara’,’udaipur’]data = [100, 2000, 300, 400000, 500, 600, 100, 2000, 300, 400, 500, 600, 100, 2000, 300, 400, 500, 600]print_id = True # The shape id will be printedcolor_pallete = 1 # ‘Purple’plot_cities_data(sf, ‘Heat map of given cities’, names, data, color_pallete, print_id)

Hướng dẫn dùng geopandas plot python

Hướng dẫn dùng geopandas plot python

Plotting Real Data

Plotting the population of Rajasthan region which is here referred to the real data.

# reading data set
census_17 = df.POPULATION
census_17.shape#plottingtitle = ‘Population Distrubution on Rajasthan Region’data = census_17names = df.DIST_NAMEplot_cities_data(sf, title, names, data, 1, True)

Hướng dẫn dùng geopandas plot python

Hướng dẫn dùng geopandas plot python

Hope you have understood the concept of plotting maps through Python libraries.

You can refer my GITHUB for the exact code.

Mapping With Geopandas

Hướng dẫn dùng geopandas plot python

As we have seen the procedure of mapping with Pandas Dataframe, now its turn to visualize it with Geopandas Dataframe. Geopandas makes working easier with geospatial data (data that has a geographic component to it) in Python. It combines the capabilities of Pandas and shapely by operating a much more compact code. It is one of the best ways to get started with making choropleth maps.

Let’s start with some mapping through Geopandas and let’s map Rajasthan’s population on it!

Shapefile used in the previous topic is sufficient to work further for Geopandas.

Installation

conda install geopandas

The Very First Step Is To Import Required Libraries

import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd

Getting The Data Of Interest

Rajasthan being the largest state of India is a highly populated state. Mapping its population will make visualization much simpler and efficient. Let’s set the path to open the shapefile for the Rajasthan region through Geopandas.

# set the filepath and load
fp = “\\District_Boundary.shp”
#reading the file stored in variable fp
map_df = gpd.read_file(fp)
# check data type so we can see that this is not a normal dataframe, but a GEOdataframemap_df.head()

Hướng dẫn dùng geopandas plot python

Let’s preview the map

#plotting the map of the shape file preview of the maps without data in it
map_df.plot()

Hướng dẫn dùng geopandas plot python

Now it’s the time to open the CSV file which contains the data to plot over. Here, we could have also made a csv for the data required but I have extracted the data from the shapefile only instead of making the csv or searching it over the web which has saved a lot of time.

#opening the csv(.shp) file which contains the data to be plotted on the map
df = gpd.read_file(\\District_Boundary.shp”)
df.head()#selecting the columns required
df = df[[‘DIST_NAME’,’POPULATION’]]
#renaming the column name
data_for_map = df.rename(index=str, columns={‘DIST_NAME’: ‘DISTRICT’,‘POPULATION’: ‘POP’})

Let’s preview the Geodataframe

# check dat dataframe
data_for_map.head()

Hướng dẫn dùng geopandas plot python

Now, let’s join our geodata with our dataset

# joining the geodataframe with the cleaned up csv dataframe
merged = map_df.set_index(‘DIST_NAME’).join(data_for_map.set_index(‘DISTRICT’))
#.head() returns the top 5(by default ) lines of the dataframe
merged.head()

Hướng dẫn dùng geopandas plot python

Output after merging datasets

Time To Map

First, we need to do some pre-required work for the Matplotlib to plot the map as setting the variable, range and creating a basic figure for the map.

# set a variable that will call whatever column we want to visualise on the map
variable = ‘POP’
# set the range for the choropleth
vmin, vmax = 120, 220
# create figure and axes for Matplotlib
fig, ax = plt.subplots(1, figsize=(10, 6))

Time To Create The Map

merged.plot(column=variable, cmap=’BuGn’, linewidth=0.8, ax=ax, edgecolor=’0.8')

Hướng dẫn dùng geopandas plot python

This is what we wanted, the Map is ready! but requires some beautification and customization.

# remove the axis
ax.axis(‘off’)
# add a title
ax.set_title(‘Population of Rajasthan’, fontdict={‘fontsize’: ‘25’, ‘fontweight’ : ‘3’})
# create an annotation for the data source
ax.annotate(‘Source: Rajasthan Datastore, 2019’,xy=(0.1, .08), xycoords=’figure fraction’, horizontalalignment=’left’, verticalalignment=’top’, fontsize=12, color=’#555555')

Colour bar is a must thing in a map which tells us the parameters to look for, let’s customize it to our map.

# Create colorbar as a legend
sm = plt.cm.ScalarMappable(cmap=’BuGn’, norm=plt.Normalize(vmin=vmin, vmax=vmax))
# empty array for the data range
sm._A = []
# add the colorbar to the figure
cbar = fig.colorbar(sm)
#saving our map as .png file.
fig.savefig(‘map_export.png’, dpi=300)

Hướng dẫn dùng geopandas plot python

You might have got why mapping with Geopandas is better to get started with. It has a very compact and simple code and gives an excellent desired output. We can plot any kind of data over any region through this approach.

Refer my GITHUB for the exact code.

Mapping With Basemap

Hướng dẫn dùng geopandas plot python

The Matplotlib basemap toolkit is a library for plotting 2D data on maps in Python. Basemap does not do any plotting on its own but provides the facilities to transform coordinates to one of 25 different map projections. Matplotlib is then used to plot contours, images, vectors, lines or points in the transformed coordinates. Shoreline, river and political boundary datasets are provided, along with methods for plotting them.

In this section, you will be learning about plotting data on a map through basemap toolkit.

Let’s see the map visualization through basemap toolkit.

Installation

conda install basemap
conda install basemap-data-hires

Importing Libraries

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import geopandas as gpd
import pandas as pd

Note: If finding any difficulty or error(PROJ_LIB) to import basemap library you can directly set its path to import it as per located in your pc and run it prior to importing basemap.

#to import the basemap library give the direct path to the libraryimport os
os.environ["PROJ_LIB"]="C:\\Users\\Anaconda3\\Library\\share"

Let’s get our data of interest to be plotted from the csv and the shapefile. Here’s the link to download the csv required CLICK HEREand shapefile required CLICK HERE

city=gpd.read_file("F:\\District_Boundary.shp")
csv=pd.read_csv("\\latlong_raj.csv")

We start by loading the data. Lat-long has been imported from a separate csv made and other data such as district names and their population from the .shp file downloaded in previous sections.

lat=csv['LAT'].values
lon=csv['LONG'].values
population = city['POPULATION'].values
dist=city['DIST_NAME'].values

This data has been stored as a numpy array,you can check it by [type(lat)].

Next, we set up the map projection, scatter the data, and then create a colour bar

fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution='h',
lat_0=27.0238, lon_0=74.2179,
width=1.05E6, height=1.2E6)
m.shadedrelief()

Our map background is now ready on which data can be plotted. Here lat-long of Rajasthan, India have been set with ‘lcc’ projection with a certain amount of zoom to focus only the particular state.

Hướng dẫn dùng geopandas plot python

Let’s add some details to it and separate the boundaries.

m.drawcoastlines(color='blue',linewidth=3)
m.drawcountries(color='gray',linewidth=3)
m.drawstates(color='gray')

Hướng dẫn dùng geopandas plot python

Now its time to scatter the data over the map projection and set the colour bar.

# scatter city data, with c reflecting population
m.scatter(lon,lat, latlon=True,
c=population,s=700,
cmap='YlGnBu_r', alpha=0.5)
#create colorbar
plt.colorbar(label=r'Population')
plt.clim(300000, 4000000)

Hướng dẫn dùng geopandas plot python

Doesn’t it look like something is missing? Yes! of course, the district names. We are not able to identify the district through this projection.

Let’s Plot them.

We have district names and their lat longs stored in variables above but we have the data stored as numpy array, so we need to store them in list or dictionaries.

dict1={}
list1=[]
list2=[]
list3=[]
n=0
#storing each value in different lists
for z in lat:
list1.append(z)
for c in lon:
list2.append(c)
for b in dist:
list3.append(b)
#storing the values of lat long in a dictionary with lat as keys and long as values
while(n dict1[list1[n]]=list2[n]
n+=1

Now, lat-long have been stored into a dictionary(dict1) and district names in a list(list3). Let’s use them for naming districts over projection.

i=0# Map (long, lat) to (x, y) for plotting
#naming the cities of Rajasthan with the help of their lat(z)long(c)
for z,c in dict1.items():
x,y = m(c, z)
plt.plot(x, y, 'ok', markersize=5)
plt.text(x, y,list3[i], fontsize=10);
i+=1

Hướng dẫn dùng geopandas plot python

Refer my GITHUB for the exact code.

There you have it. Thank you for reading.

For more on Geopandas, Basemap or you want to learn trending technologies in industry like Python, ML, DL, AI, IoT etc, Connect with Forsk Technologies.