Skip to content

Commit

Permalink
[SEDONA-328] Add SedonaPyDeck | [SEDONA-329] Remove geometry_col from…
Browse files Browse the repository at this point in the history
… SedonaKepler APIs (#913)
  • Loading branch information
iGN5117 committed Jul 24, 2023
1 parent 49f7d6f commit eb5b8d5
Show file tree
Hide file tree
Showing 13 changed files with 16,969 additions and 45 deletions.
7 changes: 7 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,13 @@ BSD 3-Clause License
--------------------------------------
python/src/pygeos/c_api.h (modified based on https://github.com/pygeos/pygeos/blob/master/src/c_api.h)

Google Buildings License [https://creativecommons.org/licenses/by/4.0/ (CC by 4.0)]
--------------------------------------
core/src/test/resources/813_buildings_test.csv

Chicago Crimes License
--------------------------------------
core/src/test/resources/Chicago_Crimes.csv

No-copyright data used in unit tests
--------------------------------------
Expand Down
1,000 changes: 1,000 additions & 0 deletions core/src/test/resources/813_buildings_test.csv

Large diffs are not rendered by default.

15,271 changes: 15,271 additions & 0 deletions core/src/test/resources/Chicago_Crimes.csv

Large diffs are not rendered by default.

125 changes: 110 additions & 15 deletions docs/tutorial/sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -507,28 +507,123 @@ There are lots of other functions can be combined with these queries. Please rea

## Visualize query results

==Sedona >= 1.5.0==
Sedona provides `SedonaPyDeck` and `SedonaKepler` wrappers, both of which expose APIs to create interactive map visualizations from SedonaDataFrames in a Jupyter environment.

!!!Note
Both SedonaPyDeck and SedonaKepler expect the default geometry order to be lon-lat. If your dataframe has geometries in the lat-lon order, please check out [ST_FlipCoordinates](https://sedona.apache.org/latest-snapshot/api/sql/Function/#st_flipcoordinates)

### SedonaPyDeck
Spatial query results can be visualized in Jupyter lab/notebook using SedonaPyDeck.

SedonaPyDeck exposes APIs to create interactive map visualizations using [pydeck](https://pydeck.gl/index.html#) based on [deck.gl](https://deck.gl/)


#### Creating a Choropleth map using SedonaPyDeck

SedonaPyDeck exposes a create_choropleth_map API which can be used to visualize a choropleth map out of the passed SedonaDataFrame containing polygons with an observation:

```python
def create_choropleth_map(cls, df, fill_color=None, plot_col=None, initial_view_state=None, map_style=None,
map_provider=None, elevation_col=0)
```

The parameter `fill_color` can be given a list of RGB/RGBA values, or a string that contains RGB/RGBA values based on a column.

For example, all these are valid values of fill_color:
```python
fill_color=[255, 12, 250]
fill_color=[0, 12, 250, 255]
fill_color='[0, 12, 240, AirportCount * 10]' ## AirportCount is a column in the passed df
```

Instead of giving a `fill_color` parameter, a 'plot_col' can be passed which specifies the column to decide the choropleth.
SedonaPyDeck then creates a default color scheme based on the values of the column passed.

The parameter `elevation_col` can be given a numeric or a string value (containing the column with/without operations on it) to set a 3D elevation to the plotted polygons if any.


Optionally, parameters `initial_view_state`, `map_style`, `map_provider` can be passed to configure the map as per user's liking.
More details on the parameters and their default values can be found on the pydeck website.

#### Creating a Geometry map using SedonaPyDeck

SedonaPyDeck exposes a create_geometry_map API which can be used to visualize a passed SedonaDataFrame containing any type of geometries:

```python
def create_geometry_map(cls, df, fill_color="[85, 183, 177, 255]", line_color="[85, 183, 177, 255]",
elevation_col=0, initial_view_state=None,
map_style=None, map_provider=None):
```

The parameter `fill_color` can be given a list of RGB/RGBA values, or a string that contains RGB/RGBA values based on a column, and is used to color polygons or point geometries in the map

The parameter `line_color` can be given a list of RGB/RGBA values, or a string that contains RGB/RGBA values based on a column, and is used to color the line geometries in the map.

The parameter `elevation_col` can be given a static elevation or elevation based on column values like `fill_color`, this only works for the polygon geometries in the map.

Optionally, parameters `initial_view_state`, `map_style`, `map_provider` can be passed to configure the map as per user's liking.
More details on the parameters and their default values can be found on the pydeck website as well by deck.gl [here](https://github.com/visgl/deck.gl/blob/8.9-release/docs/api-reference/layers/geojson-layer.md)

#### Creating a Scatterplot map using SedonaPyDeck

SedonaPyDeck exposes a create_scatterplot_map API which can be used to visualize a scatterplot out of the passed SedonaDataFrame containing points:

```python
def create_scatterplot_map(cls, df, fill_color="[255, 140, 0]", radius_col=1, radius_min_pixels = 1, radius_max_pixels = 10, radius_scale=1, initial_view_state=None, map_style=None, map_provider=None)
```

The parameter `fill_color` can be given a list of RGB/RGBA values, or a string that contains RGB/RGBA values based on a column.

The parameter `radius_col` can be given a numeric value or a string value consisting of any operations on the column, in order to specify the radius of the plotted point.

The parameter `radius_min_pixels` can be given a numeric value that would set the minimum radius in pixels. This can be used to prevent the plotted circle from getting too small when zoomed out.

The parameter `radius_max_pixels` can be given a numeric value that would set the maximum radius in pixels. This can be used to prevent the circle from getting too big when zoomed in.

The parameter `radius_scale` can be given a numeric value that sets a global radius multiplier for all points.

Optionally, parameters `initial_view_state`, `map_style`, `map_provider` can be passed to configure the map as per user's liking.
More details on the parameters and their default values can be found on the pydeck website as well by deck.gl [here](https://github.com/visgl/deck.gl/blob/8.9-release/docs/api-reference/layers/scatterplot-layer.md)


#### Creating a heatmap using SedonaPyDeck

SedonaPyDeck exposes a create_heatmap API which can be used to visualize a heatmap out of the passed SedonaDataFrame containing points:
```python
def create_heatmap(cls, df, color_range=None, weight=1, aggregation="SUM", initial_view_state=None, map_style=None,
map_provider=None)
```

The parameter `color_range` can be optionally given a list of RGB values, SedonaPyDeck by default uses `6-class YlOrRd` as color_range.
More examples can be found on [colorbrewer](https://colorbrewer2.org/#type=sequential&scheme=YlOrRd&n=6)

The parameter `weight` can be given a numeric value or a string with column and operations on it to determine weight of each point while plotting a heatmap.
By default, SedonaPyDeck assigns a weight of 1 to each point

The parameter `aggregation` can be used to define aggregation strategy to use when aggregating heatmap to a lower resolution (zooming out).
One of "MEAN" or "SUM" can be provided. By default, SedonaPyDeck uses "MEAN" as the aggregation strategy.

Optionally, parameters `initial_view_state`, `map_style`, `map_provider` can be passed to configure the map as per user's liking.
More details on the parameters and their default values can be found on the pydeck website as well by deck.gl [here](https://github.com/visgl/deck.gl/blob/8.9-release/docs/api-reference/aggregation-layers/heatmap-layer.md)


### SedonaKepler


Spatial query results can be visualized in Jupyter lab/notebook using SedonaKepler.

SedonaKepler exposes APIs to create interactive and customizable map visualizations using [KeplerGl](https://kepler.gl/).

### Creating a map object using SedonaKepler.create_map
#### Creating a map object using SedonaKepler.create_map

SedonaKepler exposes a create_map API with the following signature:

```python
create_map(df: SedonaDataFrame=None, name: str='unnamed', geometry_col: str='geometry', config: dict=None) -> map
create_map(df: SedonaDataFrame=None, name: str='unnamed', config: dict=None) -> map
```

The parameter 'name' is used to associate the passed SedonaDataFrame in the map object and any config applied to the map is linked to this name. It is recommended you pass a unique identifier to the dataframe here.

The parameter 'geometry_col' is used to identify the geometry containing column. This is required if the column has a name other than the standard 'geometry'.

!!!Note
Failure to pass the correct geometry column name (if it has a name other than 'geometry') will result in a failure to create a map object.

If no SedonaDataFrame object is passed, an empty map (with config applied if passed) is returned. A SedonaDataFrame can be added later using the method `add_df`

A map config can be passed optionally to apply pre-apply customizations to the map.
Expand All @@ -541,20 +636,20 @@ A map config can be passed optionally to apply pre-apply customizations to the m

=== "Python"
```python
map = SedonaKepler.create_map(df=groupedresult, name="AirportCount", geometry_col="country_geom")
map = SedonaKepler.create_map(df=groupedresult, name="AirportCount")
map
```

### Adding SedonaDataFrame to a map object using SedonaKepler.add_df
#### Adding SedonaDataFrame to a map object using SedonaKepler.add_df
SedonaKepler exposes a add_df API with the following signature:

```python
add_df(map, df: SedonaDataFrame, name: str='unnamed', geometry_col='geometry')
add_df(map, df: SedonaDataFrame, name: str='unnamed')
```

This API can be used to add a SedonaDataFrame to an already created map object. The map object passed is directly mutated and nothing is returned.

The parameters name and geometry_col have the same conditions as 'create_map'
The parameters name has the same conditions as 'create_map'

!!!Tip
This method can be used to add multiple dataframes to a map object to be able to visualize them together.
Expand All @@ -563,17 +658,17 @@ The parameters name and geometry_col have the same conditions as 'create_map'
=== "Python"
```python
map = SedonaKepler.create_map()
SedonaKepler.add_df(map, groupedresult, name="AirportCount", geometry_col="country_geom")
SedonaKepler.add_df(map, groupedresult, name="AirportCount")
map
```

### Setting a config via the map
#### Setting a config via the map
A map rendered by accessing the map object created by SedonaKepler includes a config panel which can be used to customize the map

<img src="../../image/sedona_customization.gif" width="1000">


### Saving and setting config
#### Saving and setting config

A map object's current config can be accessed by accessing its 'config' attribute like `map.config`. This config can be saved for future use or use across notebooks if the exact same map is to be rendered everytime.

Expand Down
33 changes: 33 additions & 0 deletions licenses/LICENSE-Chicago-Crimes
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
LICENSE-Chicago-Crimes
referenced from https://www.chicago.gov/city/en/narr/foia/data_disclaimer.html


DISCLAIMER OF LIABILITY

The City of Chicago (“City”) voluntarily provides the data on this website as a service to the public. The City makes no warranty, representation, or guaranty as to the content, accuracy, timeliness, or completeness of any of the data provided at this website. The City makes this data available on an “as is” basis and explicitly disclaims any representations and warranties, including, without limitation, the implied warranties of merchantability and fitness for a particular purpose. The City shall assume no liability for: 1. any errors, omissions, or inaccuracies in the data provided at this website regardless how caused; or, 2. any decision made or action taken or not taken by anyone using or relying upon data provided at this website. The City assumes no liability for any virus or other damage to any computer that might occur during or as a result of accessing this website or the data provided herein.



USE OF DATA

The City may require a user of this data to terminate any and all display, distribution or other use of any or all of the data provided at this website for any reason including, without limitation, violation of these Terms of Use or other terms as defined by City agencies or departments contributing data to this website.

Any user of this website providing any software application, or other secondary or derivative application using data supplied at this website shall do the following:

Include the following disclaimer at the site where the software application, or other secondary or derivative application can be accessed or downloaded:

“This site provides applications using data that has been modified for use from its original source, www.cityofchicago.org, the official website of the City of Chicago. The City of Chicago makes no claims as to the content, accuracy, timeliness, or completeness of any of the data provided at this site. The data provided at this site is subject to change at any time. It is understood that the data provided at this site is being used at one’s own risk.”

Comply with any additional Terms of Use set forth by the City agency or department providing data used by the software application, or other secondary or derivative application, including, without limitation, requirements to include additional citations or disclaimers at the site where the application can be accessed or downloaded.



RESERVATION OF RIGHTS

The City reserves the right to discontinue availability of content on this website at any time and for any reason. The City reserves the right to claim or seek to protect any patent, copyright, trademark, or other intellectual property rights in any of the information, images, software, or processes displayed or used at this website. If the City claims or seeks to protect any intellectual property rights in any of the information, images, software, or processes displayed or used at this website, then this website will so indicate on the webpage on or from which such information, images, software, or processes are accessed. These Terms of Use do not grant anyone any title or right to any patent, copyright, trademark or other intellectual property rights that the City may have in any of the information, images, software, or processes displayed or used at this website.



INDEMNITY

To the fullest extent permitted by law, any user of the data provided at this website shall indemnify and hold harmless the City from any claim, loss, damage, injury, or liability of any kind (including, without limitation, incidental and consequential damages, court costs, attorney’s fees and costs of investigation), that arises directly or indirectly, in whole or in part, from that user’s use of this data, including any secondary or derivative use of the information provided herein. Every user of this data also specifically acknowledges and agrees to have an immediate and independent obligation to defend the City from any claim that may fall within this indemnification provision, even if the allegations are or may be groundless, false or fraudulent, which obligation arises at the time such claim is tendered to the user by the City and continues at all times thereafter.
1 change: 1 addition & 0 deletions python/Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ pyspark=">=2.3.0"
attrs="*"
pyarrow="*"
keplergl = "==0.3.2"
pydeck = "===0.8.0"

[requires]
python_version = "3.7"
33 changes: 9 additions & 24 deletions python/sedona/maps/SedonaKepler.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,54 +16,39 @@
# under the License.

from keplergl import KeplerGl
import geopandas as gpd
from sedona.maps.SedonaMapUtils import SedonaMapUtils


class SedonaKepler:

@classmethod
def create_map(cls, df=None, name="unnamed", geometry_col="geometry", config=None):
def create_map(cls, df=None, name="unnamed", config=None):
"""
Creates a map visualization using kepler, optionally taking a sedona dataFrame as data input
:param df: [Optional] SedonaDataFrame to plot on the map
:param name: [Optional] Name to be associated with the given dataframe, if a df is passed with no name, a default name of 'unnamed' is set for it.
:param geometry_col: [Optional] Custom name of geometry column in the sedona data frame,
if no name is provided, it is assumed that the column has the default name 'geometry'.
:param config: [Optional] A map config to be applied to the rendered map
:return: A map object
:param name: [Optional] Name to be associated with the given
dataframe, if a df is passed with no name, a default name of 'unnamed' is set for it.
param config: [Optional] A map config to be applied to the rendered map :return: A map object
"""
kepler_map = KeplerGl()
if df is not None:
SedonaKepler.add_df(kepler_map, df, name, geometry_col)
SedonaKepler.add_df(kepler_map, df, name)

if config is not None:
kepler_map.config = config

return kepler_map

@classmethod
def add_df(cls, kepler_map, df, name="unnamed", geometry_col="geometry"):
def add_df(cls, kepler_map, df, name="unnamed"):
"""
Adds a SedonaDataFrame to a given map object.
:param kepler_map: Map object to add SedonaDataFrame to
:param df: SedonaDataFrame to add
:param name: [Optional] Name to assign to the dataframe, default name assigned is 'unnamed'
:param geometry_col: [Optional] Custom name of geometry_column if any, if no name is provided, a default name of 'geometry' is assumed.
:return: Does not return anything, adds df directly to the given map object
"""
geo_df = SedonaKepler._convert_to_gdf(df, geometry_col)
geo_df = SedonaMapUtils.__convert_to_gdf__(df)
kepler_map.add_data(geo_df, name=name)

@classmethod
def _convert_to_gdf(cls, df, geometry_col="geometry"):
"""
Converts a SedonaDataFrame to a GeoPandasDataFrame and also renames geometry column to a standard name of 'geometry'
:param df: SedonaDataFrame to convert
:param geometry_col: [Optional]
:return:
"""
pandas_df = df.toPandas()
geo_df = gpd.GeoDataFrame(pandas_df, geometry=geometry_col)
if geometry_col != "geometry":
geo_df = geo_df.rename(columns={geometry_col: "geometry"})
return geo_df

Loading

0 comments on commit eb5b8d5

Please sign in to comment.