Skip to content

Latest commit

 

History

History
33 lines (19 loc) · 2.16 KB

README.md

File metadata and controls

33 lines (19 loc) · 2.16 KB

postal_expand

container build status

Using

If my_address_file.csv is a file in the current working directory with an address column named address, then the DeGAUSS command:

docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/postal_expand:0.1.0 my_address_file.csv

will produce my_address_file_postal_expand_0.1.0.csv with added columns:

  • cleaned_address: address with non-alphanumeric characterics and excess whitespace removed (with dht::clean_address())

  • expanded_addresses: the expanded addresses for cleaned_address

Addresses are be expanded into several possible normalized addresses using libpostal_expand. This can be useful for matching of these addresses with other messy, real world addresses.

Because each cleaned_address will likely result in more than one expanded_addresses, each input row is duplicated to accomodate several expanded_addresses. This means that when expanding addresses, the input CSV file is "expanded" too by duplicating the input rows.

Geomarker Methods

Input addresses are normalized using libpostal_expand by:

  1. removing non-alphanumeric characters (except -) and excess whitespace (with dht::clean_address())
  2. expanding the cleaned address into several possible normalized addresses

DeGAUSS Details

For detailed documentation on DeGAUSS, including general usage and installation, please see the DeGAUSS homepage.