Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse addresses with HC XXX and XXXX Box __ #259

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bbharathrao
Copy link

@bbharathrao bbharathrao commented Mar 25, 2019

Training for the address with 3 or 4 (numeric only) numbers followed by HC. The numbers can be length of 3 to 4 characters with or without leading zeros.
Example 1:

address1 = usaddress.tag("HC 095 Box 23")

Currently It is throwing an error as

ORIGINAL STRING: HC 095 Box 23
PARSED TOKENS: [(u'HC', 'USPSBoxType'), (u'095', 'USPSBoxID'), (u'Box', 'USPSBoxType'), (u'23', 'USPSBoxID')]
UNCERTAIN LABEL: USPSBoxType

It needs to be parsed this way.

pprint(address1)
(OrderedDict([('USPSBoxGroupType', 'HC'),
('USPSBoxGroupID', '095'),
('USPSBoxType', 'Box'),
('USPSBoxID', '23')]),
'PO Box)

Example 2:

address1 = usaddress.tag("HC 235 Box 1A")

Currently It is throwing an error as

ORIGINAL STRING: HC 235 Box 1A
PARSED TOKENS: (u'HC', 'USPSBoxType'), (u'235', 'USPSBoxID'), (u'Box', 'USPSBoxType'), (u'1A', 'USPSBoxID')]
UNCERTAIN LABEL: USPSBoxType

It needs to be parsed this way.

pprint(address1)
(OrderedDict([('USPSBoxGroupType', 'HC'),
('USPSBoxGroupID', '235'),
('USPSBoxType', 'Box'),
('USPSBoxID', '1A')]),
'PO Box)

Example 3:

address1 = usaddress.tag("HC 2302 Box 65")

Currently It is parsing as Street Address

pprint(address1)
(OrderedDict([('AddressNumber', 'HC'),
('StreetName', '2302'),
('USPSBoxType', 'Box'),
('USPSBoxID', '65')]),
'Street Address')

It needs to be parsed this way.

pprint(address1)
(OrderedDict([('USPSBoxGroupType', 'HC'),
('USPSBoxGroupID', '2302'),
('USPSBoxType', 'Box'),
('USPSBoxID', '65')]),
'PO Box)

Example 4:

address1 = usaddress.tag("HC 0955 Box 12")

Currently It is parsing as Street Address

pprint(address1)
(OrderedDict([('AddressNumber', 'HC'),
('StreetName', '0955'),
('USPSBoxType', 'Box'),
('USPSBoxID', '12')]),
'Street Address')

It needs to be parsed this way.

pprint(address1)
(OrderedDict([('USPSBoxGroupType', 'HC'),
('USPSBoxGroupID', '0955'),
('USPSBoxType', 'Box'),
('USPSBoxID', '65')]),
'PO Box)

Training xml located at:
usaddress/training/HC_XXXX.xml

Testing xml located at:
usaddress/measure_performance/test_data/test_HC_XXXX.xml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant