Skip to content

Releases: Unstructured-IO/unstructured

0.3.5

05 Jan 00:50
a75499d
Compare
Choose a tag to compare

0.3.5

  • Add support for local inference
  • Add new pattern to recognize plain text dash bullets
  • Add test for bullet patterns
  • Fix for partition_html that allows for processing div tags that have both text and child elements
  • Add ability to extract document metadata from .docx, .xlsx, and .jpg files.
  • Helper functions for identifying and extracting phone numbers
  • Add new function extract_attachment_info that extracts and decode the attachment of an email.
  • Staging brick to convert a list of Elements to a pandas dataframe.

0.3.4

21 Dec 15:29
962c9dc
Compare
Choose a tag to compare

0.3.4

  • Python-3.7 compat

0.3.3

20 Dec 20:03
de4d0d4
Compare
Choose a tag to compare

0.3.3

  • Removes BasicConfig from logger configuration
  • Adds the partition_email partitioning brick
  • Adds the replace_mime_encodings cleaning bricks
  • Small fix to HTML parsing related to processing list items with sub-tags

0.3.2

15 Dec 22:20
1d68bb2
Compare
Choose a tag to compare

0.3.2

  • Added translate_text brick for translating text between languages
  • Add an apply method to make it easier to apply cleaners to elements

0.3.1

14 Dec 18:00
1700d4d
Compare
Choose a tag to compare

0.3.1

  • Added __init.py__ to partition

0.3.0

14 Dec 16:39
151732c
Compare
Choose a tag to compare

0.3.0

  • Implement staging brick for Argilla. Converts lists of Text elements to argilla dataset classes.
  • Removing the local PDF parsing code and any dependencies and tests.
  • Reorganizes the staging bricks in the unstructured.partition module
  • Allow entities to be passed into the Datasaur staging brick
  • Added HTML escapes to the replace_unicode_quotes brick
  • Fix bad responses in partition_pdf to raise ValueError
  • Adds partition_html for partitioning HTML documents.

0.2.4

11 Nov 00:31
4f539dd
Compare
Choose a tag to compare
  • Add an alternative way of importing Final to support google colab

0.2.3

10 Nov 21:37
300c564
Compare
Choose a tag to compare

0.2.3

  • Add cleaning bricks for removing prefixes and postfixes
  • Add cleaning bricks for extracting text before and after a pattern

0.2.2

08 Nov 22:07
2715950
Compare
Choose a tag to compare

0.2.2

  • Add staging brick for Datasaur

0.2.1

21 Oct 18:53
de31df5
Compare
Choose a tag to compare

0.2.1

  • Added brick to convert an ISD dictionary to a list of elements
  • Update PDFDocument to use the from_file method
  • Added staging brick for CSV format for ISD (Initial Structured Data) format.
  • Added staging brick for separating text into attention window size chunks for transformers.
  • Added staging brick for LabelBox.
  • Added ability to upload LabelStudio predictions
  • Added utility function for JSONL reading and writing
  • Added staging brick for CSV format for Prodigy
  • Added staging brick for Prodigy
  • Added ability to upload LabelStudio annotations
  • Added text_field and id_field to stage_for_label_studio signature