Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accuracy checker #309

Open
DanialPahlavan opened this issue Aug 26, 2024 · 3 comments
Open

Accuracy checker #309

DanialPahlavan opened this issue Aug 26, 2024 · 3 comments
Labels
enhancement New feature or request postponed no soon implementation

Comments

@DanialPahlavan
Copy link

Is your feature request related to a problem? Please describe.

I need to convert PDF files to Word without using APIs due to cost constraints. I want to use Python libraries for this task but need to ensure the accuracy of the conversion.

Describe the solution you'd like

I would like to develop an automated system that evaluates the accuracy of different PDF to Word conversion methods using Python libraries. The system should identify and use the most accurate method.

Describe alternatives you've considered

  1. Using various Python libraries such as pdf2image, pdfplumber, PyMuPDF, and camelot.
  2. Manually comparing the output of different methods to determine accuracy.
  3. Exploring other open-source tools that might offer better accuracy.

Additional context

  • Screenshots of the current conversion results.
  • Examples of PDFs and their expected Word outputs.
  • Any specific requirements for maintaining layout and formatting.
  • RTL languages not supported: I need to check if the document contains RTL (Right-to-Left) languages during preprocessing.
@DanialPahlavan DanialPahlavan added the enhancement New feature or request label Aug 26, 2024
@JorjMcKie JorjMcKie added the postponed no soon implementation label Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request postponed no soon implementation
Projects
None yet
Development

No branches or pull requests

5 participants
@DanialPahlavan @JorjMcKie and others