-
Notifications
You must be signed in to change notification settings - Fork 106
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
BED-4463: Implement BOM encoding package and refactor WriteAndValidat…
…eJSON for improved Unicode handling (#678) feat(bed-4463): Implement BOM encoding detection and UTF conversion This commit introduces a robust package for handling Byte Order Mark (BOM) encoding detection and UTF conversions, focusing on UTF-8, UTF-16 (BE/LE), and UTF-32 (BE/LE). It also refactors the WriteAndValidateJSON function to use this new package. Key changes and features: 1. Encoding Interface: - Introduced a unified Encoding interface with String(), Sequence(), and HasSequence() methods. 2. BOM Detection and UTF Conversion: - Implemented precise BOM detection for multiple encodings. - Added UTF-16 and UTF-32 to UTF-8 conversion with proper surrogate pair handling for UTF-16. - Utilized efficient bitwise operations for byte manipulation and endianness handling. 3. WriteAndValidateJSON Refactor: - Now uses bomenc.NormalizeToUTF8 for BOM detection and removal. - Normalizes input to UTF-8 for consistent JSON processing. - Simplified logic and improved error handling. 4. Testing: - Added comprehensive test suites for encoding detection, conversion, and edge cases. - Updated tests for WriteAndValidateJSON to reflect the new functionality. 5. Documentation: - Added detailed comments explaining bitwise operations and the rationale behind implementation choices. This implementation enhances our handling of text encodings across the codebase, particularly for IO operations with potentially BOM-prefixed content. It improves robustness and maintainability and sets a foundation for future Unicode-related features. Note: The current implementation of ValidateMetaTag does not invalidate JSON data with syntax errors, which are reflected in the updated tests.
- Loading branch information
Showing
16 changed files
with
1,658 additions
and
72 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.