Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing endianess conversion for dense constant lowering to LLVM dialect #2851

Open
Ansaya opened this issue Jun 17, 2024 · 6 comments
Open

Comments

@Ansaya
Copy link

Ansaya commented Jun 17, 2024

I am using onnx-mlir to build models for Sparc CPU target, which is big-endian, and I am facing a data conversion issue caused by the string type initialization used for large dense constants.
MLIR's IR and LLVM's IR expect the data representation to be little-endian until the code generation for the target system is performed; then, data is transformed to adhere to the target system's endianness according to the data type.
During the Krnl to LLVM lowering, dense raw data bigger than 1Kb is lowered to LLVM global constant as a string of bytes, thus losing the information relative to the actual data type of the initialized elements. Consequently, when code generation happens, the data endianness is not updated to that of the target system since string type is endianness independent. This leads to an incorrect constant data initialization for big-endian targets.

I understand that converting the dense constant data to a string saves space since LLVM IR stores floating-point constants as double, even for smaller data types, however, this removes the data type information needed for the endianness fix during the code generation step.
I want to contribute with a fix to this, but I would like to know which solution you think is the best for the case:

  1. Avoid the dense-to-string conversion and keep the correct data type (this is the solution I tried, and it solves the issue, but it results in the initialization data being promoted to double until the code generation phase)
  2. Keep the current implementation and perform the byte-swapping operations on the generated string of bytes (I do not know which is the proper way to implement this, but this may be a better solution from the space-saving point of view)

Please let me know if you are interested and what you think is the best approach to solve the issue.

@tungld
Copy link
Collaborator

tungld commented Jun 18, 2024

Hi @Ansaya, thanks for looking at this! FYI, we are using Z machines which are big-endian also, and constants work fine on them. So I would like to understand more about your issue. Do you have IRs showing the information loss? and what is the information you need to preserver, or anything concrete examples.

Though keeping the dense constants as they are would be the best, we have had to use strings here to avoid the poor performance when translating LLVMIR to LLVM: https://discourse.llvm.org/t/performance-issues-with-memref-global-and-llvm-ir/68604. It's really slow for big deep learning models.

@Ansaya
Copy link
Author

Ansaya commented Jun 18, 2024

Here are the steps to reproduce the issue starting from the mnist model available in docs/mnist_example/mnist.onnx in this repository.

onnx-mlir --EmitLLVMIR mnist.onnx-o test -mtriple=sparc -mcpu=leon3
mlir-translate --mlir-to-llvmir -o test.ll test.onnx.mlir
clang --target=sparc -mcpu=leon3 -c -o test.o test.ll
llvm-objdump --section=.rodata -D test.o 2> /dev/null | grep -A 3 constant_3_test
llvm-objdump --section=.rodata -D test.o 2> /dev/null | grep -A 3 constant_4_test

The output of the two llvm-objdump commands should look something like the following:

000001a0 <constant_3_test>:
     1a0: 91 fe 98 3d   <unknown>
     1a4: b8 ba 2e be   xnorcc %o0, 3774, %i4
     1a8: c4 b3 ed 3d   <unknown>

00000170 <constant_4_test>:
     170: 3d 13 ca a6   sethi 1297062, %fp
     174: 3c f8 5c 44   <unknown>
     178: bc 75 11 14   <unknown>

After a quick hex-to-float conversion of the first element of constant_3_test, you can see bytes are in the wrong order since the expected floating-point value is 0.0747, which corresponds to 0x3d98fe91, while reading the initialization data as a big-endian 32-bit value yields 0x91fe983d which results in wrong data.
On the other hand, data in the constant_4_test symbol is packed properly as big-endian as the expected value of the first element is 0.0360, which corresponds to 0x3d13caa6.
Looking at the test.onnx.mlir file, you can see that the constant_3 is initialized using a raw string, while constant_4 is initialized from dense data.

llvm.mlir.global internal constant @constant_4(dense<[0.0360819325, ...]>) : tensor<10xf32>) {addr_space = 0 : i32, alignment = 16 : i64} : !llvm.array<10 x f32>
llvm.mlir.global internal constant @constant_3("\91\FE\98=\B8\BA. ...") {addr_space = 0 : i32, alignment = 16 : i64}

Please let me know if you can reproduce the issue.

P.S. I did not use the --EmitObj directly so that you also have the intermediate result before the compilation happens, but you can obtain the same result by replacing --EmitLLVMIR with --EmitObj and skipping the mlir-transalte and clang commands above.

@tungld
Copy link
Collaborator

tungld commented Jun 20, 2024

@Ansaya thank you for the example!

I followed your steps and ran on a Z machine, and the result looked good:

00000000000001a0 <constant_3_mnist>:
     1a0: 3d 98         der     %f9, %f8
     1a2: fe 91 be 2e ba b8     <unknown>
     1a8: 3d ed         der     %f14, %f13

0000000000000170 <constant_4_mnist>:
     170: 3d 13         der     %f1, %f3
     172: ca a6 3c f8 5c 44     <unknown>
     178: bc 75 11 14   <unknown>

One thing I was noticed is constant_3_mnist in the output of --EmitLLVMIR looks different from yours: my constant_3_mnist is llvm.mlir.global internal constant @constant_3_mnist("=\98\FE\91\BE.\BA\B8=\ED\B3\C4\BE$\ ..., bytes are already swapped.

@tungld
Copy link
Collaborator

tungld commented Jun 20, 2024

Avoid the dense-to-string conversion and keep the correct data type (this is the solution I tried, and it solves the issue, but it results in the initialization data being promoted to double until the code generation phase)

If this solves your issue, I am OK if you would like to add a compiler flag that disables dense-to-string conversion.

@Ansaya
Copy link
Author

Ansaya commented Jun 20, 2024

This may be related to the fact that your build and target machine are the same, which is not true in my case. I build the executable on an X86 little-endian machine targeting a Sparc big-endian machine. Thus, the input weights have a different endianness with respect to the target. In your case, you may be able to reproduce the issue by setting a little-endian machine as the target.

@AlexandreEichenberger
Copy link
Collaborator

@tungld @Ansaya Any resolution on this?

@Ansaya did you succeed in cross compiling, or have you moved to compiling on a same endianness machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants