You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I think I've a similar problem as in #679. Our dataset contains .xz files as data/samples (just point clouds text files saved through np.savetxt(fd, points, fmt='%.5e') then xz-compressed for efficiency reasons) and .txt files as GT (using spaces as separators). The first line of GT contains a sort of header, just a int that tells the number of lines that must be read in the file. Then there is an arbitrary number of lines containing 7 or 8 columns, again, space-separated. I honestly don't see an "easy" way to represent all this in Croissant in a meaningful way, or at least I can't understand how to proceed. Let's say that, since it's a tool designed to "ingest" ML datasets, I would have at least expected a language closer to the discipline (dataset, subset, sample, ground truth, etc.). I've uploaded two single file_objects, one as GT and one as data/sample. Then the interface asks me the names of the fields, then it allows me to specify a regular expression (that is actually a good idea to grab e.g. the header/number of lines) but the interface gives no feedback about what's happening really or about what would happen with a given input. I think I'll give up for the moment, the idea is good but the tool doesn't seem usable yet, at least it isn't for non-standard cases like our dataset.
The text was updated successfully, but these errors were encountered:
Hi! I think I've a similar problem as in #679. Our dataset contains
.xz
files as data/samples (just point clouds text files saved throughnp.savetxt(fd, points, fmt='%.5e')
then xz-compressed for efficiency reasons) and.txt
files as GT (using spaces as separators). The first line of GT contains a sort of header, just a int that tells the number of lines that must be read in the file. Then there is an arbitrary number of lines containing 7 or 8 columns, again, space-separated. I honestly don't see an "easy" way to represent all this in Croissant in a meaningful way, or at least I can't understand how to proceed. Let's say that, since it's a tool designed to "ingest" ML datasets, I would have at least expected a language closer to the discipline (dataset, subset, sample, ground truth, etc.). I've uploaded two singlefile_objects
, one as GT and one as data/sample. Then the interface asks me the names of the fields, then it allows me to specify a regular expression (that is actually a good idea to grab e.g. the header/number of lines) but the interface gives no feedback about what's happening really or about what would happen with a given input. I think I'll give up for the moment, the idea is good but the tool doesn't seem usable yet, at least it isn't for non-standard cases like our dataset.The text was updated successfully, but these errors were encountered: