-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new block: BasicFile[Sink, Source] #353
Conversation
... nomen est omen. Signed-off-by: Ralph J. Steinhagen <[email protected]>
A<std::string, "file name", Doc<"base filename, prefixed if ">, Visible> file_name; | ||
FileMode _mode = FileMode::override; | ||
A<std::string, "mode", Doc<"mode: \"override\", \"append\", \"multi\"">, Visible> mode = std::string(magic_enum::enum_name(_mode)); | ||
A<gr::Size_t, "max bytes per file", Doc<"max bytes per file, 0: infinite ">, Visible> max_bytes_per_file = 0U; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's good to enforce that max_bytes_per_file % sizeof(T) == 0
. Otherwise it seems that awkward things will happen, as items will be split over different files and lose alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alignment is respected in any case because the counting is done in samples i.e. sizeof(T)
. The length is more of a maximum size. I'd expect something like in the range of a few MBs and/or GBs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alignment is respected in any case because the counting is done in samples i.e.
sizeof(T)
Good point. Although if I followed the code correctly, if max_bytes_per_file % sizeof(T) != 0
, then at the end of the file there will be some bytes of an item in order to complete the file to a size of exactly max_bytes_per_file
, and then the whole item will be repeated at the beginning of the next file.
I still think that it would be better to error out if max_bytes_per_file % sizeof(T) != 0
, in order to prevent any surprising behaviour (caused perhaps by a user typo when writing the value of max_bytes_per_file
). I agree that typical sizes to be used will be large powers of 2 or 10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree and would like to add that the tracking of produced bytes in L126f will become inconsistent with the number of input samples consumed (L118) . Since totalBytesWritten
is not used anywhere except in the unit-tests, it's not that big issue, but it might cause confusion in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree the max_bytes_per_file
config parameter is simple and does what its name implies w/o surprises:
The goal of this parameter is to limit the upper bound size of the file in terms of what is relevant for the file system w/o doing or requiring mental math and how many samples this corresponds to by the user.
The latter can be also quite difficult (and/or independent) depending on the type and sizeof(T)
since T
can be not just fixed-sized fundamental types, but more complex ones, and compound types (eventually Packet<T>
, Tensor<T>
, or DataSet<T>
) that are even variably sized.
The use-case example is: "user wants to write data to file and the file not exceeding, for example, 1GB per file regardless whether it stores, std::int16_t
, float
, std::complex<float>
, or a DataSet<T>
...
This also makes the max_bytes_per_file % sizeof(T) != 0
difficult to compute and annoying to the user, who usually does not care how big the individual samples/entities in the file are.
N.B. _totalBytesWritten
(not totalBytesWritten
as mentioned) is declared as a private implementation-specific variable and not a public API. We could make it a runtime parameter (then: total_bytes_written
), for example, for the UI and/or another monitoring process to track how many bytes are written. Together with max_bytes_per_file
this could provide an indicator of whether progress is made on the block and/or when the file is complete.
PortOut<T> out; | ||
|
||
A<std::string, "file name", Doc<"Base filename, prefixed if necessary">, Visible> file_name; | ||
FileMode _mode = FileMode::override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the meaning of mode
for a FileSource? Reading a file in append
mode doesn't seem to make sense, for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agree with the reasoning but didn't want to create a second enum class.
std::string _actualFileName; | ||
|
||
void settingsChanged(const property_map& /*oldSettings*/, const property_map& /*newSettings*/) { | ||
_mode = magic_enum::enum_cast<FileMode>(mode, magic_enum::case_insensitive).value_or(_mode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this error out if the mode
as a string cannot be converted to one of the enum
values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct. If it's thrown then an exception may be thrown that is intercepted by Block<T>
(as all exceptions are in user-defined functions), converted to an error message and forwarded via the scheduler back to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N.B. nothing needs to be done here because the function already has a try-catch block in the calling function e.g.
gnuradio4/core/include/gnuradio-4.0/Block.hpp
Lines 921 to 935 in f00cba2
invokeUserProvidedFunction("applyChangedSettings()", [this] noexcept(false) { | |
auto applyResult = settings().applyStagedParameters(); | |
checkBlockParameterConsistency(); | |
if (!applyResult.forwardParameters.empty()) { | |
publishTag(applyResult.forwardParameters, 0); | |
} | |
settings()._changed.store(false); | |
if (!applyResult.appliedParameters.empty()) { | |
notifyListeners(block::property::kStagedSetting, applyResult.appliedParameters); | |
} | |
notifyListeners(block::property::kSetting, settings().get()); | |
}); |
and
gnuradio4/core/include/gnuradio-4.0/Block.hpp
Lines 508 to 524 in f00cba2
template<typename TFunction, typename... Args> | |
[[maybe_unused]] constexpr inline auto | |
invokeUserProvidedFunction(std::string_view callingSite, TFunction &&func, Args &&...args, const std::source_location &location = std::source_location::current()) noexcept { | |
if constexpr (noexcept(func(std::forward<Args>(args)...))) { // function declared as 'noexcept' skip exception handling | |
return std::forward<TFunction>(func)(std::forward<Args>(args)...); | |
} else { // function not declared with 'noexcept' -> may throw | |
try { | |
return std::forward<TFunction>(func)(std::forward<Args>(args)...); | |
} catch (const gr::exception &e) { | |
emitErrorMessageIfAny(callingSite, std::unexpected(gr::Error(std::move(e)))); | |
} catch (const std::exception &e) { | |
emitErrorMessageIfAny(callingSite, std::unexpected(gr::Error(e, location))); | |
} catch (...) { | |
emitErrorMessageIfAny(callingSite, std::unexpected(gr::Error("unknown error", location))); | |
} | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was not about throwing exceptions. I believe that value_or()
can't throw any exception in this case.
My point was that this piece of code is converting the settings provided string to a an enum value, or keeping the current _mode
if the conversion results in an error (that's what value_or()
is doing here).
Thus, it appears that the behaviour of this code is that when new settings are given, the _mode
is changed to the one indicated by the new settings if the new mode
can be parsed and otherwise the parse error is silently ignored and the old _mode
is kept. This is what I find potentially surprising for the user. It would be better to error out (for instance throw an exception) if the mode
cannot be parsed.
FileMode _mode = FileMode::override; | ||
A<std::string, "mode", Doc<"mode: \"override\", \"append\", \"multi\"">, Visible> mode = std::string(magic_enum::enum_name(_mode)); | ||
A<bool, "repeat", Doc<"true: repeat back-to-back">> repeat = false; | ||
A<gr::Size_t, "offset", Doc<"file start offset in bytes">, Visible> offset = 0U; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the multi
mode, it's unclear what the most natural semantics for offset
would be. The current code seems to apply the same offset
to all of the files. I guess that the most common usage of multi
is a long continuous recording that has been segmented into multiple files of a maximum size (as the BasicFileSink
block does). In that case, the most nature meaning of offset
is an offset with respect to the concatenation of all these files. This means that the offset should potentially cause the first few files to be completely skipped if their sum of lengths is smaller than the offset, and then the remaining part of the offset would only be applied to the first file that is read.
A similar remark applies to length
in multi
mode. Maybe it should be applied to the concatenation of all the input files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some raw binary files have a header information (usually provided in bytes) which need to be skipped for each file. the length is to read only the first N samples from each file. This is useful for chunked data (together with the start trigger) like packet data or triggered acquisitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I guess there are two potential use cases for offset
:
- Replay only a time segment of a recording (very useful in combination with
length
). - Skip the header of a file.
These can be handled in the same way for a single file, but they lead to conflicting requirements with the multi
option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for a first implementation this doesn't have to handle all cases, but can be extended either by adding additional blocks or by changing behavior. For now as implemented, offset wil skip from each file and length will specify the number of samples to be read from each file.
For moderate amounts of samples, the global range could also be done with a separate head block and only integrated into BasicFileSource once this turns out to be a performance problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For moderate amounts of samples, the global range could also be done with a separate head block and only integrated into BasicFileSource once this turns out to be a performance problem.
No you cannot replicate this with other blocks as post processing.
The BasicBlockSource
reads N-files where for each file an offset
amount of files has to be skipped and length
amount of samples. This information cannot be easily reconstructed once the information is put into the buffer (N.B. 'bytes' vs 'samples').
The common real-world use case is:
"for a burst or transient analysis: open all files, skip their (optional) header information, and show the first N samples."
Examples for transients:
GSI/FAIR: RF bunch oscillations at injection/start of the RF ramp.
Radar domain: emitted or received pulses
Medical: MRI/CT or ultrasound pulses (conceptually similar to radar).
Signed-off-by: Ralph J. Steinhagen <[email protected]>
Quality Gate failedFailed conditions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice implementation, that should nicely fit 90% of usecases and could be extended to fit most. It also nicely showcases the expected API for a bit more complex blocks.
Also thanks to @daniestevez for the feedback. Added small remarks to the open comments to document information from chat.
FileMode _mode = FileMode::override; | ||
A<std::string, "mode", Doc<"mode: \"override\", \"append\", \"multi\"">, Visible> mode = std::string(magic_enum::enum_name(_mode)); | ||
A<bool, "repeat", Doc<"true: repeat back-to-back">> repeat = false; | ||
A<gr::Size_t, "offset", Doc<"file start offset in bytes">, Visible> offset = 0U; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for a first implementation this doesn't have to handle all cases, but can be extended either by adding additional blocks or by changing behavior. For now as implemented, offset wil skip from each file and length will specify the number of samples to be read from each file.
For moderate amounts of samples, the global range could also be done with a separate head block and only integrated into BasicFileSource once this turns out to be a performance problem.
A<std::string, "file name", Doc<"base filename, prefixed if ">, Visible> file_name; | ||
FileMode _mode = FileMode::override; | ||
A<std::string, "mode", Doc<"mode: \"override\", \"append\", \"multi\"">, Visible> mode = std::string(magic_enum::enum_name(_mode)); | ||
A<gr::Size_t, "max bytes per file", Doc<"max bytes per file, 0: infinite ">, Visible> max_bytes_per_file = 0U; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree and would like to add that the tracking of produced bytes in L126f will become inconsistent with the number of input samples consumed (L118) . Since totalBytesWritten
is not used anywhere except in the unit-tests, it's not that big issue, but it might cause confusion in the future.
constexpr gr::Size_t nSamples = 1024U; | ||
const gr::Size_t maxFileSize = mode == gr::blocks::fileio::Mode::multi ? 256U : 0U; | ||
std::string modeName{magic_enum::enum_name(mode)}; | ||
std::string fileName = fmt::format("/tmp/gr4_file_sink_test/TestFileName_{}.bin", modeName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use std::temp_directory_path for portability, but we probably have to do some changes once we try to extend to other platforms, anyway, so it can be done then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the info. Didn't know about std::temp_directory_path
.
... nomen est omen.
@xaratustrah enjoy!