Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new block: BasicFile[Sink, Source] #353

Merged
merged 2 commits into from
Jun 3, 2024
Merged

new block: BasicFile[Sink, Source] #353

merged 2 commits into from
Jun 3, 2024

Conversation

RalphSteinhagen
Copy link
Member

... nomen est omen.

@xaratustrah enjoy!

... nomen est omen.

Signed-off-by: Ralph J. Steinhagen <[email protected]>
A<std::string, "file name", Doc<"base filename, prefixed if ">, Visible> file_name;
FileMode _mode = FileMode::override;
A<std::string, "mode", Doc<"mode: \"override\", \"append\", \"multi\"">, Visible> mode = std::string(magic_enum::enum_name(_mode));
A<gr::Size_t, "max bytes per file", Doc<"max bytes per file, 0: infinite ">, Visible> max_bytes_per_file = 0U;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's good to enforce that max_bytes_per_file % sizeof(T) == 0. Otherwise it seems that awkward things will happen, as items will be split over different files and lose alignment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alignment is respected in any case because the counting is done in samples i.e. sizeof(T). The length is more of a maximum size. I'd expect something like in the range of a few MBs and/or GBs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alignment is respected in any case because the counting is done in samples i.e. sizeof(T)

Good point. Although if I followed the code correctly, if max_bytes_per_file % sizeof(T) != 0, then at the end of the file there will be some bytes of an item in order to complete the file to a size of exactly max_bytes_per_file, and then the whole item will be repeated at the beginning of the next file.

I still think that it would be better to error out if max_bytes_per_file % sizeof(T) != 0, in order to prevent any surprising behaviour (caused perhaps by a user typo when writing the value of max_bytes_per_file). I agree that typical sizes to be used will be large powers of 2 or 10.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and would like to add that the tracking of produced bytes in L126f will become inconsistent with the number of input samples consumed (L118) . Since totalBytesWritten is not used anywhere except in the unit-tests, it's not that big issue, but it might cause confusion in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree the max_bytes_per_file config parameter is simple and does what its name implies w/o surprises:

The goal of this parameter is to limit the upper bound size of the file in terms of what is relevant for the file system w/o doing or requiring mental math and how many samples this corresponds to by the user.

The latter can be also quite difficult (and/or independent) depending on the type and sizeof(T) since T can be not just fixed-sized fundamental types, but more complex ones, and compound types (eventually Packet<T>, Tensor<T>, or DataSet<T>) that are even variably sized.

The use-case example is: "user wants to write data to file and the file not exceeding, for example, 1GB per file regardless whether it stores, std::int16_t, float, std::complex<float>, or a DataSet<T>...

This also makes the max_bytes_per_file % sizeof(T) != 0 difficult to compute and annoying to the user, who usually does not care how big the individual samples/entities in the file are.

N.B. _totalBytesWritten (not totalBytesWritten as mentioned) is declared as a private implementation-specific variable and not a public API. We could make it a runtime parameter (then: total_bytes_written), for example, for the UI and/or another monitoring process to track how many bytes are written. Together with max_bytes_per_file this could provide an indicator of whether progress is made on the block and/or when the file is complete.

blocks/fileio/include/gnuradio-4.0/fileio/BasicFileIo.hpp Outdated Show resolved Hide resolved
blocks/fileio/include/gnuradio-4.0/fileio/BasicFileIo.hpp Outdated Show resolved Hide resolved
blocks/fileio/include/gnuradio-4.0/fileio/BasicFileIo.hpp Outdated Show resolved Hide resolved
PortOut<T> out;

A<std::string, "file name", Doc<"Base filename, prefixed if necessary">, Visible> file_name;
FileMode _mode = FileMode::override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of mode for a FileSource? Reading a file in append mode doesn't seem to make sense, for instance.

Copy link
Member Author

@RalphSteinhagen RalphSteinhagen May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree with the reasoning but didn't want to create a second enum class.

std::string _actualFileName;

void settingsChanged(const property_map& /*oldSettings*/, const property_map& /*newSettings*/) {
_mode = magic_enum::enum_cast<FileMode>(mode, magic_enum::case_insensitive).value_or(_mode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this error out if the mode as a string cannot be converted to one of the enum values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. If it's thrown then an exception may be thrown that is intercepted by Block<T> (as all exceptions are in user-defined functions), converted to an error message and forwarded via the scheduler back to the user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N.B. nothing needs to be done here because the function already has a try-catch block in the calling function e.g.

invokeUserProvidedFunction("applyChangedSettings()", [this] noexcept(false) {
auto applyResult = settings().applyStagedParameters();
checkBlockParameterConsistency();
if (!applyResult.forwardParameters.empty()) {
publishTag(applyResult.forwardParameters, 0);
}
settings()._changed.store(false);
if (!applyResult.appliedParameters.empty()) {
notifyListeners(block::property::kStagedSetting, applyResult.appliedParameters);
}
notifyListeners(block::property::kSetting, settings().get());
});

and
template<typename TFunction, typename... Args>
[[maybe_unused]] constexpr inline auto
invokeUserProvidedFunction(std::string_view callingSite, TFunction &&func, Args &&...args, const std::source_location &location = std::source_location::current()) noexcept {
if constexpr (noexcept(func(std::forward<Args>(args)...))) { // function declared as 'noexcept' skip exception handling
return std::forward<TFunction>(func)(std::forward<Args>(args)...);
} else { // function not declared with 'noexcept' -> may throw
try {
return std::forward<TFunction>(func)(std::forward<Args>(args)...);
} catch (const gr::exception &e) {
emitErrorMessageIfAny(callingSite, std::unexpected(gr::Error(std::move(e))));
} catch (const std::exception &e) {
emitErrorMessageIfAny(callingSite, std::unexpected(gr::Error(e, location)));
} catch (...) {
emitErrorMessageIfAny(callingSite, std::unexpected(gr::Error("unknown error", location)));
}
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was not about throwing exceptions. I believe that value_or() can't throw any exception in this case.

My point was that this piece of code is converting the settings provided string to a an enum value, or keeping the current _mode if the conversion results in an error (that's what value_or() is doing here).

Thus, it appears that the behaviour of this code is that when new settings are given, the _mode is changed to the one indicated by the new settings if the new mode can be parsed and otherwise the parse error is silently ignored and the old _mode is kept. This is what I find potentially surprising for the user. It would be better to error out (for instance throw an exception) if the mode cannot be parsed.

FileMode _mode = FileMode::override;
A<std::string, "mode", Doc<"mode: \"override\", \"append\", \"multi\"">, Visible> mode = std::string(magic_enum::enum_name(_mode));
A<bool, "repeat", Doc<"true: repeat back-to-back">> repeat = false;
A<gr::Size_t, "offset", Doc<"file start offset in bytes">, Visible> offset = 0U;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the multi mode, it's unclear what the most natural semantics for offset would be. The current code seems to apply the same offset to all of the files. I guess that the most common usage of multi is a long continuous recording that has been segmented into multiple files of a maximum size (as the BasicFileSink block does). In that case, the most nature meaning of offset is an offset with respect to the concatenation of all these files. This means that the offset should potentially cause the first few files to be completely skipped if their sum of lengths is smaller than the offset, and then the remaining part of the offset would only be applied to the first file that is read.

A similar remark applies to length in multi mode. Maybe it should be applied to the concatenation of all the input files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some raw binary files have a header information (usually provided in bytes) which need to be skipped for each file. the length is to read only the first N samples from each file. This is useful for chunked data (together with the start trigger) like packet data or triggered acquisitions.

Copy link
Contributor

@daniestevez daniestevez May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I guess there are two potential use cases for offset:

  • Replay only a time segment of a recording (very useful in combination with length).
  • Skip the header of a file.

These can be handled in the same way for a single file, but they lead to conflicting requirements with the multi option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for a first implementation this doesn't have to handle all cases, but can be extended either by adding additional blocks or by changing behavior. For now as implemented, offset wil skip from each file and length will specify the number of samples to be read from each file.

For moderate amounts of samples, the global range could also be done with a separate head block and only integrated into BasicFileSource once this turns out to be a performance problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For moderate amounts of samples, the global range could also be done with a separate head block and only integrated into BasicFileSource once this turns out to be a performance problem.

No you cannot replicate this with other blocks as post processing.

The BasicBlockSource reads N-files where for each file an offset amount of files has to be skipped and length amount of samples. This information cannot be easily reconstructed once the information is put into the buffer (N.B. 'bytes' vs 'samples').

The common real-world use case is:
"for a burst or transient analysis: open all files, skip their (optional) header information, and show the first N samples."

Examples for transients:
GSI/FAIR: RF bunch oscillations at injection/start of the RF ramp.
Radar domain: emitted or received pulses
Medical: MRI/CT or ultrasound pulses (conceptually similar to radar).

Signed-off-by: Ralph J. Steinhagen <[email protected]>
Copy link

sonarcloud bot commented Jun 1, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
63.6% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

Copy link
Member

@wirew0rm wirew0rm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice implementation, that should nicely fit 90% of usecases and could be extended to fit most. It also nicely showcases the expected API for a bit more complex blocks.

Also thanks to @daniestevez for the feedback. Added small remarks to the open comments to document information from chat.

FileMode _mode = FileMode::override;
A<std::string, "mode", Doc<"mode: \"override\", \"append\", \"multi\"">, Visible> mode = std::string(magic_enum::enum_name(_mode));
A<bool, "repeat", Doc<"true: repeat back-to-back">> repeat = false;
A<gr::Size_t, "offset", Doc<"file start offset in bytes">, Visible> offset = 0U;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for a first implementation this doesn't have to handle all cases, but can be extended either by adding additional blocks or by changing behavior. For now as implemented, offset wil skip from each file and length will specify the number of samples to be read from each file.

For moderate amounts of samples, the global range could also be done with a separate head block and only integrated into BasicFileSource once this turns out to be a performance problem.

A<std::string, "file name", Doc<"base filename, prefixed if ">, Visible> file_name;
FileMode _mode = FileMode::override;
A<std::string, "mode", Doc<"mode: \"override\", \"append\", \"multi\"">, Visible> mode = std::string(magic_enum::enum_name(_mode));
A<gr::Size_t, "max bytes per file", Doc<"max bytes per file, 0: infinite ">, Visible> max_bytes_per_file = 0U;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and would like to add that the tracking of produced bytes in L126f will become inconsistent with the number of input samples consumed (L118) . Since totalBytesWritten is not used anywhere except in the unit-tests, it's not that big issue, but it might cause confusion in the future.

constexpr gr::Size_t nSamples = 1024U;
const gr::Size_t maxFileSize = mode == gr::blocks::fileio::Mode::multi ? 256U : 0U;
std::string modeName{magic_enum::enum_name(mode)};
std::string fileName = fmt::format("/tmp/gr4_file_sink_test/TestFileName_{}.bin", modeName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use std::temp_directory_path for portability, but we probably have to do some changes once we try to extend to other platforms, anyway, so it can be done then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info. Didn't know about std::temp_directory_path.

@wirew0rm wirew0rm merged commit bea174e into main Jun 3, 2024
7 of 10 checks passed
@wirew0rm wirew0rm deleted the basicFileIo branch June 3, 2024 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants