new block: BasicFile[Sink, Source] #353

RalphSteinhagen · 2024-05-30T18:41:45Z

... nomen est omen.

... nomen est omen. Signed-off-by: Ralph J. Steinhagen <[email protected]>

daniestevez · 2024-05-31T07:01:52Z

blocks/fileio/include/gnuradio-4.0/fileio/BasicFileIo.hpp

+    A<std::string, "file name", Doc<"base filename, prefixed if ">, Visible>              file_name;
+    FileMode                                                                              _mode              = FileMode::override;
+    A<std::string, "mode", Doc<"mode: \"override\", \"append\", \"multi\"">, Visible>     mode               = std::string(magic_enum::enum_name(_mode));
+    A<gr::Size_t, "max bytes per file", Doc<"max bytes per file, 0: infinite ">, Visible> max_bytes_per_file = 0U;


Maybe it's good to enforce that max_bytes_per_file % sizeof(T) == 0. Otherwise it seems that awkward things will happen, as items will be split over different files and lose alignment.

The alignment is respected in any case because the counting is done in samples i.e. sizeof(T). The length is more of a maximum size. I'd expect something like in the range of a few MBs and/or GBs.

The alignment is respected in any case because the counting is done in samples i.e. sizeof(T)

Good point. Although if I followed the code correctly, if max_bytes_per_file % sizeof(T) != 0, then at the end of the file there will be some bytes of an item in order to complete the file to a size of exactly max_bytes_per_file, and then the whole item will be repeated at the beginning of the next file.

I still think that it would be better to error out if max_bytes_per_file % sizeof(T) != 0, in order to prevent any surprising behaviour (caused perhaps by a user typo when writing the value of max_bytes_per_file). I agree that typical sizes to be used will be large powers of 2 or 10.

I agree and would like to add that the tracking of produced bytes in L126f will become inconsistent with the number of input samples consumed (L118) . Since totalBytesWritten is not used anywhere except in the unit-tests, it's not that big issue, but it might cause confusion in the future.

I disagree the max_bytes_per_file config parameter is simple and does what its name implies w/o surprises:

The goal of this parameter is to limit the upper bound size of the file in terms of what is relevant for the file system w/o doing or requiring mental math and how many samples this corresponds to by the user.

The latter can be also quite difficult (and/or independent) depending on the type and sizeof(T) since T can be not just fixed-sized fundamental types, but more complex ones, and compound types (eventually Packet<T>, Tensor<T>, or DataSet<T>) that are even variably sized.

The use-case example is: "user wants to write data to file and the file not exceeding, for example, 1GB per file regardless whether it stores, std::int16_t, float, std::complex<float>, or a DataSet<T>...

This also makes the max_bytes_per_file % sizeof(T) != 0 difficult to compute and annoying to the user, who usually does not care how big the individual samples/entities in the file are.

N.B. _totalBytesWritten (not totalBytesWritten as mentioned) is declared as a private implementation-specific variable and not a public API. We could make it a runtime parameter (then: total_bytes_written), for example, for the UI and/or another monitoring process to track how many bytes are written. Together with max_bytes_per_file this could provide an indicator of whether progress is made on the block and/or when the file is complete.

blocks/fileio/include/gnuradio-4.0/fileio/BasicFileIo.hpp

daniestevez · 2024-05-31T07:17:49Z

blocks/fileio/include/gnuradio-4.0/fileio/BasicFileIo.hpp

+    PortOut<T> out;
+
+    A<std::string, "file name", Doc<"Base filename, prefixed if necessary">, Visible>          file_name;
+    FileMode                                                                                   _mode        = FileMode::override;


What is the meaning of mode for a FileSource? Reading a file in append mode doesn't seem to make sense, for instance.

Yes, agree with the reasoning but didn't want to create a second enum class.

daniestevez · 2024-05-31T07:19:48Z

blocks/fileio/include/gnuradio-4.0/fileio/BasicFileIo.hpp

+    std::string   _actualFileName;
+
+    void settingsChanged(const property_map& /*oldSettings*/, const property_map& /*newSettings*/) {
+        _mode = magic_enum::enum_cast<FileMode>(mode, magic_enum::case_insensitive).value_or(_mode);


Shouldn't this error out if the mode as a string cannot be converted to one of the enum values?

Yes, correct. If it's thrown then an exception may be thrown that is intercepted by Block<T> (as all exceptions are in user-defined functions), converted to an error message and forwarded via the scheduler back to the user.

N.B. nothing needs to be done here because the function already has a try-catch block in the calling function e.g.

gnuradio4/core/include/gnuradio-4.0/Block.hpp

Lines 921 to 935 in f00cba2

invokeUserProvidedFunction("applyChangedSettings()", [this] noexcept(false) {

auto applyResult = settings().applyStagedParameters();

checkBlockParameterConsistency();

if (!applyResult.forwardParameters.empty()) {

publishTag(applyResult.forwardParameters, 0);

}

settings()._changed.store(false);

if (!applyResult.appliedParameters.empty()) {

notifyListeners(block::property::kStagedSetting, applyResult.appliedParameters);

}

notifyListeners(block::property::kSetting, settings().get());

});

and

gnuradio4/core/include/gnuradio-4.0/Block.hpp

Lines 508 to 524 in f00cba2

template<typename TFunction, typename... Args>

[[maybe_unused]] constexpr inline auto

invokeUserProvidedFunction(std::string_view callingSite, TFunction &&func, Args &&...args, const std::source_location &location = std::source_location::current()) noexcept {

if constexpr (noexcept(func(std::forward<Args>(args)...))) { // function declared as 'noexcept' skip exception handling

return std::forward<TFunction>(func)(std::forward<Args>(args)...);

} else { // function not declared with 'noexcept' -> may throw

try {

return std::forward<TFunction>(func)(std::forward<Args>(args)...);

} catch (const gr::exception &e) {

emitErrorMessageIfAny(callingSite, std::unexpected(gr::Error(std::move(e))));

} catch (const std::exception &e) {

emitErrorMessageIfAny(callingSite, std::unexpected(gr::Error(e, location)));

} catch (...) {

emitErrorMessageIfAny(callingSite, std::unexpected(gr::Error("unknown error", location)));

}

}

}

My point was not about throwing exceptions. I believe that value_or() can't throw any exception in this case.

My point was that this piece of code is converting the settings provided string to a an enum value, or keeping the current _mode if the conversion results in an error (that's what value_or() is doing here).

Thus, it appears that the behaviour of this code is that when new settings are given, the _mode is changed to the one indicated by the new settings if the new mode can be parsed and otherwise the parse error is silently ignored and the old _mode is kept. This is what I find potentially surprising for the user. It would be better to error out (for instance throw an exception) if the mode cannot be parsed.

daniestevez · 2024-05-31T07:25:03Z

blocks/fileio/include/gnuradio-4.0/fileio/BasicFileIo.hpp

+    FileMode                                                                                   _mode        = FileMode::override;
+    A<std::string, "mode", Doc<"mode: \"override\", \"append\", \"multi\"">, Visible>          mode         = std::string(magic_enum::enum_name(_mode));
+    A<bool, "repeat", Doc<"true: repeat back-to-back">>                                        repeat       = false;
+    A<gr::Size_t, "offset", Doc<"file start offset in bytes">, Visible>                        offset       = 0U;


For the multi mode, it's unclear what the most natural semantics for offset would be. The current code seems to apply the same offset to all of the files. I guess that the most common usage of multi is a long continuous recording that has been segmented into multiple files of a maximum size (as the BasicFileSink block does). In that case, the most nature meaning of offset is an offset with respect to the concatenation of all these files. This means that the offset should potentially cause the first few files to be completely skipped if their sum of lengths is smaller than the offset, and then the remaining part of the offset would only be applied to the first file that is read.

A similar remark applies to length in multi mode. Maybe it should be applied to the concatenation of all the input files.

Some raw binary files have a header information (usually provided in bytes) which need to be skipped for each file. the length is to read only the first N samples from each file. This is useful for chunked data (together with the start trigger) like packet data or triggered acquisitions.

Good point. I guess there are two potential use cases for offset:

Replay only a time segment of a recording (very useful in combination with length).

Skip the header of a file.

These can be handled in the same way for a single file, but they lead to conflicting requirements with the multi option.

I think for a first implementation this doesn't have to handle all cases, but can be extended either by adding additional blocks or by changing behavior. For now as implemented, offset wil skip from each file and length will specify the number of samples to be read from each file.

For moderate amounts of samples, the global range could also be done with a separate head block and only integrated into BasicFileSource once this turns out to be a performance problem.

For moderate amounts of samples, the global range could also be done with a separate head block and only integrated into BasicFileSource once this turns out to be a performance problem.

No you cannot replicate this with other blocks as post processing.

The BasicBlockSource reads N-files where for each file an offset amount of files has to be skipped and length amount of samples. This information cannot be easily reconstructed once the information is put into the buffer (N.B. 'bytes' vs 'samples').

The common real-world use case is:
"for a burst or transient analysis: open all files, skip their (optional) header information, and show the first N samples."

Examples for transients:
GSI/FAIR: RF bunch oscillations at injection/start of the RF ramp.
Radar domain: emitted or received pulses
Medical: MRI/CT or ultrasound pulses (conceptually similar to radar).

Signed-off-by: Ralph J. Steinhagen <[email protected]>

sonarcloud · 2024-06-01T08:54:42Z

Quality Gate failed

Failed conditions
63.6% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

wirew0rm

Nice implementation, that should nicely fit 90% of usecases and could be extended to fit most. It also nicely showcases the expected API for a bit more complex blocks.

Also thanks to @daniestevez for the feedback. Added small remarks to the open comments to document information from chat.

wirew0rm · 2024-06-03T15:00:38Z

blocks/fileio/include/gnuradio-4.0/fileio/BasicFileIo.hpp

+    FileMode                                                                                   _mode        = FileMode::override;
+    A<std::string, "mode", Doc<"mode: \"override\", \"append\", \"multi\"">, Visible>          mode         = std::string(magic_enum::enum_name(_mode));
+    A<bool, "repeat", Doc<"true: repeat back-to-back">>                                        repeat       = false;
+    A<gr::Size_t, "offset", Doc<"file start offset in bytes">, Visible>                        offset       = 0U;


I think for a first implementation this doesn't have to handle all cases, but can be extended either by adding additional blocks or by changing behavior. For now as implemented, offset wil skip from each file and length will specify the number of samples to be read from each file.

For moderate amounts of samples, the global range could also be done with a separate head block and only integrated into BasicFileSource once this turns out to be a performance problem.

wirew0rm · 2024-06-03T15:22:10Z

blocks/fileio/include/gnuradio-4.0/fileio/BasicFileIo.hpp

+    A<std::string, "file name", Doc<"base filename, prefixed if ">, Visible>              file_name;
+    FileMode                                                                              _mode              = FileMode::override;
+    A<std::string, "mode", Doc<"mode: \"override\", \"append\", \"multi\"">, Visible>     mode               = std::string(magic_enum::enum_name(_mode));
+    A<gr::Size_t, "max bytes per file", Doc<"max bytes per file, 0: infinite ">, Visible> max_bytes_per_file = 0U;


I agree and would like to add that the tracking of produced bytes in L126f will become inconsistent with the number of input samples consumed (L118) . Since totalBytesWritten is not used anywhere except in the unit-tests, it's not that big issue, but it might cause confusion in the future.

wirew0rm · 2024-06-03T17:31:38Z

blocks/fileio/test/qa_FileIo.cpp

+    constexpr gr::Size_t nSamples    = 1024U;
+    const gr::Size_t     maxFileSize = mode == gr::blocks::fileio::Mode::multi ? 256U : 0U;
+    std::string          modeName{magic_enum::enum_name(mode)};
+    std::string          fileName = fmt::format("/tmp/gr4_file_sink_test/TestFileName_{}.bin", modeName);


Could use std::temp_directory_path for portability, but we probably have to do some changes once we try to extend to other platforms, anyway, so it can be done then.

Thanks for the info. Didn't know about std::temp_directory_path.

new block: BasicFile[Sink, Source]

274d029

... nomen est omen. Signed-off-by: Ralph J. Steinhagen <[email protected]>

daniestevez reviewed May 31, 2024

View reviewed changes

addressing review commits by @daniestevez

f87c906

Signed-off-by: Ralph J. Steinhagen <[email protected]>

wirew0rm approved these changes Jun 3, 2024

View reviewed changes

wirew0rm merged commit bea174e into main Jun 3, 2024
7 of 10 checks passed

wirew0rm deleted the basicFileIo branch June 3, 2024 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new block: BasicFile[Sink, Source] #353

new block: BasicFile[Sink, Source] #353

RalphSteinhagen commented May 30, 2024

daniestevez May 31, 2024

RalphSteinhagen May 31, 2024

daniestevez Jun 3, 2024

wirew0rm Jun 3, 2024

RalphSteinhagen Jun 4, 2024

daniestevez May 31, 2024

RalphSteinhagen May 31, 2024 •

edited

Loading

daniestevez May 31, 2024

RalphSteinhagen May 31, 2024

RalphSteinhagen May 31, 2024

daniestevez Jun 3, 2024

daniestevez May 31, 2024

RalphSteinhagen May 31, 2024

daniestevez May 31, 2024 •

edited

Loading

wirew0rm Jun 3, 2024

RalphSteinhagen Jun 4, 2024

sonarcloud bot commented Jun 1, 2024

wirew0rm left a comment

wirew0rm Jun 3, 2024

wirew0rm Jun 3, 2024

wirew0rm Jun 3, 2024

RalphSteinhagen Jun 4, 2024

	invokeUserProvidedFunction("applyChangedSettings()", [this] noexcept(false) {
	auto applyResult = settings().applyStagedParameters();
	checkBlockParameterConsistency();

	if (!applyResult.forwardParameters.empty()) {
	publishTag(applyResult.forwardParameters, 0);
	}

	settings()._changed.store(false);

	if (!applyResult.appliedParameters.empty()) {
	notifyListeners(block::property::kStagedSetting, applyResult.appliedParameters);
	}
	notifyListeners(block::property::kSetting, settings().get());
	});

	template<typename TFunction, typename... Args>
	[[maybe_unused]] constexpr inline auto
	invokeUserProvidedFunction(std::string_view callingSite, TFunction &&func, Args &&...args, const std::source_location &location = std::source_location::current()) noexcept {
	if constexpr (noexcept(func(std::forward<Args>(args)...))) { // function declared as 'noexcept' skip exception handling
	return std::forward<TFunction>(func)(std::forward<Args>(args)...);
	} else { // function not declared with 'noexcept' -> may throw
	try {
	return std::forward<TFunction>(func)(std::forward<Args>(args)...);
	} catch (const gr::exception &e) {
	emitErrorMessageIfAny(callingSite, std::unexpected(gr::Error(std::move(e))));
	} catch (const std::exception &e) {
	emitErrorMessageIfAny(callingSite, std::unexpected(gr::Error(e, location)));
	} catch (...) {
	emitErrorMessageIfAny(callingSite, std::unexpected(gr::Error("unknown error", location)));
	}
	}
	}

new block: BasicFile[Sink, Source] #353

new block: BasicFile[Sink, Source] #353

Conversation

RalphSteinhagen commented May 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalphSteinhagen May 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniestevez May 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarcloud bot commented Jun 1, 2024

Quality Gate failed

wirew0rm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalphSteinhagen May 31, 2024 •

edited

Loading

daniestevez May 31, 2024 •

edited

Loading