Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC Pinning #1082

Open
Aidan63 opened this issue Dec 29, 2023 · 4 comments
Open

GC Pinning #1082

Aidan63 opened this issue Dec 29, 2023 · 4 comments

Comments

@Aidan63
Copy link
Contributor

Aidan63 commented Dec 29, 2023

We can root objects, but from what I can tell we can't pin to prevent them from being moved by the GC between collections.
My use case for this is wanting to make it easier to integrate with async libraries like libuv, currently if I want to write haxe bytes to a file through libuv I need to copy the data into a non GC staging buffer and write that otherwise the GC could move the haxe object mid write.
I've not really dug around the GC before but I can see a commented out IMMIX_ALLOC_IS_PINNED define which is not referenced anywhere else, was this functionality planned or existed in a previous version?

Thanks.

@hughsando
Copy link
Member

Currently, the only pinning that happens is when a pointer is found on the stack. In this case, the whole allocation block that contains the pointer is pinned.
Note that if you store stuff in a GC root, then the pointer will be updated - however, this update will happen synchronously for haxe threads, but it will be async for foreign threads so perhaps not suitable.
Note also that most of the APIs are for hx::Object* but in the case of a byte array, it will be the bytes (contents) that need to be pinned, not the Array object (this will need to be rooted to prevent collection rather than moving). The byte itself is not an instance of hx::Object.

If the byte array is big enough (IMMIX_LARGE_OBJ_SIZE = 4000), it will be allocated separately from the block system and will not need to be pinned. This perhaps offers a reasonable solution - copy the data if is is smaller than this value or just use the pointer if it is bigger.

If you don't like this, then code will need to be added - either a list of pinned pointers, which get run though after "MarkAll" in the moving case to pin the blocks, or a "pinCount" in the BlockDataInfo structure and can be increased/decreased externally and can be used to initialize the "mPinned" variable to true in clearBlockMarks.

@Aidan63
Copy link
Contributor Author

Aidan63 commented Dec 31, 2023

Hi Hugh,

Thank for the info. What I've been doing for large byte arrays is to root it and then copy chunks of the arrays data into a std::vector and give the data pointer of that to libuv to write. In the libuv callback I then copy the next chunk from the original haxe array into that vector and send to write again and once there is nothing else to write the array is un-rooted.
This works fine but I need to then keep a queue for writes and flushes so everything is done in order, which adds a fair bit of code complexity.

Checking if the array is large enough to have been placed on the LOH sounds good, but I think I'd still need a way to root the underlying array allocation, not just the array object, to avoid odd edge cases. e.g. If the user sends haxe bytes to write in this async api and then decides to resize the bytes data array before that write has finished this could cause the array storage to be re-allocated and the original storage to be eligable for collection which would be a problem if a collection happened while the libuv thread pool was using a pointer to that.
From what I can see Array uses InternalNew and MarkAlloc which operate on void*s where as the sgRootSet and root api opertate on `hx::Object* variants, unless I'm missing something I don't think there's a way to root the underlying array storage to then safely pass around the pointer even if its on the LOH.

The solution I have works fine without any GC issues from what I can tell, but I've been wondering for a while now if there's a better way. I may have a play around and see about adding either rooting for non hx::Object allocations or pinning allocations.

Thanks.

@hughsando
Copy link
Member

Yes, there are a few problems. What about the case where the user changes the contents of the array (not the pointer) while the write is taking place - this would imply you really need a copy unless this is understood to be an unsafe operation.

One way to handle this might be with a single custom "__Mark" function. Have a single class instance (LibUvMarker) that has a custom "void __Mark(hx::MarkContext *__inCtx)" function. In this function, you can traverse your internal structure and mark and pin whatever you want. This is probably the simplest method by far.

@Aidan63
Copy link
Contributor Author

Aidan63 commented Jan 1, 2024

yeah, modifying the array after sending to write is a tricky one which can only really be solved by copying the entire array when the user makes the write call. This is all for the haxe 5 asys api which doesn't mention anything about the safety of the array after sending to write so I'm taking a "as long as it doesn't crash / corrupt the program" approach until someone says otherwise, which allows me to avoid that potentially big copy.

The custom __Mark function is a good shout and I think I've got something working. I've created a LibuvWriteMarker class which stores the base pointer of the array being written and then calls the MarkAlloc function to avoid the re-allocation issue mentioned before with just rooting the array object. This class is then rooted for the duration of the libuv write.

class LibuvWriteMarker final : public hx::Object
{
    char* const arrayBase;

public:
    LibuvWriteMarker(char* _arrayBase) : arrayBase(_arrayBase) {}

    void __Mark(hx::MarkContext* ctx) override
    {
        if (nullptr != arrayBase)
        {
            hx::MarkAlloc(arrayBase, ctx);
        }
    }
};

Full code sample below for reference and anyone else trying to do something similar in the future. Where RootedObject<T> is a RAII wrapped around the existing root api.

class WriteRequest final
{
	const hx::RootedObject<LibuvWriteMarker> marker;

public:
	uv_fs_t uv;
	uv_buf_t buffer;
	std::vector<char> staging;

	const hx::RootedObject<hx::Object> cbSuccess;
	const hx::RootedObject<hx::Object> cbFailure;

    WriteRequest(Array<uint8_t> _array, int _offset, int _length, Dynamic _cbSuccess, Dynamic _cbFailure)
    	: cbSuccess(_cbSuccess.mPtr)
    	, cbFailure(_cbFailure.mPtr)
        , marker(new LibuvWriteMarker(_array->GetBase()))
        , staging(0)
        , buffer(uv_buf_init(_array->GetBase() + _offset, _length)) {}

    WriteRequest(int _length, Dynamic _cbSuccess, Dynamic _cbFailure)
    	: cbSuccess(_cbSuccess.mPtr)
    	, cbFailure(_cbFailure.mPtr)
        , marker(nullptr)
        , staging(_length)
        , buffer(uv_buf_init(staging.data(), _length)) {}

	static void callback(uv_fs_t* request)
	{
		auto gcZone    = hx::AutoGCZone();
		auto spRequest = std::unique_ptr<WriteRequest>(static_cast<WriteRequest*>(request->data));

		if (spRequest->uv.result < 0)
		{
			Dynamic(spRequest->cbFailure.rooted)(hx::asys::libuv::uv_err_to_enum(spRequest->uv.result));
		}
		else
		{
			Dynamic(spRequest->cbSuccess.rooted)(spRequest->uv.result);
		}
	}
};

void hx::asys::libuv::filesystem::LibuvFile_obj::write(::cpp::Int64 pos, Array<uint8_t> data, int offset, int length, Dynamic cbSuccess, Dynamic cbFailure)
{
	std::unique_ptr<WriteRequest> request;

    switch (hx::GetMemType(data->GetBase()))
    {
    case hx::MemType::memLarge:
    {
        request = std::make_unique<WriteRequest>(data, offset, length, cbSuccess, cbFailure);
    }
    break;
    case hx::MemType::memBlock:
    {
        request = std::make_unique<WriteRequest>(length, cbSuccess, cbFailure);

        std::memcpy(request->staging.data(), data->GetBase() + offset, length);
    }
    break;
    case hx::MemType::memUnmanaged:
    {
        // I assume an unmanaged base pointer would be due to the set unmanaged data api,
        // would it be safe to just use that pointer or should we copy to the staging vector, hmm.
    }
    break;
    }

    auto result  = uv_fs_write(loop, &request->uv, file, &request->buffer, 1, pos, WriteRequest::callback);
    if (result < 0)
    {
        cbFailure(hx::asys::libuv::uv_err_to_enum(result));
    }
    else
    {
        request.release();
    }
}

I thought about using a simple length > 4000 check to determine if the arrays contents would be on the LOH but instead made the GetMemType of sGlobalAlloc available in the hx namespace as that seems more robust and I assume will return memUnmanaged when the array holds non GC memory set through the set unmanaged function which would also allow different code paths for copying.
Depending on the type of the memory I either copy it into a staging buffer for objects not on the LOH, or have that LibuvWriteMarker store the arrays base pointer so it is marked and pass it directly to libuv.

Assuming this looks correct and using GetMemType outside of a paused world doesn't potentially cause horrible side effects I'll open a merge request with my new function to access it from the hx namespace.
I'll probably also have a play around with pinning just for my own curiosity, but the above seems to be a good solution.

@Aidan63 Aidan63 mentioned this issue Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants