Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization issues with Mono 5 #689

Open
purefunkce opened this issue Apr 3, 2018 · 1 comment
Open

Serialization issues with Mono 5 #689

purefunkce opened this issue Apr 3, 2018 · 1 comment

Comments

@purefunkce
Copy link

purefunkce commented Apr 3, 2018

I get a serialization error any time I'm passing data at runtime. This happens with any calls using a lambda expression with data originating outside of the lambda expression. Using broadcast variables also gives the same error.

Gives serialization error:
string x="/path"; var results = rdd.Map(input => { Console.WriteLine(x); });

but, no serialization error here:
var results = rdd.Map(input => { Console.WriteLine("/path"); });

This is running 2.0.2 Spark, Linux Mono 5.10.1.20, built with msbuild. I've also tested Mono4.8.1 with xbuild and get the same error.

actual error:

ERROR System.Runtime.Serialization.SerializationException: Type '(MyClassName+<>c__DisplayClass4_0' in Assembly 'MyAssembly, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null' is not marked as serializable.
at System.Runtime.Serialization.FormatterServices.InternalGetSerializableMembers (System.RuntimeType type) [0x00045] in <8fbafb724c144c9dad69bccfec38ae40>:0
at System.Runtime.Serialization.FormatterServices+<>c__DisplayClass9_0.b__0 (System.Runtime.Serialization.MemberHolder _) [0x00000] in <8fbafb724c144c9dad69bccfec38ae40>:0
at System.Collections.Concurrent.ConcurrentDictionary2[TKey,TValue].GetOrAdd (TKey key, System.Func2[T,TResult] valueFactory) [0x00034] in <8fbafb724c144c9dad69bccfec38ae40>:0
at System.Runtime.Serialization.FormatterServices.GetSerializableMembers (System.Type type, System.Runtime.Serialization.StreamingContext context) [0x0005e] in <8fbafb724c144c9dad69bccfec38ae40>:0

@purefunkce purefunkce changed the title Serialization issues after call to Glom() Serialization issues with Mono 5 Apr 10, 2018
@purefunkce
Copy link
Author

purefunkce commented Apr 12, 2018

Ok, so for anyone else out there thinking their pyspark style lambda expressions are going to work you are going to have a bad day. There is a fundamental issue, namely the C# compiler generates anonymous functions as non-serializable. SparkCLR does some work to serialize fully closed lambda expressions, but any lambda's that reference outside their scope will be picked up by the C# compiler and made non-serializable.

I'm sure there are ways around this to elegantly turn any anonymous function into a method on a serializable class, but it has thus far eluded me. The bottom line is that you need to make a serializable helper class and use a method from that class directly in map, etc. without any lambda expressions that are not fully closed.

From the sharp CLR samples there is a BroadcastHelper class that can help you transfer data as a broadcast variable, and you can use this architecture to send any sort of data to the worker threads by first initializing a new object with the data you want used in the delegate:
[Serializable] internal class BroadcastHelper<T,U> { private readonly Broadcast<T> broadcastVar; internal BroadcastHelper(Broadcast<T> broadcastVar) { this.broadcastVar = broadcastVar; } internal T Execute(U i) { return broadcastVar.Value; } }

now, this can work:
string parallel = "test serialization string"; var broadcast = sc.Broadcast(parallel); Console.WriteLine(test.Map(new BroadcastHelper<string,int>(broadcast).Execute).First());

whereas
string parallel = "test serialization string"; var broadcast = sc.Broadcast(parallel); Console.WriteLine(test.Map(x=>broadcast.Value).First());
== bad day

I would love to chat with anyone out there who has the experience to make these kind of lambda expressions work out of the box without making helper classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant