Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast collection initialization for benchmark #2624

Open
swtrse opened this issue Aug 23, 2024 · 2 comments
Open

Fast collection initialization for benchmark #2624

swtrse opened this issue Aug 23, 2024 · 2 comments

Comments

@swtrse
Copy link

swtrse commented Aug 23, 2024

I know this is not a problem of BenchmarkDotNet. But maybe some of you folks did have a similar problem and already have a solution at hand. I also asked on stackoverflow but there are only some cheap asses at the moment and the only thing they can respond is "please provide a minimal reproducible example showing your code"

Well here is my problem
For a benchmark I need to create a set of different collections to measure their performance for certain specific tasks.

My problem is that initialization of this collections slows down the whole benchmarking.The collections are initialized 100 and 1000 of times during the benchmark so every second saved counts.

I optimized my code for SortedList and ImmutableSortedDictionary (see code example below).
My problem is that while I managed to initialize SortedList and ImmutableSortedDictionary in a few Seconds. The SortedDictionary takes several minutes.

What I need is a way to initalize SortedDictionary as fast as possible. Id does not need to be a DeepCopy since Keys and Values will not be modified. In fact I would prefer a ShallowCopy of the Data inside the Collection to save some memory during the Benchmark. The only thing that is important that the collections do not interfere with each other. For example, I tried to use the ShallowClone extension from DeepCloner nuget package. This extension manages to clone the SortedDictionary in Seconds but when I add an Item in one of this Dictionaries all other Dictionaries do get modified too which is an unwanted side effect since they share the same SortedTree object.

And using the constructor of SortedDictonary the creation of 100 SortedDictionary with 1000000 entries take ~9 minutes.

repeatCount is the number of SortedLists to create, for this specific test 100.

PrefillArray holds 1000000 random generated items.

I am trying to bring that down to ~2 seconds so that my benchmark set can finish in an acceptable timeframe.

My fastest found solution at the moment is (for a complede code example look to the bottom)

private static void A_Iteration_Initialize(int repeatCount, IList<SortedDictionary<IOrderDataKey<T>, IOrderData<T>>> collection)
{
    var sw = Stopwatch.StartNew();
    Dictionary<IOrderDataKey<T>, IOrderData<T>> prefillDictionary = PrefillArray.ToDictionary(static pair => SpecificTypeFactory.GetKey(pair), static p => p);
    var prefillSortedDictionary = new SortedDictionary<IOrderDataKey<T>, IOrderData<T>>(prefillDictionary);
    sw.Stop();
    Console.WriteLine($@"{DateTime.Now} - A_Iteration_Initialize Prepare SortedDictionary: {sw.Elapsed}");
    sw.Reset();
    sw.Start();
    Parallel.For(0, repeatCount, i => { collection[i] = new SortedDictionary<IOrderDataKey<T>, IOrderData<T>>(prefillSortedDictionary); });
    sw.Stop();
    Console.WriteLine($@"{DateTime.Now} - A_Iteration_Initialize Fill SortedDictionary: {sw.Elapsed}");
}

Minimal reproducible example:

  • This example is stripped down from a much more complicated codebase with lot of generic classes.
  • The original code is used in an BenchmarkDotNet setup.
  • ReRuns = 100; <- The number of identical collections generated.
  • A_Static_Global_Setup(1_000_000, 2_000); <- Prepares an array with 1000000 "random" items. Runs once for every set of benchmarks.
  • A_Iteration_Initialize Runs at least 8 times for ever benchmarks defined. With over 1000 benchmarks in the queue this is the absolute time critical part.
#nullable enable
using System;
using System.Collections.Generic;
using System.Collections.Immutable;
using System.Threading.Tasks;
using System.Diagnostics;
using System.Numerics;
using System.Linq;
using System.Reflection;

public class Program
{
    private const int ReRuns = 100;
    private static readonly Type ListType = typeof(SortedList<OrderDataBigIntegerKey, OrderDataBigInteger>);
    private static readonly FieldInfo KeysFieldInfo = ListType.GetField("keys", BindingFlags.Instance | BindingFlags.NonPublic) ?? throw new NullReferenceException("keys not found");
    private static readonly FieldInfo ValuesFieldInfo = ListType.GetField("values", BindingFlags.Instance | BindingFlags.NonPublic) ?? throw new NullReferenceException("values not found");
    private static readonly FieldInfo SizeFieldInfo = ListType.GetField("_size", BindingFlags.Instance | BindingFlags.NonPublic) ?? throw new NullReferenceException("_size not found");
    private static OrderDataBigInteger[] _prefillArray = [];
    public static void Main()
    {
        A_Static_Global_Setup(1_000_000, 2_000);
        var sl = new SortedList<OrderDataBigIntegerKey, OrderDataBigInteger>[ReRuns];
        A_Iteration_Initialize(ReRuns, sl);
        sl = null;
        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();
        var isd = new ImmutableSortedDictionary<OrderDataBigIntegerKey, OrderDataBigInteger>[ReRuns];
        A_Iteration_Initialize(ReRuns, isd);
        isd = null;
        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();
        var sd = new SortedDictionary<OrderDataBigIntegerKey, OrderDataBigInteger>[ReRuns];
        A_Iteration_Initialize(ReRuns, sd);
        sd = null;
    }

    private static void A_Iteration_Initialize(int repeatCount, IList<SortedList<OrderDataBigIntegerKey, OrderDataBigInteger>> collection)
    {
        var sw = Stopwatch.StartNew();
        Dictionary<OrderDataBigIntegerKey, OrderDataBigInteger> prefillDictionary = _prefillArray.ToDictionary(static pair => GetKey(pair), static p => p);
        int count = prefillDictionary.Count;
        OrderDataBigIntegerKey[] k = prefillDictionary.Keys.ToArray();
        OrderDataBigInteger[] v = prefillDictionary.Values.ToArray();
        Array.Sort(k, v);
        Parallel.For(0, repeatCount, i =>
        {
            var help = new SortedList<OrderDataBigIntegerKey, OrderDataBigInteger>(count);
            var tr = __makeref (help);
            SizeFieldInfo!.SetValueDirect(tr, count);
            Array.Copy(k, (OrderDataBigIntegerKey[])KeysFieldInfo.GetValueDirect(tr)!, count);
            Array.Copy(v, (OrderDataBigInteger[])ValuesFieldInfo.GetValueDirect(tr)!, count);
            collection[i] = help;
        });
        sw.Stop();
        Console.WriteLine($@"{DateTime.Now} - A_Iteration_Initialize Fill SortedList: {sw.Elapsed}");
    }

    private static void A_Iteration_Initialize(int repeatCount, IList<SortedDictionary<OrderDataBigIntegerKey, OrderDataBigInteger>> collection)
    {
        var sw = Stopwatch.StartNew();
        Dictionary<OrderDataBigIntegerKey, OrderDataBigInteger> prefillDictionary = _prefillArray.ToDictionary(static pair => GetKey(pair), static p => p);
        var prefillSortedDictionary = new SortedDictionary<OrderDataBigIntegerKey, OrderDataBigInteger>(prefillDictionary);
        Parallel.For(0, repeatCount, i =>
        {
            collection[i] = new SortedDictionary<OrderDataBigIntegerKey, OrderDataBigInteger>(prefillSortedDictionary);
        });
        sw.Stop();
        Console.WriteLine($@"{DateTime.Now} - A_Iteration_Initialize Fill SortedDictionary: {sw.Elapsed}");
    }

    private static void A_Iteration_Initialize(int repeatCount, IList<ImmutableSortedDictionary<OrderDataBigIntegerKey, OrderDataBigInteger>> collection)
    {
        var sw = Stopwatch.StartNew();
        IEnumerable<KeyValuePair<OrderDataBigIntegerKey, OrderDataBigInteger>> prefill = _prefillArray.Select(static o => new KeyValuePair<OrderDataBigIntegerKey, OrderDataBigInteger>(GetKey(o), o));
        var builder = ImmutableSortedDictionary.CreateBuilder<OrderDataBigIntegerKey, OrderDataBigInteger>();
        builder.AddRange(prefill);
        ImmutableSortedDictionary<OrderDataBigIntegerKey, OrderDataBigInteger> cacheDictionary = builder.ToImmutable();
        Parallel.For(0, repeatCount, i =>
        {
            collection[i] = cacheDictionary;
        });
        sw.Stop();
        Console.WriteLine($@"{DateTime.Now} - A_Iteration_Initialize Fill ImmutableSortedDictionary: {sw.Elapsed}");
    }

    private static void A_Static_Global_Setup(int prefill, int priceRange)
    {
        _prefillArray = GetOrderDataArray(prefill, priceRange);
    }

    private static OrderDataBigIntegerKey GetKey(OrderDataBigInteger data)
    {
        ArgumentNullException.ThrowIfNull(data.Price);
        return new OrderDataBigIntegerKey(data.Price, data.OrderId);
    }

    private static OrderDataBigInteger[] GetOrderDataArray(int count, int priceRange)
    {
        var ret = new OrderDataBigInteger[count];
        Parallel.For(0, count, i =>
        {
            ret[i] = GetSingleOrderData(i, priceRange);
        });
        return ret;
    }

    private static OrderDataBigInteger GetSingleOrderData(int i, int priceRange)
    {
        var rnd = new Random(i);
        long val = rnd.NextInt64(0, priceRange);
        var price = new BigInteger(val);
        var orderId = new Guid(i, 0, 0, [0, 0, 0, 0, 0, 0, 0, 0]);
        return new OrderDataBigInteger(price, orderId, 1);
    }
}

public interface IOrderData<T>
{
    Guid OrderId { get; init; }

    T Price { get; init; }

    T Quantity { get; init; }

    T UsedQuantity { get; init; }
}

public interface IOrderDataKey<T> : IComparable<IOrderDataKey<T>>
{
    Guid OrderId { get; init; }

    T Price { get; init; }
}

public readonly record struct OrderDataBigInteger(BigInteger Price, Guid OrderId, BigInteger Quantity);
public readonly record struct OrderDataBigIntegerKey(BigInteger Price, Guid OrderId) : IComparable<OrderDataBigIntegerKey>, IComparisonOperators<OrderDataBigIntegerKey, OrderDataBigIntegerKey, bool>
{
    public int CompareTo(OrderDataBigIntegerKey other)
    {
        int result = Price.CompareTo(other.Price);
        return result != 0 ? result : OrderId.CompareTo(other.OrderId);
    }

    public static bool operator>(OrderDataBigIntegerKey left, OrderDataBigIntegerKey right)
    {
        return left.CompareTo(right) > 0;
    }

    public static bool operator >=(OrderDataBigIntegerKey left, OrderDataBigIntegerKey right)
    {
        return left.CompareTo(right) >= 0;
    }

    public static bool operator <(OrderDataBigIntegerKey left, OrderDataBigIntegerKey right)
    {
        return left.CompareTo(right) < 0;
    }

    public static bool operator <=(OrderDataBigIntegerKey left, OrderDataBigIntegerKey right)
    {
        return left.CompareTo(right) <= 0;
    }

    public int CompareTo(object? other)
    {
        if (other is not OrderDataBigIntegerKey key)
        {
            return -1;
        }

        return CompareTo(key);
    }
}

Any ideas?
A working and updated example if I found a faster solution can be viewed here: https://dotnetfiddle.net/1Enx1q
However I did have to lower some values because of memory limitations.

@swtrse swtrse changed the title Collection initializaton Fast collection initialization for benchmark Aug 23, 2024
@adamsitnik
Copy link
Member

My problem is that initialization of this collections slows down the whole benchmarking.The collections are initialized 100 and 1000 of times during the benchmark so every second saved counts.

The initialization logic should not be part of the benchmark: https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md#setup (I very much encourage you to read the whole doc)

But maybe some of you folks did have a similar problem and already have a solution at hand.

Over here you can see how we have implemented collection benchmarks on the .NET Team: https://github.com/dotnet/performance/tree/main/src/benchmarks/micro/libraries/System.Collections

@swtrse
Copy link
Author

swtrse commented Sep 3, 2024

The initialization logic should not be part of the benchmark: https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md#setup (I very much encourage you to read the whole doc)

@adamsitnik Maybe I was not clear enough. Of course the initialization logic is not part of the benchmark. However it is done in Methods decorated as [IterationSetup]. Which runs before every Benchmark.

Over here you can see how we have implemented collection benchmarks on the .NET Team: https://github.com/dotnet/performance/tree/main/src/benchmarks/micro/libraries/System.Collections

I now that benchmarks. However they are not much of a help. At no place they have the need to benchmark Add, Remove, Lookup performance of a collection containing already 100_000_000 items. And to prepare the prefilled collections is the thing that is time consuming. I do not have a problem with the benchmarks itself or the structure. That I have figured out nicely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants