LINQ to SharePoint Performance Pitfalls

September 8 2010 10 comments

SharePoint 2010 introduced LINQ to SharePoint which enables the programmer an easy access to SharePoint lists and content types using LINQ.

While LINQ to SharePoint greatly simplifies the code and frees the programmer from writing CAML queries, it can easily result in poor code in terms of performance. The main problem is, IMHO, that the full power of LINQ is exposed to the programmer while only a limited subset of all possible operations can be translated to CAML. To my understanding, the LINQ provider performs the following steps:

  • Translate some supported parts of the LINQ query to CAML
  • Send the CAML to SharePoint content database
  • Fetch the results
  • Apply those LINQ operations that were not translated to CAML

One problem is that fetching data from the content database may be slow, and more data gets loaded to the web front-end server memory than necessary. But to me the big problem is that the whole process and its performance implications are completely invisible to the programmer – what actually happens cannot be clearly seen from the LINQ statement.

I will illustrate this with a simple example. I have some 4000 auto-generated Calendar Events in my Calendar list. I measure the performance of two queries with the following method:

private TimeSpan Benchmark(int repeatCount, Action action)
{
    var start = DateTime.Now;
    for (int i = 0; i < repeatCount; i++)
    {
        action();
    }
    return (DateTime.Now - start);
}

And then execute the following code:

var time1 = Benchmark(100, () =>
    {
        using (var context = new MyDataContext(url))
        {
            var item2 = context.Calendar
               .Where(x => x.Id == 1)   // where-condition (translated to CAML)
               .Take(1)                 // limit the result set (translated to CAML)
               .ToList()                // execute the CAML query and fetch results to server memory
               .SingleOrDefault();      // take the only element returned, if any
        }
    });
var time2 = Benchmark(100, () =>
    {
        using (var context = new MyDataContext(url))
        {
            var item2 = context.Calendar.FirstOrDefault(x => x.Id == 1);
        }
    });

Console.WriteLine("Time 1: {1}, Time 2: {2}", time1, time2);

Both queries return the same results. Guess what is printed to the console?

Time 1: 00:00:06.5607988, Time 2: 00:02:50.9675024

The first query is tens of times faster than the innocent-looking second one! Why? To see that, we will have to utilize the Log-property of the DataContext to read the generated CAML. For the first query, the CAML looks like this (I have excluded the contents of the <ViewFields> -element):

<View>
  <Query>
    <Where>
      <And>
        <BeginsWith>
          <FieldRef Name="ContentTypeId" />
          <Value Type="ContentTypeId">0x010200</Value>
        </BeginsWith>
        <Eq>
          <FieldRef Name="ID" />
          <Value Type="Counter">1</Value>
        </Eq>
      </And>
    </Where>
  </Query>
  <ViewFields>
  </ViewFields>
  <RowLimit Paged="TRUE">1</RowLimit>
</View>

The second query produces the following CAML:

<View>
  <Query>
    <Where>
      <BeginsWith>
        <FieldRef Name="ContentTypeId" />
        <Value Type="ContentTypeId">0x010200</Value>
      </BeginsWith>
    </Where>
  </Query>
  <ViewFields>
  </ViewFields>
  <RowLimit Paged="TRUE">2147483647</RowLimit>
</View>

You see? It seems that the provider doesn’t know how to interpret FirstOrDefault properly to CAML. The unfortunate result is that all 4000 calendar elements are loaded from content database to the web front-end, and the FirstOrDefault is executed in memory. Using FirstOrDefault may not be the intended use of LINQ to SharePoint, but should the programmer be aware of that? Similar problems will arise if the query contains custom columns mapped using the ICustomMapping interfacethe provider cannot translate them to CAML either.

Conclusion
Even though the LINQ interface to SharePoint seems lucratively easy-to-use, it should be used with caution. I suggest considering the following principles when writing LINQ to SharePoint queries, especially at performance-critical parts or when dealing with potentially large lists:

  • Plan explicitly what part of the query is intended to be executed in SharePoint content database (CAML) and what part in the web front-end server memory. For this, you need to understand the limitations of the LINQ to SharePoint provider.
  • Write and comment your code so that these intentions are made explicit to the reader.
  • Test your code to check that the generated CAML is what you expected it to be (use the Log-property for this). You may be surprised.

Most likely you will want to pose some simple where-conditions and a limit condition in the CAML part to guarantee adequate performance, and then do the complex filtering logic in memory.

Popularity: 2% [?]

10 comments to “LINQ to SharePoint Performance Pitfalls”

  1. Thank you for that. Very useful to know. And now I’m off to rewrite my linq queries. :)

  2. Petri Tolonen says:

    I would quess the reason is in a use of FirstOrDefault/SingleOrDefault, but it’s just a quess with a zero experience. It would be interesting to know does it make any difference.

  3. On some dummy tests I’ve done, I’ve found performance testing to be a _very_ confusing area, especially when you’re trying to base decisions off of it. Here’s a suggestion you might find interesting – try switch the order of execution of the two tests… When I did it, I found the results switched completely! What you can consider doing is throwing the very first execution of the very first test away, as it skews the results.

    A small suggestion – instead of using DateTime.Now, rather use the Stopwatch class from System.Diagnostics. It’s got a MUCH finer level of granularity.

  4. Here’s another take on the Benchmark method that might be of interest:

    private static TimeSpan Benchmark(int repeatCount, Action action)
    {
    Stopwatch sw = new Stopwatch();

    for (int i = 0; i < repeatCount; i++)
    {
    if (i == 1)
    sw.Start();

    action();
    }

    sw.Stop();
    return sw.Elapsed;
    }

  5. THe first time it takes to connect. If you repeat the first test, you’l still see the second one is faster.

  6. Kornelis says:

    Thanks for sharing this, Lauri :)

    Been trying to solve a problem with huge memory loads on our initial synchronisation tool for SharePoint, that gets lots and lots of data from another application in our network, and puts it in SharePoint. I found that somewhere in a piece of code, that gets hit a lot during the synchronisation, I’m using .FirstOrDefault() to retrieve possible existing items from sometimes very big lists (>25000 items). Going to find out whether it will solve the problem.

    @ Boris: That’s true, but don’t forget that Lauri is repeating the process a 100 times before making any conclusions. I think the differences generated by the connection-speed is averaged out well enough :)

  7. “Thank you for that. Very useful to know.”

  8. Togagreem says:

    ugg boots, drugs whatever send me i buy all – ok manservant nearly village ? here i my adress details
    search1computers@yahoo.com

  9. Raj says:

    Try 2nd query this way: var item2 = context.Calendar.Where(x => x.Id == 1).FirstOrDefault(); and see the difference.

  10. Javier V says:

    This one doesn’t have the where clause: var item2 = context.Calendar.FirstOrDefault(x => x.Id == 1);

    As Raj says, the correct way to do the query is this: var item2 = context.Calendar.Where(x => x.Id == 1).FirstOrDefault();

    Hope it helps!

Leave a Reply