I’m working on a project that imports over 1 million users from an Oracle database used with SharePoint/Forms Authentication into the SharePoint user profile store. This is done as a custom SharePoint timer job that pulls the users from the DB and creates/updates User Profiles through the SharePoint API.
When running a job on a recordset of this size, there are several things to strive for:
- Limit the time that the process needs to run (jobs can take days and overlap themselves)
- Reduce memory usage (the OWSTIMER.exe can already consume quite a bit with the regular timer jobs)
Two ways you can achieve this:
- Avoid UserExists() method
- Use a DataReader if possible
Avoid UserExists() method
Most code samples on the web that deal with programmatic creation of User Profiles will show code such as this:
1 |
<span style="color: #0000ff">if</span> (profileManager.UserExists(accountName) |
1 |
{ |
1 |
userProfile = profileManager.GetUserProfile(accountName)... |
1 |
} |
1 |
<span style="color: #0000ff">else</span> |
1 |
{ |
1 |
userProfile = profileManager.CreateUserProfile(accountName)... |
1 |
} |
On small recordsets, this is fine, but for large recordsets the UserExists method represents a bottleneck that can increase the duration that your process runs. In addition, in the code above, you will unknowingly call this method a second useless time, because the CreateUserProfile() method internally calls UserExists() as well.
There are two ways to avoid this method:
- Cache profile IDs in a Dictionary/Hashtable type object
- Use reflection to create user profiles
Cache Profile IDs (and MemberGroup IDs too)
The UserProfileManager object is an IEnumerable that you can iterate over and access all the Profiles in SharePoint. Caching the IDs of these profiles up front enables you to index into a Dictionary object to see if your profile exists, rather than hitting SQL Server with UserExists(). The following code helped to reduce processing time significantly (you take a hit up front, but it’s far less than the delay imposed by UserExists over large recordsets):
1 |
Dictionary<<span style="color: #0000ff">string</span>, Guid> cachedProfiles = <span style="color: #0000ff">new</span> Dictionary<<span style="color: #0000ff">string</span>, Guid>(); |
1 |
<span style="color: #0000ff">foreach</span>(UserProfile profile <span style="color: #0000ff">in</span> profileManager) |
1 |
{ |
1 |
cachedProfiles.Add(profile.AccountName, profile.ID); |
1 |
} |
1 |
... |
1 |
<span style="color: #0000ff">if</span>(cachedProfiles.ContainsKey(accountName) |
1 |
{ |
1 |
... |
1 |
} |
In addition, caching the Guid of the UserProfile lets you later use the overloaded method of GetUserProfile() that takes a Guid as a parameter, which seems to perform slightly better than the alternative that takes a string for AccountName.
This approach also works very well when importing large numbers of MemberGroups:
1 |
<span style="color: #0000ff">foreach</span> (MemberGroup memberGroup <span style="color: #0000ff">in</span> memberGroupManager) |
1 |
{ |
1 |
cachedMemberGroups.Add(memberGroup.DisplayName, memberGroup.Id); |
1 |
} |
NOTE: If you are wondering why not simply cache the entire UserProfile in the Dictionary (Dictionary<string, UserProfile>), the memory usage for this will be much higher, which will undo any gains by avoiding UserExists().
Use Reflection to Create User Profiles
The UserProfileManager’s CreateUserProfile() method internally calls the UserExists method, and then calls an internal constructor on the UserProfile object to actually create the profile. By using reflection, you can call this internal constructor yourself and avoid UserExists():
1 |
<span style="color: #008000">// Get some reflected information about the UserProfile object for later use</span> |
1 |
ConstructorInfo ci = <span style="color: #0000ff">typeof</span>(UserProfile).GetConstructor( |
1 |
BindingFlags.NonPublic | BindingFlags.Instance, |
1 |
<span style="color: #0000ff">null</span>, |
1 |
<span style="color: #0000ff">new</span> Type[] { <span style="color: #0000ff">typeof</span>(UserProfileManager), <span style="color: #0000ff">typeof</span>(<span style="color: #0000ff">string</span>), <span style="color: #0000ff">typeof</span>(<span style="color: #0000ff">string</span>) }, |
1 |
<span style="color: #0000ff">null</span>); |
1 |
Once you’ve got the reflected information, you can use the following code to create your UserProfile:
1 |
<span style="color: #0000ff">if</span> (cachedProfiles.ContainsKey(accountName)) |
1 |
{ |
1 |
<span style="color: #008000">// Get existing profile...</span> |
1 |
} |
1 |
<span style="color: #0000ff">else</span> |
1 |
{ |
1 |
<span style="color: #008000">// Create new profile</span> |
1 |
UserProfile newProfile = (UserProfile)ci.Invoke(<span style="color: #0000ff">new</span> <span style="color: #0000ff">object</span>[] { profileManager, accountName, displayName }); |
1 |
} |
NOTE: I’ve tried creating a user profile in this manner that already existed to see what would happen. The existing profile was updated, and I did not get any duplicate records in the SharePoint db. It appears the SQL under the hood already takes care of avoiding duplicates. General cautions about reflection still apply here though (API may change, etc.).
Use a DataReader if Possible
Instead of pulling a huge recordset into a DataTable, DataSet, or into a collection of custom objects, try to process your records one at a time using a data reader if your data source permits. This will keep memory usage down, as the garbage collector will dispose frequently any variables you create within a while(reader.Read()) loop. A DataTable with 1 million records in it will take up tons of memory on top of the large memory consumption that OWSTIMER.exe does already.
1 |
<span style="color: #0000ff">using</span> (OracleConnection conn = <span style="color: #0000ff">new</span> OracleConnection(_connectionString)) |
1 |
{ |
1 |
1 |
<span style="color: #0000ff">using</span> (OracleCommand cmd = <span style="color: #0000ff">new</span> OracleCommand(_sqlGetAllUsers, conn)) |
1 |
{ |
1 |
conn.Open(); |
1 |
1 |
OracleDataReader rdr = cmd.ExecuteReader(); |
1 |
1 |
<span style="color: #0000ff">if</span>(!rdr.HasRows()) |
1 |
{ |
1 |
<span style="color: #0000ff">return</span>; |
1 |
} |
1 |
1 |
<span style="color: #0000ff">while</span> (rdr.Read()) |
1 |
{ |
1 |
UserProfile profile = <span style="color: #0000ff">null</span>; |
1 |
<span style="color: #0000ff">string</span> accountName = rdr[<span style="color: #006080">"ACCOUNT_NAME"</span>] <span style="color: #0000ff">as</span> <span style="color: #0000ff">string</span>; |
1 |
<span style="color: #0000ff">string</span> firstName = rdr[<span style="color: #006080">"FIRST_NAME"</span>] <span style="color: #0000ff">as</span> <span style="color: #0000ff">string</span>; |
1 |
<span style="color: #0000ff">string</span> lastName = rdr[<span style="color: #006080">"LAST_NAME"</span>] <span style="color: #0000ff">as</span> <span style="color: #0000ff">string</span>; |
1 |
1 |
<span style="color: #0000ff">if</span> (cachedProfiles.ContainsKey(accountName)) |
1 |
{ |
1 |
profile = profileManager.GetUserProfile(cachedProfiles[accountName]); |
1 |
} |
1 |
<span style="color: #0000ff">else</span> |
1 |
{ |
1 |
profile = (UserProfile)ci.Invoke(<span style="color: #0000ff">new</span> <span style="color: #0000ff">object</span>[] { profileManager, accountName, displayName }); |
1 |
cachedProfiles.Add(accountName, profile.ID); |
1 |
} |
1 |
1 |
<span style="color: #0000ff">if</span> (!<span style="color: #0000ff">string</span>.IsNullOrEmpty(firstName)) |
1 |
{ |
1 |
profile[<span style="color: #006080">"FirstName"</span>].Value = firstName; |
1 |
} |
1 |
<span style="color: #0000ff">if</span> (!<span style="color: #0000ff">string</span>.IsNullOrEmpty(lastName)) |
1 |
{ |
1 |
profile[<span style="color: #006080">"LastName"</span>].Value = lastName; |
1 |
} |
1 |
<span style="color: #008000">// ... </span> |
1 |
profile.Commit(); |
1 |
1 |
} |
1 |
1 |
rdr.Close(); |
1 |
rdr.Dispose(); |
1 |
1 |
} |
1 |
} |
Of course other best practices also apply, such as:
- Getting pages of records, rather than all at once
- Implementing incremental change queries rather than all records all the time
- Only getting what you need from the data source
- Disposing your objects and data connections properly
Pingback: Links 2009-03-07 - Gunnar Peipman's ASP.NET blog
cool sitename man)))
————————
signature: http://pedeno.ru/
Hello, I can’t understand how to add your blog in my rss reader
————————
signature: http://werato.ru/
Oh, thanks for the feedback arrintout. My RSS link is here: http://feeds2.feedburner.com/LifeOnPlanetGroove
I’ll try to highlight that a little better in my design soon.
Nice article, how you managing the cache because if service is restarted you will loos you cache