Querying Solr while indexing causes loss documents from index - c#

I'm wrote simple .NET Windows service, that pushes documents to Apache Solr v4.1. For access to Solr, I used SolrNet. My code is:
var solr = _container.Resolve<ISolrOperations<Document>>();
var docs = from o in documents
orderby o.Id ascending
select o;
for (var i = 0; i < docs.Count(); i++ )
var texts = new List<string>();
if (docs.ToList()[i].DocumentAttachments.Count > 0)
foreach (var attach in docs.ToList()[i].DocumentAttachments)
using (var fileStream = System.IO.File.OpenRead(...))
var extractResult = solr.Extract(
new ExtractParameters(fileStream, attach.Id.ToString(CultureInfo.InvariantCulture))
ExtractFormat = ExtractFormat.Text,
ExtractOnly = true
docs.ToList()[i].GetFilesText = texts;
if (i % _commitStep == 0)
"Document.GetFilesText" - this is a field, storing text, extracted from pdf files.
This example is cleaned from logging methods(writes to Windows Event Log). While indexing, I'm watched to:
a) Event Log - shows documents indexing progress
b) "Core Admin" page in "Solr Admin" webapp - shows count of documents in index
When I'm just indexing documents, without searching, all works right - event log shows "7500 docs added" entry, "Core Admin" shows num docs = 7500.
But, if I try to search documents during indexing, I have these errors:
- search results contains not all passed documents
- "Core Admin" resets num docs value. For example, EventLog shows 7500 docs indexed, but "Core Admin" shows num docs=23. And num docs resets every time, when I'm querying Solr.
My querying code:
searchPhrase = textBox1.Text;
var documents = Solr.Query(new SolrQuery(searchPhrase), new QueryOptions
Highlight = new HighlightingParameters
UsePhraseHighlighter = true,
Fields = new Collection<string> { "Field1", "Field2", "Field3" },
BeforeTerm = "<b>",
AfterTerm = "</b>"
Rows = 100
UPD: to make things clear
I have these lines in my webapp's "search" page:
public class MyController : Controller
public ISolrOperations<Document> Solr { get; set; }
public MyController()
//_solr = solr;
// GET: /Search/My/
public ActionResult Index()
return View();
And, opening this page in browser, causes totally loss of documents from Solr index.:-)

You are seeing this behavior because the first thing you do is clear the index.
This removes all documents from the index. So once reindexing starts the index will be empty.
Now in your subsequent code you are adding the items back into the index in batches. However any new documents you add to the index will not be visible to users querying the index until a commit is issued. Since you are adding documents and issuing commits in batches during that explains why your document counts are increasing while you are rebuilding and why not all documents are visible. Your counts and total documents in the index will not be 7500 until the last commit is issued.
There might be a couple of options to help alleviate this for you.
Issue soft commits to Solr using commitWithin or auto soft commits to Solr. CommitWithin is supported as an optional AddParameter to the Add method in SolrNet. You could issue solr.Add(docs.ToList()[i], new AddParameters{ CommitWithin = 3000}); which would tell Solr to commit this batch of items within 3 seconds.
Use Solr Cores to have an "active" core that users are searching against and reload your logs data into a "standby" core. Once the load process to the standby core has completed, you can issue a command to SWAP the cores and this will be totally transparent to any users. CoreAdmin commands are supported in SolrNet as well, see the the tests in SolrCoreAdminFixture.cs for examples.
Hope this helps.


Read all users from AD using Novell.Directory.Ldap.NETStandard

I need to read all users from the AD. Here is code that I am using:
using Novell.Directory.Ldap;
using Novell.Directory.Ldap.Controls;
using System.Linq;
namespace LdapTestApp
class Program
static void Main()
LdapConnection ldapConn = new LdapConnection();
ldapConn.SecureSocketLayer = true;
ldapConn.Connect(HOST, PORT);
var cntRead = 0;
int? cntTotal = null;
var curPage = 0;
var constraints = new LdapSearchConstraints();
constraints.SetControls(new LdapControl[]
new LdapSortControl(new LdapSortKey("sn"), true),
new LdapVirtualListControl("sn=*", 0, 10)
ILdapSearchResults searchResults = ldapConn.Search(
"OU=All Users,DC=homecredit,DC=ru",
while (searchResults.HasMore() && ((cntTotal == null) || (cntRead < cntTotal)))
LdapEntry entry = searchResults.Next();
catch (LdapReferralException)
cntTotal = GetTotalCount(searchResults as LdapSearchResults);
} while ((cntTotal != null) && (cntRead < cntTotal));
private static int? GetTotalCount(LdapSearchResults results)
if (results.ResponseControls != null)
var r = (from c in results.ResponseControls
let d = c as LdapVirtualListResponse
where (d != null)
select (LdapVirtualListResponse)c).SingleOrDefault();
if (r != null)
return r.ContentCount;
return null;
I used this question Page LDAP query against AD in .NET Core using Novell LDAP as basis.
Unfortunatelly I get this exception when I am trying to recieve the very first entry:
"Unavailable Critical Extension"
000020EF: SvcErr: DSID-03140594, problem 5010 (UNAVAIL_EXTENSION), data 0
What am I doing wrong?
VLVs are browsing indexes and are not directly related to the possibility or not to browse large numbers of entries (see generic documentation). So even if this control would be activated on your AD, you wouldn't be able to retrieve more than 1000 elements this way :
how VLVs work on AD
MaxPageSize is 1000 by default on AD (see documentation)
So what you can do:
use a specific paged results control, but it seems that the Novell C# LDAP library does not have one
ask you the question: "is this pertinent to look for all the users in a single request?" (your request looks like a batch request: remember that a LDAP server is not designed for the same purposes than a classic database - that can easily return millions of entries - and that's why most of LDAP directories have default size limits around 1000).
The answer is no: review your design, be more specific in your LDAP search filter, your search base, etc.
The answer is yes:
you have a single AD server: ask your administrator to change the MaxPageSize value, but this setting is global and can lead to several side effects (ie. what happens if everybody start to request all the users all the time?)
you have several AD servers: you can configure one for specific "batch like" queries like the one you're trying to do (so large MaxPageSize, large timeouts etc.)
I had to use approach described here:
The solution is far from being perfect but at least I am able to move on.
Starting with version 3.5 the library supports Simple Paged Results Control - https://ldapwiki.com/wiki/Simple%20Paged%20Results%20Control - and the usage is as simple as ldapConnection.SearchUsingSimplePaging(searchOptions, pageSize) or ldapConnection.SearchUsingSimplePaging(ldapEntryConverter, searchOptions, pageSize) - see Github repo for more details - https://github.com/dsbenghe/Novell.Directory.Ldap.NETStandard and more specifically use the tests as usage samples.

Ektron taxonomy and library items (in v9)

We recently upgraded from Ektron 8.6 to 9.0 (Ektron CMS400.NET, Version: 9.00 SP2(Build
I have some code (below) which we use to display links to items in a taxonomy. Under 8.6, this would show library items if they had been added to the taxonomy. As of 9.0, it no longer displays library items. It still works for DMS items and normal pages (all first class content in Ektron).
private List<ContentData> getTaxonomyItems(long TaxonomyId)
listContentManager = new ContentManager();
criteria = new ContentTaxonomyCriteria(ContentProperty.Id, EkEnumeration.OrderByDirection.Ascending);
criteria.PagingInfo = new Ektron.Cms.PagingInfo(400); // there's a lot of items and I don't want to page them.
criteria.AddFilter(TaxonomyId, true); // this gets sub taxonomies too :)
List<ContentData> contentList = listContentManager.GetList(criteria);
return contentList;
(I would love to simply say to users to use the DMS instead of the library, but we have a security requirement and I'm not aware of a way I can enforce security on DMS items like we can with library items by dropping a webconfig file in the library folder.)
Is this a bug that anyone else has experienced?
Or is there a problem with my code (did an API change in the upgrade to 9.0)?
I ended up emailing Ektron support in Sydney (I'm in Australia), and they said:
I would expect ContentManager to only return content, not library
items – must have been a loophole which is now closed. Taxonomy is the
way to go.
So I used some of the code they provided and came up with the following, which appears to work...
private List<TaxonomyItemData> getTaxonomyItems(long TaxonomyId)
List<TaxonomyItemData> list = new List<TaxonomyItemData>();
TaxonomyManager taxManager = new TaxonomyManager(Ektron.Cms.Framework.ApiAccessMode.Admin);
TaxonomyCriteria taxonomyCriteria = new Ektron.Cms.Organization.TaxonomyCriteria();
Ektron.Cms.Common.CriteriaFilterOperator.StartsWith, GetTaxonomyPathById(TaxonomyId));
List<TaxonomyData> TaxonomyDataList = taxManager.GetList(taxonomyCriteria);
foreach (TaxonomyData taxd in TaxonomyDataList)
TaxonomyData taxTree = taxManager.GetTree(taxd.Path,
1, // depth. doesn't seem to work. have to manually tranverse lower taxonomies.
true, // include items
foreach (TaxonomyItemData taxItem in taxTree.TaxonomyItems)
return list;
private static String GetTaxonomyPathById(long taxonomyId)
TaxonomyManager tMgr = new TaxonomyManager();
TaxonomyData tData = tMgr.GetItem(taxonomyId);
if (tData != null)
return tData.Path;
return "";
This code fetches items for all the child taxonomies as well as returning library items.
The one problem is that it fetches duplicates for some items, but those are easy to clean out.
I was also told by Ektron that...
TaxonomyManager.GetItem(“{path}”) is a more efficient way to get the
That's why I've included the GetTaxonomyPathById() method (inspired by this blog post: http://www.nimbleuser.com/blog/posts/2009/iterating-through-ektron-content-in-multiple-taxonomies-via-directly-interfacing-with-search-indexing-services/ )

Adding AsParallel() call cause my code to break on writing a file

I'm building a console application that have to process a bunch of document.
To stay simple, the process is :
for each year between X and Y, query the DB to get a list of document reference to process
for each of this reference, process a local file
The process method is, I think, independent and should be parallelized as soon as input args are different :
private static bool ProcessDocument(
DocumentsDataset.DocumentsRow d,
string langCode
var htmFileName = d.UniqueDocRef.Trim() + langCode + ".htm";
var htmFullPath = Path.Combine("x:\path", htmFileName;
missingHtmlFile = !File.Exists(htmFullPath);
if (!missingHtmlFile)
var html = File.ReadAllText(htmFullPath);
// ProcessHtml is quite long : it use a regex search for a list of reference
// which are other documents, then sends the result to a custom WS
ProcessHtml(ref html);
File.WriteAllText(htmFullPath, html);
return true;
catch (Exception exc)
Trace.TraceError("{0,8}Fail processing {1} : {2}","[FATAL]", d.UniqueDocRef, exc.ToString());
return false;
In order to enumerate my document, I have this method :
private static IEnumerable<DocumentsDataset.DocumentsRow> EnumerateDocuments()
return Enumerable.Range(1990, 2020 - 1990).AsParallel().SelectMany(year => {
return Document.FindAll((short)year).Documents;
Document is a business class that wrap the retrieval of documents. The output of this method is a typed dataset (I'm returning the Documents table). The method is waiting for a year and I'm sure a document can't be returned by more than one year (year is part of the key actually).
Note the use of AsParallel() here, but I never got issue with this one.
Now, my main method is :
var documents = EnumerateDocuments();
var result = documents.Select(d => {
bool success = true;
foreach (var langCode in new string[] { "-e","-f" })
success &= ProcessDocument(d, langCode);
return new {
using (var sw = File.CreateText("summary.csv"))
foreach (var item in result)
string level;
if (!item.success) level = "[ERROR]";
else level = "[OK]";
This method works as expected under this form. However, if I replace
var documents = EnumerateDocuments();
var documents = EnumerateDocuments().AsParrallel();
It stops to work, and I don't understand why.
The error appears exactly here (in my process method):
File.WriteAllText(htmFullPath, html);
It tells me that the file is already opened by another program.
I don't understand what can cause my program not to works as expected. As my documents variable is an IEnumerable returning unique values, why my process method is breaking ?
thx for advises
[Edit] Code for retrieving document :
/// <summary>
/// Get all documents in data store
/// </summary>
public static DocumentsDS FindAll(short? year)
Database db = DatabaseFactory.CreateDatabase(connStringName); // MS Entlib
DbCommand cm = db.GetStoredProcCommand("Document_Select");
if (year.HasValue) db.AddInParameter(cm, "Year", DbType.Int16, year.Value);
string[] tableNames = { "Documents", "Years" };
DocumentsDS ds = new DocumentsDS();
db.LoadDataSet(cm, ds, tableNames);
return ds;
[Edit2] Possible source of my issue, thanks to mquander. If I wrote :
var test = EnumerateDocuments().AsParallel().Select(d => d.UniqueDocRef);
var testGr = test.GroupBy(d => d).Select(d => new { d.Key, Count = d.Count() }).Where(c=>c.Count>1);
var testLst = testGr.ToList();
Console.WriteLine(testLst.Where(x => x.Count == 1).Count());
Console.WriteLine(testLst.Where(x => x.Count > 1).Count());
I get this result :
Removing the AsParallel returns the same output.
Conclusion : my EnumerateDocuments have something wrong and returns twice each documents.
Have to dive here I think
This is probably my source enumeration in cause
I suggest you to have each task put the file data into a global queue and have a parallel thread take writing requests from the queue and do the actual writing.
Anyway, the performance of writing in parallel on a single disk is much worse than writing sequentially, because the disk needs to spin to seek the next writing location, so you are just bouncing the disk around between seeks. It's better to do the writes sequentially.
Is Document.FindAll((short)year).Documents threadsafe? Because the difference between the first and the second version is that in the second (broken) version, this call is running multiple times concurrently. That could plausibly be the cause of the issue.
Sounds like you're trying to write to the same file. Only one thread/program can write to a file at a given time, so you can't use Parallel.
If you're reading from the same file, then you need to open the file with only read permissions as not to put a write lock on it.
The simplest way to fix the issue is to place a lock around your File.WriteAllText, assuming the writing is fast and it's worth parallelizing the rest of the code.

TFS 2010: How to produce a changelog (ie. list of work items) between two releases of the application using labels?

I'm looking for a way to automatically produce a changelog (actually a list of workitems) between two releases of my application. I have two versions of my application, v1 and v2, each is identified by a label in TFS 2010 (LABEL1 and LABEL2) that I manually created before building the setups of my app.
I have a branching system, which means I have a trunk were most of bugs are fixed, and a branch where patches are applied mostly using merges from the trunk (but there are also some fixes on the branch only that do not concern the trunk). The two versions of my application (v1 and v2) are versions from the branch.
I would like TFS 2010 to be able to return the list of bugs that were fixed (ie. the list of work items with type = Bug that are closed and verified) between these two labels.
I tried to achieve this using the web UI of TFS 2010, or using Visual Studio, but I didn't find any way.
Then I tried to ask tf.exe for a history using the following command line:
tf history /server:http://server_url/collection_name "$/project_path" /version:LLABEL1~LLABEL2 /recursive /noprompt /format:brief
where LABEL1 is the label that has been associated with the source code of the v1 of the application, and LABEL2 the label that has been associated with the source code of the v2 of the application.
It actually fails in two ways:
- the command line only returns a list of changesets, not a list of associated closed work items
- the list of changesets only contains the changesets that I applied on the branch itself, not the changesets that I also applied and the trunk and then merged to the branch. Setting or not the "/slotmode" parameter doesn't change anything.
There I tried to write a piece of C# code to retrieve the list of workitems (not the list of changesets):
var tfs = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(new Uri("http://server_url/collection_name"));
VersionControlServer controlServer = tfs.GetService<VersionControlServer>();
VersionControlServer vcs = tfs.GetService<VersionControlServer>();
VersionSpec sFrom = VersionSpec.ParseSingleSpec("LLABEL1", null);
VersionSpec sTo = VersionSpec.ParseSingleSpec("LLABEL2", null);
var changesets = vcs.QueryHistory(
false); // Slotmode to false
Dictionary<int, WorkItem> dico = new Dictionary<int, WorkItem>();
foreach (Changeset set in changesets)
foreach (WorkItem zz in set.WorkItems)
if (!dico.ContainsKey(zz.Id))
dico.Add(zz.Id, zz);
foreach (KeyValuePair<int, WorkItem> pair in dico.OrderBy(z => z.Key))
Console.WriteLine(string.Format("ID: {0}, Title: {1}", pair.Key, pair.Value.Title));
This actually works, I get the list of workitems between my two labels which is actually what I wanted. But only workitems associated to changesets that were committed on the branch itself are taken into account: the workitems of type "Bug" that were solved on the trunk then merged to the branch don't appear. Slotmode doesn't change anything.
Then I finally tried to replace VersionSpecs that were defined by a label with VersionSpecs that are defined by changesets:
VersionSpec sFrom = VersionSpec.ParseSingleSpec("C5083", null);
VersionSpec sTo = VersionSpec.ParseSingleSpec("C5276", null);
And my code finally works.
So my question is: how could I get the same result with labels, which are the TFS objects I use to identify a version? If it's not possible, how should I identify a version in TFS 2010?
Btw I found some questions on stackoverflow, but none of them gave me answers with labels. For instance:
Question example
I think http://tfschangelog.codeplex.com/ can possibly help you here.
TFS ChangeLog applicatoin allows users to automatically generate release notes from TFS. Users will have to provide information on thier project, branch and changeset range and then TFS ChangeLog application will extract information from each changeset in a given range and all the associated workitems to such changesets. i.e. it will travel from starting changeset upto ending changeset and will extract data about each changeset along with associated workitems in an XML file.
Users can then use their own transformation logic including filter, sorting, styling, output formatting, etc. to generate Release Notes Report.
Another thing I would like to add here will be related to Labels in TFS. Labels are basically assigned / associated with changesets. Currently, TFS ChangeLog application does not support Labels to define starting and ending point but it does support changeset which can be used as a workaround solution.
Hope this is useful.
In general, the absolute method of defining points in time in any SCM is clearly the checkin-id. Using labels to abstract this, is in TFS not the optimum as discussed here & here. A better approach is to use builds instead, especially in a modern CI environment.
In order to retrieve the max changeset that was contained in a given build you 'd have to do something like this:
using System;
using System.Collections.Generic;
using Microsoft.TeamFoundation.Build.Client;
using Microsoft.TeamFoundation.Client;
namespace GetChangesetsFromBuild
class Program
static void Main()
TfsTeamProjectCollection tpc = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(new Uri("http://TFSServer:8080/Name"));
IBuildServer bs = (IBuildServer)tpc.GetService(typeof(IBuildServer));
IBuildDetail build = bs.GetAllBuildDetails(new Uri("vstfs:///..."));
List<IChangesetSummary> associatedChangesets = InformationNodeConverters.GetAssociatedChangesets(build);
int idMax = associatedChangesets[0].ChangesetId;
A difficulty with the above is to retrieve the BuildUri of the builds you are interested in. In order to get this information you could do something like this:
IBuildDetail[] builds = bs.QueryBuilds("TeamPorjectName", "yourBuildDefinitionName")
and then retrieve the Uri's that are important to you.
This is also a good vehicle if you eventually insist on using labels: Besides Uri, each build[] has also a LabelName.
I have been in the same situation as you. I also want Work Items from merged changesets included. I only include Work Items that are Done. Also if the same Work Item is linked to multiple changesets, only the last changeset is reported. I use this in a CI setup; and create a changelog for each build. The List<ChangeInfo> can then be exported to a XML/HTML/TXT-file. Here is my solution:
namespace TFSChangelog
public class TFSChangelogGenerator
private const string workItemDoneText = "Done";
/// <summary>
/// This class describes a change by:
/// Changeset details
/// and
/// WorkItem details
/// </summary>
public class ChangeInfo
#region Changeset details
public DateTime ChangesetCreationDate { get; set; }
public int ChangesetId { get; set; }
#region WorkItem details
public string WorkItemTitle { get; set; }
public int WorkItemId { get; set; }
public static List<ChangeInfo> GetChangeinfo(string tfsServer, string serverPath, string from, string to)
// Connect to server
var tfs = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(new Uri(tfsServer));
var vcs = tfs.GetService<VersionControlServer>();
// Create versionspec's
VersionSpec versionFrom = null;
if (!string.IsNullOrEmpty(from))
versionFrom = VersionSpec.ParseSingleSpec(from, null);
VersionSpec versionTo = VersionSpec.Latest;
if (!string.IsNullOrEmpty(to))
versionTo = VersionSpec.ParseSingleSpec(to, null);
// Internally used dictionary
var changes = new Dictionary<int, ChangeInfo>();
// Find Changesets that are checked into the branch
var directChangesets = vcs.QueryHistory(
foreach (var changeset in directChangesets)
foreach (var workItem in changeset.WorkItems.Where(workItem => workItem.State == workItemDoneText))
if (changes.ContainsKey(workItem.Id))
if (changeset.ChangesetId < changes[workItem.Id].ChangesetId) continue;
changes[workItem.Id] = new ChangeInfo { ChangesetId = changeset.ChangesetId, ChangesetCreationDate = changeset.CreationDate, WorkItemId = workItem.Id, WorkItemTitle = workItem.Title };
// Find Changesets that are merged into the branch
var items = vcs.GetItems(serverPath, RecursionType.Full);
foreach (var item in items.Items)
var changesetMergeDetails = vcs.QueryMergesWithDetails(
foreach (var merge in changesetMergeDetails.Changesets)
foreach (var workItem in merge.WorkItems.Where(workItem => workItem.State == workItemDoneText))
if (changes.ContainsKey(workItem.Id))
if (merge.ChangesetId < changes[workItem.Id].ChangesetId) continue;
changes[workItem.Id] = new ChangeInfo { ChangesetId = merge.ChangesetId, ChangesetCreationDate = merge.CreationDate, WorkItemId = workItem.Id, WorkItemTitle = workItem.Title };
// Return a list sorted by ChangesetId
return (from entry in changes orderby entry.Value.ChangesetId descending select entry.Value).ToList();
This question got me closer to solving a similar problem I was having.
Use the type LabelVersionSpec instead of VersionSpec for label versions.
VersionSpec sFrom = VersionSpec.ParseSingleSpec("LLABEL1", null);
VersionSpec sTo = VersionSpec.ParseSingleSpec("LLABEL2", null);
LabelVersionSpec sFrom = new LabelVersionSpec("LLABEL1");
LabelVersionSpec sTo = new LabelVersionSpec("LLABEL2");

How to programatically in c# get the latest top "n" commit messages from a svn repository

I'd like to build a site which simply displays the top latest (by date, revision?) "n" commit logs plus other associated info.
What's the best way to do this? I started having a quick look at SharpSvn, but the GET seems to be based on Revision ranges rather than date.
I'd like a simple example for .Net in c# based on any available library which gets the job done.
Since you mentioned using SharpSVN, I happen to have written this in BuildMaster:
private static IList<string> GetLatestCommitMessages(Uri repository, int count)
using (var client = new SvnClient())
System.Collections.ObjectModel.Collection<SvnLogEventArgs> logEntries;
var args = new SvnLogArgs()
Limit = count
client.GetLog(repository, args, out logEntries);
return logEntries.Select(log => log.LogMessage).ToList();

