I created a tool which iterates through all Commits of a repository.
Then it makes a diff of the commit to all of its parents and then reads the content for some checks.
Now it came out that this slows down very quickly. It always slows down after the same specific commit which is quite large as it is a merge commit.
Here the code how I iteratre through the commits. The following codes are a little simplified for a better focus.
var repo = new Repository(path);
foreach (LibGit2Sharp.Commit commit in repo.Commits)
{
IEnumerable<FileChanges> changed = repo.GetChangedToAllParents(commit);
var files = ResolveChangeFileInfos(changed);
var entry = new Commit(commit.Id.ToString(), commit.Author.Email, commit.Committer.When, commit.Message, files);
yield return entry;
}
in GetChagedToAllParents I basically make a diff foreach parent like this
foreach(var parent in commit.parent)
{
var options = new CompareOptions
{
Algorithm = DiffAlgorithm.Minimal,
IncludeUnmodified = false
};
var patches = repo.Diff.Compare<Patch>(parent.Tree, commit.Tree, options); // Difference
}
and later I read the content of the files in this way:
var blob = repo.Lookup<Blob>(patchEntry.Oid.Sha); // Find blob
Stream contentStream = blob.GetContentStream();
string result = null;
using (var tr = new StreamReader(contentStream, Encoding.UTF8))
{
result = tr.ReadToEnd();
}
Are there any known issues ? am I missing any leaks?
Update
I Found out that most of the time (about 90%) is taken by the diff. and it gets constantly slower
var options = new CompareOptions
{
Algorithm = DiffAlgorithm.Minimal,
IncludeUnmodified = false
};
var patches = repo.Diff.Compare<Patch>(parent.Tree, commit.Tree, options); // Difference
I can reproduce it with this code:
var repo = new Repository(path);
foreach (var commit in repo.Commits)
{
pos++;
if (pos%100 == 0)
{
Console.WriteLine(pos);
}
var options = new CompareOptions
{
Algorithm = DiffAlgorithm.Minimal,
IncludeUnmodified = false,
Similarity = new SimilarityOptions
{
RenameDetectionMode = RenameDetectionMode.None,
WhitespaceMode = WhitespaceMode.IgnoreAllWhitespace
}
};
foreach (var parent in commit.Parents)
{
var changedFiles=
repo.Diff.Compare<TreeChanges>(parent.Tree, commit.Tree, options).ToList();
}
}
it reserved about 500MB for each 1000 commits and at some point it just crashes. So I posted it also here:
https://github.com/libgit2/libgit2sharp/issues/1359
Is there a faster way to get all files that were changed in a specific commit?
Related
I have difficulties understanding this example on how to use facets :
https://lucenenet.apache.org/docs/4.8.0-beta00008/api/Lucene.Net.Demo/Lucene.Net.Demo.Facet.SimpleFacetsExample.html
My goal is to create an index in which each document field have a facet, so that at search time i can choose which facets use to navigate data.
What i am confused about is setup of facets in index creation, to
summarize my question : is index with facets compatibile with
ReferenceManager?
Need DirectoryTaxonomyWriter to be actually written and persisted
on disk or it will embedded into the index itself and is just
temporary? I mean given the code
indexWriter.AddDocument(config.Build(taxoWriter, doc)); of the
example i expect it's temporary and will be embedded into the index (but then the example also show you need the Taxonomy to drill down facet). So can the Taxonomy be tangled in some way with the index so that the are handled althogeter with ReferenceManager?
If is not may i just use the same folder i use for storing index?
Here is a more detailed list of point that confuse me :
In my scenario i am indexing the document asyncrhonously (background process) and then fetching the indext ASAP throught ReferenceManager in ASP.NET application. I hope this way to fetch the index is compatibile with DirectoryTaxonomyWriter needed by facets.
Then i modified the code i write introducing the taxonomy writer as indicated in the example, but i am a bit confused, seems like i can't store DirectoryTaxonomyWriter into the same folder of index because the folder is locked, need i to persist it or it will be embedded into the index (so a RAMDirectory is enougth)? if i need to persist it in a different direcotry, can i safely persist it into subdirectory?
Here the code i am actually using :
private static void BuildIndex (IndexEntry entry)
{
string targetFolder = ConfigurationManager.AppSettings["IndexFolder"] ?? string.Empty;
//** LOG
if (System.IO.Directory.Exists(targetFolder) == false)
{
string message = #"Index folder not found";
_fileLogger.Error(message);
_consoleLogger.Error(message);
return;
}
var metadata = JsonConvert.DeserializeObject<IndexMetadata>(File.ReadAllText(entry.MetdataPath) ?? "{}");
string[] header = new string[0];
List<dynamic> csvRecords = new List<dynamic>();
using (var reader = new StreamReader(entry.DataPath))
{
CsvConfiguration csvConfiguration = new CsvConfiguration(CultureInfo.InvariantCulture);
csvConfiguration.AllowComments = false;
csvConfiguration.CountBytes = false;
csvConfiguration.Delimiter = ",";
csvConfiguration.DetectColumnCountChanges = false;
csvConfiguration.Encoding = Encoding.UTF8;
csvConfiguration.HasHeaderRecord = true;
csvConfiguration.IgnoreBlankLines = true;
csvConfiguration.HeaderValidated = null;
csvConfiguration.MissingFieldFound = null;
csvConfiguration.TrimOptions = CsvHelper.Configuration.TrimOptions.None;
csvConfiguration.BadDataFound = null;
using (var csvReader = new CsvReader(reader, csvConfiguration))
{
csvReader.Read();
csvReader.ReadHeader();
csvReader.Read();
header = csvReader.HeaderRecord;
csvRecords = csvReader.GetRecords<dynamic>().ToList();
}
}
string targetDirectory = Path.Combine(targetFolder, "Index__" + metadata.Boundle + "__" + DateTime.Now.ToString("yyyyMMdd_HHmmss") + "__" + Path.GetRandomFileName().Substring(0, 6));
System.IO.Directory.CreateDirectory(targetDirectory);
//** LOG
{
string message = #"..creating index : {0}";
_fileLogger.Information(message, targetDirectory);
_consoleLogger.Information(message, targetDirectory);
}
using (var dir = FSDirectory.Open(targetDirectory))
{
using (DirectoryTaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(dir))
{
Analyzer analyzer = metadata.GetAnalyzer();
var indexConfig = new IndexWriterConfig(LuceneVersion.LUCENE_48, analyzer);
using (IndexWriter writer = new IndexWriter(dir, indexConfig))
{
long entryNumber = csvRecords.Count();
long index = 0;
long lastPercentage = 0;
foreach (dynamic csvEntry in csvRecords)
{
Document doc = new Document();
IDictionary<string, object> dynamicCsvEntry = (IDictionary<string, object>)csvEntry;
var indexedMetadataFiled = metadata.IdexedFields;
foreach (string headField in header)
{
if (indexedMetadataFiled.ContainsKey(headField) == false || (indexedMetadataFiled[headField].NeedToBeIndexed == false && indexedMetadataFiled[headField].NeedToBeStored == false))
continue;
var field = new Field(headField,
((string)dynamicCsvEntry[headField] ?? string.Empty).ToLower(),
indexedMetadataFiled[headField].NeedToBeStored ? Field.Store.YES : Field.Store.NO,
indexedMetadataFiled[headField].NeedToBeIndexed ? Field.Index.ANALYZED : Field.Index.NO
);
doc.Add(field);
var facetField = new FacetField(headField, (string)dynamicCsvEntry[headField]);
doc.Add(facetField);
}
long percentage = (long)(((decimal)index / (decimal)entryNumber) * 100m);
if (percentage > lastPercentage && percentage % 10 == 0)
{
_consoleLogger.Information($"..indexing {percentage}%..");
lastPercentage = percentage;
}
writer.AddDocument(doc);
index++;
}
writer.Commit();
}
}
}
//** LOG
{
string message = #"Index Created : {0}";
_fileLogger.Information(message, targetDirectory);
_consoleLogger.Information(message, targetDirectory);
}
}
Follow code:
string[] files
= System.IO.Directory.GetFiles(#"C:\Users\Matheus Miranda\Pictures", "*.jpg");
foreach (var file in files)
{
var uploadParams = new ImageUploadParams()
{
File = new FileDescription(file),
PublicId = "my_folder/images",
EagerAsync = true
};
var uploadResult = cloudinary.Upload(uploadParams);
}
It is not working, it is always recording the previous file.
I'm trying to save multiple images to cloudinary and nothing is success.
Only one image is saved. I use a libray Cloudinary.
Any solution ?
When I tested it out, it works as expected; however, I would adjust a couple of things. The first thing is that you do not need the parameter eager_async as no eager transformation is being applied to the assets. An eager transformation lets you create a modified version of the original asynchronously after the asset has been uploaded. Secondly, if you wish to see the upload response, you can use property JsonObj and display it in the console. I have modified your sample here:
string[] files
= System.IO.Directory.GetFiles(#"C:\Users\Matheus Miranda\Pictures", "*.jpg");
foreach (var file in files)
{
var uploadParams = new ImageUploadParams()
{
File = new FileDescription(file),
UseFilename = true
};
var uploadResult = cloudinary.Upload(uploadParams);
Console.WriteLine(uploadResult.JsonObj);
}
I found the solution!
string[] files =
System.IO.Directory.GetFiles(#"C:\Users\Matheus Miranda\Pictures\teste", "*.jpg");
for (int i = 0; i < files.Length; i++)
{
var uploadParams = new ImageUploadParams()
{
File = new FileDescription(files[i]),
PublicId = $"my_folder/images/{System.IO.Path.GetFileName(files[i])}"
};
var uploadResult = cloudinary.Upload(uploadParams);
}
i try to speed up my lucene-search in my wpf-application.
i hoped that my search would be in the range of about 30ms.
87 search items where found in the index. So that is not very much.
But Stopwatch timer says, it takes around 400ms, way to much for me.
So can u check my code, how i can improve code ?
I also still measured time from the beginning of the try block to the foreach, so there is no big waste of time through init; but it is not (0 ms).
List<CardView> aAll_CardView = new List<CardView>();
try
{
SortField field = new SortField(LUCENT_STATE_MAIN, SortFieldType.STRING);
Sort sort = new Sort(field);
searchManager.MaybeRefreshBlocking(); // execute with fresh index searcher
var searcher = searchManager.Acquire();
var topDocs = searcher.Search(aBooleanQuery, 100, sort);
var _totalHits = topDocs.TotalHits;
CardView aCardView = null;
// measured time: take ~400-500ms grr !!!!
foreach ( var result in topDocs.ScoreDocs)
{
#region iterate through findings and put lucene data into CardView list
var aDoc = searcher.Doc(result.Doc);
aAll_CardView.Add(new CardView
{
// all fields are defined as TextField()...
// must be first, because used in e.g. Nr_Main
RelatedItemCount = aDoc.Get(LUCENT_RELATED_ITEMS),
Nr_Main = aDoc.Get(LUCENT_NR_MAIN),
Nr_Parent = aDoc.Get(LUCENT_NR_PARENT),
Antwort = aDoc.Get(LUCENT_ANTWORT),
Beschreibung = aDoc.Get(LUCENT_BESCHREIBUNG),
Note = aDoc.Get(LUCENT_NOTES),
Question_Main = aDoc.Get(LUCENT_TITLE_MAIN),
Question_Parent = aDoc.Get(LUCENT_TITLE_PARENT),
Book = aDoc.Get(LUCENT_BOOK),
Date_Create = aDoc.Get(LUCENT_DATE_CREATED),
Date_LastEdit = aDoc.Get(LUCENT_DATE_LASTEDIT),
Bibelstelle = aDoc.Get(LUCENT_BIBELSTELLE),
// ParseCore just uses TryParse to get enum for state
Status_Main = ParseCore(aDoc.Get(LUCENT_STATE_MAIN)),
Status_Parent = ParseCore(aDoc.Get(LUCENT_STATE_PARENT))
});
#endregion
}
}
catch(Exception e)
{
string exp = e.ToString();
new JMsg(exp).ShowDialog();
}
finally
{
}
In an ASP.Net MVC4 application, I'm using the following code to process a Go To Webinar Attendees report (CSV format).
For some reason, the file that is being loaded is not being released by IIS and it is causing issues when attempting to process another file.
Do you see anything out of the ordinary here?
The CSVHelper (CsvReader) is from https://joshclose.github.io/CsvHelper/
public AttendeesData GetRecords(string filename, string webinarKey)
{
StreamReader sr = new StreamReader(Server.MapPath(filename));
CsvReader csvread = new CsvReader(sr);
csvread.Configuration.HasHeaderRecord = false;
List<AttendeeRecord> record = csvread.GetRecords<AttendeeRecord>().ToList();
record.RemoveRange(0, 7);
AttendeesData attdata = new AttendeesData();
attdata.Attendees = new List<Attendee>();
foreach (var rec in record)
{
Attendee aa = new Attendee();
aa.Webinarkey = webinarKey;
aa.FullName = String.Concat(rec.First_Name, " ", rec.Last_Name);
aa.AttendedWebinar = 0;
aa.Email = rec.Email_Address;
aa.JoinTime = rec.Join_Time.Replace(" CST", "");
aa.LeaveTime = rec.Leave_Time.Replace(" CST", "");
aa.TimeInSession = rec.Time_in_Session.Replace("hour", "hr").Replace("minute", "min");
aa.Makeup = 0;
aa.RegistrantKey = Registrants.Where(x => x.email == rec.Email_Address).FirstOrDefault().registrantKey;
List<string> firstPolls = new List<string>()
{
rec.Poll_1.Trim(), rec.Poll_2.Trim(),rec.Poll_3.Trim(),rec.Poll_4.Trim()
};
int pass1 = firstPolls.Count(x => x != "");
List<string> secondPolls = new List<string>()
{
rec.Poll_5.Trim(), rec.Poll_6.Trim(),rec.Poll_7.Trim(),rec.Poll_8.Trim()
};
int pass2 = secondPolls.Count(x => x != "");
aa.FirstPollCount = pass1;
aa.SecondPollCount = pass2;
if (aa.TimeInSession != "")
{
aa.AttendedWebinar = 1;
}
if (aa.FirstPollCount == 0 || aa.SecondPollCount == 0)
{
aa.AttendedWebinar = 0;
}
attdata.Attendees.Add(aa);
attendeeToDB(aa); // adds to Oracle DB using EF6.
}
// Should I call csvread.Dispose() here?
sr.Close();
return attdata;
}
Yes. You have to dispose objects too.
sr.Close();
csvread.Dispose();
sr.Dispose();
Better strategy to use using keyword.
You should use usings for your streamreaders and writers.
You should follow some naming conventions (Lists contains always multiple entries, rename record to records)
You should use clear names (not aa)
What would be an effective way to do pagination with Active Directory searches in .NET? There are many ways to search in AD but so far I couldn't find how to do it effectively. I want to be able to indicate Skip and Take parameters and be able to retrieve the total number of records matching my search criteria in the result.
I have tried searching with the PrincipalSearcher class:
using (var ctx = new PrincipalContext(ContextType.Domain, "FABRIKAM", "DC=fabrikam,DC=com"))
using (var criteria = new UserPrincipal(ctx))
{
criteria.SamAccountName = "*foo*";
using (var searcher = new PrincipalSearcher(criteria))
{
((DirectorySearcher)searcher.GetUnderlyingSearcher()).SizeLimit = 3;
var results = searcher.FindAll();
foreach (var found in results)
{
Console.WriteLine(found.Name);
}
}
}
Here I was able to limit the search results to 3 but I wasn't able to get the total number of records corresponding to my search criteria (SamAccountName contains foo) neither I was able to indicate to the searcher to skip the first 50 records for example.
I also tried using the System.DirectoryServices.DirectoryEntry and System.DirectoryServices.Protocols.SearchRequest but the only thing I can do is specify the page size.
So is the only way to fetch all the results on the client and do the Skip and Count there? I really hope that there are more effective ways to achieve this directly on the domain controller.
You may try the virtual list view search. The following sort the users by cn, and then get 51 users starting from the 100th one.
DirectoryEntry rootEntry = new DirectoryEntry("LDAP://domain.com/dc=domain,dc=com", "user", "pwd");
DirectorySearcher searcher = new DirectorySearcher(rootEntry);
searcher.SearchScope = SearchScope.Subtree;
searcher.Filter = "(&(objectCategory=person)(objectClass=user))";
searcher.Sort = new SortOption("cn", SortDirection.Ascending);
searcher.VirtualListView = new DirectoryVirtualListView(0, 50, 100);
foreach (SearchResult result in searcher.FindAll())
{
Console.WriteLine(result.Path);
}
For your use case you only need the BeforeCount, AfterCount and the Offset properties of DirectoryVirtualListView (the 3 in DirectoryVirtualListView ctor). The doc for DirectoryVirtualListView is very limited. You may need to do some experiments on how it behave.
If SizeLimit is set to zero and PageSize is set to 500, the search will return all 12,000 results in pages of 500 items, with the last page containing only 200 items. The paging occurs transparently to the application and the application does not have to perform any special processing other than setting the PageSize property to the proper value.
SizeLimit limits the amount of results that you can retrieve at once - so your PageSize needs to be less than or equal to 1000 (Active Directory limits the maximum number of search results to 1000. In this case, setting the SizeLimit property to a value greater than 1000 has no effect.). The paging is done automagically behind the scenes when you call FindAll() etc.
For more details please refer MSDN
https://msdn.microsoft.com/en-us/library/ms180880.aspx
https://msdn.microsoft.com/en-us/library/system.directoryservices.directorysearcher.pagesize.aspx
https://msdn.microsoft.com/en-us/library/system.directoryservices.directorysearcher.sizelimit.aspx
Waaaay late to the party, but this is what I'm doing:
I'm using FindOne() instead of FindAll() and member;range=<start>-<end> on PropertiesToLoad.
There's a catch on member;range: when it's the last page, even if you pass member;range=1000-1999 (for instance), it returns member;range=1000-*, so you have to check for the * at the end to know if there is more data.
public void List<string> PagedSearch()
{
var list = new List<string>();
bool lastPage = false;
int start = 0, end = 0, step = 1000;
var rootEntry = new DirectoryEntry("LDAP://domain.com/dc=domain,dc=com", "user", "pwd");
var filter = "(&(objectCategory=person)(objectClass=user)(samAccountName=*foo*))";
using (var memberSearcher = new DirectorySearcher(rootEntry, filter, null, SearchScope.Base))
{
while (!lastPage)
{
start = end;
end = start + step - 1;
memberSearcher.PropertiesToLoad.Clear();
memberSearcher.PropertiesToLoad.Add(string.Format("member;range={0}-{1}", start, end));
var memberResult = memberSearcher.FindOne();
var membersProperty = memberResult.Properties.PropertyNames.Cast<string>().FirstOrDefault(p => p.StartsWith("member;range="));
if (membersProperty != null)
{
lastPage = membersProperty.EndsWith("-*");
list.AddRange(memberResult.Properties[membersProperty].Cast<string>());
end = list.Count;
}
else
{
lastPage = true;
}
}
}
return list;
}
private static DirectoryEntry forestlocal = new DirectoryEntry(LocalGCUri, LocalGCUsername, LocalGCPassword);
private DirectorySearcher localSearcher = new DirectorySearcher(forestlocal);
public List<string> GetAllUsers()
{
List<string> users = new List<string>();
localSearcher.SizeLimit = 10000;
localSearcher.PageSize = 250;
string localFilter = string.Format(#"(&(objectClass=user)(objectCategory=person)(!(objectClass=contact))(msRTCSIP-PrimaryUserAddress=*))");
localSearcher.Filter = localFilter;
SearchResultCollection localForestResult;
try
{
localForestResult = localSearcher.FindAll();
if (resourceForestResult != null)
{
foreach (SearchResult result in localForestResult)
{
if (result.Properties.Contains("mail"))
users.Add((string)result.Properties["mail"][0]);
}
}
}
catch (Exception ex)
{
}
return users;
}