I have data set that I have generate every permutation, then check some properties on it to see if is an object that I want to keep and use. The number of permutations is staggering, in the quadrillions. Is there anything that you can see in the code below that I can use to speed this up? I suspect that I can't speed it up to a reasonable amount of time, so I'm also looking at possibly sharding it onto multiple servers to process, but I'm having a hard time deciding where to shard it.
Any opinions or ideas is appreciated.
var boats = _warMachineRepository.AllBoats();
var marines = _warMachineRepository.AllMarines();
var bombers = _warMachineRepository.AllBombers().ToList();
var carriers = _warMachineRepository.AllCarriers().ToList();
var tanks = _warMachineRepository.AllTanks().ToList();
var submarines = _warMachineRepository.AllSubmarines();
var armies = new List<Army>();
int processed = 0;
Console.WriteLine((long)boats.Count*marines.Count*bombers.Count*carriers.Count*tanks.Count*submarines.Count);
// 70k of these
Parallel.ForEach(boats, new ParallelOptions(){MaxDegreeOfParallelism = Environment.ProcessorCount},boat =>
{
// 7500 of these
foreach (var marine in marines)
{
// 200 of these
foreach (var bomber in bombers)
{
// 200 of these
foreach (var carrier in carriers)
{
// 400 of these
foreach (var tank in tanks)
{
// 50 of these
foreach (var submarine in submarines)
{
var lineup = new Army()
{
Tank = tank,
Submarine = submarine,
Carrier = carrier,
Marine = marine,
Bomber = bomber,
Boats = boat
};
if (army.Hitpoints > 50000)
{
lock (lockObject)
{
armies.Add(lineup);
}
}
processed++;
if (processed%10000000 == 0)
{
Console.WriteLine("Processed: {0}, valid: {1}, DateTime: {2}", processed, armies.Count, DateTime.Now);
}
}
}
}
}
}
});
return armies;
If this code is referring to a simulation you might want to add some optimizations by:
Mark an object as changed (put it in a list) when it changes so there is no need to search multiple times
Decrease/throttle/tune the object update frequency
Use other information available to filter objects: are objects close to one another so they might affect/hurt/heal each other -> only then investigate changes
Change the data structure; by putting all attributes of all objects in a smartly setup matrix you might be able to use simple matrix multiplication to have the object interact. You might even be able to offload the multiplication to the GPU
You might be asking too much: so scale out by using more nodes/machines.
Related
Yes, I'm well aware Not to do this, but I have no choice. I'd agree that it's an XYZ issue, but since I can't update the service I have to use, it's out of my hands. I need some help to save some time, maybe learn something handy in the process.
I'm looking to map a list of models (items in this example) to what is essentially numbered variables of a service I'm posting to, in the example, that's the fields a part of new 'newUser'.
Additionally, there may not be always be X amount items in the list (On the right in the example), and yet I have a finite amount (say 10) of numbered variables from 'newUser' to map to (On the left in the example). So I'll have to perform a bunch of checks to avoid indexing a null value as well.
Current example:
if (items.Count >= 1 && !string.IsNullOrWhiteSpace(items[0].id))
{
newUser.itemId1 = items[0].id;
newUser.itemName1 = items[0].name;
newUser.itemDate1 = items[0].date;
newUser.itemBlah1 = items[0].blah;
}
else
{
// This isn't necessary, but this effectively what will happen
newUser.itemId1 = string.Empty;
newUser.itemName1 = string.Empty;
newUser.itemDate1 = string.Empty;
newUser.itemBlah1 = string.Empty;
}
if (items.Count >= 2 && !string.IsNullOrWhiteSpace(items[1].id))
{
newUser.itemId2 = items[1].id;
newUser.itemName2 = items[1].name;
newUser.itemDate2 = items[1].date;
newUser.itemBlah2 = items[1].blah;
}
// Removed the else to clean it up, but you get the idea.
// And so on, repeated many more times..
I looked into an example using Dictionary, but I'm unsure of how to map that to the model without just manually mapping all the variables.
PS: To all who come across this question, if you're implementing numbered variables in your API, please don't- it's wildly unnecessary and time consuming.
As an alternative to fiddling with the JSON, you could get down and dirty and use Reflection.
Given the following test data:
const int maxItemsToSend = 3;
class ItemToSend {
public string
itemId1, itemName1,
itemId2, itemName2,
itemId3, itemName3;
}
ItemToSend newUser = new();
record Item(string id, string name);
Item[] items = { new("1", "A"), new("2", "B") };
Using the rules you set forth in the question, we can loop through the projected fields as so:
// If `itemid1`,`itemId2`, etc are fields:
var fields = typeof(ItemToSend).GetFields();
// If they're properties, replace GetFields() with
// .GetProperties(BindingFlags.Instance | BindingFlags.Public);
for(var i = 1; i <= maxItemsToSend; i++){
// bounds check
var item = (items.Count() >= i && !string.IsNullOrWhiteSpace(items[i-1].id))
? items[i-1] : null;
// Use Reflection to find and set the fields
fields.FirstOrDefault(f => f.Name.Equals($"itemId{i}"))
?.SetValue(newUser, item?.id ?? string.Empty);
fields.FirstOrDefault(f => f.Name.Equals($"itemName{i}"))
?.SetValue(newUser, item?.name ?? string.Empty);
}
It's not pretty, but it works. Here's a fiddle.
I am using Deedle from c# and windowing through a frame is very slow compared with the same operation on a series. For example, for a series and frame with a similar size I am seeing 60ms vs 3500ms (series vs. frame).
Has anyone seen this before ?
var msftRaw = Frame.ReadCsv(#"C:\Users\olivi\source\repos\ConsoleApp\MSFT.csv");
var msft = msftRaw.IndexRows<DateTime>("Date").SortRowsByKey();
var rollingFrame = msft.Window(60); // 7700 ms
var openSeries = msft.GetColumn<double>("Open");
var rollingSeries = openSeries.Window(60); // 14 ms
var oneSeriesFrame = Frame.FromColumns(new Dictionary<string, Series<DateTime, double>> { { "Open", openSeries } });
var rollingFakeFrame = oneSeriesFrame.Window(60); // 3300mm
This is quite a common operation when working with financial time series data, for example calculating rolling correlation between prices, or calculating rolling realized volatility when there is a condition on another price time series.
I found a workaround for the performance issue: perform the rolling operation on each of the series individually, join the rolling series in a frame so they are aligned by date and write the processing function on the frame, selecting each series inside the processing function.
Continuing from the example above:
private static double CalculateRealizedCorrelation(ObjectSeries<string> objectSeries)
{
var openSeries = objectSeries.GetAs<Series<DateTime, double>>("Open");
var closeSeries = objectSeries.GetAs<Series<DateTime, double>>("Close");
return MathNet.Numerics.Statistics.Correlation.Pearson(openSeries.Values, closeSeries.Values);
}
var rollingAgg = new Dictionary<string, Series<DateTime, Series<DateTime, double>>>();
foreach (var column in msft.ColumnKeys)
{
rollingAgg[column] = msft.GetColumn<double>(column);
}
var rollingDf = Frame.FromColumns(rollingAgg);
var rolingCorr = rollingDf.Rows.Select(kvp => CalculateRealizedCorrelation(kvp.Value));
I am saving leaderboard data using save data as transactions
I am looking for a way to remove the array numbering with Firebase transactions and replace them with userId for me to be able to update user information.
Code:
private TransactionResult AddScoreTransaction(MutableData mutableData)
{
playerNewGlobalScore = false;
List<object> leaders = mutableData.Value as List<object>;
if (leaders == null)
{
leaders = new List<object>();
}
else if (mutableData.ChildrenCount >= LeaderBoardManager.Instance.MaxScoreRows)
{
// If the current list of scores is greater or equal to our maximum allowed number,
// we see if the new score should be added and remove the lowest existing score.
long minScore = long.MaxValue;
object minVal = null;
foreach (var child in leaders)
{
if (!(child is Dictionary<string, object>))
continue;
long childScore = (long)((Dictionary<string, object>)child)["score"];
if (childScore < minScore)
{
minScore = childScore;
minVal = child;
}
}
// If the new score is lower than the current minimum, we abort.
if (minScore > Score)
{
return TransactionResult.Abort();
}
// Otherwise, we remove the current lowest to be replaced with the new score.
leaders.Remove(minVal);
}
// Now we add the new score as a new entry that contains the email address and score.
Dictionary<string, object> newScoreMap = new Dictionary<string, object>();
newScoreMap["name"] = Name;
newScoreMap["country"] = Country;
newScoreMap["photoUrl"] = PhotoUrl;
newScoreMap["level"] = Level;
newScoreMap["userId"] = UserId;
newScoreMap["score"] = Score;
leaders.Add(newScoreMap);
// You must set the Value to indicate data at that location has changed.
mutableData.Value = leaders;
playerNewGlobalScore = true;
return TransactionResult.Success(mutableData);
}
Real time database example:
https://cloud.google.com/support#support-plans
Firebase does not support C# and they are happy about that(see https://stackshare.io/stackups/firebase-vs-pusher-vs-signalr), firestore does.
They suggest you go to https://groups.google.com/forum/#!forum/firebase-talk
However https://github.com/step-up-labs/firebase-database-dotnet from Step Up Labs, Inc. (use NuGet Firebase) is useful for the for all the things you want to do over the REST API https://firebase.google.com/docs/reference/rest/database
In the underbelly of the docs is "Conditional requests, the REST equivalent of SDK Transaction Operations, update data according to a certain condition. See an overview of the workflow and learn more about conditional requests for REST in Saving Data." and that can be found at https://firebase.google.com/docs/database/rest/save-data#section-conditional-requests
Hope that is of some help.
The following code results in deletions instead of updates.
My question is: is this a bug in the way I'm coding against Entity Framework or should I suspect something else?
Update: I got this working, but I'm leaving the question now with both the original and the working versions in hopes that I can learn something I didn't understand about EF.
In this, the original non working code, when the database is fresh, all the additions of SearchDailySummary object succeed, but on the second time through, when my code was supposedly going to perform the update, the net result is a once again empty table in the database, i.e. this logic manages to be the equiv. of removing each entity.
//Logger.Info("Upserting SearchDailySummaries..");
using (var db = new ClientPortalContext())
{
foreach (var item in items)
{
var campaignName = item["campaign"];
var pk1 = db.SearchCampaigns.Single(c => c.SearchCampaignName == campaignName).SearchCampaignId;
var pk2 = DateTime.Parse(item["day"].Replace('-', '/'));
var source = new SearchDailySummary
{
SearchCampaignId = pk1,
Date = pk2,
Revenue = decimal.Parse(item["totalConvValue"]),
Cost = decimal.Parse(item["cost"]),
Orders = int.Parse(item["conv1PerClick"]),
Clicks = int.Parse(item["clicks"]),
Impressions = int.Parse(item["impressions"]),
CurrencyId = item["currency"] == "USD" ? 1 : -1 // NOTE: non USD (if exists) -1 for now
};
var target = db.Set<SearchDailySummary>().Find(pk1, pk2) ?? new SearchDailySummary();
if (db.Entry(target).State == EntityState.Detached)
{
db.SearchDailySummaries.Add(target);
addedCount++;
}
else
{
// TODO?: compare source and target and change the entity state to unchanged if no diff
updatedCount++;
}
AutoMapper.Mapper.Map(source, target);
itemCount++;
}
Logger.Info("Saving {0} SearchDailySummaries ({1} updates, {2} additions)", itemCount, updatedCount, addedCount);
db.SaveChanges();
}
Here is the working version (although I'm not 100% it's optimized, it's working reliably and performing fine as long as I batch it out in groups of 500 or less items in a shot - after that it slows down exponentially but I think that just may be a different question/subject)...
//Logger.Info("Upserting SearchDailySummaries..");
using (var db = new ClientPortalContext())
{
foreach (var item in items)
{
var campaignName = item["campaign"];
var pk1 = db.SearchCampaigns.Single(c => c.SearchCampaignName == campaignName).SearchCampaignId;
var pk2 = DateTime.Parse(item["day"].Replace('-', '/'));
var source = new SearchDailySummary
{
SearchCampaignId = pk1,
Date = pk2,
Revenue = decimal.Parse(item["totalConvValue"]),
Cost = decimal.Parse(item["cost"]),
Orders = int.Parse(item["conv1PerClick"]),
Clicks = int.Parse(item["clicks"]),
Impressions = int.Parse(item["impressions"]),
CurrencyId = item["currency"] == "USD" ? 1 : -1 // NOTE: non USD (if exists) -1 for now
};
var target = db.Set<SearchDailySummary>().Find(pk1, pk2);
if (target == null)
{
db.SearchDailySummaries.Add(source);
addedCount++;
}
else
{
AutoMapper.Mapper.Map(source, target);
db.Entry(target).State = EntityState.Modified;
updatedCount++;
}
itemCount++;
}
Logger.Info("Saving {0} SearchDailySummaries ({1} updates, {2} additions)", itemCount, updatedCount, addedCount);
db.SaveChanges();
}
The thing that keeps popping up in my mind is that maybe the Entry(entity) or Find(pk) method has some side effects? I should probably be consulting the documentation but any advice is appreciated..
It's a slight assumption on my part (without looking into your models/entities), but have a look at what's going on within this block (see if the objects being attached here are related to the deletions):
if (db.Entry(target).State == EntityState.Detached)
{
db.SearchDailySummaries.Add(target);
addedCount++;
}
Your detached object won't be able to use its navigation properties to locate its related objects; you'll be re-attaching an object in a potentially conflicting state (without the correct relationships).
You haven't mentioned what is being deleted above, so I may be way off. Just off out, so this is a little rushed, hope there's something useful in there.
I need to match email sends with email bounces so I can find if they were delivered or not. The catch is, I have to limit the bounce to within 4 days of the send to eliminate matching the wrong send to the bounce. Send records are spread over a period of 30 days.
LinkedList<event_data> sent = GetMyHugeListOfSends(); //for example 1M+ records
List<event_data> bounced = GetMyListOfBounces(); //for example 150k records
bounced = bounced.OrderBy(o => o.event_date).ToList(); //this ensures the most accurate match of bounce to send (since we find the first match)
List<event_data> delivered = new List<event_data>();
event_data deliveredEmail = new event_data();
foreach (event_data sentEmail in sent)
{
event_data bounce = bounced.Find(item => item.email.ToLower() == sentEmail.email.ToLower() && (item.event_date > sentEmail.event_date && item.event_date < sentEmail.event_date.AddDays(deliveredCalcDelayDays)));
//create delivered records
if (bounce != null)
{
//there was a bounce! don't add a delivered record!
}
else
{
//if sent is not bounced, it's delivered
deliveredEmail.sid = siteid;
deliveredEmail.mlid = mlid;
deliveredEmail.mid = mid;
deliveredEmail.email = sentEmail.email;
deliveredEmail.event_date = sentEmail.event_date;
deliveredEmail.event_status = "Delivered";
deliveredEmail.event_type = "Delivered";
deliveredEmail.id = sentEmail.id;
deliveredEmail.number = sentEmail.number;
deliveredEmail.laststoretransaction = sentEmail.laststoretransaction;
delivered.Add(deliveredEmail); //add the new delivered
deliveredEmail = new event_data();
//remove bounce, it only applies to one send!
bounced.Remove(bounce);
}
if (bounced.Count() == 0)
{
break; //no more bounces to match!
}
}
So I did some testing and it's processing about 12 sent records per second. At 1M+ records, it will take 25+ hours to process!
Two questions:
How can I find the exact line that is taking the most time?
I am assuming it's the lambda expression finding the bounce that is taking the longest since this was much faster before I put that in there. How can I speed this up?
Thanks!
Edit
---Ideas---
One idea I just had is to sort the sends by date like I did the bounces so that the search through the bounces will be more efficient, since an early send would be likely to hit an early bounce as well.
Another idea I just had is to run a couple of these processes in parallel, although I would hate to multi-thread this simple application.
I would be reasonably confident in saying that yes it is your find that is taking the time.
It looks like you are certain that the find method will return 0 or 1 records only (not a list) in which case the way to speed this up would be to create a lookup (a dictionary) instead of creating a List<event_data> for your bounced var, create a Dictionary<key, event_data> instead, then you can just look-up the value by key instead of doing a find.
The trick is in creating your key (I don't know enough about your app to help with that) but essentially the same criteria that is in your find.
EDIT. (adding some pseudo code)
void Main()
{
var hugeListOfEmails = GetHugeListOfEmails();
var allBouncedEmails = GetAllBouncedEmails();
IDictionary<string, EmailInfo> CreateLookupOfBouncedEmails = CreateLookupOfBouncedEmails(allBouncedEmails);
foreach(var info in hugeListOfEmails)
{
if(CreateLookupOfBouncedEmails.ContainsKey(info.emailAddress))
{
// Email is bounced;
}
else
{
// Email is not bounced
}
}
}
public IEnumerable<EmailInfo> GetHugeListOfEmails()
{
yield break;
}
public IEnumerable<EmailInfo> GetAllBouncedEmails()
{
yield break;
}
public IDictionary<string, EmailInfo> CreateLookupOfBouncedEmails(IEnumerable<EmailInfo> emailList)
{
var result = new Dictionary<string, EmailInfo>();
foreach(var e in emailList)
{
if(!result.ContainsKey(e.emailAddress))
{
if(//satisfies the date conditions)
{
result.Add(e.emailAddress, e);
}
}
}
return result;
}
public class EmailInfo
{
public string emailAddress { get; set; }
public DateTime DateSent { get; set; }
}
You should improve by using ToLookup method to create lookup table for email address
var bouncedLookup = bounced.ToLookup(k => k.email.ToLower());
and use this in the loop to lookup by the email first
var filteredBounced = bouncedLookup[sent_email.email.ToLower()];
// mini optimisation here
var endDate = sentEmail.event_date.AddDays(deliveredCalcDelayDays);
event_data bounce = filteredBounced.Find(item => item.event_date > sentEmail.event_date && item.event_date < endDate));
I could not compile it but I think that should do. Please try it.
You are finding items in a list. That means it has to traverse the whole list so it is an order (n) operation. Could you not store those sent emails in a Dictionary with the key being the email address you are searching on. The go through the bounces linking back to the emails in the dictionary. The lookup will be constant time and the you will go through the bounces so it will be order (n) overall. You current method is order (n squared)
Converting bounced to sortedlist might be a good solution
SortedList<string,data> sl = new SortedList<string,event_data>(bounced.ToDictionary(s=>s.email,s=>s));
and to find a bounce use
sl.Select(c=>c.Key.Equals(item => item.email,StringComparison.OrdinalIgnoreCase) && ...).FirstOrDefault();
There's another concern about your code, that I want to point out.
Memory consumption. I don't know your machine configuration, but here are some thoughts about the code:
Initially you are allocating space for 1,2M+ objects of event_data
type. I can't see event_data full type definition, but assuming
that emails are all unique and seeing that the type has quite many
properties, I can assume that such a collection is rather heavy
(hundreds of Meg possibly).
Next you are allocating another bunch of event_data objects
(almost 1M if I've counted it right). It's getting even more heavy
in terms of memory consumption
I don't know about other objects, that are present in data-model of your application, but considering all things I've mentioned, you can easily get close to memory limit
for 32-bit process and thus force GC to work very often. In fact
you can easily have a GC collecting after each call
bounced.Remove(bounce); And it's really would significantly slow down your app.
So, even if you are having a plenty of memory left and/or your app is 64-bit, I would try to minimize memory consumption. Pretty sure it would get your code run faster. For example, you can do complete processing of deliveredEmail, without storing it, or load your initial event_data in chunks etc.
On Consideration, the number of bounces is relatively small, so,
Why not pre omptimise the bounce lookup as much as possible, this code makes a delegate for each possible bounce and groups them into a dictionary for access by the e-mail key.
private static DateInRange(
DateTime sendDate,
DateTime bouncedDate,
int deliveredCalcDelayDays)
{
if (sentDate < bouncedDate)
{
return false;
}
return sentDate < bouncedDate.AddDays(deliveredCalcDelayDays);
}
static IEnumerable<event_data> GetDeliveredMails(
IEnumerable<event_data> sent,
IEnumerable<event_data> bounced,
int siteId,
int mlId,
int mId,
int deliveredCalcDelayDays)
{
var grouped = bounced.GroupBy(
b => b.email.ToLowerInvariant());
var lookup = grouped.ToDictionary(
g => g.Key,
g => g.OrderBy(e => e.event_date).Select(
e => new Func<DateTime, bool>(
s => DateInRange(s, e.event_date, deliveredCalcDelayDays))).ToList());
foreach (var s in sent)
{
var key = s.email.ToLowerInvariant();
List<Func<DateTime, nool>> checks;
if (lookup.TryGetValue(key, out checks))
{
var match = checks.FirstOrDefault(c => c(s.event_date));
if (match != null)
{
checks.Remove(match);
continue;
}
}
yield return new event_data
{
.sid = siteid;
.mlid = mlid;
.mid = mid;
.email = s.email;
.event_date = s.event_date;
.event_status = "Delivered";
.event_type = "Delivered";
.id = s.id;
.number = s.number;
.laststoretransaction = s.laststoretransaction
};
}
}
You could try pre-compiling the delegates in lookup if this is not fast enough.
Ok the final solution I found was a Dictionary for the bounces.
The sent LinkedList was sorted by sent_date so it would loop through in chronological order. That's important because I have to match the right send to the right bounce.
I made a Dictionary<string,<List<event_data>>, so the key was email, and the value was a List of all <event_data> bounces for the email address. The List was sorted by event_date since I wanted to make sure the first bounce was matched to the send.
Final result...it went from processing 700 records/minute to 500k+ records/second.
Here is the final code:
LinkedList sent = GetMyHugeListOfSends();
IEnumerable sentOrdered = sent.OrderBy(send => send.event_date);
Dictionary> bounced = GetMyListOfBouncesAsDictionary();
List delivered = new List();
event_data deliveredEmail = new event_data();
List bounces = null;
bool matchedBounce = false;
foreach (event_data sentEmail in sentOrdered)
{
matchedBounce = false;
//create delivered records
if (bounced.TryGetValue(sentEmail.email, out bounces))
{
//there was a bounce! find out if it was within 4 days after the send!
foreach (event_data bounce in bounces)
{
if (bounce.event_date > sentEmail.event_date &&
bounce.event_date <= sentEmail.event_date.AddDays(4))
{
matchedBounce = true;
//remove the record because a bounce can only match once back to a send
bounces.Remove(bounce);
if(bounces.Count == 0) //no more bounces for this email
{
bounced.Remove(sentEmail.email);
}
break;
}
}
if (matchedBounce == false) //no matching bounces in the list!
{
//if sent is not bounced, it's delivered
deliveredEmail.sid = siteid;
deliveredEmail.mlid = mlid;
deliveredEmail.mid = mid;
deliveredEmail.email = sentEmail.email;
deliveredEmail.event_date = sentEmail.event_date;
deliveredEmail.event_status = "Delivered";
deliveredEmail.event_type = "Delivered";
deliveredEmail.id = sentEmail.id;
deliveredEmail.number = sentEmail.number;
deliveredEmail.laststoretransaction = sentEmail.laststoretransaction;
delivered.Add(deliveredEmail); //add the new delivered
deliveredEmail = new event_data();
}
}
else
{
//if sent is not bounced, it's delivered
deliveredEmail.sid = siteid;
deliveredEmail.mlid = mlid;
deliveredEmail.mid = mid;
deliveredEmail.email = sentEmail.email;
deliveredEmail.event_date = sentEmail.event_date;
deliveredEmail.event_status = "Delivered";
deliveredEmail.event_type = "Delivered";
deliveredEmail.id = sentEmail.id;
deliveredEmail.number = sentEmail.number;
deliveredEmail.laststoretransaction = sentEmail.laststoretransaction;
delivered.Add(deliveredEmail); //add the new delivered
deliveredEmail = new event_data();
}
if (bounced.Count() == 0)
{
break; //no more bounces to match!
}
}