Grouping usage data per week

Grouping usage data per week - c#

I am trying to create a usage report where I can plot the number of files created per week by each user. I'm trying to put the data into a DataTable so I can use it in a Chart. The path I'm headed down is clunky and I'm guessing there is a much more elegant way to do this in Linq.
The File class has an OpenDate and a LastModUser value. I want to sum up all the files created for a week for each user. In table form this looks like this:
File# OpenDate LastModUser
1 1/1/2015 ASmith
2 1/2/2015 ASmith
3 1/2/2015 DJones
4 1/2/2015 CBanks
The result of this query would return:
Week# ASmith DJones CBanks
1 2 1 1
2 etc etc etc
Here is what I have thus far.
public static DataTable GetFileCountByUserByClient(Int32 clientID, Int32 weeks)
{
using (EtaDataModelContainer12 etaDbContext = new EtaDataModelContainer12())
{
DataTable table = new DataTable();
table.Columns.Add("week", typeof(Int32));
// These are all the client's files
List<File> files = etaDbContext.Files.Where(b => b.ClientClientId == clientID).ToList();
// Get a list of all users
List<string> users = files.GroupBy(b => b.LastModUser).Select(b =>b.Key).Distinct().ToList();
for (int i = 1; i < users.Count(); i++)
{
// Create one column per user
table.Columns.Add(users[i], typeof(string));
}
// Add rows to the table based on how many files created in a given week
DateTime today = DateTime.Today;
int filecount = 0;
// Loop through the number of selected weeks (rows in DataTable) and populate sums
for (int j = 0; j < weeks; j++)
{
// Look at each file and determine if it fits in the selected week
foreach (File item in files)
{
// If a match is found determine what column in the DataTable should be incremented
}
table.Rows.Add(j, filecount);
}
return table;
}
}
There has to be a more elegant way to do this.

I think, this can solve your problem. You may need to tune some constant to get it to your grid. The week returned by this query will be number of week from year 0.
const long TicksPerWeek = TimeSpan.TicksPerDay * 7;
var userFiles = files.GroupBy(f => f.LastModUser);
var userStats = userFiles.Select(u =>
u.GroupBy(f => file.Date.Ticks / TicksPerWeek)
.Select(f => new { week = f.Key, modifiedCount = f.Count()))

Related

Excel List<List<string>> per each row

I have a small program where you can select some database tables and create a excel file with all values for each table and thats my solution to create the excel file.
foreach (var selectedDatabase in this.lstSourceDatabaseTables.SelectedItems)
{
//creates a new worksheet foreach selected table
foreach (TableRetrieverItem databaseTable in tableItems.FindAll(e => e.TableName.Equals(selectedDatabase)))
{
_xlWorksheet = (Excel.Worksheet) xlApp.Worksheets.Add();
_xlWorksheet.Name = databaseTable.TableName.Length > 31 ? databaseTable.TableName.Substring(0, 31): databaseTable.TableName;
_xlWorksheet.Cells[1, 1] = string.Format("{0}.{1}", databaseTable.TableOwner,databaseTable.TableName);
ColumnRetriever retrieveColumn = new ColumnRetriever(SourceConnectionString);
IEnumerable<ColumnRetrieverItem> dbColumns = retrieveColumn.RetrieveColumns(databaseTable.TableName);
var results = retrieveColumn.GetValues(databaseTable.TableName);
int i = 1;
(result is a result.Item3 is a List<List<string>> which contains all values from a table and for each row is a new list inserted)
for (int j = 0; j < results.Item3.Count(); j++)
{
int tmp = 1;
foreach (var value in results.Item3[j])
{
_xlWorksheet.Cells[j + 3, tmp] = value;
tmp++;
}
}
}
}
It works but when you have a table with 5.000 or more values it will take such a long time.
Does someone maybe know a better solution to add the List List string per row than my for foreach solution ?

I utilize the GetExcelColumnName function in my code sample to convert from column count to the excel column name.
The whole idea is, that it's very slow to write excel cells one by one. So instead precompute the whole table of values and then assign the result in a single operation. In order to assign values to a two dimensional range, use a two dimensional array of values:
var rows = results.Item3.Count;
var cols = results.Item3.Max(x => x.Count);
object[,] values = new object[rows, cols];
// TODO: initialize values from results content
// get the appropriate range
Range range = w.Range["A3", GetExcelColumnName(cols) + (rows + 2)];
// assign all values at once
range.Value = values;
Maybe you need to change some details about the used index ranges - can't test my code right now.

As I see, youd didn't do profiling. I recomend to do profiling first (for example dotTrace) and see what parts of your code actualy causing performance issues.
In my practice there is rare cases (almost no such cases) when code executes slower than database requests, even if code is realy awfull in algorithmic terms.
First, I recomend to fill up your excel not by columns, but by rows. If your table has many columns this will cause multiple round trips to database - it is great impact to performance.
Second, write to excel in batches - by rows. Think of excel files as mini-databases, with same 'batch is faster than one by one' principles.

Iterate through list in paged manner, within another loop

I have three collections. First, a collection of days. Next, a collection of time spans in each day. These time spans are the same for each day. Next, I have a collection of sessions.
There are 4 days. There are 6 time spans. There are 30 sessions.
I need to iterate through each day, assigning all of the time spans to each day the same way for each day. However, I need to assign the sessions to time blocks in sequence. For example, day 1 gets all 6 time spans, but only the first 6 sessions, 1-6. Day 2 gets the same time spans, but gets the next 6 sessions, 7-12.
How can I do this within the same method?
Here's what I have so far, but I'm having trouble wrapping my head around the paged iteration part.
var timeSlots = TimeSlotDataAccess.GetItems(codeCampId);
var assignableSlotCount = timeSlots.Where(t => !t.SpanAllTracks);
// determine how many days the event lasts for
agenda.NumberOfDays = (int)(agenda.CodeCamp.EndDate - agenda.CodeCamp.BeginDate).TotalDays;
// iterate through each day
agenda.EventDays = new List<EventDayInfo>(agenda.NumberOfDays);
var dayCount = 0;
while (dayCount <= agenda.NumberOfDays)
{
var eventDate = agenda.CodeCamp.BeginDate.AddDays(dayCount);
var eventDay = new EventDayInfo()
{
Index = dayCount,
Day = eventDate.Day,
Month = eventDate.Month,
Year = eventDate.Year,
TimeStamp = eventDate
};
// iterate through each timeslot
foreach (var timeSlot in timeSlots)
{
var slot = new AgendaTimeSlotInfo(timeSlot);
// iterate through each session
// first day gets the first set of assignableTimeSlotCount, then the next iteration gets the next set of that count, etc.
slot.Sessions = SessionDataAccess.GetItemsByTimeSlotId(slot.TimeSlotId, codeCampId).ToList();
// iterate through each speaker
foreach (var session in slot.Sessions)
{
session.Speakers=SpeakerDataAccess.GetSpeakersForCollection(session.SessionId, codeCampId);
}
}
agenda.EventDays.Add(eventDay);
dayCount++;
}

I ended up using LINQ in a new method based upon the GetItemsByTimeSlot() method. The new signature and example of getting a matching subset of that collection is below.
Here's how I'm calling it:
slot.Sessions = SessionDataAccess.GetItemsByTimeSlotIdByPage(slot.TimeSlotId,
codeCampId, dayCount + 1, timeSlotCount).ToList();
Here's what it looks like:
public IEnumerable<SessionInfo> GetItemsByTimeSlotIdByPage(int timeSlotId, int codeCampId, int pageNumber, int pageSize)
{
var items = repo.GetItems(codeCampId).Where(t => t.TimeSlotId == timeSlotId);
items.Select(s => { s.RegistrantCount = GetRegistrantCount(s.SessionId); return s; });
// this is the important part
var resultSet = items.Skip(pageSize * (pageNumber - 1)).Take(pageSize);
foreach (var item in resultSet)
{
item.Speakers = speakerRepo.GetSpeakersForCollection(item.SessionId, item.CodeCampId);
}
return resultSet;
}

Best way to remove duplicates from DataTable depending on column values

I have a DataSet which contains just one Table, so you could say I'm working with a DataTable here.
The code you see below works, but I want to have the best and most efficient way to perform the task because I work with some data here.
Basically, the data from the Table should later be in a Database, where the primary key - of course - must be unique.
The primary key of the data I work with is in a column called Computer Name. For each entry we also have a date in another column date.
I wrote a function which searches for duplicates in the Computer Name column, and then compare the dates of these duplicates to delete all but the newest.
The Function I wrote looks like this:
private void mergeduplicate(DataSet importedData)
{
Dictionary<String, List<DataRow>> systems = new Dictionary<String, List<DataRow>>();
DataSet importedDataCopy = importedData.Copy();
importedData.Tables[0].Clear();
foreach (DataRow dr in importedDataCopy.Tables[0].Rows)
{
String systemName = dr["Computer Name"].ToString();
if (!systems.ContainsKey(systemName))
{
systems.Add(systemName, new List<DataRow>());
}
systems[systemName].Add(dr);
}
foreach (KeyValuePair<String,List<DataRow>> entry in systems) {
if (entry.Value.Count > 1) {
int firstDataRowIndex = 0;
int secondDataRowIndex = 1;
while (entry.Value.Count > 1) {
DateTime time1 = Validation.ConvertStringIntoDateTime(entry.Value[firstDataRowIndex]["date"].ToString());
DateTime time2 = Validation.ConvertStringIntoDateTime(entry.Value[secondDataRowIndex]["date"].ToString());
//delete older entry
if (DateTime.Compare(time1,time2) >= 0) {
entry.Value.RemoveAt(firstDataRowIndex);
} else {
entry.Value.RemoveAt(secondDataRowIndex);
}
}
}
importedData.Tables[0].ImportRow(entry.Value[0]);
}
}
My Question is, since this code works - what is the best and fastest/most efficient way to perform the task?
I appreciate any answers!

I think this can be done more efficiently. You copy the DataSet once with DataSet importedDataCopy = importedData.Copy(); and then you copy it again into a dictionary and then you delete the unnecessary data from the dictionary. I would rather just remove the unnecessary information in one pass. What about something like this:
private void mergeduplicate(DataSet importedData)
{
Dictionary<String, DataRow> systems = new Dictionary<String, DataRow>();
int i = 0;
while (i < importedData.Tables[0].Rows.Count)
{
DataRow dr = importedData.Tables[0].Rows[i];
String systemName = dr["Computer Name"].ToString();
if (!systems.ContainsKey(systemName))
{
systems.Add(systemName, dr);
}
else
{
// Existing date is the date in the dictionary.
DateTime existing = Validation.ConvertStringIntoDateTime(systems[systemName]["date"].ToString());
// Candidate date is the date of the current DataRow.
DateTime candidate = Validation.ConvertStringIntoDateTime(dr["date"].ToString());
// If the candidate date is greater than the existing date then replace the existing DataRow
// with the candidate DataRow and delete the existing DataRow from the table.
if (DateTime.Compare(existing, candidate) < 0)
{
importedData.Tables[0].Rows.Remove(systems[systemName]);
systems[systemName] = dr;
}
else
{
importedData.Tables[0].Rows.Remove(dr);
}
}
i++;
}
}

maybe not the most efficient way but you said you appreciate any answers
List<DataRow> toDelete = dt.Rows.Cast<DataRow>()
.GroupBy(s => s["Computer Name"])
.SelectMany(grp => grp.OrderBy(x => x["date"])
.Skip(1)).ToList();
toDelete.ForEach(x => dt.Rows.Remove(x));

You could try to use CopyToDataTable, like this:
importedData.Tables[0] = importedData.Tables[0].AsEnumerable()
.GroupBy(r => new {CN = r["Computer Name"], Date = r["date"]})
.Select(g => g.OrderBy(r => r["Date"]).(First())
.CopyToDataTable();

Why am I getting index out of bounds error from database

I know what index out of bounds is all about. When I debug I see why as well. basically what is happening is I do a filter on my database to look for records that are potential/pending. I then gather a array of those numbers send them off to another server to check to see if those numbers have been upgraded to a sale. If it has been upgraded to a sale the server responds back with the new Sales Order ID and my old Pending Sales Order ID (SourceID). I then do a for loop on that list to filter it down that specific SourceID and update the SourceID to be the Sales Order ID and change a couple of other values. Problem is is that when I use that filter on the very first one it throws a index out of bounds error. I check the results returned by the filter and it says 0. Which i find kind of strange because I took the sales order number from the list so it should be there. So i dont know what the deal is. Here is the code in question that throws the error. And it doesn't do it all the time. Like I just ran the code this morning and it didn't throw the error. But last night it did before I went home.
filter.RowFilter = string.Format("Stage = '{0}'", Potential.PotentialSale);
if (filter.Count > 0)
{
var Soids = new int[filter.Count];
Console.Write("Searching for Soids - (");
for (int i = 0; i < filter.Count; i++)
{
Console.Write(filter[i][1].ToString() + ",");
Soids[i] = (int)filter[i][1];
}
Console.WriteLine(")");
var pendingRecords = Server.GetSoldRecords(Soids);
var updateRecords = new NameValueCollection();
for (int i = 0; i < pendingRecords.Length; i++)
{
filter.RowFilter = "Soid = " + pendingRecords[i][1];
filter[0].Row["Soid"] = pendingRecords[i][0];
filter[0].Row["SourceId"] = pendingRecords[i][1];
filter[0].Row["Stage"] = Potential.ClosedWon;
var potentialXML = Potential.GetUpdatePotentialXML(filter[0].Row["Soid"].ToString(), filter[0].Row["Stage"].ToString());
updateRecords.Add(filter[0].Row["ZohoID"].ToString(), potentialXML);
}
if i'm counting right line 17 is the error where the error is thrown. pendingRecords is a object[][] array. pendingRecords[i] is the individual records. pendingRecords[i][0] is the new Sales OrderID (SOID) and pendingRecords[i][1] is the old SOID (now the SourceID)
Any help on this one? is it because i'm changing the SOID to the new SOID, and the filter auto updates itself? I just don't know

Well I ended up changing how it worked all together and it actually sorts it a bit nicer now. The code i am about to post has a bunch of hard coded numbers due to the structure of my table that is returned. Sorry about that. I have learned since then to not do that, but i am working on a different project now and will change that when I have to change the program. But here is the solution.
var potentials = Server.GetNewPotentials(); //loads all records from server
for (int i = 0; i < potentials.Length; i++)
{
var filter = AllPotentials.DefaultView;
var result1 = CheckSoidOrSource(potentials[i].Soid, true);
var result2 = CheckSoidOrSource(potentials[i].SourceID,false) ;
//This potential can't be found at all so let's add it to our table
if (result1+result2==0)
{
Logger.WriteLine("Found new record. Adding it to DataTable and sending it to Zoho");
AllPotentials.Add(potentials[i]);
filter.RowFilter = string.Format("Soid = '{0}'", potentials[i].SourceID);
var index = AllPotentials.Rows.IndexOf(filter[0].Row);
ZohoPoster posterInsert = new ZohoPoster(Zoho.Fields.Potentials, Zoho.Calls.insertRecords);
AllPotentials.Rows[index]["ZohoID"] = posterInsert.PostNewPotentialRecord(3, filter[0].Row);
}
//This potential is not found, but has a SourceId that matches a Soid of another record.
if (result1==0 && result2 == 1)
{
Logger.WriteLine("Found a record that needs to be updated on Zoho");
ZohoPoster posterUpdate = new ZohoPoster(Zoho.Fields.Potentials, Zoho.Calls.updateRecords);
filter.RowFilter = string.Format("Soid = '{0}'", potentials[i].SourceID);
var index = AllPotentials.Rows.IndexOf(filter[0].Row);
AllPotentials.Rows[index]["Soid"] = potentials[i].Soid;
AllPotentials.Rows[index]["SourceId"] = potentials[i].SourceID;
AllPotentials.Rows[index]["PotentialStage"] = potentials[i].PotentialStage;
AllPotentials.Rows[index]["UpdateRecord"] = true;
AllPotentials.Rows[index]["Amount"] = potentials[i].Amount;
AllPotentials.Rows[index]["ZohoID"] = posterUpdate.UpdatePotentialRecord(3, filter[0].Row);
}
}
AllPotentials.AcceptChanges();
}
private int CheckSoidOrSource(string Soid, bool checkSource)
{
var filter = AllPotentials.DefaultView;
if (checkSource)
filter.RowFilter = string.Format("Soid = '{0}' OR SourceId = '{1}'",Soid, Soid);
else
filter.RowFilter = string.Format("Soid = '{0}'", Soid);
return filter.Count;
}
basically what is happening is that i noticed something about my data when I filter it this way. The two results would only return the following results (0,0) (0,1) and (1,0) (0,0) means that the record doesn't exist at all in this table so I need to add it. (1,0) means that the Sales Order ID (Soid) matches another Soid in the table so it already exists. Lastly (0,1) means that the Soid doesn't exist in this table but i found a record that has the Soid as it's source...which to me means that the one that had it as a source has been upgraded from a potential to a sale, which in turn means i have to update the record and Zoho. This worked out to much less work for me because now I don't have to search for won and lost records, i only have to search for lost records. less code same results is always a good thing :)

need help with comparison of datatables and finding out the difference

DataTable 1 :-
Caption |ID
------------
Caption1|1
Caption2|2
Caption3|3
DataTable 2 :-
Name |ID
------------
Name1|1
Name2|2
I want to compare the above two data tables and fetch the value "Caption3" so I can display a message on screen
that "No name for "Caption3" exist!"
I have tried merging as follows but it's fetching DataTable 2 as it is in dtTemp !
datatable1.Merge( datatable2);
DataTable dtTemp = datatable2.GetChanges();
Also tried the logic as follows that removes rows with same IDs in both tables and updates datatable2's rows and only the ones that don't have duplicated IDs will remain...This didn't work either.
:(
if (datatable2.Rows.Count != datatable1.Rows.Count)
{
if (datatable2.Rows.Count != 0)
{
for (int k = 0; k < datatable2.Rows.Count; k++)
{
for (int j = 0; j < datatable1.Rows.Count; j++)
{
if (datatable1.Rows[j]["ID"].ToString() == datatable1.Rows[k]["ID"].ToString())
{
datatable1.Rows.Remove(datatable1.Rows[j]);
datatable1.AcceptChanges();
}
// string test = datatable1.Rows[0]["ID"].ToString();
}
}
}
How do I fetch those "CAPTIONS" whose corresponding "NAMES" do not exist?? Please help thanks.
Note:- Rows in both datatables will vary based on some logic. What I want is to fetch that CAPTION from datatable1 who's KCID doesn't exist in datatable2.
edit:-
How else do I loop through datatable1's rows and check which ID(from datatable1) doesn't exist in datatable2 and then print those captions on my page?
#CodeInChaos:: I have not worked with Linq-To-Objects at all so not able to understand your code :/
Is there any other way to loop through that datatable and fetch "caption " who's correcponding "Name" doesn't exist in the datatable2??
Someone please help me out with this. I am clueless how else how else to loop through the datatable rows if not like above.

Note that this code uses Linq-To-Objects and thus needs to whole tables in your application. There might be a better solution which works on the database server. But unlike your code it's at linear in the size of the tables.
HashSet<int> ids2=new HashSet<int>(Table2.Select(e=>e.ID));
var entriesOnlyInTable1=Table1.Where(e=>!ids2.Contains(e.ID));
IEnumerable<string> captionsOnlyInTable1=onlyInTable1.Select(e=>e.Caption);
Without LINQ:
HashSet<int> ids2=new HashSet<int>();
foreach(var e in Table2)
{
ids2.Add(e.ID);
}
List<string> captionsOnlyInTable1=new List<string>();
foreach(var e in Table1)
{
if(!ids2.Contains(e.ID))
captionsOnlyInTable1.Add(e.Caption);
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Grouping usage data per week - c#

Related

Excel List<List<string>> per each row

Iterate through list in paged manner, within another loop

Best way to remove duplicates from DataTable depending on column values

Why am I getting index out of bounds error from database

need help with comparison of datatables and finding out the difference

Categories

Resources