How do I merge adjacent time spans with the same data value? - c#

I have a file with comma separated values (CSV), that have this format:
26/09/2015,GROUP_1,0,0,0,0,0,0,0,0,0,0,12345.006,12345.006,27469.005,27469.005,27983.005,27983.005,28081.005,0,0,0,28105.005,28105.005,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Every number represents the work hours over an interval of 15 minutes, over the range 8:00 am - 8:00 pm. The first start time is 08:00:00) and the last start time will be 19:45:00; there are 49 "columns" of data.
0,0,0,0,0,0,0,0,0,0,12345.006,12345.006,27469.005,27469.005,27983.005,27983.005,28081.005,0,0,0,28105.005,28105.005,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
The date will be the date when the "event" happened and it's the date on the data. But I need to get the values that are the same and specify a time range. For example, those first two non-zero values are the same:
12345.006,12345.006
These start at 10:30 and 10:45; I need to merge these and report 12345 hours for the time span 10:30 - 11:00 am.
I read the file; I have those values as an array, and the problem I'm having is how to "group" the same values into the appropriate time ranges.
DateTime startDate = new DateTime(2015,08,05);
DateTime finisDahte = new DateTime(2015,08,05);
int column = 0;
for (int i = 0; i < data.Length; i++)
{
//timerange start with every 15 minutes by column
if (column >= 2)
{
if (data[i] != "0")
{
//Getting rid of decimals, they are not neccesary and that's how the file have it, I dont know why
if (data[i].Contains('.'))
{
data[i] = data[i].Substring(0, data[i].LastIndexOf('.'));
}
//we check if there is a next index to compare the same value
if ((i + 1) <= totalElementos)
{
var nextElem = data[i + 1];
if (nextElem != "0")
{
nextElem.Substring(0, nextElem.LastIndexOf('.'));
}
else
{
//the is no next element... something here
}
//CRUCIAL PART: if the current index it's the same as the next one, it means they share the time range
if (data[i] == nextElem)
{
//the same index as the next one
//I need to identify when it's the first time I'm comparing a value with the next one, so I can set a start date
//I need to sum the total amount of time ranges for every repetition they have and save when the value start and when the value is different (so it's a new value)
}
else
{
//it's not the same index, so technically the finishdate will be set here?
}
}
else
{
//there is not more indexes, so finishdate will be here
}
}
}
//column++;
}
Hope I could explain. Thanks

You need to generate a class like the code below. I changed the date to US format for testing. The code below reads from a string using StringReader and when reading from a file use StreamReader instead.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
List<DataSample> samples = new List<DataSample>();
string data = "9/26/2015,GROUP_1,0,0,0,0,0,0,0,0,0,0,12345.006,12345.006,27469.005,27469.005,27983.005,27983.005,28081.005," +
"0,0,0,28105.005,28105.005,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0\n";
StringReader reader = new StringReader(data);
string inputline = "";
while ((inputline = reader.ReadLine()) != null)
{
string[] dataArray = inputline.Split(new char[] { ',' });
DateTime startDate = DateTime.Parse(dataArray[0]);
startDate = startDate.AddHours(8);
DateTime timeCounter = startDate;
string groupName = dataArray[1];
for (int i = 2; i < dataArray.Length; i++)
{
if (dataArray[i] != "0")
{
DataSample newSample = new DataSample();
samples.Add(newSample);
newSample.name = groupName;
newSample.time = timeCounter;
newSample.value = double.Parse(dataArray[i]);
}
timeCounter = timeCounter.AddMinutes(15);
}
}
var groupByValue = samples.AsEnumerable()
.GroupBy(x => x.value)
.ToList();
foreach (var group in groupByValue)
{
Console.WriteLine("Value : {0}, Times : {1}", group.Key.ToString(), string.Join(",",group.Select(x => x.time.ToString())));
}
Console.ReadLine();
}
}
public class DataSample
{
public string name { get; set; }
public DateTime time { get; set; }
public double value { get; set; }
}
}
​

This is not a coding service; you need to go a little farther. You've done a good job of outlining your algorithm; now, you should put in a few print statements to track the operation of your code. Do the loops and if statements give you the control flow you expected? A good way to do this is to put your comments into print statements, such as in your last inside comment:
print "there is not more indexes, so finishdate will be here"
Also print the loop index, values you found, etc.
Once you have corrected any flow problems there, start filling in the comment-only blocks with useful code, a few lines or one block at a time.
When you hit a specific problem, post your code and the actual output. That's where StackOverflow is designed to help you.
In the meantime, I'd like to g a change to your outer loop. Let it be driven as a while loop, so you can freely advance your index as needed. Right now, you're keeping two variables for almost the same purpose: i and column. Instead, use just one, something like:
column = 0
while (column < data.Length)
// Find all of the indices with the same consecutive value
finish_index = column;
while data[start_column] = data[finish_index+1]
finish_index++;
// You now have the range of work slots to merge.
printf "Time slots %d - %d have %d work hours", column, finish_index, int(data[column])
column = finish_index+1
}
You will still have to convert column numbers to times: 15 minutes * column + 8:00am. I've also left out a few intermediate nice steps, but I think that you already have them in your comments.
Does this get you moving?

Related

C#: Increment invoice number based on set cycles

I am creating incrementing invoice numbers, like so: AABBBB1122.
'A' and 'B' are bound to identifiers in my code.
But the digits I need to be month and year, respectively.
For example: 0821 (august, 2021).
I don't want to connect it to a calendar in any way.
If possible I would like to define a starting date, and increment from there.
That is: 0821 would have to be incremented to 0921, 1021, 1121, 1221 -
before the year is incremented as well; 0122.
How can I do that?
What I've got so far:
string AA {
get { return this.IdentifierA.Substring(0, 2);
set { SetAndNotify(ref this.AA, value); }
}
string BB {
get { return this.IdentifierB.Substring(0, 4);
set { SetAndNotify(ref this.BB, value); }
}
string InvoiceNumber {
get { return String.Concat(AA + BB + /* what goes here? */).ToUpper(); }
set { SetAndNotify(ref this.InvoiceNumber, value);
Sounds like a peculiar way to do invoice numbers.. You are saying you don't want it based on the current date, but to just increment in a MMYY style?
Well given a typical auto-increment int KEY, which goes up by 1 for each invoice, use:
((KEY % 12).ToString("00")+(KEY/12).ToString("00"))
Start KEY at 12*21+8 to start with 0821.
.. But based on the invoice requirement I think what you must surely be asking for is:
(DateTime.Now.Month.ToString("00")+DateTime.Now.Year.ToString("00"))
This would be:
return string.Format("{0}{1}{2}", AA, BB, DateTime.Now.ToString("MMyy")).ToUpper();
or in new concatenation
return $"{AA}{BB}{DateTime.Now.ToString("MMyy")}".ToUpper();
If I understood correctly your question, the objective is to parse the date from the string and generate the next id. You can then use the following logic.
string id = "AAAA1221";
// Extract data.
int i = Convert.ToInt32(id.Substring(id.Length - 4, 4));
int year = i % 100;
int month = i / 100;
// Perform increment logic.
if (month == 12)
{
year++;
month = 1;
}
else
{
month++;
}
// Reformat key
string newId = $"{id.Substring(0, id.Length - 4)}{month:D2}{year:D2}";
In the case you don't want to use the previous id, you can just start the same logic with your previous value in the 'i' variable.

Apply last non empty string to empty string?

A file is read in. Looks for lines that have a number that beings with an S The lines that do not have an S are maintained. Saves to an array. I am then populating an existing gridview with the same amount of lines.
As a place holder I have set the blank lines to *** This is where I'm stuck. I need the empty strings to be populated with the last non empty string.
So for example if the readout is:
1
2
3
Empty
Empty
Empty
4
Empty
6
I'd want it displayed as:
1
2
3
3
3
3
4
4
6
I can't figure out how to do that. I've been searching all day for examples but can only find ways of grabbing either the first or last number of my array is all. Here is my code.
var sLines = File.ReadAllLines(cboPartProgram.Text)
.Where(s => !s.StartsWith("'"))
.Select(s => new
{
SValue = Regex.Match(s, "(?<=S)[\\d.]*").Value,
})
.ToArray();
string LastSValue = "";
string Value = "";
for (int i = 0; i < sLines.Count(); i++)
{
if (sLines[i].SValue == "")
{
LastSValue = "***";
Value = LastSValue;
}
else
{
Value = (sLines[i].SValue);
}
}
Ok I think I got it.
for (int i = 0; i < sLines.Length; i++)
{
if (sLines[i].SValue == "" && i > 0)
{
foreach (var empt in sLines[i].SValue)
{
LastSValue = sLines[i - 1].SValue;
Value = LastSValue;
}
}
else
{
Value = (sLines[i].SValue);
}
On a side note, when I copy my code I use the code option above to format it, but I notice someone always has to correct my spacing. Its copied straight from the IDE but there are always spaced each line that I guess shouldn't be. Is there a different way I should do it?
UPDATE
If I should ask this as a new question let me know, but it's so dependent on this that I thought I should keep it here.
Using the code I posted above that does what I needed it too. I've been trying to edit this so that if there is NO previous number, so for example if there if line 1 has no number but the rest do, then just apply the string "NA" otherwise still do what the code above does to the rest of the lines.
I guess maybe the best way would be to just take the results from the above code, and if there are any empty spaces left, apply "NA" but I can't figure it out.
In your example, you just need to take the value of the row before to fill the current value. Something like the following :
for (int i = 0; i < sLines.Length; i++)
{
if (sLines[i].SValue == "" && i > 0)
{
sLines[i].SValue = sLines[i-1].SValue;
}
else
{
sLines[i].SValue = sLines[i].SValue;
}
}
Your example has one more issue but currently I'll focus only on gathering the "last non empty" string.
If you look at your example you can spot few things that could potentially help you finding solution. These are for loop and reference to original list that stays intact.
For my example I'll use Linq because it will be much easier.
First of all I'll copy all from before for loop ( if that makes sense :D ) :
var sLines = File.ReadAllLines(cboPartProgram.Text)
.Where(s => !s.StartsWith("'"))
.Select(s => new
{
SValue = Regex.Match(s, "(?<=S)[\\d.]*").Value,
})
.ToArray();
string LastSValue = "";
string Value = "";
Just because it's okay and will work for now.
With your for loop I'll make modifications :
for (int i = 0; i < sLines.Count(); i++)
{
// `i` is representing current "index" of processed "word"
// we can use this to find last "valid" element
// string notEmpty = sLines.Take(i).LastOrDefault(word => !string.IsNullOrEmpty(word));
// but since you want to assign this to `Value` and there can be not empty string at `i` index
// we can make it in one line :
Value = string.IsNullOrEmpty(sLines[i]) ? sLines.Take(i).LastOrDefault(word => !string.IsNullOrEmpty(word)) : sLines[i].SValue;
// instead of your previous logic :
//if (sLines[i].SValue == "")
//{
// LastSValue = "***";
// Value = LastSValue;
//}
//else
//{
// Value = (sLines[i].SValue);
//}
}
Another problem which I think you'll face is that first value ( judging by the input ) can also be empty. Which will throw exception in my example. This will also be impossible to fit this kind of solution because there's no previous value ( at all ).
From what I understand, if you want to store the result in Value and do something else with it inside the loop (instead of changing it in the array), what you probably want is this:
for (int i = 0; i < sLines.Count(); i++)
{
if (sLines[i].SValue == "")
{
Value = LastSValue;
}
else
{
Value = (sLines[i].SValue);
LastSValue = Value;
}
// use Value
}
I would also suggest using sLines.Length instead of Count(), which is made for sequences where the length isn't known in advance - it's supposed to literally count the elements one by one. In this case it would probably be optimized but if you know you're dealing with an array, it's a good idea to ask for the length directly.
EDIT:
To get "NA" if there's no previous number, just initialize LastSValue to this value before the loop:
string LastSValue = "NA";
That way, if Value is empty and there was not LastSValue set before, it will still be "NA".
EDIT2:
A solution similar to the one from #Cubi, to change it in place:
for (int i = 0; i < sLines.Length; i++)
{
if (sLines[i].SValue == "")
sLines[i].SValue = i > 0 ? sLines[i-1].SValue : "NA";
}

Int.Parse(String.Split()) returns "Input string was not in a correct format" error

I am trying to perform a LINQ query on an array to filter out results based on a user's query. I am having a problem parsing two int's from a single string.
In my database, TimeLevels are stored as strings in the format [mintime]-[maxtime] minutes for example 0-5 Minutes. My user's have a slider which they can select a min and max time range, and this is stored as an int array, with two values. I'm trying to compare the [mintime] with the first value, and the [maxtime] with the second, to find database entries which fit the user's time range.
Here is my C# code from the controller which is supposed to perform that filtering:
RefinedResults = InitialResults.Where(
x => int.Parse(x.TimeLevel.Split('-')[0]) >= data.TimeRange[0] &&
int.Parse(x.TimeLevel.Split('-')[1]) <= data.TimeRange[1] &&).ToArray();
My thinking was that it would firstly split the 0-5 Minutes string at the - resulting in two strings, 0 and 5 Minutes, then parse the ints from those, resulting in just 0 and 5.
But as soon as it gets to Int.Parse, it throws the error in the title.
some of the x.TimeLevel database records are stored as "30-40+ Minutes". Is there any method just to extract the int?
You could use regular expressions to match the integer parts of the string for you, like this:
RefinedResults = InitialResults
.Where(x => {
var m = Regex.Match(x, #"^(\d+)-(\d+)");
return m.Success
&& int.Parse(m.Groups[1]) >= data.TimeRange[0]
&& int.Parse(m.Groups[2]) <= data.TimeRange[1];
}).ToArray();
This approach requires the string to start in a pair of dash-separated decimal numbers. It would ignore anything after the second number, ensuring that only sequences of digits are passed to int.Parse.
The reason your code doesn't work is because string.Split("-", "0-5 Minutes") will return [0] = "0" and [1] = "5 Minutes", and the latter is not parseable as an int.
You can use the regular expression "\d+" to split up groups of digits and ignore non-digits. This should work:
var refinedResults =
(
from result in InitialResults
let numbers = Regex.Matches(result.TimeLevel, #"\d+")
where ((int.Parse(numbers[0].Value) >= data.TimeRange[0]) && (int.Parse(numbers[1].Value) <= data.TimeRange[1]))
select result
).ToArray();
Here's a complete compilable console app which demonstrates it working. I've used dummy classes to represent your actual classes.
using System;
using System.Linq;
using System.Text.RegularExpressions;
namespace ConsoleApplication2
{
public class SampleTime
{
public SampleTime(string timeLevel)
{
TimeLevel = timeLevel;
}
public readonly string TimeLevel;
}
public class Data
{
public int[] TimeRange = new int[2];
}
class Program
{
private static void Main(string[] args)
{
var initialResults = new []
{
new SampleTime("0-5 Minutes"),
new SampleTime("4-5 Minutes"), // Should be selected below.
new SampleTime("1-8 Minutes"),
new SampleTime("4-6 Minutes"), // Should be selected below.
new SampleTime("4-7 Minutes"),
new SampleTime("5-6 Minutes"), // Should be selected below.
new SampleTime("20-30 Minutes")
};
// Find all ranges between 4 and 6 inclusive.
Data data = new Data();
data.TimeRange[0] = 4;
data.TimeRange[1] = 6;
// The output of this should be (as commented in the array initialisation above):
//
// 4-5 Minutes
// 4-6 Minutes
// 5-6 Minutes
// Here's the significant code:
var refinedResults =
(
from result in initialResults
let numbers = Regex.Matches(result.TimeLevel, #"\d+")
where ((int.Parse(numbers[0].Value) >= data.TimeRange[0]) && (int.Parse(numbers[1].Value) <= data.TimeRange[1]))
select result
).ToArray();
foreach (var result in refinedResults)
{
Console.WriteLine(result.TimeLevel);
}
}
}
}
Error happens because of the " Minutes" part of the string.
You can truncate the " Minutes" part before splitting, like;
x.TimeLevel.Remove(x.IndexOf(" "))
then you can split.
The problem is that you are splitting by - and not also by space which is the separator of the minutes part. So you could use Split(' ', '-') instead:
InitialResults
.Where(x => int.Parse(x.TimeLevel.Split('-')[0]) >= data.TimeRange[0]
&& int.Parse(x.TimeLevel.Split(' ','-')[1]) <= data.TimeRange[1])
.ToArray();
As an aside, don't store three informations in one column in the database. That's just a source of nasty errors and bad performance. It's also more difficult to filter in the database which should be the preferred way or to maintain datatabase consistency.
Regarding your comment that the format can be 0-40+ Minutes. Then you could use...
InitialResults
.Select(x => new {
TimeLevel = x.TimeLevel,
MinMaxPart = x.TimeLevel.Split(' ')[0]
})
.Select(x => new {
TimeLevel = x.TimeLevel,
Min = int.Parse(x.MinMaxPart.Split('-')[0].Trim('+')),
Max = int.Parse(x.MinMaxPart.Split('-')[1].Trim('+'))
})
.Where(x => x.Min >= data.TimeRange[0] && x.Max <= data.TimeRange[1])
.Select(x => x.TimeLevel)
.ToArray();

Have multiple timeout(datetime) values per row in C# DataTable

I have a DataTable with multiple TimeStamp (DateTime) columns per row. I want to create a timeout value so when the TimeStamp passes DateTime.Now-timeoutValue, it will be nulled. And when all TimeStamp values are nulled, the row is deleted.
It's currently implemented with timers and loops. It's starting to get very laggy with many entries, is there a more automated efficient way? Expressions or something? Here are snips of my code:
public ReadsList(object _readers)
{
_readers = List of things that add to datatable
dataTable = new DataTable();
Timeout = 5;
aTimer = new System.Timers.Timer(5000);
aTimer.Elapsed += new ElapsedEventHandler(UpdateReads);
aTimer.Enabled = true;
}
public void Add(object add)
{
//Checks if object exists, update TimeStamp if so, else, add new row
}
private void UpdateReads(object source, ElapsedEventArgs e)
{
//Clean DataTable
foreach (DataRow row in dataTable.Rows.Cast<DataRow>().ToList())
{
int p = 0;
foreach (var i in _readers)
{
p += i.Value;
for (int b = 1; b <= i.Value; b++)
{
if (row[(i.Key + ":" + b)] != DBNull.Value)
{
if (Timeout == 0)
Timeout = 99999;
if (DateTime.Parse(row[(i.Key + ":" + b)].ToString()) <
DateTime.UtcNow.AddSeconds(-1*Timeout))
{
row[(i.Key + ":" + b)] = DBNull.Value;
}
}
else
{
p -= 1;
}
}
}
//Remove Row if empty
if (p == 0)
{
row.Delete();
//readCount -= 1;
}
}
dataTable.AcceptChanges();
OnChanged(EventArgs.Empty);
}
Here are a couple of ideas for minor improvements which may add up to a significant improvement:
You're building the column key (i.Key + ":" + b) more than once. Build it once within your inner foreach and stick it in a variable.
You are reading the column (row[(i.Key + ":" + b)]) more than once. Read it once and stick it in a variable so that you can use it multiple times without having to incur the hash table lookup each time.
You are adjusting the timeout (if (Timeout == 0) Timeout = 99999;) more than once. Adjust it once at the beginning of the method.
You are calculating the timeout DateTime (DateTime.UtcNow.AddSeconds(-1*Timeout)) more than once. Calculating it once at the beginning of the method.
You are always looking up column values by string. If you can store the column ordinals somewhere and use those instead, you'll get better performance. Just make sure you look up the column ordinals once at the beginning of the method, not inside either of the foreaches.
You are parsing strings into DateTimes. If you can store DateTimes in the DataTable, you wouldn't have to parse each time.
First off, there are a few things you can do here to increase the speed. First off, datatables are meant to pull data from a database, and are not really high end collections. In general, Generic Lists are 4x faster than Datatables, and use significantly less memory. Also, your biggest time cost is coming from the DateTime.Parse right in the middle of your third loop, and performing the DateTime calculation right in the middle of the loop for your expiration time. It doesnt appear the expiration time is based upon the records original value, so you should definitely generate that once before the loop starts.
I would recommend creating a data type for your record format, which would allow you to store the records dates as DateTime Objects, basically consuming the conversion time when you first intialize the list, rather than doing the tryparse everytime through. So using a List to store the data, then you could simply do something like :
var cutOffTime = DateTime.UtcNow.AddSeconds(-99999); // Do this to save creating it a billion times.
var totalRecords = AllRecords.Count; // Do this so the value is not re-evaluated
for(var i=0;i<totalRecords;i++)
{
var rec = AllRecords[i];
if(rec.TimeThingy1 < CutOffTime && rec.TimeThingy2 < cutOffTime && rec.TimeThingy3 < cutOffTime)
{
AllRecords.RemoveAt(i); // You could even add this to another list, and remove all at the end, as removing an object from a list during mid-iteration is costly
}
}

Recursion Help required with C#

On our windows application, We have startDate and EndDate. On click of Execute button event, we need to call a third party web service with our search string + daterange( date from 01/01/2010 to 12/31/2010). Now our search criteria can return us thousands of records but web service have limitation of able to return only 10K records per transaction.
Which required us to break down our dateRange. So basically we need following;
For (X dateRange if RecordCount > 10000) then
X dateRange/2 which will be 01/01/2010 to 06/01/2010 in our case and check condition again and do this recursively until we get daterange block where RecordCount is < 10000.
Then start with Next date, for example, if we get 9999 records for 01/01/2010 to 03/30/2010 then we need to get records for next block starting 04/01/2010
Is this possible with Recursion?
RecursionFunction(dtStart, dtEnd)
{
if (WebService.RecordCount > 9999)
{
TimeSpan timeSpan = dtEnd.Subtract(dtStart);
DateTime mStart = dtStart;
DateTime mEnd = dtStart.AddDays(timeSpan.Days / 2);
RecursionFunction(dtStart,dtEnd);
}
else
{
Get Records here
}
}
But with above code, recursion will have following blocks
01/01/2010, 12/31/2010 > 10000
01/01/2010, 07/03/2010 > 10000
01/01/2010, 04/02/2010 < 10000
So after finishing getting record, recursion will start again with block 01/01/2010,07/03/2010 which we don't need. We need to start next recursion with 04/03/2010,12/31/2010
Thanks in advance for help.
It looks like you are trying to split the input range until it is small enough to handle. Try calling it for both ranges:
RecursionFunction(mStart, mEnd);
RecursionFunction(mEnd.AddDays(1), dtEnd);
The first step is to change the RecursionFunction call (at line 8 of your example) to:
RecursionFunction(mStart, mEnd);
But, then, you'll also need to call it again with the other half of the date range.
RecursionFunction(mEnd + AddDays(1), dtEnd);
Also, you need to handle the results (presumably combining the two answers).
var set1 = RecurseFunction(...);
var set2 = RecurseFunction(...);
return set1.Concat(set2);
This is like divide and conquer. You need to get results from the left and the right of the split and combine them and return that value. So you can keep getting smaller until you have enough data you can deal with and just return that. Then keep joining the result sets together.
public IList<Data> GetRecords(DateTime start, DateTime end)
{
var RecordCount = WebService.RecordCount(start, end);
if (RecordCount < 10000) return WebService.GetRecords(start, end);
DateTime l, m, e;
l = start;
e = end;
var midDay = end.Subtract(start).TotalDays / 2;
m = start.AddDays(midDay);
var left = GetRecords(l, m);
var right = GetRecords(m.AddDays(1), e);
return left.Concat(right);
}
This is how I would do it
static List<string> RecursiveGet(DateTime StartDate, DateTime EndDate, List<string> Output)
{
if (Webservice.RecordCount > 9999)
{
TimeSpan T = EndDate.Subtract(StartDate);
T = new TimeSpan((long)(T.Ticks / 2));
DateTime MidDate = StartDate.Add(T);
Output.AddRange(RecursiveGet(StartDate, MidDate, Output));
Output.AddRange(RecursiveGet(MidDate.AddMilliseconds(1), EndDate, Output));
}
else
{
//Get Records here, return them in array
Output.Add("Test");
}
return Output;
}
static List<string> GetRecords(DateTime StartDate, DateTime EndDate)
{
return RecursiveGet (StartDate, EndDate, new List<string>());
}
Note, Couldn't test it
It works by dividing the dates in half, then searching each of them, and if one is still bigger than 9999, then doing it again.
An easy way would be a form of pagination. If your using JSON or XML, you can put the amount of total results and just return a set number of results (return the offset too). This way you can do a loop to check if your on the last page and after you get the last results page, break out of it.
Don't forget to put checks in if a particular transaction fails though. It's not an ideal solution on such a large dataset but it is a workaround
It sounds much easier to just reuse the last date for the data you actually got back in a while-loop than to home in with recursion like this.
Then start with Next date, for example, if we get 9999 records for 01/01/2010 to 03/30/2010 then we need to get records for next block starting 04/01/2010
March has 31 days.
Pseudo-C# code
var dtStart = DateTime.Parse("2010-01-01");
var dtEnd = DateTime.Parse("2010-12-31");
var totalRecords = new List<RecordType>();
var records = WebService.Get(dtStart, dtEnd);
totalRecords.Add(records);
while (dtStart < dtEnd && records.Count > 9999)
{
dtStart=records.Last().Date;
records = WebService.Get(dtStart, dtEnd);
totalRecords.Add(records);
}
To ease the load on the service you could calculate the timespan for the previous run and only get that many days for the next run in the while-loop.
How you should handle the inevitable doublets depends on the data on the records.
I just realized I presumed you had a date in the returned data. If not, then disregard this answer.

Categories

Resources