C# Task Parallel Library first time slow - c#

Trying to perform a number of HTTP Get requests in parallel, one per task. If I do the Gets through internet explorer they return almost instantaneously, but when calling in code through tasks, the first time I fire them off they take a good few seconds to return, but running a second time they return as I would expect. So either I'm doing something which is blocking something, or for some reason the threads are not starting? It's my first time trying to use TPL.
Here's the basic lookup class:
public class Lookup
{
public string Name { get; set; }
public string URL { get; set; }
public Lookup(string Name, string URL)
{
this.Name = Name;
this.URL = URL;
}
public LookupReturn DoLookup()
{
LookupReturn d = new LookupReturn();
d.Name = this.Name;
d.URL = this.URL;
WebRequest wrGETURL;
wrGETURL = WebRequest.Create(this.URL);
Stream objStream;
objStream = wrGETURL.GetResponse().GetResponseStream();
StreamReader objReader = new StreamReader(objStream);
string sLine = objReader.ReadToEnd();
d.Result = sLine;
return d;
}
}
And the return type is simply:
public class LookupReturn
{
public string Name { get; set; }
public string Result { get; set; }
public string URL { get; set; }
}
So the attempt to run this in parallel - am testing from a Winforms GUI but eventually will be in a WCF service.
public partial class Form1 : Form
{
private List<Lookup> Lookups = new List<Lookup>();
private async void btnRunLookups_Click(object sender, EventArgs e)
{
Lookups.Add(new Lookup("Name1", "http://geturl1 "));
Lookups.Add(new Lookup("Name2", "http://geturl2 "));
// More lookups…
int workerThreads, complete;
ThreadPool.GetMinThreads(out workerThreads, out complete);
ThreadPool.SetMinThreads(100, complete);
btnRunLookups.Visible = false;
List <Task<LookupReturn>> lookupTasks = new List<Task<LookupReturn>>();
foreach(Lookup dl in Lookups)
{
lbLookups.Items.Add("Starting task for " + dl.URL);
Task<LookupReturn> t = new Task<LookupReturn>(() => dl.DoLookup() );
lookupTasks.Add(t);
t.Start();
}
//await Task.WhenAny(
// Task.WhenAll(lookupTasks.ToArray<Task<LookupReturn>>()),
// Task.Delay(3000)
// );
// This takes a good few seconds the first time
await Task.WhenAll(lookupTasks.ToArray<Task<LookupReturn>>());
// Now I need to see which ones completed and returned a result
foreach (var x in lookupTasks)
{
if (x.IsCompleted)
{
lbLookups.Items.Add(x.Result);
}
else
{
// lbLookups.Items.Add("Not finished " + x.Result.Name);
}
}
btnRunLookups.Visible = true;
}

Other people have noted that HttpWebRequest can take a long time on its first request because it's looking for proxy information. Try setting its Proxy property to null.
wrGETURL = WebRequest.Create(this.URL);
wrGETURL.Proxy = null;

Most likely the problem is that the program is doing some first-time setup stuff (DNS resolution, proxy detection, etc.) on the first call to GetResponse. The proxy detection, in particular, can take a good long while.

Related

Memory Leak in ConcurrentQueue<T>?

Hi I have a simple class in a .NET Core SDK -> 3.1.409 project for the communication with devices.
public class WriteCommand
{
//Commands is an enumeration.
public Commands LaserCommand { get; }
public List<byte> Parameter { get; }
public List<byte> Data { get; }
public WriteCommand(Commands laserCommand, byte[] parameter = null)
{
Data = BuildSendData(laserCommand, parameter);
Parameter = new List<byte>(parameter);
LaserCommand = laserCommand;
}
private List<byte> BuildSendData(Commands command, byte[] paramBytes)
{
var parameter = paramBytes ?? Array.Empty<byte>();
int numberOfBytes = parameter.Length + Constants.ADD_TO_PARAMETER; // Defined by protocol
List<byte> sendData = new List<byte>();
sendData.Add(Constants.PACKET_START_BYTE);
sendData.Add((byte)numberOfBytes);
sendData.Add(Constants.COMMAND_START_BYTE);
sendData.Add((byte)command);
foreach (var param in parameter)
{
sendData.Add(param);
}
sendData.Add(Constants.PACKET_END_BYTE);
byte checksum = new CheckSumCalculator().CalculateCheckSum(sendData);
sendData.Add(checksum);
return sendData;
}
}
I use this class to add to a ConcurrentQueue in one taks like this.
public void AddCommand()
{
commandsQueue.Enqueue(new WriteCommand(Commands.SetRs232BaudRate));
}
And in another task I get the command out of the ConcurrentQueue
public void SendAndReceiveMessages()
{
while (!commandsQueue.IsEmpty)
{
if (commandsQueue.TryDequeue(out WriteCommand writeCommand))
{
//Do something
}
}
}
In my progam I habe 6 devices to communicate within an interval of one second. Each device has it's own communication class.
When the program run for a while (more than 2 days) I see an increase of the needed memory.
I Check this with the a memory profiler and see a memory leak:
WriteCommand
ConcurrentQueueSegment+Slot
This is only articelI found.
You can find the example code here
Does anyone know this problem?
Greetings Mike

How to use C# Task libraries to download and process a sequence of data

I've been struggling with a design to maximize the parallelism of a task using C# and Task libraries. Although I have some idea of various parallel processing concepts (and also reading multiple StackOverflow questions on the topic), I am having a hard time trying to assemble everything in a coherent manner that will solve my problem.
My problem has these properties/rules:
I would like to download a time-stamped series of data from a HTTP connection in "segments" across a number of http servers/connections.
Depending on the specific HTTP service, it will provide each segment of data in a differing size. For example, on one connection it may provide an hourly segment for each request (eg. "http://server1/getdata?year=2020&month=1&day=1&hour=1"). On a different connection, it might provide the data in a monthly segment (eg. "http://server2/getdata?year=2020&month=1"). It is not possible to get hourly data from a monthly connection or vice versa.
If any one server fails or is busy with more than x connections, I would like to retry on a different server.
When a segment of data has been downloaded, it requires to be processed into a data-set result. As much as possible, this processing should be parallelized.
When the chronological first segment in the series arrives, I would like to immediately start processing it and processing each subsequent segment in chronological order (i.e. I do not want to wait for the entire series to complete downloading before responding to the caller).
Below is one of my attempts to solve this. For the sake of clarity, I have only included the stubs of some code.
public IEnumerable<object> RetrieveData(DateTime begin, DateTime end)
{
// Break the period up into the smallest segments allowed.
// In this case, we will create one segment for each hour between begin and end dates
var segments = new DataSegments(begin, end, IntervalTypeEnum.Hourly);
var cancelTokenSource = new CancellationTokenSource();
var cancelToken = cancelTokenSource.Token;
var tasks = new List<Task>();
// Start a number of tasks which are responsible for downloading segments
// until all segments are complete.
for (int i = 0; i < 3; i++)
{
var task = new Task(() =>
{
// Keep downloading segments until there are none left.
while (!segments.IsComplete && !cancelToken.IsCancellationRequested)
{
string errorMsg = string.Empty;
// Gets a list of connections available for downloading data
var connections = DataConnectionManager.GetConnectionQueue();
// Cycle through all the available connections until we successfully download
// a chunk.
Retry:
try
{
var connection = connections.Dequeue();
if (connection is MonthlyDataConnection)
{
List<Segment> list = segments.GetNext(SegmentType.Monthly);
DownloadAndProcessMonthlySegment(connection, chunk, cancelToken);
}
else if (connection is HourlyDataConnection)
{
List<Segment> list = segments.GetNext(SegmentType.Hourly);
foreach(var segment in list)
{
DownloadAndProcessHourlySegment(connection, segment, cancelToken);
}
}
}
catch
{
goto Retry;
}
}
});
task.Start();
tasks.Add(task);
}
foreach(var segment in segments)
{
segment.Wait(cancelToken);
if (chunk.Data != null && !cancelToken.IsCancellationRequested)
{
yield return chunk.Data;
}
}
Task.WaitAll(tasks.ToArray());
}
void DownloadAndProcessMonthlySegment(connection, segment, cancelToken)
{
// Download from http connection, throw exception if WebException.
// Process data if http download successful
// Mark all segments as complete/ready
}
void DownloadAndProcessHourlySegment(connection, segment, cancelToken)
{
// Download from http connection, throw exception if WebException.
// Process data if http download successful
// Mark segment as complete/ready
}
public enum SegmentType
{
NextAvailable,
Hourly,
Monthly
}
// Represents a series of data segments that need to be downloaded
// In this code example, it will have hourly segments that span the specified
// begin and end dates.
public class DataSegments: IEnumerable<DataSegment>
{
// Returns a list of segments that haven't been downloaded yet.
// Depending on the "SegmentType", it will return just one hourly segment or
// an entire month of hourly segments (SegmentType.Hourly)
public List<DataSegment> GetNext(SegmentType type = SegmentType.NextAvailable);
}
// Represents a segment of data that needs to be retrieved from the web
// and processed into "Data".
public class DataSegment
{
DateTime BeginDate { get; set; }
DateTime EndDate { get; set; }
// The processed data-set result
object Data { get; set; }
}
The code works by using a series of Tasks that operate like Threads and loop until a list of Segments are downloaded and processed. Depending on the connection type (monthly or hourly), it will download and process the data accordingly (while ensuring no other task attempts to download the same range of data).
Although the code does (mostly) work, I feel it isn't the most optimal or elegant solution. One short-coming, for example, would be that the tasks could be held waiting for HTTP requests when it could instead be processing data. Another is that the connection and error handling is not ideal. For example, there is no handling for the scenario where more than x connections have been established to a server.
Would someone have a better solution or ideas to improve on this code while properly maximizing parallelism?
EDIT:
As requested by #Enigmativity, below is a full console app example that can be compiled.
Limitations of the solution:
The number of running tasks are hard-coded.
Each task is designed more like a Thread with a continuous loop rather than a discrete operation.
Processing of segments is not parallelized as much as it could be.
No exception handling.
class Program
{
static Random random = new Random((int)DateTime.Now.Ticks);
static void Main(string[] args)
{
Connections.Instance.Enqueue(new Connection(IntervalTypeEnum.Hourly));
Connections.Instance.Enqueue(new Connection(IntervalTypeEnum.Daily));
var begin = new DateTime(2020, 1, 1);
var end = new DateTime(2020, 1, 5);
foreach (var download in Download(begin, end))
{
Console.WriteLine($"Final result: {download}");
}
Console.WriteLine("Press any key...");
Console.ReadKey();
}
public static IEnumerable<string> Download(DateTime begin, DateTime end)
{
var segments = new DataSegments(begin, end, IntervalTypeEnum.Hourly);
var cancelTokenSource = new CancellationTokenSource();
var cancelToken = cancelTokenSource.Token;
var taskList = new List<Task<object>>();
var tasks = new List<Task>();
for (int i = 0; i < 3; i++)
{
var task = new Task(() =>
{
while (!segments.IsComplete && !cancelToken.IsCancellationRequested)
{
string errorMsg = string.Empty;
var connection = Connections.GetNextAvailable();
var list = segments.GetNext(connection.IntervalType);
foreach (var segment in list)
{
GetSegment(connection, segment, cancelToken);
}
}
});
task.Start();
tasks.Add(task);
}
foreach (var segment in segments)
{
segment.Wait(cancelToken);
if (segment.Data != null && !cancelToken.IsCancellationRequested)
{
Console.WriteLine($"Yielding data: {segment.Data}");
yield return (string)segment.Data;
}
}
Task.WaitAll(tasks.ToArray());
}
static void GetSegment(Connection conn, DataSegment segment, CancellationToken token)
{
conn.WaitOne();
var result = conn.Download(segment.Begin, segment.End);
segment.Data = result;
ProcessSegment(segment, token);
conn.Release();
}
static void ProcessSegment(DataSegment segment, CancellationToken token)
{
Console.WriteLine($"Processing segment data: {segment.Data}");
for (DateTime d = segment.Begin; d < segment.End; d = d.AddHours(1))
{
for (int i = 0; i < 100; i++)
{
}
// Doing stuff..
}
segment.Status = DownloadStatusEnum.Done;
}
}
public class Connection
{
static Random random = new Random((int)DateTime.Now.Ticks);
public IntervalTypeEnum IntervalType { get; set; }
private SemaphoreSlim semaphore = new SemaphoreSlim(2);
public Connection(IntervalTypeEnum type)
{
IntervalType = type;
}
public void WaitOne()
{
semaphore.Wait();
}
public bool IsBusy
{
get
{
return semaphore.CurrentCount == 0;
}
}
public string Download(DateTime begin, DateTime end)
{
var data = $"{begin.ToString("yyyyMMdd hh:mm")} - {end.ToString("yyyyMMdd hh:mm")}";
Console.WriteLine($"Downloading {data}");
Thread.Sleep(random.Next(1000));
return data;
}
public void Release()
{
semaphore.Release();
}
}
public class Connections : Queue<Connection>
{
private static Connections instance = null;
public static Connections Instance
{
get
{
if (instance == null)
instance = new Connections();
return instance;
}
}
public static Connection GetNextAvailable()
{
Connection retVal = null;
foreach (var connection in Instance)
{
if (retVal == null) retVal = connection;
if (!connection.IsBusy)
{
retVal = connection;
break;
}
else
{
}
}
return retVal;
}
}
public enum DownloadStatusEnum
{
NeedsProcessing,
InProgress,
Done
}
public class DataSegment
{
public EventHandler OnStatusUpdate;
ManualResetEvent resetEvent = new ManualResetEvent(false);
public DataSegment(DateTime begin, DateTime end)
{
Begin = begin;
End = end;
Status = DownloadStatusEnum.NeedsProcessing;
Data = null;
}
public DateTime Begin { get; set; }
public DateTime End { get; set; }
private DownloadStatusEnum _status = DownloadStatusEnum.NeedsProcessing;
public DownloadStatusEnum Status
{
get
{
return _status;
}
set
{
_status = value;
Update();
}
}
public string Data { get; set; }
void Update()
{
// If the task is finished, then trigger anyone waiting..
if (Status == DownloadStatusEnum.Done) resetEvent.Set();
this.OnStatusUpdate?.Invoke(this, null);
}
public void Wait(CancellationToken token)
{
WaitHandle.WaitAny(
new[] { token.WaitHandle, resetEvent });
}
}
public enum ChunkType
{
NextAvailable,
Monthly
}
public enum IntervalTypeEnum
{
Hourly = 0,
Daily = 1,
}
public class DataSegments : IEnumerable<DataSegment>
{
protected List<DataSegment> chunkList = new List<DataSegment>();
protected HashSet<DataSegment> unprocessedList = new HashSet<DataSegment>();
protected HashSet<DataSegment> inProgressList = new HashSet<DataSegment>();
protected HashSet<DataSegment> completedList = new HashSet<DataSegment>();
public DataSegments(DateTime begin, DateTime end, IntervalTypeEnum intervalType)
{
BeginDate = begin;
EndDate = end;
IntervalType = intervalType;
DateTime requestDate = BeginDate;
DateTime endDate = new DateTime(EndDate.Year, EndDate.Month, EndDate.Day, EndDate.Hour,
EndDate.Minute, EndDate.Second);
DateTime finalRequestDate = EndDate;
DateTime beginPeriod = BeginDate;
DateTime endPeriod = DateTime.MinValue;
if (IntervalType == IntervalTypeEnum.Hourly)
{
beginPeriod = new DateTime(beginPeriod.Year, beginPeriod.Month, beginPeriod.Day, beginPeriod.Hour, 0, 0);
endPeriod = beginPeriod.AddHours(1);
requestDate = new DateTime(requestDate.Year, requestDate.Month, requestDate.Day, requestDate.Hour, 0, 0);
finalRequestDate = endDate.AddHours(1);
}
else if (IntervalType == IntervalTypeEnum.Daily)
{
beginPeriod = new DateTime(beginPeriod.Year, beginPeriod.Month, beginPeriod.Day, 0, 0, 0);
endPeriod = beginPeriod.AddDays(1);
requestDate = new DateTime(requestDate.Year, requestDate.Month, beginPeriod.Day, 0, 0, 0);
// Calculate the last request date as the end day of the month
finalRequestDate = new DateTime(endDate.Year, endDate.Month, beginPeriod.Day, 23, 0, 0);
}
while (endPeriod <= finalRequestDate)
{
var chunk = new DataSegment(beginPeriod < BeginDate ? BeginDate : beginPeriod, endPeriod > EndDate ? EndDate : endPeriod.AddTicks(-1));
chunk.OnStatusUpdate += OnStatusUpdated;
chunkList.Add(chunk);
unprocessedList.Add(chunk);
if (IntervalType == IntervalTypeEnum.Hourly)
{
beginPeriod = beginPeriod.AddHours(1);
endPeriod = beginPeriod.AddHours(1);
}
else if (IntervalType == IntervalTypeEnum.Daily)
{
beginPeriod = beginPeriod.AddMonths(1);
endPeriod = beginPeriod.AddMonths(1);
}
}
}
void OnStatusUpdated(object sender, EventArgs args)
{
if (sender is DataSegment)
{
var dc = (DataSegment)sender;
if (dc.Status == DownloadStatusEnum.NeedsProcessing)
{
lock (unprocessedList)
{
unprocessedList.Add(dc);
inProgressList.Remove(dc);
completedList.Remove(dc);
}
}
else if (dc.Status == DownloadStatusEnum.InProgress)
{
lock (unprocessedList)
{
unprocessedList.Remove(dc);
inProgressList.Add(dc);
completedList.Remove(dc);
}
}
else if (dc.Status == DownloadStatusEnum.Done)
{
lock (unprocessedList)
{
unprocessedList.Remove(dc);
inProgressList.Remove(dc);
completedList.Add(dc);
}
}
}
}
public IntervalTypeEnum IntervalType { get; set; }
public DateTime BeginDate { get; set; }
public DateTime EndDate { get; set; }
public int UnprocessedCount
{
get
{
lock (chunkList)
{
return unprocessedList.Count;
}
}
}
/// <summary>
/// Determines whether the
/// </summary>
public bool IsComplete
{
get
{
return chunkList.Count == completedList.Count;
}
}
public List<DataSegment> GetNext(IntervalTypeEnum type)
{
List<DataSegment> retVal = new List<DataSegment>();
lock (unprocessedList)
{
DataSegment firstSegment = null;
bool adding = false;
int watermark = -1;
foreach (var chunk in unprocessedList)
{
//if (chunk.Status == DownloadStatusEnum.NeedsProcessing)
{
// Grab the first available chunk. If we don't find anything else that suits,
// we will just return this.
if (firstSegment == null) firstSegment = chunk;
if (type == IntervalTypeEnum.Hourly)
{
Console.WriteLine("Reserving HOURLY segment for download");
break;
}
else if (type == IntervalTypeEnum.Daily)
{
// IF we are at the start of a month, then add these
// to our list until we progress to the next month.
// We take a note of the current month so we know when we have
// moved to the next.
if (!adding)
{
adding = true;
watermark = chunk.Begin.Day;
retVal.Add(chunk);
}
else if (adding && chunk.Begin.Day != watermark)
{
Console.WriteLine("Reserving DAILY segment for download");
break;
}
else
{
retVal.Add(chunk);
}
}
}
}
// If we didn't find any matching chunk, return the first one.
if (retVal.Count == 0 && firstSegment != null) retVal.Add(firstSegment);
} // lock
// Mark all the chunks as in progress
foreach (var chunk in retVal)
{
chunk.Status = DownloadStatusEnum.InProgress;
}
return retVal;
}
public IEnumerator<DataSegment> GetEnumerator()
{
return chunkList.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}

MongoDB as a lock with WriteConcern Majority and ReadConcern Linearizable

I got one place in my application where I want to use Mongo (3.6) as a lock of multiple threads (on different servers). Essentially something like "if one thread started work, other threads should see it through mongo and dont start the same work in parallel".
From the documentation I learned
Combined with "majority" write concern, "linearizable" read concern enables multiple threads to perform reads and writes on a single document as if a single thread performed these operations in real time;
So this sounded good to me, I insert a certain document if one thread starts work, and other threads check if such document already exists and dont start if so, but it does not work for my case.
I prepared two tests - one non-parallel that successfully blocks the second thread - but the parallel test fails and I get two of these RebuildLog documents.
using System;
using System.Threading.Tasks;
using FluentAssertions;
using Xunit;
using MongoDB.Bson.Serialization.Attributes;
using MongoDB.Driver;
namespace FindOneAndUpdateTests
{
public class FindOneAndUpdateTests
{
private static IMongoDatabase GetDatabase()
{
var dbName = "test";
var client = new MongoClient("mongodb://localhost:45022");
return client.GetDatabase(dbName);
}
private IMongoCollection<RebuildLog> GetCollection()
{
return GetDatabase().GetCollection<RebuildLog>("RebuildLog");
}
[Fact]
public async Task FindOneAndUpdate_NotParallel_Test()
{
var dlpId = Guid.NewGuid();
var first = await FindOneAndUpdateMethod(dlpId);
var second = await FindOneAndUpdateMethod(dlpId);
first.Should().BeFalse();
second.Should().BeTrue();
}
[Fact]
public async Task FindOneAndUpdate_Parallel_Test()
{
var dlpId = Guid.NewGuid();
var taskFirst = FindOneAndUpdateMethod(dlpId);
var taskSecond = FindOneAndUpdateMethod(dlpId);
var first = await taskFirst;
var second = await taskSecond;
first.Should().BeFalse();
second.Should().BeTrue();
}
private async Task<bool> FindOneAndUpdateMethod(Guid dlpId)
{
var mongoCollection = GetCollection();
var filterBuilder = Builders<RebuildLog>.Filter;
var filter = filterBuilder.Where(w => w.DlpId == dlpId);
var creator = Builders<RebuildLog>.Update
.SetOnInsert(w => w.DlpId, dlpId)
.SetOnInsert(w => w.ChangeDate, DateTime.UtcNow)
.SetOnInsert(w => w.BuildDate, DateTime.UtcNow)
.SetOnInsert(w => w.Id, Guid.NewGuid());
var options = new FindOneAndUpdateOptions<RebuildLog>
{
IsUpsert = true,
ReturnDocument = ReturnDocument.Before
};
var result = await mongoCollection
.WithWriteConcern(WriteConcern.WMajority)
.WithReadConcern(ReadConcern.Linearizable)
.FindOneAndUpdateAsync(filter, creator, options);
return result != null;
}
}
[BsonIgnoreExtraElements]
public class RebuildLog
{
public RebuildLog()
{
Id = Guid.NewGuid();
}
public Guid Id { get; set; }
public DateTime ChangeDate { get; set; }
public string ChangeUser { get; set; }
public Guid DlpId { get; set; }
public string Portal { get; set; }
public DateTime? BuildDate { get; set; }
}
}
My suspicion is that my idea with the atomic handcrafted GetOrInsert (see the FindOneAndUpdate with IsUpsert) breaks the constraint of "on a single document" in the documentation. Any idea to fix this or is it just not possible?
It is interesting. May be you have no unique index on DlpId? That's why mongo decides that sequential execution of these operations is not necessary because in your case it's no write-then-read pattern (as it pointed in "Client Sessions and Causal Consistency Guarantees"). It is update-or-create two times concurrently.
What about this? :
public class SyncDocument
{
// ...
[BsonElement("locked"), BsonDefaultValue(false)]
public bool Locked { get; set; }
}
In client code:
var filter = Builders<SyncDocument>.Filter.Eq(d => d.Locked, false);
var update = Builders<SyncDocument>.Update.Set(d => d.Locked, true);
var result = collection.UpdateOne(filter, update);
if (result.ModifiedCount == 1) {
Console.WriteLine("Lock acquired");
}
Document with Locked field should be created before applications startup (if it is applicable for your task).

How to reduce the execution time? ProfileBase's GetPropertyValue takes 40ms to execute

I have following code, wherein I create list of a custom class using MembershipUser array.
Following is the custom class whose list is created:
public class userandGroup :IComparable{
public string id { get; set; }
public string Name { get; set; }
public string DisplayName { get; set; }
public string type { get; set; }
public int? CompareTo(Object obj)
{
if (obj is userandGroup)
return this.DisplayName.CompareTo((obj as userandGroup).DisplayName);
return null;
}
}
Following is the code which populates userlist:
MembershipUserCollection tempuserlist = GetProvider("DefaultProfileProvider", applicationName).GetAllUsers(currentPage - 1, pageSize, out totalUsers);
MembershipUser[] userlist = new MembershipUser[totalUsers];
tempuserlist.CopyTo(userlist, 0);
Following is the code which generates list of userandGroup (the custom class):
foreach (MembershipUser usr in userlist)
{
userandGroup usrgp = new userandGroup();
usrgp.id = ((Guid)usr.ProviderUserKey).ToString() ;
usrgp.Name = usr.UserName;
ProfileBase profile = ProfileBase.Create(usr.UserName);
profile.Initialize(usr.UserName, true);
// Following line approximately takes 40ms per loop.
usrgp.DisplayName = profile.GetPropertyValue("FirstName").ToString() + " " + profile.GetPropertyValue("LastName").ToString();
usrgp.type = "user";
lst.Add(usrgp);
}
As written in the comment, the line;
usrgp.DisplayName = profile.GetPropertyValue("FirstName").ToString() + " " + profile.GetPropertyValue("LastName").ToString();
takes 40ms to complete in one loop. I have 40 users at the moment. Thus the loop takes approximately 1600ms to execute. If number of users are increased, the loop will take horrendous time to complete.
How can I reduce the execution time of the line or is there any other way to get first name and last name of the user from ProfileBase?
As per #TyCobb 's suggestion, I used Parallel Foreach loop. I updated the loop as follows.
Object obj = new Object();
Parallel.ForEach(userlist, (usr) =>
{
userandGroup usrgp = new userandGroup();
usrgp.id = ((Guid)usr.ProviderUserKey).ToString();
usrgp.Name = usr.UserName;
ProfileBase profile = ProfileBase.Create(usr.UserName);
profile.Initialize(usr.UserName, true);
usrgp.type = "user";
usrgp.DisplayName = profile.GetPropertyValue("FirstName").ToString() + " " + profile.GetPropertyValue("LastName").ToString();
lock (obj)
{
lst.Add(usrgp);
}
});
Although this improved performance a bit, the performance is yet not optimum. Now the entire loop completes in less than one second.

How to Loop calls to Pagination URL in C# HttpClient to download all Pages from JSON results

My 1st question, so please be kind... :)
I'm using the C# HttpClient to invoke Jobs API Endpoint.
Here's the endpoint: Jobs API Endpoint (doesn't require key, you can click it)
This gives me JSON like so.
{
"count": 1117,
"firstDocument": 1,
"lastDocument": 50,
"nextUrl": "\/api\/rest\/jobsearch\/v1\/simple.json?areacode=&country=&state=&skill=ruby&city=&text=&ip=&diceid=&page=2",
"resultItemList": [
{
"detailUrl": "http:\/\/www.dice.com\/job\/result\/90887031\/918715?src=19",
"jobTitle": "Sr Security Engineer",
"company": "Accelon Inc",
"location": "San Francisco, CA",
"date": "2017-03-30"
},
{
"detailUrl": "http:\/\/www.dice.com\/job\/result\/cybercod\/BB7-13647094?src=19",
"jobTitle": "Platform Engineer - Ruby on Rails, AWS",
"company": "CyberCoders",
"location": "New York, NY",
"date": "2017-04-16"
}
]
}
I've pasted a complete JSON snippet so you can use it in your answer. The full results are really long for here.
Here's are the C# classes.
using Newtonsoft.Json;
using System.Collections.Generic;
namespace MyNameSpace
{
public class DiceApiJobWrapper
{
public int count { get; set; }
public int firstDocument { get; set; }
public int lastDocument { get; set; }
public string nextUrl { get; set; }
[JsonProperty("resultItemList")]
public List<DiceApiJob> DiceApiJobs { get; set; }
}
public class DiceApiJob
{
public string detailUrl { get; set; }
public string jobTitle { get; set; }
public string company { get; set; }
public string location { get; set; }
public string date { get; set; }
}
}
When I invoke the URL using HttpClient and deserialize using JSON.NET, I do get the data back properly.
Here's the code I am calling from my Console App's Main method (hence the static list, I think this could be better refactored??)
private static List<DiceApiJob> GetDiceJobs()
{
HttpClient httpClient = new HttpClient();
var jobs = new List<DiceApiJob>();
var task = httpClient.GetAsync("http://service.dice.com/api/rest/jobsearch/v1/simple.json?skill=ruby")
.ContinueWith((taskwithresponse) =>
{
var response = taskwithresponse.Result;
var jsonString = response.Content.ReadAsStringAsync();
jsonString.Wait();
var result = JsonConvert.DeserializeObject<DiceApiJobWrapper>(jsonString.Result);
if (result != null)
{
if (result.DiceApiJobs.Any())
jobs = result.DiceApiJobs.ToList();
if (result.nextUrl != null)
{
//
// do this GetDiceJobs again in a loop? How?? Any other efficient elegant way??
}
}
});
task.Wait();
return jobs;
}
But now, how do I check if there are more jobs using the nextUrl field? I know I can check to see if it's not null, and if if not, that means there are more jobs to pull down.
Results from my debugging and stepping through
How do I do this recursively, and without hanging and with some delays so I don't cross the API limits? I think I have to use TPL ( Task Parallel Library) but am quite baffled.
Thank you!
~Sean
If you are concerned about response time of your app and would like to return some results before you actually get all pages/data from the API, you could run your process in a loop and also give it a callback method to execute as it gets each page of data from the API.
Here is a sample:
public class Program
{
public static void Main(string[] args)
{
var jobs = GetDiceJobsAsync(Program.ResultCallBack).Result;
Console.WriteLine($"\nAll {jobs.Count} jobs displayed");
Console.ReadLine();
}
private static async Task<List<DiceApiJob>> GetDiceJobsAsync(Action<DiceApiJobWrapper> callBack = null)
{
var jobs = new List<DiceApiJob>();
HttpClient httpClient = new HttpClient();
httpClient.BaseAddress = new Uri("http://service.dice.com");
var nextUrl = "/api/rest/jobsearch/v1/simple.json?skill=ruby";
do
{
await httpClient.GetAsync(nextUrl)
.ContinueWith(async (jobSearchTask) =>
{
var response = await jobSearchTask;
if (response.IsSuccessStatusCode)
{
string jsonString = await response.Content.ReadAsStringAsync();
var result = JsonConvert.DeserializeObject<DiceApiJobWrapper>(jsonString);
if (result != null)
{
// Build the full list to return later after the loop.
if (result.DiceApiJobs.Any())
jobs.AddRange(result.DiceApiJobs.ToList());
// Run the callback method, passing the current page of data from the API.
if (callBack != null)
callBack(result);
// Get the URL for the next page
nextUrl = (result.nextUrl != null) ? result.nextUrl : string.Empty;
}
}
else
{
// End loop if we get an error response.
nextUrl = string.Empty;
}
});
} while (!string.IsNullOrEmpty(nextUrl));
return jobs;
}
private static void ResultCallBack(DiceApiJobWrapper jobSearchResult)
{
if (jobSearchResult != null && jobSearchResult.count > 0)
{
Console.WriteLine($"\nDisplaying jobs {jobSearchResult.firstDocument} to {jobSearchResult.lastDocument}");
foreach (var job in jobSearchResult.DiceApiJobs)
{
Console.WriteLine(job.jobTitle);
Console.WriteLine(job.company);
}
}
}
}
Note that the above sample allows the callback method to access each page of data as it is received by the GetDiceJobsAsync method. In this case, the console, displays each page as it becomes available. If you do not want the callback option, you can simply pass nothing to GetDiceJobsAsync.
But the GetDiceJobsAsync also returns all the jobs when it completes. So you can choose to act on the whole list at the end of GetDiceJobsAsync.
As for reaching API limits, you can insert a small delay within the loop, right before you repeat the loop. But when I tried it, I did not encounter the API limiting my requests so I did not include it in the sample.

Categories

Resources