I am implementing a autocomplete control. Every time the user types a new character into a input, a query will fire. In order to test I have created a large database where the average query takes about 5 seconds to execute.
Because the query takes 5 seconds to execute I execute the query on a new thread:
// execute the following lamda expression when user enters characters to textbox
textBox1.TextChanged +=(x,eventArgs)=>{
// execute query on separate thread
new Thread(new ThreadStart(() =>
{
ObservableCollection = MyEntities.Entities.Products.Where(p => p.Name.Contains(inputText));
})).Start();
};
ObservableCollection and inputText are properties binded to my textbox and list.
The problem is that if the user types two characters my program will be running two threads at the same time. How can I abort the query?
Things that I am thinking about:
Create a bool variable IsQueryRuning and set it equal to true quen query starts and equal to false when thread ends. If a new query will be executed and IsQueryRuning = true then I can set ObservableCollection =null and cause an exeption. I will then resove it with a try catch block. I think that technique is not the best approach..
Edit:
Setting property Collection=null sometimes causes an exception and some other times it does not...
I would suggest a different approach, if that is possible to change for you.
Instead of querying on each keystroke of the user, I would only do a query after, for example, 3 characters. Keep the result in some collection in memory.
After that, only do the next queries on the in-memory collection. That spares you any following database accesses that are always much slower and ou should get a considerable performance gain.
class Program
{
public class Person
{
public string Name;
public int Age;
}
public static void ExecuteQueryAsync ( IEnumerable<Person> collectionToQuery , Action<List<Person>> onQueryTerminated , out Action stopExecutionOfQuery )
{
var abort = false;
stopExecutionOfQuery = () =>
{
abort = true;
};
Task.Factory.StartNew( () =>
{
try
{
var query = collectionToQuery.Where( x =>
{
if ( abort )
throw new NotImplementedException( "Query aborted" );
// query logic:
if ( x.Age < 25 )
return true;
return
false;
} );
onQueryTerminated( query.ToList() );
}
catch
{
onQueryTerminated( null );
}
});
}
static void Main ( string[] args )
{
Random random = new Random();
Person[] people = new Person[ 1000000 ];
// populate array
for ( var i = 0 ; i < people.Length ; i++ )
people[ i ] = new Person() { Age = random.Next( 0 , 100 ) };
Action abortQuery;
ExecuteQueryAsync( people , OnQueryDone , out abortQuery );
// if after some time user wants to stop query:
abortQuery();
Console.Read();
}
static void OnQueryDone ( List<Person> results )
{
if ( results == null )
Console.WriteLine( "Query was canceled by the user" );
else
Console.WriteLine( "Query yield " + results.Count + " results" );
}
}
I had to execute the query like:
public IEnumerable<T> ExecuteQuery(IEnumerable<T> query)
{
foreach (var t in query)
{
if (counterOfSymultaneosQueries > 1)
{
// if there are more than two queries break from the last one
// breaking from it will decrease counterOfSymoltanosQueries
break;
}
yield return t;
}
}
Related
I need to help solve the problem with shared function in Parallel.ForEach. I got an error lower, how can I change the function to be saved for work with threads ?
public IEnumerable<Datamodel> LoadLibrary(IEnumerable<Items> items)
{
var allLibReferences = new List<LibraryReferenceModel>();
var baseData = LoadBaseLibData();
Parallel.ForEach(baseData, data =>
{
var item = items.ToList().FindAll(c => c.Name == data.Name);
CreateLibraryReference(allLibReferences, item, data.Name); // Problem to call function in Parallel.ForEach
});
return allLibReferences;
}
private static void CreateLibraryReference(ICollection<LibraryReferenceModel> allLibReferences,
IReadOnlyCollection<Item> item, string libraryName)
{
allLibReferences.Add(item.Count == 0
? new LibraryReferenceModel
{
LibName = libraryName,
HasReference = false,
References = item.Count
}
: new LibraryReferenceModel
{
LibName = libraryName,
HasReference = true,
References = item.Count
});
}
I got This exception (the index is out of array bounds):
Thank you
As you've found, since multiple threads are attempting to add new items to the shared allLibReferences collection, you'll find erratic thread safety issues like the error you've described.
This is why it's really important to make your code thread safe before you consider parallelising it. One of the best techniques is to ensure that you rely on immutable code constructs, i.e. never try and change (mutate) the value of a shared variable during parallel code.
So I would change the way the code works, so that instead of sharing a collection, what we do is project the items needed immutably, which can be safely parallelised (I've used .AsParallel, as its simpler), and then you can collate the results and return them.
Furthermore, since the whole point of parallelism is to make code run as quickly as possible, you'll also want to remove inefficiencies such as materialising the same items in a list during each iteration (items.ToList()), and you'll also want to avoid O(N) iterations during a loop if possible - I've replaced .FindAll(c => c.Name == data.Name) with a pre-calculated dictionary.
Putting that altogether, you'll wind up with something like this:
public IEnumerable<LibraryReferenceModel> LoadLibrary(IEnumerable<Item> items)
{
var keyedItems = items.GroupBy(i => i.Name)
.ToDictionary(grp => grp.Key, grp => grp.ToList());
var baseData = LoadBaseLibData();
var allLibReferences = baseData
.AsParallel()
.SelectMany(data =>
{
if (keyedItems.TryGetValue(data.Name, out var matchedItems))
{
return matchedItems
.Select(i => ProjectLibraryReference(i, data.Name));
}
// No matches found
return new LibraryReferenceModel
{
LibName = data.Name,
HasReference = false,
References = 0
};
})
.ToList();
return allLibReferences;
}
private static LibraryReferenceModel ProjectLibraryReference(IReadOnlyCollection<Item> item,
string libraryName)
{
return new LibraryReferenceModel
{
LibName = libraryName,
HasReference = item.Count > 0,
References = item.Count
};
}
I've assumed that multiple items can have the same name, hence we're grouping before creating the Dictionary, and then we're flattening the projected results with .SelectMany at the end.
I am trying to get a count for my location inside a polygon. Here is my stored proc:
function count(poly) {
var collection = getContext().getCollection();
var query = {query: 'Select f.id from f WHERE ST_WITHIN(f.location, #poly)',
parameters: [{name: '#poly', value: poly}]};
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
query,
function (err, docs, options) {
if (err) throw err;
if (!docs || !docs.length) getContext().getResponse().setBody('no docs found');
else getContext().getResponse().setBody(docs.length);
});
if (!isAccepted) throw new Error('The query was not accepted by the server.');}
When I execute the same query in query explorer, I get the results, but through stored procedures, its returning "no docs found". It is returning the results for simpler queries but for that too, the max count returned is always 100. Not sure what am I doing wrong.
Thanks in advance.
P.S.: I tried using ST_DISTANCE for these coordinates. It did returned count as 100(max value), but is not at all working for ST_WITHIN.
Edit:
It was not working.So I tried this as described in the official example for counting the results. And voila!! It worked. So I moved to the next step to count all the locations in all the polygons I had as locally, there were too many round trips to get the count for each polygon. But calling the same function from a loop does'nt return anything. I have already tested each query of the array in documentdb studio and it does return results. Please help!!! The code for new procedure is:
function countABC(filterQueryArray) {
var results = [];
for (i = 0; i < filterQueryArray.length; i++) {
countnew(filterQueryArray[i].QueryString, "");
}
getContext().getResponse().setBody(results);
function countnew(filterQuery, continuationToken) {
var collection = getContext().getCollection();
var maxResult = 50000;
var result = 0;
tryQuery(continuationToken);
function tryQuery(nextContinuationToken) {
var responseOptions = {
continuation: nextContinuationToken,
pageSize: maxResult
};
if (result >= maxResult || !query(responseOptions)) {
setBody(nextContinuationToken);
}
}
function query(responseOptions) {
return (filterQuery && filterQuery.length) ?
collection.queryDocuments(collection.getSelfLink(), filterQuery, responseOptions, onReadDocuments) :
collection.readDocuments(collection.getSelfLink(), responseOptions, onReadDocuments);
}
function onReadDocuments(err, docFeed, responseOptions) {
if (err) {
throw 'Error while reading document: ' + err;
}
result += docFeed.length;
if (responseOptions.continuation) {
tryQuery(responseOptions.continuation);
} else {
setBody(null);
}
}
function setBody(continuationToken) {
var body = {
count: result,
continuationToken: continuationToken
};
results.push(body);
}
}
}
With new sproc, it's not helpful to set result after the loop because at that time no queries are executed (results array will be empty). The idea is that all CRUD/query calls are queued and are executed after the script that queued them is finished (in this case main script).
Setting result/body needs to be done from callback. This is partially done already, but there is an issue that for every call of countnew, the "result" variable is reset to 0. Essentially, "var result = 0" needs to be done in main script.
Also, it's not recommended to use loops like the "for" loop when it calls CRUD/queries in a loop without waiting for previous CRUD/query to finish (due to async nature), otherwise checking for isAccepted is not reliable. What's recommended it to serialize this loop, something like this:
var result = 0;
step();
function step() {
if (filterQueryArray.length == 0) setBody(null);
else {
var query = filterQueryArray.shift();
// process current query. In the end (from callback), call step() again.
}
}
Does this make sense?
In Azure table storage, how can I query for a set of entities that match specific row keys in a partition???
I'm using Azure table storage and need to retrieve a set of entities that match a set of row keys within the partition.
Basically if this were SQL it may look something like this:
SELECT TOP 1 SomeKey
FROM TableName WHERE SomeKey IN (1, 2, 3, 4, 5);
I figured to save on costs and reduce doing a bunch of table retrieve operations that I could just do it using a table batch operation. For some reason I'm getting an exception that says:
"A batch transaction with a retrieve operation cannot contain any other operations"
Here is my code:
public async Task<IList<GalleryPhoto>> GetDomainEntitiesAsync(int someId, IList<Guid> entityIds)
{
try
{
var client = _storageAccount.CreateCloudTableClient();
var table = client.GetTableReference("SomeTable");
var batchOperation = new TableBatchOperation();
var counter = 0;
var myDomainEntities = new List<MyDomainEntity>();
foreach (var id in entityIds)
{
if (counter < 100)
{
batchOperation.Add(TableOperation.Retrieve<MyDomainEntityTableEntity>(someId.ToString(CultureInfo.InvariantCulture), id.ToString()));
++counter;
}
else
{
var batchResults = await table.ExecuteBatchAsync(batchOperation);
var batchResultEntities = batchResults.Select(o => ((MyDomainEntityTableEntity)o.Result).ToMyDomainEntity()).ToList();
myDomainEntities .AddRange(batchResultEntities );
batchOperation.Clear();
counter = 0;
}
}
return myDomainEntities;
}
catch (Exception ex)
{
_logger.Error(ex);
throw;
}
}
How can I achieve what I'm after without manually looping through the set of row keys and doing an individual Retrieve table operation for each one? I don't want to incur the cost associated with doing this since I could have hundreds of row keys that I want to filter on.
I made a helper method to do it in a single request per partition.
Use it like this:
var items = table.RetrieveMany<MyDomainEntity>(partitionKey, nameof(TableEntity.RowKey),
rowKeysList, columnsToSelect);
Here's the helper methods:
public static List<T> RetrieveMany<T>(this CloudTable table, string partitionKey,
string propertyName, IEnumerable<string> valuesRange,
List<string> columnsToSelect = null)
where T : TableEntity, new()
{
var enitites = table.ExecuteQuery(new TableQuery<T>()
.Where(TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition(
nameof(TableEntity.PartitionKey),
QueryComparisons.Equal,
partitionKey),
TableOperators.And,
GenerateIsInRangeFilter(
propertyName,
valuesRange)
))
.Select(columnsToSelect))
.ToList();
return enitites;
}
public static string GenerateIsInRangeFilter(string propertyName,
IEnumerable<string> valuesRange)
{
string finalFilter = valuesRange.NotNull(nameof(valuesRange))
.Distinct()
.Aggregate((string)null, (filterSeed, value) =>
{
string equalsFilter = TableQuery.GenerateFilterCondition(
propertyName,
QueryComparisons.Equal,
value);
return filterSeed == null ?
equalsFilter :
TableQuery.CombineFilters(filterSeed,
TableOperators.Or,
equalsFilter);
});
return finalFilter ?? "";
}
I have tested it for less than 100 values in rowKeysList, however, if it even throws an exception if there are more, we can always split the request into parts.
With hundreds of row keys, that rules out using $filter with a list of row keys (which would result in partial partition scan anyway).
With the error you're getting, it seems like the batch contains both queries and other types of operations (which isn't permitted). I don't see why you're getting that error, from your code snippet.
Your only other option is to execute individual queries. You can do these asynchronously though, so you wouldn't have to wait for each to return. Table storage provides upwards of 2,000 transactions / sec on a given partition, so it's a viable solution.
Not sure how I missed this in the first place, but here is a snippet from the MSDN documentation for the TableBatchOperation type:
A batch operation may contain up to 100 individual table operations, with the requirement that each operation entity must have same partition key. A batch with a retrieve operation cannot contain any other operations. Note that the total payload of a batch operation is limited to 4MB.
I ended up executing individual retrieve operations asynchronously as suggested by David Makogon.
I made my own ghetto link-table. I know it's not that efficient (maybe its fine) but I only make this request if the data is not cached locally, which only means switching devices. Anyway, this seems to work. Checking the length of the two arrays lets me defer the context.done();
var query = new azure.TableQuery()
.top(1000)
.where('PartitionKey eq ?', 'link-' + req.query.email.toLowerCase() );
tableSvc.queryEntities('linkUserMarker',query, null, function(error, result, response) {
if( !error && result ){
var markers = [];
result.entries.forEach(function(e){
tableSvc.retrieveEntity('markerTable', e.markerPartition._, e.RowKey._.toString() , function(error, marker, response){
markers.push( marker );
if( markers.length == result.entries.length ){
context.res = {
status:200,
body:{
status:'error',
markers: markers
}
};
context.done();
}
});
});
} else {
notFound(error);
}
});
I saw your post when I was looking for a solution, in my case I needed to be look up multiple ids at the same time.
Because there is no contains linq support (https://learn.microsoft.com/en-us/rest/api/storageservices/query-operators-supported-for-the-table-service) I just made a massive or equals chain.
Seems to be working for me so far hope it helps anyone.
public async Task<ResponseModel<ICollection<TAppModel>>> ExecuteAsync(
ICollection<Guid> ids,
CancellationToken cancellationToken = default
)
{
if (!ids.Any())
throw new ArgumentOutOfRangeException();
// https://learn.microsoft.com/en-us/rest/api/storageservices/query-operators-supported-for-the-table-service
// Contains not support so make a massive or equals statement...lol
var item = Expression.Parameter(typeof(TTableModel), typeof(TTableModel).FullName);
var expressions = ids
.Select(
id => Expression.Equal(
Expression.Constant(id.ToString()),
Expression.MakeMemberAccess(
Expression.Parameter(typeof(TTableModel), nameof(ITableEntity.RowKey)),
typeof(TTableModel).GetProperty(nameof(ITableEntity.RowKey))
)
)
)
.ToList();
var builderExpression = expressions.First();
builderExpression = expressions
.Skip(1)
.Aggregate(
builderExpression,
Expression.Or
);
var finalExpression = Expression.Lambda<Func<TTableModel, bool>>(builderExpression, item);
var result = await _azureTableService.FindAsync(
finalExpression,
cancellationToken
);
return new(
result.Data?.Select(_ => _mapper.Map<TAppModel>(_)).ToList(),
result.Succeeded,
result.User,
result.Messages.ToArray()
);
}
public async Task<ResponseModel<ICollection<TTableEntity>>> FindAsync(
Expression<Func<TTableEntity,bool>> filter,
CancellationToken ct = default
)
{
try
{
var queryResultsFilter = _tableClient.QueryAsync<TTableEntity>(
FilterExpressionTree(filter),
cancellationToken: ct
);
var items = new List<TTableEntity>();
await foreach (TTableEntity qEntity in queryResultsFilter)
items.Add(qEntity);
return new ResponseModel<ICollection<TTableEntity>>(items);
}
catch (Exception exception)
{
_logger.Error(
nameof(FindAsync),
exception,
exception.Message
);
// OBSFUCATE
// TODO PASS ERROR ID
throw new Exception();
}
}
I am trying to grab logs from windows. To make it faster I look for the days where logs are and then for that range of those days I open one thread per day to load it fast. In function work1 the error "Index was outside the bounds of the array" appears. If I make the job in only one thread it works fine but it is very very slow.
I tried to use the information from
"Index was outside the bounds of the array while trying to start multiple threads"
but it does not work.
I think the problem is in IEnumerable when it is loaded, like it is not loaded in time when the loop is started.
Sorry for my english, i am from Uzbekistan.
var result = from EventLogEntry elog in aLog.Entries
orderby elog.TimeGenerated
select elog.TimeGenerated;
DateTime OLDentry = result.First();
DateTime NEWentry = result.Last();
DTN.Add(result.First());
foreach (var dtn in result) {
if (dtn.Year != DTN.Last().Year |
dtn.Month != DTN.Last().Month |
dtn.Day != DTN.Last().Day
) {
DTN.Add(dtn);
}
}
List<Thread> t = new List<Thread>();
int i = 0;
foreach (DateTime day in DTN) {
DateTime yad = day;
var test = from EventLogEntry elog in aLog.Entries
where (elog.TimeGenerated.Year == day.Year) &&
(elog.TimeGenerated.Month == day.Month) &&
(elog.TimeGenerated.Day == day.Day)
select elog;
var tt2 = test.ToArray();
t.Add(new Thread(() => work1(tt2)));
t[i].Start();
i++;
}
static void work1(IEnumerable<EventLogEntry> b) {
var z = b;
for (int i = 0; i < z.Count(); i++) {
Console.Write(z + "\n");
}
}
Replace var tt2 = test; with var tt2 = test.ToArray();
The error is a mistake you do numerous times in your code: you are enumerating over a the data countless times. Calling .Count() enumerates the data again, and in this case the data ends up conflicting with cached values inside the EventLogEntry enumerator.
LINQ does not return a data set. It returns a query. A variable of type IEnumerable<T> may return different values every time you call Count(), First() or Last(). Calling .ToArray() makes C# retrieve the result and store it in an array.
You should generally just enumerate an IEnumerable<T> once.
Let's say I have two List<string>. These are populated from the results of reading a text file
List owner contains:
cross
jhill
bbroms
List assignee contains:
Chris Cross
Jack Hill
Bryan Broms
During the read from a SQL source (the SQL statement contains a join)... I would perform
if(sqlReader["projects.owner"] == "something in owner list" || sqlReader["assign.assignee"] == "something in assignee list")
{
// add this projects information to the primary results LIST
list_by_owner.Add(sqlReader["projects.owner"],sqlReader["projects.project_date_created"],sqlReader["projects.project_name"],sqlReader["projects.project_status"]);
// if the assignee is not null, add also to the secondary results LIST
// logic to determine if assign.assignee is null goes here
list_by_assignee.Add(sqlReader["assign.assignee"],sqlReader["projects.owner"],sqlReader["projects.project_date_created"],sqlReader["projects.project_name"],sqlReader["projects.project_status"]);
}
I do not want to end up using nested foreach.
The FOR loop would probably suffice. Someone had mentioned ZIP to me but wasn't sure if that would be a preferable route to go in my situation.
One loop to iterate through both lists (assuming both have same count):
for (int i = 0; i < alpha.Count; i++)
{
var itemAlpha = alpha[i] // <= your object of list alpha
var itemBeta = beta[i] // <= your object of list beta
//write your code here
}
From what you describe, you don't need to iterate at all.
This is what you need:
http://msdn.microsoft.com/en-us/library/bhkz42b3.aspx
Usage:
if ((listAlpga.contains(resultA) || (listBeta.contains(resultA)) {
// do your operation
}
List Iteration will happen implicitly inside the contains method. And thats 2n comparisions, vs n*n for nested iteration.
You would be better off with sequential iteration in each list one after the other, if at all you need to go that route.
This list is maybe better represented as a List<KeyValuePair<string, string>> which would pair the two list values together in a single list.
There are several options for this. The least "painful" would be plain old for loop:
for (var index = 0; index < alpha.Count; index++)
{
var alphaItem = alpha[index];
var betaItem = beta[index];
// Do something.
}
Another interesting approach is using the indexed LINQ methods (but you need to remember they get evaluated lazily, you have to consume the resulting enumerable), for example:
alpha.Select((alphaItem, index) =>
{
var betaItem = beta[index];
// Do something
})
Or you can enumerate both collection if you use the enumerator directly:
using (var alphaEnumerator = alpha.GetEnumerator())
using (var betaEnumerator = beta.GetEnumerator())
{
while (alphaEnumerator.MoveNext() && betaEnumerator.MoveNext())
{
var alphaItem = alphaEnumerator.Current;
var betaItem = betaEnumerator.Current;
// Do something
}
}
Zip (if you need pairs) or Concat (if you need combined list) are possible options to iterate 2 lists at the same time.
I like doing something like this to enumerate over parallel lists:
int alphaCount = alpha.Count ;
int betaCount = beta.Count ;
int i = 0 ;
while ( i < alphaCount && i < betaCount )
{
var a = alpha[i] ;
bar b = beta[i] ;
// handle matched alpha/beta pairs
++i ;
}
while ( i < alphaCount )
{
var a = alpha[i] ;
// handle unmatched alphas
++i ;
}
while ( i < betaCount )
{
var b = beta[i] ;
// handle unmatched betas
++i ;
}