DynamoDB query doesn't return all attributes - c#

I have a table in DynamoDB.
Table name test-vod
Primary partition key guid (String)
Primary sort key -
With additional attributes as you can see below.
The goal is to query the table using one of the columns that are not a primary key srcVideo, to accomplish that we created a second local index.
And using the low-level API from DynamoDB SDK NuGet package we query with the below code (open to other options instead of low-level API).
var queryRequest = new QueryRequest
{
TableName = $"{_environmentName}-vod",
IndexName = "srcVideo-index",
ScanIndexForward = true,
KeyConditionExpression = "srcVideo = :v_srcVideo",
ExpressionAttributeValues = new Dictionary<string, AttributeValue>()
{
{":v_srcVideo",new AttributeValue {S = inputMediaKey}}
}
};
var response = await _client.QueryAsync(queryRequest, cancellationToken);
// Does not exist
var hlsUrl = response.Items
.SelectMany(p => p)
.SingleOrDefault(p => p.Key.Equals("hlsUrl"));
I am interested to retrieve 3 attributes (fields) from the response hlsUrl, dashUrl, workflowsStatus but all 3 missing, the response contains a Dictionary with a count of keys 27, these are only 27 out of the 35 available columns.
I have tried using ProjectionExpression and other query combinations with no success.

You don't show the CREATE TABLE you've used...
Sounds like your index wasn't created with the Projection attribute you really want...
Default is , KEYS_ONLY. Sounds like you want ALL or maybe INCLUDE just selected attributes...GlobalSecondaryIndex - Projection
Local secondary indexes work the same way...

It is interesting but I made it work with the below code, even if the key/value is not present in the dictionary when inspecting the debugger you can still retrieve it.
var queryRequest = new QueryRequest
{
TableName = tableName,
IndexName = "srcVideo-index",
ScanIndexForward = true,
KeyConditionExpression = "srcVideo = :v_srcVideo",
ExpressionAttributeValues = new Dictionary<string, AttributeValue>()
{
{":v_srcVideo", new AttributeValue {S = inputMediaKey}}
}
};
var response = await _client.QueryAsync(queryRequest, cancellationToken);
if (response.Items.AnyAndNotNull())
{
var dictionary = response.Items.First().ToDictionary(p => p.Key, x => x.Value.S);
return Result.Ok (new VodDataInfo(
dictionary["srcBucket"],
dictionary["srcVideo"],
dictionary["destBucket"],
dictionary.ContainsKey("dashUrl")
? dictionary["dashUrl"]
: default,
dictionary.ContainsKey("hlsUrl")
? dictionary["hlsUrl"]
: default,
dictionary["workflowStatus"]));
}

Related

How to implement bulk update in MongoDb using multiple filters using c#?

I have a collection EmployeeDetails. The collection has 4 fields. I have filter with first three fields. I want to update ($set) if matching data found else insert (.SetOnInsert {upsert}); however in bulk.
EmpName:
EmpCompany:
EmpDesignation:
EmpSalary:
I would like to update EmpSalary on the basis of other fields. Also, filter data will be sent in bulk. Is it possible to so w/o a foreach loop.
I have tried the followed code:
foreach( var filterData in filterDataArrayList)
{
var loadData = Builders<EmployeeModel>.Update
.SetOnInsert(x=>x.EmpSalary , Salary)
.SetOnInsert(x=>x.EmpName , Name)
.SetOnInsert(x=>x.EmpCompany , Company)
.SetOnInsert(x=>x.EmpDesignation , Designation)
var insertResult = await collection.UpdateOneAsync(
x=>x.EmpName == filterData.Name, x=>x.EmpCompany=filterData.Company, x=>x.EmpDesignation = filterData.Designation ), loadData,
new UpdateOptions() {IsUpsert=true});
if(loadData.upsertId==null && loadData.matchedCount==1)
{
var updateData = Builders<EmployeeModel>.Update
.Set(x=>x.EmpSalary , Salary)
var updateResult = await collection.UpdateOneAsync(
x=>x.EmpName == filterData.Name, x.EmpCompany=filterData.Company, x.EmpDesignation =filterData.Designation ), updateData)
}
This code works fine. I want to eliminate foreach loop for filter data. Is that possible?
Try this:
var client = new MongoClient();
var db = client.GetDatabase("d");
var coll = db.GetCollection<BsonDocument>("c");
coll.BulkWrite(new[]
{
new UpdateOneModel<BsonDocument>(
"{ whatever1 : 1 }",
new UpdateDefinitionBuilder<BsonDocument>()
.SetOnInsert("field1", 1)
.SetOnInsert("field2", 2)),
new UpdateOneModel<BsonDocument>(
"{ whatever2 : 1 }",
new UpdateDefinitionBuilder<BsonDocument>()
.SetOnInsert("field21", 1)
.SetOnInsert("field22", 2))
{
IsUpsert = true
}
});
This example is just to show how it can be done, you can use a typed/more complex way as in your example too

How to convert SQLQuery to SortedList through EF6

I have an Entity Framework 6 class called Materials, which is reflected in my database as a table with the same name. Using a parent parameter, I need to return a sorted list of materials from a SQL Query, so that I can later check that edits the user makes do not affect the order. My SQL is a stored procedure that looks like this:
CREATE PROC [dbo].[GET_SortedMaterials](#FinishedGoodCode VARCHAR(50))
AS
SELECT
ROW_NUMBER() OVER (ORDER BY Component.Percentage_of_Parent DESC,Material.Material) AS _sortField
,Material.*
FROM
Components AS Component
INNER JOIN Materials AS Material ON Component.Child_Material = Material.Material
WHERE
Component.Parent_Code = #FinishedGoodCode
ORDER BY
Component.Percentage_of_Parent DESC
,Material.Material
As you can see, the orderby field is not included in the Material. For this reason, I felt I could not return just a set of Material objects and still keep the sorting - I have performed the ordering in SQL and added the _sortField (I think that field may be a bad idea).
My C# code to read the SQL looks like this:
public async Task<SortedList<int, Materials>> GET_SortedMaterials(IProgress<Report> progress, string finishedGoodCode)
{
try
{
var report = new Report { Message = "Retrieving Sorted Materials", NewLine = true, StatusCode = Enums.StatusCode.Working };
progress.Report(report);
using (var context = new DBContext())
{
var ingredientList = await context.Database.SqlQuery<(int _sortField,Materials mat)>("[app].[GET_Customers]").ToListAsync();
var sorted = new SortedList<int, Raw_Materials>();
foreach (var (_sortField, mat) in ingredientList.OrderBy(x=>x._sortField))
{
sorted.Add(_sortField, mat);
}
return sorted;
}
}
catch (Exception ex)
{ [EXCLUDED CODE]
}
}
When the code executes, I get the correct number of rows returned, but I do not get a Sorted list where the Key corresponds to the _sortField value and the Value to the Material value. I have tried various different versions of basically the same code and I cannot get the script to return a list of materials with information about their sorting, instead, the conversion to EF class fails entirely and I only get null values back:
Any advice about how to return a sorted list from SQL and maintain the sorting in C#, when the sort field is not in the return values would be very gratefully received.
use
var ingredientList = await context.Database.SqlQuery<Materials>("[app].[GET_Customers]").Select((mat, _sortField) => (_sortField, mat)).ToDictionary(x => x._sortField, x => x.mat);
or if you want async load use
var ingredientList = await context.Database.SqlQuery<Materials>("[app].[GET_Customers]").ToListAsync().Result.Select((mat, _sortField) => (_sortField, mat)).ToDictionary(x => x._sortField, x => x.mat);
full code
public async Task<SortedList<int, Materials>> GET_SortedMaterials(IProgress<Report> progress, string finishedGoodCode)
{
try
{
var report = new Report { Message = "Retrieving Sorted Materials", NewLine = true, StatusCode = Enums.StatusCode.Working };
progress.Report(report);
using (var context = new DBContext())
{
var ingredientList = await context.Database.SqlQuery<Materials>("[app].[GET_Customers]").ToListAsync().Result.Select((mat, _sortField) => (_sortField, mat)).ToDictionary(x => x._sortField, x => x.mat);
var sorted = new SortedList<int, Raw_Materials>();
foreach (var item in ingredientList.OrderBy(x => x.Key))
{
sorted.Add(item.Key, item.Value);
}
return sorted;
}
}
catch (Exception ex)
{
[EXCLUDED CODE]
}
}

Dynamo DB query not returning expected results when using nested collections

I have an AWS DynamoDB Table with the following structure:
I am trying to get back all the items that have at least one RequestItem with the Id 3401.
Here is what I've tried so far (c# code):
IAmazonDynamoDB client = new AmazonDynamoDBClient(
new BasicAWSCredentials(configuration["AccessKey"], configuration["SecretKey"]),
RegionEndpoint.USEast1);
var request = new ScanRequest
{
TableName = "dynamo-table-name",
ExpressionAttributeNames = new Dictionary<string, string>
{
{"#requestItems", "RequestItems"},
{"#requestId", "Id"}
},
ExpressionAttributeValues = new Dictionary<string, AttributeValue>
{
{":val", new AttributeValue {N = "3401"}}
},
FilterExpression = "contains(#requestItems.#requestId, :val)"
};
var response = await client.ScanAsync(request);
I did some variations on FilterExpression (using a simple "=" instead of "contains") but... I still don't get back the results. The query passes without errors, but the result it's an empty list.
However, the same code works for properties which are not collections (e.g. Contact.EmailAddress)
What am I missing?
[EDIT]
I tried another solution that was suggested:
var request = new ScanRequest
{
TableName = "dynamo-table-name",
ExpressionAttributeNames = new Dictionary<string, string>
{
{"#requestItems", "RequestItems"}
},
ExpressionAttributeValues = new Dictionary<string, AttributeValue>
{
{
":val",
new AttributeValue
{
L = new List<AttributeValue>
{
{
new AttributeValue
{
M = new Dictionary<string, AttributeValue>
{{"Id", new AttributeValue {N = "3401"}}}
}
}
}
}
}
},
FilterExpression = "contains(#requestItems, :val)"
};
var response = await client.ScanAsync(request);
but I still do not receive results.
You cannot really do the query you want with DynamoDB. The only thing you could do, if you know the maximum amount of items that could be in RequestItems is to chain together a lot of contains checks with OR: (RequestItems.0.Id = :val) OR (RequestItems.1.Id = :val) OR (RequestItems.2.Id = :val) .... That doesn't seem like a good idea though, unless you know in advance that RequestItems will always contain a certain, low, number of items.
contains does not work the way you want it to. If you do contains(path, <some number>), DynamoDB checks if the value found at path is a Set of Numbers and whether the value provided in <some number> is contained within that set.
I'm afraid your only option, given your data schema, is to fetch all the items and filter them in your code.
I apologise if this is not authoritative.
I suspect that DynamoDB cannot do that.
Moreover, the idea behind DynamoDB is that it should not do that.
DynamoDB does not support arbitrary function evaluation over data.
DynamoDB is a K-V (kinda), store, not a database. The "dynamo" way is to query all the rows (items) you may need and analyse the columns (keys) client-side. Note that it costs exactly same (for dynamo, small difference for traffic), because aws charges you for something like "database disk reads". And that it's just as cumbersome or easy, for example, you still have to deal with pagination.

How to avoid posting duplicates into elasticsearch using Nest .NET 6.x?

When data from a device goes into the elastic there are duplicates. I like to avoid this duplicates. I'm using a object of IElasticClient, .NET and NEST to put data.
I searched for a method like ElasticClient.SetDocumentId(), but cant find.
_doc doc = (_doc)obj;
HashObject hashObject = new HashObject { DataRecordId = doc.DataRecordId, TimeStamp = doc.Timestamp };
// hashId should be the document ID.
int hashId = hashObject.GetHashCode();
ElasticClient.IndexDocumentAsync(doc);
I would like to update the data set inside the Elastic instead of adding one more same object right now.
Assuming the following set up
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var settings = new ConnectionSettings(pool)
.DefaultIndex("example")
.DefaultTypeName("_doc");
var client = new ElasticClient(settings);
public class HashObject
{
public int DataRecordId { get; set; }
public DateTime TimeStamp { get; set; }
}
If you want to set the Id for a document explicitly on the request, you can do so with
Fluent syntax
var indexResponse = client.Index(new HashObject(), i => i.Id("your_id"));
Object initializer syntax
var indexRequest = new IndexRequest<HashObject>(new HashObject(), id: "your_id");
var indexResponse = client.Index(indexRequest);
both result in a request
PUT http://localhost:9200/example/_doc/your_id
{
"dataRecordId": 0,
"timeStamp": "0001-01-01T00:00:00"
}
As Rob pointed out in the question comments, NEST has a convention whereby it can infer the Id from the document itself, by looking for a property on the CLR POCO named Id. If it finds one, it will use that as the Id for the document. This does mean that an Id value ends up being stored in _source (and indexed, but you can disable this in the mappings), but it is useful because the Id value is automatically associated with the document and used when needed.
If HashObject is updated to have an Id value, now we can just do
Fluent syntax
var indexResponse = client.IndexDocument(new HashObject { Id = 1 });
Object initializer syntax
var indexRequest = new IndexRequest<HashObject>(new HashObject { Id = 1});
var indexResponse = client.Index(indexRequest);
which will send the request
PUT http://localhost:9200/example/_doc/1
{
"id": 1,
"dataRecordId": 0,
"timeStamp": "0001-01-01T00:00:00"
}
If your documents do not have an id field in the _source, you'll need to handle the _id values from the hits metadata from each hit yourself. For example
var searchResponse = client.Search<HashObject>(s => s
.MatchAll()
);
foreach (var hit in searchResponse.Hits)
{
var id = hit.Id;
var document = hit.Source;
// do something with them
}
Thank you very much Russ for this detailed and easy to understand description! :-)
The HashObject should be just a helper to get a unique ID from my real _doc object. Now I add a Id property to my _doc class and the rest I will show with my code below. I get now duplicates any more into the Elastic.
public void Create(object obj)
{
_doc doc = (_doc)obj;
string idAsString = doc.DataRecordId.ToString() + doc.Timestamp.ToString();
int hashId = idAsString.GetHashCode();
doc.Id = hashId;
ElasticClient.IndexDocumentAsync(doc);
}

How to work around NotMapped properties in queries?

I have method that looks like this:
private static IEnumerable<OrganizationViewModel> GetOrganizations()
{
var db = new GroveDbContext();
var results = db.Organizations.Select(org => new OrganizationViewModel
{
Id = org.OrgID,
Name = org.OrgName,
SiteCount = org.Sites.Count(),
DbSecureFileCount = 0,
DbFileCount = 0
});
return results;
}
This is returns results pretty promptly.
However, you'll notice the OrganizationViewModel has to properties which are getting set with "0". There are properties in the Organization model which I added via a partial class and decorated with [NotMapped]: UnsecureFileCount and SecureFileCount.
If I change those 0s to something useful...
DbSecureFileCount = org.SecureFileCount,
DbFileCount = org.UnsecureFileCount
... I get the "Only initializers, entity members, and entity navigation properties are supported" exception. I find this a little confusing because I don't feel I'm asking the database about them, I'm only setting properties of the view model.
However, since EF isn't listening to my argument I tried a different approach:
private static IEnumerable<OrganizationViewModel> GetOrganizations()
{
var db = new GroveDbContext();
var results = new List<OrganizationViewModel>();
foreach (var org in db.Organizations)
{
results.Add(new OrganizationViewModel
{
Id = org.OrgID,
Name = org.OrgName,
DbSecureFileCount = org.SecureFileCount,
DbFileCount = org.UnsecureFileCount,
SiteCount = org.Sites.Count()
});
}
return results;
}
Technically this gives me the correct results without an exception but it takes forever. (By "forever" I mean more than 60 seconds whereas the first version delivers results in under a second.)
Is there a way to optimize the second approach? Or is there a way to get the first approach to work?
Another option would be to load the values back as an anonymous type and the loop through those to load your viewmodel (n+1 is most likely the reason for the slowness).
For example:
var results = db.Organizations.Select(org => new
{
Id = org.OrgID,
Name = org.OrgName,
DbSecureFileCount = org.SecureFileCount,
DbFileCount = org.UnsecureFileCount,
SiteCount = org.Sites.Count()
}).ToList();
var viewmodels = results.Select( x=> new OrganizationViewModel
{
Id = x.Id,
Name = x.Name,
DbSecureFileCount = x.DbSecureFileCount,
DbFileCount = x.DbFileCount,
SiteCount = x.SiteCount
});
Sorry about the formatting; I'm typing on a phone.
You are basically lazy loading each object at each iteration of the loop, causing n+1 queries.
What you should do is bring in the entire collection into memory, and use it from there.
Sample code:
var organizationList = db.Organizations.Load();
foreach (var org in organizationList.Local)
{
//Here you are free to do whatever you want
}

Categories

Resources