DynamoDB .NET SDK Get only Specific Properties - c#

I have an application that I'm trying to store data in Amazon DynamoDB and I'm trying to figure out the best way to structure the tables. A quick description of the app:
It needs to be able to load a large number of elements from the DB based on a search of a small number of properties and display those limited properties to the user. Then the user can browse and select a few elements that they want to look closer at and it needs to show the rest of the properties for those items.
My thought is that basically for speed and memory purposes it needs to load a 'summarized' version of the objects for the initial step, then load the full object when the user asks to look into something fully. I can do this easily (and have done so) in my c# code. However here is what I'm wondering:
If I have a c# object and I use the Dynamo Object persistence SDK to relate say 5 properties to a Dynamo table that has say 30 properties; will the SDK request only the properties that are on the object? Or will it request all of them and then throw out the 25 that aren't related to the object?
If it only takes the needed properties then I think I can store everything in one table and relate both the summarized objects and the full objects to the same table and just pull the properties needed. If it takes everything then I'm worried it will create a lot of throughput that I don't need 75% of, plus slowing down the transfer due to the extra data. If that's the case I think it may be worth creating a GSI that just has the summarized properties...
Anyway sorry for the long description, any input from those more familiar with DynamoDB than I am would be appreciated :)

Related

How to build dynamic integration adapter?

We have a scenario where we have multiple sources of data coming in from various external systems through API calls, SQL tables and physical files, and we now have to map it against a number of transaction templates. I want to build an integration adapter and UI where I can choose any entity data class and map it's fields to a class or action that will be used to create a transaction in our financial system.
I want to have an object type or class that can be modified dynamically, setup links between these objects and possibly create a set of rules that defines the interaction between these objects. I have seen some versions of this types of software that uses a drag and drop type of UI interface to do the mappings, so that will be the ideal end goal.
I'm coming from a C# .Net background, so I need some advise or tips on where to start and what to look at.
I am currently doing something similar. I wrote some code to turn data from our legacy system into JSON objects written out to flat files (1 file per data record in a table), and then wrote some code to cleanup that data and push it into Acumatica via the REST API.
I like flat file json objects because they can easily be hashed, and the hashes used to compare them to new data that comes in. Only data where the hash has changed needs to be merged or overwritten and then pushed into the target system. The file names are usually primary key values from whatever table you're working with. Our legacy system has a hierarchical (non-tabular/SQL-like data structure), so my mileage may be greater than doing with this with a well-normalized SQL database.
There are also products like Alteryx that are built for doing data pipelines the way you have proposed.
I would caution to be practical in the building of these types of things. For us, for example, we have a limited set of data that needs to come over, so we don't need the perfect abstraction for every data type. We inevitably had to do cleanup on legacy/3rd party data as well, and those problems aren't always so easy to abstract. I had previously built a system using closures for function passing in order to write custom cleanup routines for any abstract data problem I might encounter (sort of sounds like what you're talking about), but realized in the end that just writing simpler code that deals with specific data problems was cleaner and simpler to maintain....in the end, there is probably only a limited amount of data that needs to be synched.

String search in Access Database

I am new to this site. I have a question about data structures. Here's the project:
I have a MS Access Database, with approx 50 tables. Each table has an index field (sequential auto number) and 10-12 Memo type fields. These fields can contain small or large amounts of text. In all, the DB contains between 20,000 - 40,000 individual strings (Memo field entries).
I am looking for a way to search for a string in all of these tables (using C# / ASP.NET). I do not have a lot of exposure to either Access, C# or ASP..but..I am thinking that there may be a data structure that might be more suitable (than any other) that might work. I am conscious that reading that amount of data into any data structure would be a memory hog, which is why I am asking the question. So the question relates specifically to suitable data structures (arrays, linked lists, etc) that might be the most appropriate. I will try to figure the rest out later..
Thanks..
What you're looking for is to connect to your database first in order to get the data necessary at first hand. You can look it up there to help yourself : http://msdn.microsoft.com/en-us/library/bb655884(v=vs.90).aspx
Then, you'll then be able to select the needed data, for instance the strings, with your database with SqlDataSource. In order to get more information is this situation, look there : http://msdn.microsoft.com/en-us/library/w1kdt8w2(v=vs.90).aspx
Finally, if you were to connect to your database and put the data in a data structure like a list or a an arraylist. Don't put it in a set because you cannot repeat data in a set, if you have more than once the same string, you'll end up with false(missing) data.
Since this is quite important to know, I would strongly refer you to go to msdn and look up this : An Extensive Examination of Data Structures Using C# 2.0. It will provide you with better knowledge of data structure so you'll know next time what you need
Hope it helps you !

Serializing complex EF model over JSON

I have done a lot of searching and experimenting and have been unable to find a workable resolution to this problem.
Environment/Tools
Visual Studio 2013
C#
Three tier web application:
Database tier: SQL Server 2012
Middle tier: Entity Framework 6.* using Database First, Web API 2.*
Presentation tier: MVC 5 w/Razor, Bootstrap, jQuery, etc.
Background
I am building a web application for a client that requires a strict three-tier architecture. Specifically, the presentation layer must perform all data access through a web service. The presentation layer cannot access a database directly. The application allows a small group of paid staff members to manage people, waiting lists, and the resources they are waiting for. Based on the requirements the data model/database design is entirely centered around the people (User table).
Problem
When the presentation layer requests something, say a Resource, it is related to at least one User, which in turn is related to some other table, say Roles, which are related to many more Users, which are related to many more Roles and other things. The point being that, when I query for just about anything EF wants to bring in almost the entire database.
Normally this would be okay because of EF's default lazy-load behavior, but when serializing just about any object to JSON for returning to the presentation layer, the Newtonsoft.Json serializer hangs for a long time then blows a stack error.
What I Have Tried
Here is what I have attempted so far:
Set Newtonsoft's JSON serialier ReferenceLoopHandling setting to Ignore. No luck. This is not cyclic graph issue, it is just the sheer volume of data that gets brought in (there are over 20,000 Users).
Clear/reset unneeded collections and set reference properties to null. This showed some promise, but I could not get around Entity Framework's desire to track everything.
Just setting nav properties to null/clear causes those changes to be saved back to the database on the next .SaveChanges() (NOTE: This is an assumption here, but seemed pretty sound. If anyone knows different, please speak up).
Detaching the entities causes EF to automatically clear ALL collections and set ALL reference properties to null, whether I wanted it to or not.
Using .AsNotTracking() on everything threw some exception about not allowing non-tracked entities to have navigation properties (I don't recall the exact details).
Use AutoMapper to make copies of the object graph, only including related objects I specify. This approach is basically working, but in the process of (I believe) performing the auto-mapping, all of the navigation properties are accessed, causing EF to query and resolve them. In one case this leads to almost 300,000 database calls during a single request to the web service.
What I am Looking For
In short, has anyone had to tackle this problem before and come up with a working and performant solution?
Lacking that, any pointers for at least where to look for how to handle this would be greatly appreciated.
Additional Note: It occurred to me as I wrote this that I could possibly combine the second and third items above. In other words, set/clear nav properties, then automap the graph to new objects, then detach everything so it won't get saved (or perhaps wrap it in a transaction and roll it back at the end). However, if there is a more elegant solution I would rather use that.
Thanks,
Dave
It is true that doing what you are asking for is very difficult and it's an architectural trap I see a lot of projects get stuck in.
Even if this problem were solveable, you'd basically end up just having a data layer which just wraps the database and destroys performance because you can't leverage SQL properly.
Instead, consider building your data access service in such a way that it returns meaningful objects containing meaningful data; that is, only the data required to perform a specific task outlined in the requirements documentation. It is true that an post is related to an account, which has many achievements, etc, etc. But usually all I want is the text and the name of the poster. And I don't want it for one post. I want it for each post in a page. Instead, write data services and methods which do things which are relevant to your application.
To clarify, it's the difference between returning a Page object containing a list of Posts which contain only a poster name and message and returning entire EF objects containing large amounts of irrelevant data such as IDs, auditing data such as creation time.
Consider the Twitter API. If it were implemented as above, performance would be abysmal with the amount of traffic Twitter gets. And most of the information returned (costing CPU time, disk activity, DB connections as they're held open longer, network bandwidth) would be completely irrelevant to what developers want to do.
Instead, the API exposes what would be useful to a developer looking to make a Twitter app. Get me the posts by this user. Get me the bio for this user. This is probably implemented as very nicely tuned SQL queries for someone as big as Twitter, but for a smaller client, EF is great as long as you don't attempt to defeat its performance features.
This additionally makes testing much easier as the smaller, more relevant data objects are far easier to mock.
For three tier applications, especially if you are going to expose your entities "raw" in services, I would recommend that you disable Lazy Load and Proxy generation in EF. Your alternative would be to use DTO's instead of entities, so that the web services are returning a model object tailored to the service instead of the entity (as suggested by jameswilddev)
Either way will work, and has a variety of trade-offs.
If you are using EF in a multi-tier environment, I would highly recommend Julia Lerman's DbContext book (I have no affiliation): http://www.amazon.com/Programming-Entity-Framework-Julia-Lerman-ebook/dp/B007ECU7IC
There is a chapter in the book dedicated to working with DbContext in multi-tier environments (you will see the same recommendations about Lazy Load and Proxy). It also talks about how to manage inserts and updates in a multi-tier environment.
i had such a project which was the stressful one .... and also i needed to load large amount of data and process them from different angles and pass it to complex dashboard for charts and tables.
my optimization was :
1-instead of using ef to load data i called old-school stored procedure (and for more optimization grouping stuff to reduce table as much as possible for charts. eg query returns a table that multiple charts datasets can be extracted from it)
2-more important ,instead of Newtonsoft's JSON i used fastJSON which performance was mentionable( it is really fast but not compatible with complex object. simple example may be view models that have list of models inside and may so on and on or )
better to read pros and cons of fastJSON before
https://www.codeproject.com/Articles/159450/fastJSON
3-in relational database design who is The prime suspect of this problem it might be good to create those tables which have raw data to process in (most probably for analytics) denormalized schema which save performance on querying data.
also be ware of using model class from EF designer from database for reading or selecting data especially when u want serialize it(some times i think separating same schema model to two section of identical classes/models for writing and reading data in such a way that the write models has benefit of virtual collections came from foreign key and read models ignore it...i am not sure for this).
NOTE: in case of very very huge data its better go deeper and set up in-memory table OLTP for the certain table contains facts or raw data how ever in that case your table acts as none relational table like noSQL.
NOTE: for example in mssql you can use benefits of sqlCLR which let you write scripts in c#,vb..etc and call them by t-sql in other words handle data processing from database level.
4-for interactive view which needs load data i think its better to consider which information might be processed in server side and which ones can be handled by client side(some times its better to query data from client-side ... how ever you should consider that those data in client side can be accessed by user) how ever it is situation-wise.
5-in case of large raw data table in view using datatables.min.js is a good idea and also every one suggest using serverside-paging on tables.
6- in case of importing and exporting data from big files oledb is a best choice i think.
how ever still i doubt them to be exact solutions. if any body have practical solutions please mention it ;) .
I have fiddled with a similar problem using EF model first, and found the following solution satisfying for "One to Many" relations:
Include "Foreign key properties" in the sub-entities and use this for later look-up.
Define the get/set modifiers of any "Navigation Properties" (sub-collections) in your EF entity to private.
This will give you an object not exposing the sub-collections, and you will only get the main properties serialized. This workaround will require some restructuring of your LINQ queries, asking directly from your table of SubItems with the foreign key property as your filtering option like this:
var myFitnessClubs = context.FitnessClubs
?.Where(f => f.FitnessClubChainID == myFitnessClubChain.ID);
Note 1:
You may off-cause choose to implement this solution partly, hence only affecting the sub-collections that you strongly do not want to serialize.
Note 2:
For "Many to Many" relations, at least one of the entities needs to have a public representation of the collection. Since the relation cannot be retrieved using a single ID property.

Designing app to load, edit, and save hierarchical data

I am writing a GUI that will be integrated with SAP Business One. I'm having a difficult time determine how to load, edit, and save the data in the best way possible (reliable, fast, easy).
I have two tables which are called Structures and StructureRows (not great names). A structure can contain other structures, generics, and specifics. The structures rows hold all of these items and have a type associated with them. The generics are placeholders for specifics and the specifics are an actual item in inventory.
A job will contain job metadata as well as n structures. On the screen where you edit the job, you can add structures and delete structures as well as edit the rows underneath them. For example, if you added Structure 1 to Job 1 and Structure 1 contains Generic 1, the user would be able to swap Generic 1 for a Specific.
I understand how to store the data, but I don't know what the best method to load and save the data is...
I see a few different options:
When someone adds a structure to a job, load the structure, and then recursively load any structures beneath it (the generics and specifics will already be loaded). I would put this all into an Object Model such as List and each Structure object would have List and List. When I save the changes back to the database, I would have to manually loop through the data and persist the changes.
Somehow load the data into a view in SQL and then group and order the datatable/dataset on the client side. Bind the data to a GridView so changes are automatically reflected in the dataset. When you go to save, SQL / ADO.NET could handle this automatically? This seems like the ideal solution, but I don't know how to actually implement it...
The part that throws me off is being able to add a structure to a structure. If it wasn't for this, I would select the Specifics and Generics from the StructureRows table, and group them in the GUI based on the Structure they belong to. I would have them in a DataTable and bind that to the GridView so any changes were persisted automatically to the DataTable and then I could turn around and push them to SQL very easily...
Is loading and saving the data manually via an object model the only option I have? If not, how would you do it? I'm not sure if I'm just making it more complicated then it needs to be or if this is actually difficult to do with C#, ADO.NET and MS SQL.
The HierarchyID datatype was introduced in SQLServer 2008 to handle this kind of thing. Haven't done it myself, but here's a place to start that gives a fair example of how to use it.
That being said, if you aren't wedded to your current tables, and don't need to query out the individual elements (in other words, you are always dealing the job as a whole), I'd be tempted to store the data for each job as XML. (If you were doing a web-based app, you could also go with JSON.) It preserves the hierarchy and there are many tools in .NET for working with XML. There's also a built-in TreeView class for winForms, and doubtless other third party controls available.

Best (free) way to store data? How about updates to the file system?

I have an idea for how to solve this problem, but I wanted to know if there's something easier and more extensible to my problem.
The program I'm working on has two basic forms of data: images, and the information associated with those images. The information associated with the images has been previously stored in a JET database of extreme simplicity (four tables) which turned out to be both slow and incomplete in the stored fields. We're moving to a new implementation of data storage. Given the simplicity of the data structures involved, I was thinking that a database was overkill.
Each image will have information of it's own (capture parameters), will be part of a group of images which are interrelated (taken in the same thirty minute period, say), and then part of a larger group altogether (taken of the same person). Right now, I'm storing people in a dictionary with a unique identifier. Each person then has a List of the different groups of pictures, and each picture group has a List of pictures. All of these classes are serializable, and I'm just serializing and deserializing the dictionary. Fairly straightforward stuff. Images are stored separately, so that the dictionary doesn't become astronomical in size.
The problem is: what happens when I need to add new information fields? Is there an easy way to setup these data structures to account for potential future revisions? In the past, the way I'd handle this in C was to create a serializable struct with lots of empty bytes (at least a k) for future extensibility, with one of the bytes in the struct indicating the version. Then, when the program read the struct, it would know which deserialization to use based on a massive switch statement (and old versions could read new data, because extraneous data would just go into fields which are ignored).
Does such a scheme exist in C#? Like, if I have a class that's a group of String and Int objects, and then I add another String object to the struct, how can I deserialize an object from disk, and then add the string to it? Do I need to resign myself to having multiple versions of the data classes, and a factory which takes a deserialization stream and handles deserialization based on some version information stored in a base class? Or is a class like Dictionary ideal for storing this kind of information, as it will deserialize all the fields on disk automatically, and if there are new fields added in, I can just catch exceptions and substitute in blank Strings and Ints for those values?
If I go with the dictionary approach, is there a speed hit associated with file read/writes as well as parameter retrieval times? I figure that if there's just fields in a class, then field retrieval is instant, but in a dictionary, there's some small overhead associated with that class.
Thanks!
Sqlite is what you want. It's a fast, embeddable, single-file database that has bindings to most languages.
With regards to extensibility, you can store your models with default attributes, and then have a separate table for attribute extensions for future changes.
A year or two down the road, if the code is still in use, you'll be happy that 1)Other developers won't have to learn a customized code structure to maintain the code, 2) You can export, view, modify the data with standard database tools (there's an ODBC driver for sqlite files and various query tools), and 3) you'll be able to scale up to a database with minimal code changes.
Just a wee word of warning, SQLLite, Protocol Buffers, mmap et al...all very good but you should prototype and test each implementation and make sure that your not going to hit the same perf issues or different bottlenecks.
Simplicity may be just to upsize to SQL (Express) (you'll may be surprised at the perf gain) and fix whatever's missing from the present database design. Then if perf is still an issue start investigating these other technologies.
My brain is fried at the moment, so I'm not sure I can advise for or against a database, but if you're looking for version-agnostic serialization, you'd be a fool to not at least check into Protocol Buffers.
Here's a quick list of implementations I know about for C#/.NET:
protobuf-net
Proto#
jskeet's dotnet-protobufs
There's a database schema, for which I can't remember the name, that can handle this sort of situation. You basically have two tables. One table stores the variable name, and the other stores the variable value. If you want to group the variables, then add a third table that will have a one to many relationship with the variable name table. This setup has the advantage of letting you keep adding different variables without having to keep changing your database schema. Saved my bacon quite a few times when dealing with departments that change their mind frequently (like Marketing).
The only drawback is that the variable value table will need to store the actual value as a string column (varchar or nvarchar actually). Then you have to deal with the hassle of converting the values back to their native representations. I currently maintain something like this. The variable table currently has around 800 million rows. It's still fairly fast, as I can still retrieve certain variations of values in under one second.
I'm no C# programmer but I like the mmap() call and saw there is a project doing such a thing for C#.
See Mmap
Structured files are very performing if tailored for a specific application but are difficult to manage and an hardly reusable code resource. A better solution is a virtual memory-like implementation.
Up to 4 gigabyte of information can be managed.
Space can be optimized to real data size.
All the data can be viewed as a single array and accessed with read/write operations.
No needing to structure to store but just use and store.
Can be cached.
Is highly reusable.
So go with sqllite for the following reasons:
1. You don't need to read/write the entire database from disk every time
2. Much easier to add to even if you don't leave enough placeholders at the beginning
3. Easier to search based on anything you want
4. easier to change data in ways beyond the application was designed
Problems with Dictionary approach
1. Unless you made a smart dictionary you need to read/write the entire database every time (unless you carefully design the data structure it will be very hard to maintain backwards compatibility)
----- a) if you did not leave enough place holders bye bye
2. It appears as if you'd have to linear search through all the photos in order to search on one of the Capture Attributes
3. Can a picture be in more than one group? Can a picture be under more than one person? Can two people be in the same group? With dictionaries these things can get hairy....
With a database table, if you get a new attribute you can just say Alter Table Picture Add Attribute DataType. Then as long as you don't make a rule saying the attribute has to have a value, you can still load and save older versions. At the same time the newer versions can use the new attributes.
Also you don't need to save the picture in the database. You could just store the path to the picture in the database. Then when the app needs the picture, just load it from a disk file. This keeps the database size smaller. Also the extra seek time to get the disk file will most likely be insignificant compared to the time to load the image.
Probably your table should be
Picture(PictureID, GroupID?, File Path, Capture Parameter 1, Capture Parameter 2, etc..)
If you want more flexibility you could make a table
CaptureParameter(PictureID, ParameterName, ParameterValue) ... I would advise against this because it is a lot less efficient than just putting them in one table (not to mention the queries to retrieve/search the Capture Parameters would be more complicated).
Person(PersonID, Any Person Attributes like Name/Etc.)
Group(GroupID, Group Name, PersonID?)
PersonGroup?(PersonID, GroupID)
PictureGroup?(GroupID, PictureID)

Categories

Resources