This is a bit of difficult question to ask but any feedback at all is welcome.
Ill start by the background, I am a university student studying software engineering last year we covered c# and I got myself a job working in a software house coding prototype software in c# (their main language is c++ using QT) after producing the prototype it was given to some clients which have all passed back positive feedback.
Now I am looking at the app and thinking well I could use this as a showcase with my CV esp as the clients who used the software have said that they will sign something to reference it.
So if I am going to do that then I had better get it right and do it to the best I possibly can. so I have started to look at it and think where I can improve it and one of the ways in which I think that I can is the way it handles the database connections and the data along with it.
the app itself runs along side a MySQL server and there is 6 different schemas which it gets its data from.
I have written a class (called it databaseHandler) which has the mysqlconnection in it ( one question was about if the connection should remain open the whole time the app is running, or open it fire a query then close it etc) inside this class I have written a method which takes some arguments and creates its query string which it then does the whole mysqlDataReader = cmd.executeReader(), this then returns the reader back to where ever it was called from.
After speaking to a friend he mentioned it might be nice if the method returned the raw data and not the reader, therefore keeping all the database "stuff" away from the main app.
After playing around I managed to find a couple of tutorials on putting the reader data into arrays and arraylists and passing then back, also had a go at passing back an array list of hashtables - these methods obv mean that the dev must know the column names in order to find the correct data.
then I stumbled across a page which went on about creating a Class which had the attributes of the column names and created a list which you could pull your data from:
http://zensoftware.org/archives/248 is the link
so this made me think, in order to use this method, would I need to create 6 classes with the attributes of the columns of my tables ( a couple of tables has up to 10-15 columns)? or is there a better way for me to handle my data?
I am not really clued up on these things but if pointed in the right direction I am a very fast learner :)
Again I thank you for any input what so ever.
Vade
You have a lot of ideas that are very close but are pretty common problems, but good that you are actively thinking about how to handle them!
On the question about leaving the connection open for the whole program or only having it open during the actual query time. The common (and proper) way to do this is only have the connection open as much as you need it, so
MySqlConnection cn = new MySqlConnection(yourConnectionString);
//Execute your queries
cn.close();
This is better since you don't risk leaving open connections, or having transaction issues typing up databases and resources.
With the having just the data returned and not the actual datareader this is a good idea but by just returning the data as an ArrayList or whatever you are kind of losing the structure of the data a little.
A good way to do this would be to either have your class just take the datareader to populate it's data OR have the Data Layer just return an instance of your class after reading the data.
I believe that it would be an excellent approach if your data access class returned a custom class populated with data from the database. That would be object-oriented. Instead of, say, returning a DataSet or an array containing customer information, you would create a Customer class with properties. Then, when you retrieve the data from the database, you populate an instance of the Customer class with the data, and return it to the calling code.
A lot of the newer Microsoft technologies are focusing on making this task easier. Quite often, there are many more than 6 classes needed, and writing all that code can seem like drudgery. I would suggest that, if you are interested in learning about those newer approaches, and possibly adapting them to your own code, you can check out Linq to SQL and Entity Framework.
one question was about if the connection should remain open the whole
time the app is running, or open it fire a query then close it etc
You want to keep the connection open as little as possible. Therefore you should open on each data request and close it as soon as you are done. You should also dispose it but if your database stuff is inside a C# using statement that happens automatically.
As far as the larger question on how to return the data to your application you are on the right track. You typically want to hide the raw database from the rest of your application and mapping the raw data to other intermediate classes is the correct thing to do.
Now, how you do this mapping is a very large topic. Ideally you don't want to create classes that map one to one your tables/columns but rather provide your app a more app-friendly representation of the data (e.g. business objects rather than database tables.) For example, if your employee data is split in to or three tables for normalization purposes you can hide this complexity and present the information as a single Employee class that binds the data from the other tables together.
Abstracting away your data access code using objects is known as Object/Relational mapping. It's actually a much more complex task than it appears at first sight. There are several libraries, even in the framework itself, that already do very well what you're trying to implement.
If your needs are very simple, look into typed DataSets. They let you create the table classes in a designer and also generate objects that will do the loading and saving for you (given certain limitations)
If your needs are less simple, but still pretty simple, I recommend you take a look at Linq To SQL to see if it covers your needs, as it does table-class mapping in a very straightforward way and uses a more modern usage pattern than DataSets.
There are also more complex ORMs that allow you to define more complex mappings, like Entity Framework or nHibernate, but very often their complexity is not necessary.
Details like connection lifetime will then depend on your specific needs. Sometimes it's best to keep the connection open, if you have a lot of queries caused by user interaction, like is usually the case with a desktop app. Other times it's best to keep them as short as possible to avoid congestion, like the case of web apps.
Whichever technology you choose will likely end up guiding you onto a good set of practices for it, and the best you can do is try things out and see which works best for you.
Related
I have done a lot of searching and experimenting and have been unable to find a workable resolution to this problem.
Environment/Tools
Visual Studio 2013
C#
Three tier web application:
Database tier: SQL Server 2012
Middle tier: Entity Framework 6.* using Database First, Web API 2.*
Presentation tier: MVC 5 w/Razor, Bootstrap, jQuery, etc.
Background
I am building a web application for a client that requires a strict three-tier architecture. Specifically, the presentation layer must perform all data access through a web service. The presentation layer cannot access a database directly. The application allows a small group of paid staff members to manage people, waiting lists, and the resources they are waiting for. Based on the requirements the data model/database design is entirely centered around the people (User table).
Problem
When the presentation layer requests something, say a Resource, it is related to at least one User, which in turn is related to some other table, say Roles, which are related to many more Users, which are related to many more Roles and other things. The point being that, when I query for just about anything EF wants to bring in almost the entire database.
Normally this would be okay because of EF's default lazy-load behavior, but when serializing just about any object to JSON for returning to the presentation layer, the Newtonsoft.Json serializer hangs for a long time then blows a stack error.
What I Have Tried
Here is what I have attempted so far:
Set Newtonsoft's JSON serialier ReferenceLoopHandling setting to Ignore. No luck. This is not cyclic graph issue, it is just the sheer volume of data that gets brought in (there are over 20,000 Users).
Clear/reset unneeded collections and set reference properties to null. This showed some promise, but I could not get around Entity Framework's desire to track everything.
Just setting nav properties to null/clear causes those changes to be saved back to the database on the next .SaveChanges() (NOTE: This is an assumption here, but seemed pretty sound. If anyone knows different, please speak up).
Detaching the entities causes EF to automatically clear ALL collections and set ALL reference properties to null, whether I wanted it to or not.
Using .AsNotTracking() on everything threw some exception about not allowing non-tracked entities to have navigation properties (I don't recall the exact details).
Use AutoMapper to make copies of the object graph, only including related objects I specify. This approach is basically working, but in the process of (I believe) performing the auto-mapping, all of the navigation properties are accessed, causing EF to query and resolve them. In one case this leads to almost 300,000 database calls during a single request to the web service.
What I am Looking For
In short, has anyone had to tackle this problem before and come up with a working and performant solution?
Lacking that, any pointers for at least where to look for how to handle this would be greatly appreciated.
Additional Note: It occurred to me as I wrote this that I could possibly combine the second and third items above. In other words, set/clear nav properties, then automap the graph to new objects, then detach everything so it won't get saved (or perhaps wrap it in a transaction and roll it back at the end). However, if there is a more elegant solution I would rather use that.
Thanks,
Dave
It is true that doing what you are asking for is very difficult and it's an architectural trap I see a lot of projects get stuck in.
Even if this problem were solveable, you'd basically end up just having a data layer which just wraps the database and destroys performance because you can't leverage SQL properly.
Instead, consider building your data access service in such a way that it returns meaningful objects containing meaningful data; that is, only the data required to perform a specific task outlined in the requirements documentation. It is true that an post is related to an account, which has many achievements, etc, etc. But usually all I want is the text and the name of the poster. And I don't want it for one post. I want it for each post in a page. Instead, write data services and methods which do things which are relevant to your application.
To clarify, it's the difference between returning a Page object containing a list of Posts which contain only a poster name and message and returning entire EF objects containing large amounts of irrelevant data such as IDs, auditing data such as creation time.
Consider the Twitter API. If it were implemented as above, performance would be abysmal with the amount of traffic Twitter gets. And most of the information returned (costing CPU time, disk activity, DB connections as they're held open longer, network bandwidth) would be completely irrelevant to what developers want to do.
Instead, the API exposes what would be useful to a developer looking to make a Twitter app. Get me the posts by this user. Get me the bio for this user. This is probably implemented as very nicely tuned SQL queries for someone as big as Twitter, but for a smaller client, EF is great as long as you don't attempt to defeat its performance features.
This additionally makes testing much easier as the smaller, more relevant data objects are far easier to mock.
For three tier applications, especially if you are going to expose your entities "raw" in services, I would recommend that you disable Lazy Load and Proxy generation in EF. Your alternative would be to use DTO's instead of entities, so that the web services are returning a model object tailored to the service instead of the entity (as suggested by jameswilddev)
Either way will work, and has a variety of trade-offs.
If you are using EF in a multi-tier environment, I would highly recommend Julia Lerman's DbContext book (I have no affiliation): http://www.amazon.com/Programming-Entity-Framework-Julia-Lerman-ebook/dp/B007ECU7IC
There is a chapter in the book dedicated to working with DbContext in multi-tier environments (you will see the same recommendations about Lazy Load and Proxy). It also talks about how to manage inserts and updates in a multi-tier environment.
i had such a project which was the stressful one .... and also i needed to load large amount of data and process them from different angles and pass it to complex dashboard for charts and tables.
my optimization was :
1-instead of using ef to load data i called old-school stored procedure (and for more optimization grouping stuff to reduce table as much as possible for charts. eg query returns a table that multiple charts datasets can be extracted from it)
2-more important ,instead of Newtonsoft's JSON i used fastJSON which performance was mentionable( it is really fast but not compatible with complex object. simple example may be view models that have list of models inside and may so on and on or )
better to read pros and cons of fastJSON before
https://www.codeproject.com/Articles/159450/fastJSON
3-in relational database design who is The prime suspect of this problem it might be good to create those tables which have raw data to process in (most probably for analytics) denormalized schema which save performance on querying data.
also be ware of using model class from EF designer from database for reading or selecting data especially when u want serialize it(some times i think separating same schema model to two section of identical classes/models for writing and reading data in such a way that the write models has benefit of virtual collections came from foreign key and read models ignore it...i am not sure for this).
NOTE: in case of very very huge data its better go deeper and set up in-memory table OLTP for the certain table contains facts or raw data how ever in that case your table acts as none relational table like noSQL.
NOTE: for example in mssql you can use benefits of sqlCLR which let you write scripts in c#,vb..etc and call them by t-sql in other words handle data processing from database level.
4-for interactive view which needs load data i think its better to consider which information might be processed in server side and which ones can be handled by client side(some times its better to query data from client-side ... how ever you should consider that those data in client side can be accessed by user) how ever it is situation-wise.
5-in case of large raw data table in view using datatables.min.js is a good idea and also every one suggest using serverside-paging on tables.
6- in case of importing and exporting data from big files oledb is a best choice i think.
how ever still i doubt them to be exact solutions. if any body have practical solutions please mention it ;) .
I have fiddled with a similar problem using EF model first, and found the following solution satisfying for "One to Many" relations:
Include "Foreign key properties" in the sub-entities and use this for later look-up.
Define the get/set modifiers of any "Navigation Properties" (sub-collections) in your EF entity to private.
This will give you an object not exposing the sub-collections, and you will only get the main properties serialized. This workaround will require some restructuring of your LINQ queries, asking directly from your table of SubItems with the foreign key property as your filtering option like this:
var myFitnessClubs = context.FitnessClubs
?.Where(f => f.FitnessClubChainID == myFitnessClubChain.ID);
Note 1:
You may off-cause choose to implement this solution partly, hence only affecting the sub-collections that you strongly do not want to serialize.
Note 2:
For "Many to Many" relations, at least one of the entities needs to have a public representation of the collection. Since the relation cannot be retrieved using a single ID property.
I'm trying to open a view from a Lotus Notes database using the NotesSQL ODBC driver and the COM API. When I use Access to 'link to external data', using notesSQL over 380 'tables' are listed.
The following C# code was written with the COM API to list out the views: only 230 objects are returned. It appears that actual tables that contain the data are excluded from the Views list.
var _someDatabase = _lotesNotesSession.GetDatabase("MainServer/Server/Company/US", #"Groups\Location\someNotesDatabase", false);
var _views = _someDatabase.Views;
foreach (var view in _views)
Console.WriteLine(view.Name);
What is the proper way to connect to the tables that actually contain the data?
Revised based on your comments.
Okay... So you are not using NotesSQL. You are using the Notes COM API. If you want to get a list of all the Views in a Notes Database, you use the _someDatabase.Views property, as you have done. But to read data out of a Notes database, you need to read NotesDocument objects, which contain NotesItem objects.
There are a number of ways to read NotesDocument objects. E.g.,
Using the NotesDatabase.AllDocuments property which gives you a NotesDocumentCollection object containing all the NotesDocuments
Using the NotesDatabase.Search method, which gives you a NotesDocumentCollection object containing only the NotesDocument objects that meet a search criterion.
Looping through the collection of NotesDocuments associated with a particular view.
I'll briefly cover the last option, because it will underscore a point that I want to make.
Views contain NotesDocument objects in what we call their collection. The same is true of Folders. You've got the views (as an array of NotesView objects) from your call to _someDatabase.Views, so for any of those views, you can do something like this:
thisDoc = view.GetFirstDocument
while (thisDoc != null)
{
processNotesDocument(thisDoc)
thisDoc = view.GetNextDocument
}
And then you could write a processNotesDocument function that uses Notes COM API calls to read the data (i.e. the NotesItem objects) from the NotesDocument. You can look at the documentation for the NotesDocument class to figure out how to do that.
The thing is, before you can go and do that, you'd have to answer the question: which view (or views) do I want to do this in? Or you'd have to choose one of the other methods that I listed above for accessing the NotesDocument objects.
I don't mean to be disrespectful, but I think you're going to need to know a bit more about the data that you are dealing with, and about Lotus Notes database design, before you can address that. I'm infering that from the fact that your question doesn't show that you even know that the basic unit of information in Lotus Notes is called a "document", and Notes programming has a big API and it's just not something that one can usually navigate through without some idea of where you're going in the first place.
Here's a link that will take you to the reference info for the NotesDocument class. It should also bring up navigation for the documenation for all the other classes in the API. But what I think you really need is a tutorial on the basic concepts of Lotus Notes programming, and unfortunately all the good stuff is old and not oriented toward C#. The good news is that the classes and the concepts are all pretty much the same for the API in C# as they are in the LotusScript language that is native to Notes, and not too different from the classes and concepts in in the API for Java -- and even the fairly old stuff is still going to be good on the basics that you need. Here's a link to another StackOverflow question that can help point you in the direction of some material that can help you.
=================================================================================
Original answer from this point on...
First: Is the C# code running under the same Notes ID that Access is? If not, then security restrictions within the Notes database design might be responsible for the different results. Or at least partially responsible.
Second: I think you know this (because you used quotes around 'tables'), but there are no tables. The NotesSQL driver only makes it look like there are tables, and it uses both Views and Forms for that. A Notes database with 160 Forms would be a bit unusual, but not entirely unheard of. That might be what you are seeing. Bear in mind, though, that NotesSQL queries against Views will be more efficient than queries against Forms, becuase the former go against the indexes that are pre-built on the server, whereas the latter have to do a search through the entire database. If you need to query for fields that are not included in any existing View, then you can certainly use a query against a Form. You can get the list of Forms via _localDatabase.Forms. But the better way would be to use Domino Designer to add the Views that you need, and then do your NotesSQL query against the Views.
Third: But I'm a little confused by why you wrote the above code, because it's not using NotesSQL. It's using the Notes COM API (presumably through the interop DLL) since this is C#. But since you're doing that, then why bother with NotesSQL? You're already using an API that is much better suited for doing operations on Notes/Domino data, and might very well give you a way to get your data more efficiently than NotesSQL can -- without having to add any new Views to the database, though that could depend on your understanding of the existing Views, data relationships, etc.
Finally: Unrelated to your actual question, but just some friendly advice: you should really change the name of your variable _localDatabase. Anyone with Notes/Domino experience would likely interpret "local" to refer to a database that is being accessed without going through a Domino server -- which is what I thought at first. But with a closer look at your code, I see that you are actually opening a database on a server named MainServer/Server/Company/US.
I have been developing many application and have been into confusion about using dataset.
Till date i dont use dataset and works into my application directly from my database using queries and procedures that runs on Database Engine.
But I would like to know, what is the good practice
Using Dataset ?
or
Working direclty on Database.
Plz try to give me certain cases also when to use dataset along with operation (Insert/Update)
can we set read/write lock on dataset with respect to our database
You should either embrace stored procedures, or make your database dumb. That means that you have no logic whatsoever in your db, only CRUD operations. If you go with the dumb database model, Datasets are bad. You are better off working with real objects so you can add business logic to them. This approach is more complicated than just operating directly on your database with stored procs, but you can manage complexity better as your system grows. If you have large system with lots of little rules, stored procedures become very difficult to manage.
In ye olde times before MVC was a mere twinkle in Haack's eye, it was jolly handy to have DataSet handle sorting, multiple relations and caching and whatnot.
Us real developers didn't care about such trivia as locks on the database. No, we had conflict resolution strategies that generally just stamped all over the most recent edits. User friendliness? < Pshaw >.
But in these days of decent generic collections, a plethora of ORMs and an awareness of separation of concerns they really don't have much place any more. It would be fair to say that whenever I've seen a DataSet recently I've replaced it. And not missed it.
As a rule of thumb, I would put logic that refers to data consistency, integrity etc. as close to that data as possible - i.e. in the database. Also, if I am having to fetch my data in a way that is interdependent (i.e. fetch from tables A, B and C where the relationship between A, B and C's contribution is known at request time), then it makes sense to save on callout overhead and do it one go, via a database object such as a function, procedure (as already pointed out by OMGPonies). For logic that is a level or two removed, it makes sense to have it where dealing with it "procedurally" is a bit more intuitive, such as in a dataset. Having said all that, rules of thumb are sometimes what their acronym infers...ROT!
In past .Net projects I've often done data imports/transformations (e.g. for bank transaction data files) in the database (one callout, all logic is encapsulated in in procedure and is transaction protected), but have "parsed" items from that same data in a second stage, in my .net code using datatables and the like (although these days I would most likely skip the dataset stage and work on them from a higher lever of abstraction, using class objects).
I have seen datasets used in one application very well, but that is in 7 years development on quite a few different applications (at least double figures).
There are so many best practices around these days that point twords developing with Objects rather than datasets for enterprise development. Objects along with an ORM like NHibernate or Entity Framework can be very powerfull and take a lot of the grunt work out of creating CRUD stored procedures. This is the way I favour developing applications as I can seperate business logic nicely this way in a domain layer.
That is not to say that datasets don't have their place, I am sure in certain circumstances they may be a better fit than Objects but for me I would need to be very sure of this before going with them.
I have also been wondering this when I never needed DataSets in my source code for months.
Actually, if your objects are O/R-mapped, and use serialization and generics, you would never need DataSets.
But DataSet has a great use in generating reports.
This is because, reports have no specific structure that can be or should be O/R-mapped.
I only use DataSets in tandem with reporting tools.
Im having a really hard time figuring out how to specify a good search term for my problem: separation of gui and database interaction in visual studio 2008 using Linq to sql.
According to my teacher in a c# class im taking it's not correct to let the GUI be dependant on a specific way of getting data.
The way my project is currently set up is that we have a mssql database where everything is stored.
The solution is split into 4 seperate projects. UserGUI, AdminGUI, Logic and Db.
Now using linq to populate listboxes and similar things I use something like:
From the windows form in the project UserGUI:
//The activeReservationBindingSource has Db.ActiveReservation as it's value
private void refreshReservation() {
activeReservationBindingSource.DataSource = logic.getActiveReservationsQry();
}
To the Logic project:
public IQueryable getActiveReservationsQry() {
return dbOperations.getActiveReservationsQry(this.currentMemberId);
}
To the database project:
public IQueryable getActiveReservationsQry(int memberId) {
var qry =
from active in db.ActiveReservations
where active.memberId == memberId
orderby active.reservationId
select active;
return qry;
}
This makes sense to me seing as I can send items from listboxes all the way to the database project and there easily update/insert things into the mssql database. The problem is that it would be pretty hard to merge over from a mssql database to lets say an access version.
What should I be reading up on to understand how to do this correctly? Is creating my own classes with the same values as the ones visual studio generates for me when I create the dbml file a way to go? Should I then in the logic project populate for example List that I pass to the GUI? To me it seams like "double work" but perhaps it's the correct way to go?
Be adviced that we have not read anything about design patterns or business logic which seems to be a pretty big subject which im looking forward to exploring outside the frame of the course at a later time.
I was also thinking that IQueryable inherits from IEnumerable and perhaps that was the way to go but I have failed to find any information that made sense to me on how to actually accomplish this.
The GUI also knows about the datasources which I think is a bad thing but can't figure out how to get rid of.
Please understand that I tried to figure this out with my teacher for half an hour today at the only tutoring available for this project and then spent most of the day trying to find similar answers on google, SO and from classmates without any result.
There's a post here that I answered where the question was a bit similar to yours. I think it worth to take a look.
Regards
One keyword to read up on: Model-View-Controller. That's kind of the idea you're after. Your "View" is the GUI. The "Model" is the data layer, and the Controller is the code that takes data from the DB and hands it to the GUI (and vice-versa.)
Check out the repository pattern. There are several implementations you can find by googling "Linq repository."
you may want to check out the MVC Storefront series here: http://www.asp.net/learn/mvc-videos/#MVCStorefrontStarterKit. In the series, Rob Conery builds a Linq IQueryable repository that returns custom made objects instead of the linq objects.
May be you can take a look at Data Abstract
According to my teacher in a c# class im taking it's not correct to let the GUI be dependant on a specific way of getting data.
Honestly your teacher is an idiot. This is one of the stupidest statements, from a database perspective, I've ever read. Of course you want to be dependant on a specific way of getting data if that is the most performant way to get the data (which is almost always database specific and means not using LINQ to SQL for complex queries but that is another who issue). Users care about performance not database independence.
Very few real world applications truly need to be database independant. Yes a few kinds of box software sold commercially are (although I think this is usually a mistake and one reason why every commercial product I've ever used is badly designed and horribly slow).
And since every database implements SQl differntly, even using ANSII sql is not completely database independant. Access in particular is no where near close to the ANSII standard. There is no way to write code which will work correctly on every single possible database.
I am currently working on a web application that requires certain requests by users to be persisted. I have three choices:
Serialize each request object and store it as an xml text file.
Serialize the request object and store this xml text in a DB using CLOB.
Store the requests in separate tables in the DB.
In my opinion I would go for option 2 (storing the serialized objects' xml text in the DB). I would do this because it would be so much easier to read from 1 column and then deserialize the objects to do some processing on them. I am using c# and asp .net MVC to write this application. I am fairly new to software development and would appreciate any help I can get.
Short answer: If option 2 fits your needs well, use it. There's nothing wrong with storing your data in the database.
The answer for this really depends on the details. What kind of data are storing? How do you need to query it? How often will you need to query it?
Generally, I would say it's not a good idea to do both 1 and 2. The problem with option 2 is that you it will be much harder to query for specific fields. If you're going to do a LIKE query and have it search a really long string, it's going to be an expensive operation and you'll likely run into perf issues later on.
If you really want to stay away from having to write code to read multiple columns to load your data, look into using an ORM like Linq to SQL. That will help load database tables into objects for you.
I have designed a number of systems where storing 'some' object as serialized xml in the db has proven the better choice. I also learned lessons where storing objects in the db as xml ended up causing more headaches down the road. So I came up with some questions that you have to answer yes to in order to be comfortable in doing:
Does the object need to be portable?
Is the data in the object encapsulated i.e. not part of something else, and not made up of something else.
In the future can number 2 change?
In SQL you can always create a table view using XQuery, but I would only recommend you do this if a) its too late to change your mind b) you don't have that many objects to manage.
Serializing and storing objects in XML has some real benefits, especially for extensibilty and agile development.
If the number of this kind of objects is large and the size of it isn't very large. I think that using the database is a good idea.
Whether store it in a separate table or store it in the original table depends on how would you use this CLOB data with the original table.
Go with option 2 if you will always need the CLOB data when you access the original table.
Otherwise go with option 3 to improve performance.
You need to also think about security and n-tier architecture. Storing serialized data in a database means your data will be on another server, ideal if the data needs to be secure, but will alos give you network latency, whereas storing the data in the filesystem will give you quicker IO access, but very limited searching ability.
I have a situiation like this and I use the database. It also gets backed up properly with the rest of the related data.