problem with huge data - c#

I have WCF service which reads data from xml. Data in xml is being changed every 1 minute.
This xml is very big, it has about 16k records. Parsing this takes about 7 sec. so its definitely to long.
Now it works in that way:
ASP.NET call WCF
WCF parse xml
ASP.NET is waiting for WCF callback
WCF gives back data to ASP.NET
of course there is caching for 1 minute but after it WCF must load data again.
Is there any possibility to make something that will refresh data without stopping site? Something like ... I don't know, double buffering? that will retrieve old data if there is none of new? Maybe you know better solution?
best regards
EDIT:
the statement which takes the longest time:
XDocument = XDocument.Load(XmlReader.Create(uri)); //takes 7 sec.
parse takes 70 ms, its okey, but this is not the problem. Is there a better solution to dont block the website? :)
EDIT2:
Ok I have found a better solution. Simply, I download xml to the hdd and Im read data from it. Then the other proccess starts download new version of xml and replace the old. Thx for engagement.

You seems to have XML to Object tool that creates an object model from the XML.
What usually takes most of the time is not the parsing but creating all these objects to represent the data.
So You might want to extract only part of the XML data which will be faster for you and not systematically create a big object tree for extracting only part of it.
You could use XPath to extract the pieces you need from the XML file for example.
I have used in the past a nice XML parsing tool that focuses on performances. It is called vtd-xml (see http://vtd-xml.sourceforge.net/).
It supports XPath and other XML Tech.
There is a C# version. I have used the Java version but I am sure that the C# version has the same qualities.
LINQ to XML is also a nice tool and it might do the trick for you.

It all depends on your database design. If you designed database in a way you can recognize which data is already queried then for each new query return only a records difference from last query time till current time.
Maybe you could add rowstamp for each record and update it on each add/edit/delete action, then you can easily achieve logic from the beginning of this answer.
Also, if you don't want first call to take long (when initial data has to be collected) think about storing that data locally.
Use something else then XML (like JSON). If you have big XML overhead, try to replace long element names with something shorter (like single char element names).
Take a look at this:
What is the easiest way to add compression to WCF in Silverlight?
Create JSON from C# using JSON Library

If you take a few stackshots, it might tell you that the biggest "bottleneck" is not parsing, but data structure allocation, initialization, and subsequent garbage collection. If so, a way around it is to have a pool of pre-allocated row objects and re-use them.
Also, if each item is appended to the list, you might find it spending a large fraction of time doing the append. It might be faster to simply push each new row on the front, and then reverse the whole list at the end.
(But don't implement these things unless you prove they are problems by stackshots. Until then, they are just guesses.)
It's been my experience that the real cost of XML is not the parsing, but the data structure manipulation.

Related

Is it better to have SQL Server parse a large multi document XML or send it each document separately

I need to request the XML from PubMed like
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=27087788,28322247,26158412&retmode=xml
The example has 3 IDs but the request can be as much as 200 at a time. The request is being done by a .NET web service. I am looking for the most efficient way to process the XML files. I know that the the term "best" or "efficient" is very subjective and dependent upon many things but:
Is it better to send the entire string to the SQL Server database (if it is even possible because of length or possible nesting levels) and let it parse the document and save it to the database or is it better to parse the document in the web service using a XMLTextReader or XML Document Object and send each document? Each document needs to be saved as a separate record.
Thanks for your information.
My first thought was: Why SQL-Server? Why send all this data around? Do the parsing in C#!
But - on the second sight: If I understand this correctly, you want to read many different XML files and store them in your database.
Now I'd rather ask: When do you need to retrieve data from these XMLs and do you need to store extracted data in relational tables? Would it be a possible approach for you to store alle these XMLs as-is in XML typed columns and read them on demand?
You can pass your XML as C#-string (which is unicode) and insert this directly into an XML-typed column. To avoid any hassel you should cut away the first lines (<xml>declaration and DOCTYPE) and start with <PubmedArticleSet>.
The rest should be easily transfered and stored in SQL-Server.
If you need help on how to read this? Just come back with another more pointed question.
About your Which is faster question you might read this.

Notification Of Changed XML Values

I guess I'll just flat-out explain my situation. I have a desktop application that reads and writes to an XML file. This same XML file is read and written by an ASP site. Both sides need to be notified when a value changes. This is relatively trivial on the desktop app side as I just re-read the XML and apply the values, however it gets more complicated on the web side.
The website needs to immediately get the updated information from the XML. The problem is I can't figure out a proper way to store these values and in turn handle notification of updated/changed/new/deleted values. Sending the entire XML file is out of the question.
Getting the data to the page isn't the question, I have that all wired up. The question is how should I be storing this data in order to be able to handle incremental updates and also be notified of changed values?
I have an extremely clunky solution and I absolutely hate it. I was hoping someone could maybe point me in the right direction as to what type of container to store this data in; I'm relatively inexperienced in this area of C#/ASP.
Thanks for taking the time to read my novel.
What do you mean by the website getting the update "immediately"? Code would normally only execute there on the next request (whenever that comes). So on every request you could (but do not!) just read the latest copy of the file.
In general architecture terms, if you need 2 components / applications co-ordinating like this - a message queue is the natural abstraction. Right now you're treating the XML file like shared memory for doing interprocess communication.
If it's going to remain a kludge like solution - i suggest replacing the XML with a DB table. Easier to poll, co-ordinate and receive updates.
Well, I kinda found a solution.
I keep a copy of the "current" XML as an XmlSerializer with a proper Type. When I read the new XML, I also read it in as a XmlSerializer but to compare them, I just JSON each and do a standard string compare.
This works for me because my XML structure never changes, just the values inside so I'm able to check to see which "sections" have changed by comparing their serialized JSON.
Kinda weird but it works for me.

Best way to store small amount of data

I'm new to windows app and I would like to know what the best way to save a small amount of data, like 1 value a day.
I'm going for the text file because it's easy, but I know i could use MS Access.
Do you have other option ? Faster or better ?
Since you are already considering using a MS Access database, I would recommend using SQLite. Here's a quote from their site (SQLite Home Page):
SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.
It is really very easy to use - no installations required, you simply need to reference a DLL.
If you need to read it then use a plain text file.
If you need to read the values back into the application then serialize to an XML or binary file by making your user data serializable possibly by having a List of values in your object.
How do you want to use the data? Do you just want to look at it once in awhile? Do you plan to analyze it in a spreadsheet? Etc. Based upon what you say so far, I would just use a text file, one value per line. Even if later you wanted to do more with it, it's easy to import into spreadsheets, etc. If the daily data is a little more complicated (maybe a couple of different values for things each day), you might consider something like YAML.
Why stray from the path? XML gives you the ability to expand on it later without having to rethink everything.
Its mainly dependent upon the complexity of the data that you want to store. If its just DateTime some other simple built in type you would be able to recreate that object as a strongly typed one easily. But in case if its more complicated I would suggest you to create a serializable class (link on how to create such class is here) and then use one of Binary or SOAP serializations based on the size, security and other such needs. I am suggesting this as it would be best to be able to recreate objects as strongly typed ones from a flat file rather than just trying to parse what's there in the flat file.
Please let me know in case you need more clarity.
Thanks,
Sai Pavan

Is it a good idea to store serialized objects in a Database instead of multiple xml text files?

I am currently working on a web application that requires certain requests by users to be persisted. I have three choices:
Serialize each request object and store it as an xml text file.
Serialize the request object and store this xml text in a DB using CLOB.
Store the requests in separate tables in the DB.
In my opinion I would go for option 2 (storing the serialized objects' xml text in the DB). I would do this because it would be so much easier to read from 1 column and then deserialize the objects to do some processing on them. I am using c# and asp .net MVC to write this application. I am fairly new to software development and would appreciate any help I can get.
Short answer: If option 2 fits your needs well, use it. There's nothing wrong with storing your data in the database.
The answer for this really depends on the details. What kind of data are storing? How do you need to query it? How often will you need to query it?
Generally, I would say it's not a good idea to do both 1 and 2. The problem with option 2 is that you it will be much harder to query for specific fields. If you're going to do a LIKE query and have it search a really long string, it's going to be an expensive operation and you'll likely run into perf issues later on.
If you really want to stay away from having to write code to read multiple columns to load your data, look into using an ORM like Linq to SQL. That will help load database tables into objects for you.
I have designed a number of systems where storing 'some' object as serialized xml in the db has proven the better choice. I also learned lessons where storing objects in the db as xml ended up causing more headaches down the road. So I came up with some questions that you have to answer yes to in order to be comfortable in doing:
Does the object need to be portable?
Is the data in the object encapsulated i.e. not part of something else, and not made up of something else.
In the future can number 2 change?
In SQL you can always create a table view using XQuery, but I would only recommend you do this if a) its too late to change your mind b) you don't have that many objects to manage.
Serializing and storing objects in XML has some real benefits, especially for extensibilty and agile development.
If the number of this kind of objects is large and the size of it isn't very large. I think that using the database is a good idea.
Whether store it in a separate table or store it in the original table depends on how would you use this CLOB data with the original table.
Go with option 2 if you will always need the CLOB data when you access the original table.
Otherwise go with option 3 to improve performance.
You need to also think about security and n-tier architecture. Storing serialized data in a database means your data will be on another server, ideal if the data needs to be secure, but will alos give you network latency, whereas storing the data in the filesystem will give you quicker IO access, but very limited searching ability.
I have a situiation like this and I use the database. It also gets backed up properly with the rest of the related data.

Best way to store urls locally

I am creating an RSS reader as a hobby project, and at the point where the user is adding his own URL's.
I was thinking of two things.
A plaintext file where each url is a single line
SQLite where i can have unique ID's and descriptions following the URL
Is the SQLite idea to much of an overhead or is there a better way to do things like this?
What about as an OPML file? It's XML, so if you needed to store more data then the OPML specification supplies, you can always add your own namespace.
Additionally, importing and exporting from other RSS readers is all done via OPML. Often there is library support for it. If you're interested in having users switch then you have to support OPML. Thansk to jamesh for bringing that point up.
Why not XML?
If you're dealing with RSS anyway you mayaswell :)
Do you plan just to store URLs? Or you plan to add data like last_fetch_time or so?
If it's just a simple URL list that your program will read line-by-line and download data, store it in a file or even better in some serialized object written to a file.
If you plan to extend it, add comments/time of last fetch, etc, I'd go for SQLite, it's not that much overhead.
If it's a single user application that only has one instance, SQLite might be overkill.
You've got a few options as I see it:
SQLite / Database layer. Increases the dependencies your code needs to run. But allows concurrent access
Roll your own text parser. Complexity increases as you want to save more data and you're re-inventing the wheel. Less dependency and initially, while your data is simple, it's trivial for a novice user of your application to edit.
Use XML. It's well formed & defined and text editable. Could be overkill for storing just a URL though.
Use something like pickle to serialize your objects and save them to disk. Changes to your data structure means "upgrading" the pickle files. Not very intuitive to edit for a novice user, but extremely easy to implement.
I'd go with the XML text file option. You can use the XSD tool built into Visual Studio to create a DataTable out of the XML data, and it easily serializes back into the file when needed.
The other caveat is that I'm sure you're going to want the end user to be able to categorize their RSS feeds and be able to potentially search/sort them, and having that kind of datatable style will help with this.
You'll get easy file storage and access, the benefit of a "database" structure, but not quite the overhead of SQLite.

Categories

Resources