I've been working on a VB application which parses multiple XML files, and create an Excel file from them.
The main problem of this is that I am, simply, reading each line of each XML and outputs them to the Excel file when a specific node is found. I would like to know if exists any method to store the data from each element, just to use it once everything (all the XML files) have been parsed.
I was thinking about databases but I think this is excessive and unnecesary. Maybe you can give me some ideas in order to make it working.
System.Data.DataSet can be used as an "in memory database".
You can use a DataSet to store information in memory - a DataSet can contain multiple DataTables and you can add columns to those at runtime, even if there are already rows in the DataTable. So even if you don't know the XML node names ahead of time, you can add them as columns as they appear.
You can also use DataViews to filter the data inside the DataSet.
My typical way of pre-parsing XML is to create a two-column DataTable with the XPATH address of each node and its value. You can then do a second pass that matches XPATH addresses to your objects/dataset.
Related
Currently, we generate a shipping manifest file using VBA. I was able to take care of the grunt work of harmonizing, and getting the dataset I need, converted to C#, my last step is to generate the XML file in C# is as well.
The way it works currently, VBA opens a recordset called tblXMLcodes which has 4 fields, Tag, Carrier, Section, and orderct.. Tag has each tag I need, carrier is always the same, section is either Item, or Package as I have two data sets (a package query and an item query) Then the orderct field is just numbered 1-16 for each section.
Then another record set is opened from the item query AND the package query and it gets looped through appending to a .txt file..
How can I easily take my two datasets (both queries) and loop through them to generate an XML file..
Are there any NuGet packages?
Any help is appreciated.. Currently my code works so I wont post it, but if someone wants to see the current I will gladly share. It is VERY slow....
follow this code
DataSet ds = new DataSet();
ds.Tables.Add(dt1); // Table 1
ds.Tables.Add(dt2); // Table 2...
...
string dsXml= ds.GetXml();
...
using (StreamWriter fs = new StreamWriter(xmlFile)) // XML File Path
{
ds.WriteXml(fs);
}
My app will build an item list and grab the necessary data (ex: prices, customer item codes) from an excel file.
This reference excel file has 650 lines and 7 columns.
App will read rows of 10-12 items in one run-time.
Would it be wiser to read line item by line item?
Or should I first read all line item in the excel file into a list/array and make the search from there?
Thank you
It's good to start by designing the classes that best represent the data regardless of where it comes from. Pretend that there is no Excel, SQL, etc.
If your data is always going to be relatively small (650 rows) then I would just read the whole thing into whatever data structure you create (your own classes.) Then you can query those for whatever data you want, like
var itemsIWant = allMyData.Where(item => item.Value == "something");
The reason is that it enables you to separate the query (selecting individual items) from the storage (whatever file or source the data comes from.) If you replace Excel with something else you won't have to rewrite other code. If you read it line by line then the code that selects items based on criteria is mingled with your Excel-reading code.
Keeping things separate enables you to more easily test parts of your code in isolation. You can confirm that one component correctly reads what's in Excel and converts it to your data. You can confirm that another component correctly executes a query to return the data you want (and it doesn't care where that data came from.)
With regard to optimization - you're going to be opening the file from disk and no matter what you'll have to read every row. That's where all the overhead is. Whether you read the whole thing at once and then query or check each row one at a time won't be a significant factor.
I have a string that is written out like an XML file. An example would look like this:
string = <Employees><EmployeeId>1</EmployeeId>< ... ></Employees>
I am saving this in a table because I wanted to audit changes, but I didn't want to have multiple tables for different audits. This is because it would record changes to things other than employees. So using an XML style string in the database seemed like a good suggestion.
Now to the real business. I want to check to make sure that there were actually changes to the employee because one could go into the edit page, change nothing, and click save. As of right now, the data would write to the DB and just clutter it up with non-changed data.
I'd like to be able to check if the XML styled string that is going to be saved is on the database, so if <employees><employeeid>###</employeeid> == "changes" and then see if the whole string equals the other. Basically, check the employeeId first because that won't change, and then check the string as a whole to see if there is any difference. I would have just checked the first n numbers, but the id number could have a length of 1 to 3.
Also, because it is styled as XML, is there an easy way to convert it to read it like an XML file and check that way?
Storing arbitrary data in a column is a form of denormalization. You can't really do much with it at a database level. However, SQL Server does have an XML column type. Entity Framework doesn't support mapping to/from an XML column, so it will simply treat your XML as a standard string. With this column type, though, you can write actual SQL queries against your XML using XPath expressions.
Your best bet, then, is to type your column as XML, and then write a stored procedure that performs the query you need. You can then utilize this stored procedure with Entity Framework.
For more information on the XML column type see: https://msdn.microsoft.com/en-us/library/ms190798(SQL.90).aspx
I am working with a vendor supplied application that stores XML data as byte array in a SQL database table. I have found if the XML data is "too long" (meaning by a possible predetermined length in black box code provided by the vendor) the XML is truncated and a second record, containing the remainder of the XML data, is created.
My task is to take these "linked" records and merge them into one valid XML string. These linked records can be broken off anywhere, in the middle of an element, node, etc. There is no rhyme or reason to where the XML string is broken.
Taking the invalid XML data and loading it into an XElement causes an error "Tag has no closing tag".
I've also tried using an XmlReader and reading through each Node, based on this article as well as this msdn article. They also result in the above missing tag error.
Is there a way to take these partial xml strings and merge them? Or am I simply stuck?
The vendor application we use does perform this merge, but that code is hidden from me.
Thank you
There are things that I dont like in DataTable.WriteXML() and .ReadXML(). These are:
Not possible to configure number of rows or columns to load/save.
Need it when processing huge data amounts
Not even possible to get
rid of the XML element that means dataset name (DocumentElement by
default)
Want to make custom processing on read/write nodes
Is there any good serializer for DataRows and DataTables? I can write my own but it seems weird that they still don;t exist (i tried to find).
Thanks!