I am creating a generic XML to SQL Server data driver (In c#), that will take as input an XML file, and produce one or more data tables containing information.
So far I have an input XML file and an XSLT, the XSLT creates a new XML containing only the information needed from the xml.
My Problem lies in not knowing how to define a mapping from the XML elements to the columns in certain tables.
For example, say I have this extract of XML:
<Bug name = "MillenniumBug">
<Severity value = "1" />
</Bug>
I would like to create two tables, a bugs table and a severity table, and I need the Bug name in the Bug table, and the severity value in the severity table.
A point in the right direction as to how I can specify this mapping would be really appreciated.
Thanks
You have already lined up your ruleset when you set up elements as tables and attributes as columnns in the table. Setting up the mapping, and even "table creation logic" is fairly easy using those rules. The harder part with your "driver" is going to be determination of types (is this number a byte, an int or a long?) and possibly creating an inferred schema to validate the input. Determining nullability of columns will also be an issue you have to examine.
Without better understanding the business driver for this project, I can't give you much more than that.
Related
I have a string that is written out like an XML file. An example would look like this:
string = <Employees><EmployeeId>1</EmployeeId>< ... ></Employees>
I am saving this in a table because I wanted to audit changes, but I didn't want to have multiple tables for different audits. This is because it would record changes to things other than employees. So using an XML style string in the database seemed like a good suggestion.
Now to the real business. I want to check to make sure that there were actually changes to the employee because one could go into the edit page, change nothing, and click save. As of right now, the data would write to the DB and just clutter it up with non-changed data.
I'd like to be able to check if the XML styled string that is going to be saved is on the database, so if <employees><employeeid>###</employeeid> == "changes" and then see if the whole string equals the other. Basically, check the employeeId first because that won't change, and then check the string as a whole to see if there is any difference. I would have just checked the first n numbers, but the id number could have a length of 1 to 3.
Also, because it is styled as XML, is there an easy way to convert it to read it like an XML file and check that way?
Storing arbitrary data in a column is a form of denormalization. You can't really do much with it at a database level. However, SQL Server does have an XML column type. Entity Framework doesn't support mapping to/from an XML column, so it will simply treat your XML as a standard string. With this column type, though, you can write actual SQL queries against your XML using XPath expressions.
Your best bet, then, is to type your column as XML, and then write a stored procedure that performs the query you need. You can then utilize this stored procedure with Entity Framework.
For more information on the XML column type see: https://msdn.microsoft.com/en-us/library/ms190798(SQL.90).aspx
I'm new to both UniData and Uniobjects so if I ask something that obvious I apologize.
I'm trying to write a tool that will let me export contacts from our ERP (Manage2000) that runs on UniData (v. 6.1) and can then import them into AD/Exchange.
The primary issue I'm having is that I don't know which fields (columns?) in the table (file?) are for what. I know that that there is a dictionary that has this information in it but I'm not sure how to get what I want out of it.
I found that there is a command LIST.METADATA in the current UniData documentation from Rocket but it seems that either the version of UniData that we are using is so old that it doesn't have this command in it or it was removed from the VOC file for some unknown reason.
Does anyone know how or have any tips to pull out the structure of a table so that I can know which fields are for what data?
Thanks in advance!
At TCL:
LIST DICT contact.master
Please note that the database file name (EX: contact.master) is case sensitive. I don't have a UniData instance at the moment to provide an example output. However, it should be similar to Universe's output:
Field......... Type & Field........ Conversion.. Column......... Output Depth &
Name.......... Field. Definition... Code........ Heading........ Format Assoc..
Number
AMOUNT.WEBB A 1 MR22 Amt WEBB 10R M
PANDAS.COST A 3 MD2Z Pandass Cost 10R M
CREDIT.EXP.DT A 6 D4/ Cred Exp Date 10R M
For the example above, you can generally tell the "data type" of the field by looking at the conversion code. "D4/" is the conversion code for a date. "MD2Z" is a numeric conversion code, which we can guess is for monetary amounts. I'm glossing over the power of conversion codes, so please make sure to reference Rocket's documentation for these codes to truly understand what these fields would output. If you don't have the documentation in front of you, you can also reference this site:
http://www.koretech.com/kr_help/KU2/30/en/KMK_Prog_Conversions.htm
If you wanted to use UniObjects and C# to retrieve the field names in a file, you could use the following code:
UniCommand fieldSelectCommand = activeSession.CreateUniCommand();
fieldSelectCommand.Command = "SELECT DICT contact.master";
fieldSelectCommand.Execute();
UniSelectList resultList = activeSession.CreateUniSelectList(0);
String[] allFieldNames = resultList.ReadListAsStringArray();
Having answered your question, I would also like to make a recommendation that you check out Rocket's U2 Toolkit for .NET if you're mostly going to be selecting data from the database instead of reading and manipulating individual records:
http://www.rocketsoftware.com/products/rocket-u2-toolkit-net
Not only does it present an ADO.NET way of accessing the database, it also has a better performance version of the UniObjects library under the U2.Data.Client.UO namespace.
The Dictionary, in my opinion, is a recommendation of how the schema should behave. However, there are cases when it's not 100% accurate. You could run "LIST CONTACT.MASTER TOXML TO MYFILE.XML" which would create an xml file what you could parse.
See https://u2devzone.rocketsoftware.com/accelerate/articles/u2-xml/u2-xml#section-0 for more information.
I am working on optimizing some code I have been assigned from a previous employee's code base. Beyond the fact that the code is pretty well "spaghettified" I did run into an issue where I'm not sure how to optimize properly.
The below snippet is not an exact replication, but should detail the question fairly well.
He is taking one DataTable from an Excel spreasheet and placing rows into a consistantly formatted DataTable which later updates the database. This seems logical to me, however, the way he is copying data seems convoluted, and is a royal pain to modify, maintain or add new formats.
Here is what I'm seeing:
private void VendorFormatOne()
{
//dtSumbit is declared with it's column schema elsewhere
for (int i = 0; i < dtFromExcelFile.Rows.Count; i++)
{
dtSubmit.Rows.Add(i);
dtSubmit.Rows[i]["reference_no"] = dtFromExcelFile.Rows[i]["VENDOR REF"];
dtSubmit.Rows[i]["customer_name"] = dtFromExcelFile.Rows[i]["END USER ID"];
//etc etc etc
}
}
To me this is completely overkill for mapping columns to a different schema, but I can't think of a way to do this more gracefully. In the actual solution, there are about 20 of these methods, all using different formats from dtFromExcelFile and the column list is much longer. The column schema of dtSubmit remains the same across the board.
I am looking for a way to avoid having to manually map these columns every time the company needs to load a new file from a vendor. Is there a way to do this more efficiently? I'm sure I'm overlooking something here, but did not find any relevant answers on SO or elsewhere.
This might be overkill, but you could define an XML file that describes which Excel column maps to which database field, then input that along with each new Excel file. You'd want to whip up a class or two for parsing and consuming that file, and perhaps another class for validating the Excel file against the XML file.
Depending on the size of your organization, this may give you the added bonus of being able to offload that tedious mapping to someone less skilled. However, it is quite a bit of setup work and if this happens only sparingly, you might not get a significant return on investment for creating so much infrastructure.
Alternatively, if you're using MS SQL Server, this is basically what SSIS is built for, though in my experience, most programmers find SSIS quite tedious.
I had originally intended this just as a comment but ran out of space. It's in reply to Micah's answer and your first comment therein.
The biggest problem here is the amount of XML mapping would equal that of the manual mapping in code
Consider building a small tool that, given an Excel file with two
columns, produces the XML mapping file. Now you can offload the
mapping work to the vendor, or an intern, or indeed anyone who has a
copy of the requirement doc for a particular vendor project.
Since the file is then loaded at runtime in your import app or
whatever, you can change the mappings without having to redeploy the
app.
Having used exactly this kind of system many, many times in the past,
I can tell you this: you will be very glad you took the time to do
it - especially the first time you get a call right after deployment
along the lines of "oops, we need to add a new column to the data
we've given you, and we realised that we've misspelled the 19th
column by the way."
About the only thing that can perhaps go wrong is data type
conversions, but you can build that into the mapping file (type
from/to) and generalise your import routine to perform the
conversions for you.
Just my 2c.
A while ago I ran into similar problem where I had over 400 columns from 30 odd tables to be mapped to about 60 in the actual table in the database. I had the same dilemma whether to go with a schema or write something custom.
There was so much duplication that I ended up writing a simple helper class with a couple of overridden methods that basically took in a column name from import table and spit out the database column name. Also, for column names, I built a separate class of the format
public static class ColumnName
{
public const string FirstName = "FirstName";
public const string LastName = "LastName";
...
}
Same thing goes for TableNames as well.
This made it much simpler to maintain table names and column names. Also, this handled duplicate columns across different tables really well avoiding duplicate code.
I have a requirement to generate an XML file. This is easy-peasy in C#. The problem (aside from slow database query [separate problem]) is that the output file reaches 2GB easily. On top of that, the output XML is not in a format that can easily be done in SQL. Each parent element aggregates elements in its children and maintains a sequential unique identifier that spans the file.
Example:
<level1Element>
<recordIdentifier>1</recordIdentifier>
<aggregateOfLevel2Children>11</aggregateOfL2Children>
<level2Children>
<level2Element>
<recordIdentifier>2</recordIdentifier>
<aggregateOfLevel3Children>92929</aggregateOfLevel3Children>
<level3Children>
<level3Element>
<recordIdentifier>3</recordIdentifier>
<level3Data>a</level3Data>
</level3Element>
<level3Element>
<recordIdentifier>4</recordIdentifier>
<level3Data>b</level3Data>
</level3Element>
</level3Children>
</level2Element>
<level2Element>
<recordIdentifier>5</recordIdentifier>
<aggregateOfLevel3Children>92929</aggregateOfLevel3Children>
<level3Children>
<level3Element>
<recordIdentifier>6</recordIdentifier>
<level3Data>h</level3Data>
</level3Element>
<level3Element>
<recordIdentifier>7</recordIdentifier>
<level3Data>e</level3Data>
</level3Element>
</level3Children>
</level2Element>
</level2Children>
</level1Element>
The schema in use actually goes up five levels. For the sake of brevity, I'm including only 3. I do not control this schema, nor can I request changes to it.
It's a simple, even trivial matter to aggregate all of this data in objects and serialize out to XML based on this schema. But when dealing with such large amounts of data, out of memory exceptions occur while using this strategy.
The strategy that is working for me is this: I'm populating a collection of entities through an ObjectContext that hits a view in a SQL Server database (a most ineffectively indexed database at that). I'm grouping this collection then iterating through, then grouping the next level then iterating through that until I get to the highest level element. I then organize the data into objects that reflect the schema (effectively just mapping) and setting the sequential recordIdentifier (I've considered doing this in SQL, but the amount of nested joins or CTEs would be ridiculous considering that the identifier spans the header elements into the child elements). I write a higher level element (say the level2Element) with its children to the output file. Once I'm done writing at this level, I move to the parent group and insert the header with the aggregated data and its identifier.
Does anyone have any thoughts concerning a better way output such a large XML file?
As far as I understand your question, your problem is not with the limited space of storage i.e HDD. You have difficulty to maintain a large XDocument object in memory i.e RAM. To deal with this you can ignore make such a huge object. For each recovrdIdentifier element you can call .ToString() and get a string. Now, simply append this strings to a file. Put declaration and root tag in this file and you're done.
I'm creating a data-entry application where users are allowed to create the entry schema.
My first version of this just created a single table per entry schema with each entry spanning a single or multiple columns (for complex types) with the appropriate data type. This allowed for "fast" querying (on small datasets as I didn't index all columns) and simple synchronization where the data-entry was distributed on several databases.
I'm not quite happy with this solution though; the only positive thing is the simplicity...
I can only store a fixed number of columns. I need to create indexes on all columns. I need to recreate the table on schema changes.
Some of my key design criterias are:
Very fast querying (Using a simple domain specific query language)
Writes doesn't have to be fast
Many concurrent users
Schemas will change often
Schemas might contain many thousand columns
The data-entries might be distributed and needs syncronization.
Preferable MySQL and SQLite - Databases like DB2 and Oracle is out of the question.
Using .Net/Mono
I've been thinking of a couple of possible designs, but none of them seems like a good choice.
Solution 1: Union like table containing a Type column and one nullable column per type.
This avoids joins, but will definitly use a lot of space.
Solution 2: Key/value store. All values are stored as string and converted when needed.
Also use a lot of space, and of course, I hate having to convert everything to string.
Solution 3: Use an xml database or store values as xml.
Without any experience I would think this is quite slow (at least for the relational model unless there is some very good xpath support).
I also would like to avoid an xml database as other parts of the application fits better as a relational model, and being able to join the data is helpful.
I cannot help to think that someone has solved (some of) this already, but I'm unable to find anything. Not quite sure what to search for either...
I know market research is doing something like this for their questionnaires, but there are few open source implementations, and the ones I've found doesn't quite fit the bill.
PSPP has much of the logic I'm thinking of; primitive column types, many columns, many rows, fast querying and merging. Too bad it doesn't work against a database.. And of course... I don't need 99% of the provided functionality, but a lot of stuff not included.
I'm not sure this is the right place to ask such a design related question, but I hope someone here has some tips, know of any existing work, or can point me to a better place to ask such a question.
Thanks in advance!
Have you already considered the most trivial solution: having one table for each of your datatypes and storing the schema of your dataset in the database as well. Most simple solution:
DATASET Table (Virtual "table")
ID - primary key
Name - Name for the dataset/table
COLUMNSCHEMA Table (specifies the columns for one "dataset")
DATASETID - int (reference to Dataset-table)
COLID - smallint (unique # of the column)
Name - varchar
DataType - ("varchar", "int", whatever)
Row Table
DATASETID
ID - Unique id for the "row"
ColumnData Table (one for each datatype)
ROWID - int (reference to Row-table)
COLID - smallint
DATA - (varchar/int/whatever)
To query a dataset (a virtual table), you must then dynamically construct a SQL statement using the schema information in COLUMNSCHEMA table.