how to load a hashtable from a simple xml file using xmltextreader - c#

using xmltextreader, how would I load a hashtable.
XML:
<base><user name="john">2342343</user><user name="mark">239099393</user></base>
This was asked before but it was using some funky linq that I am not fully comfortable with just yet.

Well, the LINQ to XML solution is really easy, so I suggest we try to make you comfortable with that instead of creating a more complex solution. Here's the code, with plenty of explanation...
// Load the whole document into memory, as an element
XElement root = XElement.Load(xmlReader);
// Get a sequence of users
IEnumerable<XElement> users = root.Elements("user");
// Convert this sequence to a dictionary...
Dictionary<string, string> userMap = users.ToDictionary(
element => element.Attribute("name").Value, // Key selector
element => element.Value); // Value selector
Of course you could do this all in one go - and I'd probably combine the second and third statements. But that's about as conceptually simple as it's likely to get. It would become more complicated if you wanted to put error handling around the possibility that a user element might not have a name, admittedly. (This code will throw a NullReferenceException in that case.)
Note that this assumes you want the name as the key and id as value. If you want the hashtable the other way round, just switch the order of the lambda expressions.

Related

Optimize XML Reading

I have an event that occurs over 50 000 times. Each time this event occurs, I need to go look into an xml file and grab some data. This xml file has over 20 000 possible values.
Before needing to read my xml, the 50 000 event's other work was very fast. As soon as I implemented the xml read, it slowed down the whole operation very significantly.
To read the XML, I am doing this:
XElement xelement = XElement.Load(SomePath);
IEnumerable<XElement> elements = xelement.Elements();
foreach (var element in elements)
{
if (element.Element("SomeNode") == value)
{
valueToSet = element.Element("SomeOtherNode").Value
break;
}
}
What can I do to optimize this?
It sounds to me like you should load the file once, preprocess it to a Dictionary<string, String>, and then just perform a lookup in that dictionary each time the event occurs. You haven't given us enough context to say exactly where you'd cache the dictionary, but that's unlikely to be particularly hard.
The loading and converting to a dictionary is really simple:
var dictionary = XElement.Load(SomePath)
.Elements()
.ToDictionary(x => x.Element("SomeNode").Value,
x => x.Element("SomeOtherNode").Value);
That's assuming that all of the secondary elements contains both a SomeNode element and a SomeOtherNode element... if that's not the case, you'd need to do some filtering first.
If you always need to load the (refreshed?) file each time, you may want to consider the XStreamingElement so that you don't have to load the entire document upfront (as XElement.load does). With XElement.Load, you load the entire XML into memory and then iterate over that. With the XStreaminigElement, you only bring the amount of the document you need into memory. In addition, to then get the first match, use .FirstOrDefault.
If you don't need to constantly refresh the document, consider a caching mechanism like the one #jonskeet offered.

What is the fastest way to search a List<T> across multiple properties?

I have a process I've inherited that I'm converting to C# from another language. Numerous steps in the process loop through what can be a lot of records (100K-200K) to do calculations. As part of those processes it generally does a lookup into another list to retrieve some values. I would normally move this kind of thing into a SQL statement (and we have where we've been able to) but in these cases there isn't really an easy way to do that. In some places we've attempted to convert the code to a stored procedure and decided it wasn't working nearly as well as we had hoped.
Effectively, the code does this:
var match = cost.Where(r => r.ryp.StartsWith(record.form.TrimEnd()) &&
r.year == record.year &&
r.period == record.period).FirstOrDefault();
cost is a local List type. If I was doing a search on only one field I'd probably just move this into a Dictionary. The records aren't always unique either.
Obviously, this is REALLY slow.
I ran across the open source library I4O which can build indexes, however it fails for me in various queries (and I don't really have the time to attempt to debug the source code). It also doesn't work with .StartsWith or .Contains (StartsWith is much more important since a lot of the original queries take advantage of the fact that doing a search for "A" would find a match in "ABC").
Are there any other projects (open source or commercial) that do this sort of thing?
EDIT:
I did some searching based on the feedback and found Power Collections which supports dictionaries that have keys that aren't unique.
I tested ToLookup() which worked great - it's still not quite as fast as the original code, but it's at least acceptable. It's down from 45 seconds to 3-4 seconds. I'll take a look at the Trie structure for the other look ups.
Thanks.
Looping through a list of 100K-200K items doesn't take very long. Finding matching items within the list by using nested loops (n^2) does take long. I infer this is what you're doing (since you have assignment to a local match variable).
If you want to quickly match items together, use .ToLookup.
var lookup = cost.ToLookup(r => new {r.year, r.period, form = r.ryp});
foreach(var group in lookup)
{
// do something with items in group.
}
Your startswith criteria is troublesome for key-based matching. One way to approach that problem is to ignore it when generating keys.
var lookup = cost.ToLookup(r => new {r.year, r.period });
var key = new {record.year, record.period};
string lookForThis = record.form.TrimEnd();
var match = lookup[key].FirstOrDefault(r => r.ryp.StartsWith(lookForThis))
Ideally, you would create the lookup once and reuse it for many queries. Even if you didn't... even if you created the lookup each time, it will still be faster than n^2.
Certainly you can do better than this. Let's start by considering that dictionaries are not useful only when you want to query one field; you can very easily have a dictionary where the key is an immutable value that aggregates many fields. So for this particular query, an immediate improvement would be to create a key type:
// should be immutable, GetHashCode and Equals should be implemented, etc etc
struct Key
{
public int year;
public int period;
}
and then package your data into an IDictionary<Key, ICollection<T>> or similar where T is the type of your current list. This way you can cut down heavily on the number of rows considered in each iteration.
The next step would be to use not an ICollection<T> as the value type but a trie (this looks promising), which is a data structure tailored to finding strings that have a specified prefix.
Finally, a free micro-optimization would be to take the TrimEnd out of the loop.
Now certainly all of this only applies to the specific example given and may need to be revisited due to other specifics of your situation, but in any case you should be able to extract practical gain from this or something similar.

Performance issue with Linq to object taking more time

I have an xml code which is getting updated based on the object value. The 'foreach' loop here is taking almost 12-15 minutes to fetch a 200 kb xml file. Please suggest how I can improve the performance. (the xml file consists of a four leveled tag in which the child (4th level) tags are each 10 in number)
Code:
IEnumerable<XElement> elements = xmlDoc.Descendants();
foreach (DataSource Data in DataLst)
{
XElement xmlElem = (from xmlData in elements
where Data.Name == xmlData.Name.LocalName //Name
&& Data.Store == xmlData.Element(XName.Get("Store", "")).Value
&& Data.Section == xmlData.Element(XName.Get("Section", "")).Value
select xmlData.Element(XName.Get("Val", ""))).Single();
xmlElem.ReplaceWith(new XElement(XName.Get("Val", ""), Data.Value));
}
It looks like you have an O(n)×O(m) issue here, for n = size of DataList and m = size of the xml. To make this O(n)+O(m), you should index the data; for example:
var lookup = elements.ToLookup(
x => new {
Name = x.Name.LocalName,
Store = x.Element(XName.Get("Store", "")).Value,
Section = x.Element(XName.Get("Section", "")).Value},
x => x.Element(XName.Get("Val", ""))
);
foreach (DataSource Data in DataLst)
{
XElement xmlElem = lookup[
new {Data.Name, Data.Store, Data.Section}].Single();
xmlElem.ReplaceWith(new XElement(XName.Get("Val", ""), Data.Value));
}
(untested - to show general approach only)
i think better approach would be to Deserialize XML to C# Classes and then use LINQ on that, should be fast.
"Well thanks everyone for your precious time and effort"
Problem answer: Actually the object 'DataLst' was of the type IEnumerable<> which was taking time in obtaining the values but after I changed it to the List<> type the performance improved drastically (now running in 20 seconds)
If it really takes this long to run this, then maybe do something like this:
Don't iterate both - only iterate the XML-File and load the Data from your DataLst (make a SQL-command or simple linq-statement to load the data based on Name/Store/Section), make a simple struct/class for your key with this data (Name/Store/Section) - don't forget to implement equals, and GetHashCode
iterate through your XML-Elements and use the dictionary to find the values to replace
This way you will only iterate the XML-File once not once for every data in your DataSource.
It's not clear why it's taking that long - that's a very long time. How many elements are in DataLst? I would rewrite the query for simplicity to start with though:
IEnumerable<XElement> elements = xmlDoc.Descendants();
foreach (DataSource data in DataLst)
{
XElement valElement = (from element in xmlDoc.Descendants(data.Name)
where data.Store == element.Element("Store").Value
&& data.Section == element.Element("Section").Value
select element.Element("Val")).Single();
valElement.ReplaceWith(new XElement("Val"), data.Value));
}
(I'm assuming none of your elements actually have namespaces, by the way.)
Next up: consider replacing the contents of valElement instead of replacing the element itself. Change it to:
valElement.ReplaceAll(data.Value);
Now, this has all been trying to keep to the simplicity of avoiding precomputation etc... because it sounds like it shouldn't take this long. However, you may need to build lookups as Marc and Carsten suggested.
Try by replacing Single() call in the LINQ with First().
At the risk of flaming, have you considered writing this in XQuery instead? There's a good chance that a decent XQuery processor would have a join optimiser that handles this query efficiently.

Converting Linq to XSLT

Is there a way to convert LINQ queries into XSLT? the same way LINQ can be converted to SQL?
I mean if i have a solid well defined XML(Conforms to an XSD) is there a way to compile the stuff under System.Linq.Expressions into XSLT in regard to that XML?
Thank you.
To Dimitries request I'll try to elaborate a little... Basically I have some data in one place (it's basically to chunks of xml serializable data), and I need to process them, I need to combine and process them.
Both the incoming original data , and the output result data, are XML serializable, and conform to a well defined XSD.
I want to generate the processing logic dynamically - someplace else. And allow my user to change and play around with the processing. I can represent the processing it self easily with Expression trees. Expression trees are similar to parse trees and can capture program code. This is the way linq to SQL works it converts expression trees to SQL queries.
Since all the income data and the output are both a well defined XML I can do the transformation easily with XSLT, but I'm not familiar with XSLT enough to write a dynamic XSLT generator. So I thought I could build the transformations in C# and convert them into XSLT.... again its not general purpose C#, but probably specific queries over a well defined data provider.
For the example sake:
(Not real code)
var schemas = XSchemaSet.Load("a","b");
var doc = XDocument.Load("File",schemas);
var result = from node in doc.Nodes
where node.Name == "Cats" || node.Name == "Dogs"
select (new Node(){Name = "Animal Owner", Value = node.Owner)
var newDoc = new XDocument().AddNodes(result);
newDoc.Validate(schemas);
Basically I want something that will function like that... I can write that in a single linq query if I use IQueryable.Aggregate
Yes, you can implement your own query provider, which uses XSLT internally, if you can figure out how to query with XSLT.
Why not just use Linq2XML?
It operates on XDocument and XElement
or alternatively if you want people to define transforms using xlst why not just execute an xslt against the document in code?
I dont see how having a Linq2XSLT would help solve a problem without recreating it in a different form

Setting attributes in an XML document

I'm writing one of my first C# programs. Here's what I'm trying to do:
Open an XML document
Navigate to a part of the XML tree and select all child elements of type <myType>
For each <myType> element, change an attribute (so <myType id="oldValue"> would become <myType id="newValue">
Write this modified XML document to a file.
I found the XmlDocument.SelectNodes method, which takes an XPath expression as its argument. However, it returns an XmlNodeList. I read a little bit about the difference between an XML node and an XML element, and this seems to explain why there is no XmlNode.SetAttribute method. But is there a way I can use my XPath expression to retrieve a list of XmlElement objects, so that I can loop through this list and set the id attributes for each?
(If there's some other easier way, please do let me know.)
Simply - it doesn't know if you are reading an element or attribute. Quite possibly, all you need is a cast here:
foreach(XmlElement el in doc.SelectNodes(...)) {
el.SetAttribute(...);
}
The SelectNodes returns an XmlNodeList, but the above treats each as an XmlElement.
I am a big fan of System.Xml.Linq.XDocument and the features it provides.
XDocument xDoc = XDocument.Load("FILENAME.xml");
// assuming you types is the parent and mytype is a bunch of nodes underneath
IEnumerable<XElement> elements = xdoc.Element("types").Elements("myType");
foreach (XElement type in elements)
{
// option 1
type.Attribute("id").Value = NEWVALUE;
// option 2
type.SetAttributeValue("id", NEWVALUE);
}
Option 1 or 2 works but I prefer 2 because if the attribute doesn't exist this'll create it.
I'm sitting at my Mac so no .NET for me...
However, I think that you can cast an XmlNode to an XmlElement via an explicit cast.
You should be able to cast the XmlElement to an XmlNode then and get it's children Nodes using something like XmlNode.ChildNodes.

Categories

Resources