Does the XElement Parent property wrap a weak or a strong reference?
My code currently uses XmlElement, which holds a strong reference (ParentNode), and I'm considering the benefits of replacing it with XDocument.
Thanks.
You won't save any memory be switching from XmlDocument to XDocument. All references are strong. If you have a refernce to any element, you force the whole document to remain in memory.
The choice between XmlDocument and XDocument is about horrible vs. nice API, not about memory.
If you need to work with only small part of the original XML, and must discard the rest, consider making a clone of the elements you are interested in.
See - http://msdn.microsoft.com/en-us/library/bb297950(v=vs.110).aspx
public XElement(XElement other)
This constructor makes a deep copy of an element.
Related
I'm attempting to build a tree, where each node can have an unspecified amount of children nodes. The tree is to have over a million nodes in practice.
I've managed to contruct the tree, however I'm experiencing memory errors due to a full heap when I fill the tree with a few thousand nodes. The reason for this is because I'm attempting to store each node's children in a Dictionary data structure (or any data structure for that matter). Thus, at run-time I've got thousands of such data structures being created since each node can have an unspecified amount of children, and each node's children are to be stored in this data structure.
Is there another way of doing this? I cannot simply use a variable to store a reference of the children, as there can be an unspecified amount of children for each node. THus, it is not like a binary tree where I could have 2 variables keeping track of the left child and right child respectively.
Please no suggestions for another method of doing this. I've got my reasons for needing to create this tree, and unfortunately I cannot do otherwise.
Thanks!
How many of your nodes will be "leaf" nodes? Perhaps only create the data structure to store children when you first have a child, otherwise keeping a null reference.
Unless you need to look up the children as a map, I'd use a List<T> (initialized with an appropriate capacity) instead of a Dictionary<,> for the children. It sounds like you may have more requirements than you've explained though, which makes it hard to say.
I'm surprised you're failing after only a few thousand nodes though - you should be able to create a pretty large number of objects before having problems.
I'd also suggest that if you think you'll end up using a lot of memory, make sure you're on a 64-bit machine and make sure your application itself is set to be 64-bit. (That may just be a thin wrapper over a class library, which is fine so long as the class library is set to be 64-bit or AnyCPU.)
I have a function that is very small, but is called so many times that my profiler marks it as time consuming. It is the following one:
private static XmlElement SerializeElement(XmlDocument doc, String nodeName, String nodeValue)
{
XmlElement newElement = doc.CreateElement(nodeName);
newElement.InnerXml = nodeValue;
return newElement;
}
The second line (where it enters the nodeValue) is the one takes some time.
The thing is, I don't think it can be optimized code-wise, I'm still open to suggestions on that part though.
However, I remember reading or hearing somewhere that you could tell the compiler to flag this function, so that it is loaded in memory when the program starts and it runs faster.
Is this just my imagination or such a flag exists?
Thanks,
FB.
There are ways you can cause it to be jitted early, but it's not the jit time that's hurting you here.
If you're having performance problems related to Xml serialization, you might consider using XmlWriter rather than XmlDocument, which is fairly heavy. Also, most automatic serialization systems (including the built-in .NET XML Serialization) will emit code dynamically to perform the serialization, which can then be cached and re-used. Most of this has to do with avoiding the overhead of reflection, however, rather than the overhead of the actual XML writing/parsing.
I dont think this can be solved using any kind of catching or inlining. And I believe its your imagination. Mainly the part about performance. What you have in mind is pre-JIT-ing your code. This technique will remove the wait time for JITer when your function is first called. But this is only first time this function is called. It has no performance effect for subsequent calls.
As documentation states, setting InnterXml parses set string as XML. And parsing XML string can be expensive operation, especialy if set xml in string format is complex. And documentation even has this line:
InnerXml is not an efficient way to modify the DOM. There may be performance issues when replacing complex nodes. It is more efficient to construct nodes and use methods such as InsertBefore, InsertAfter, AppendChild, and RemoveChild to modify the Xml document.
So, if you are creating complex XML structure this way it would be wise to do it by hand.
Ok, I have a set of very large, identical, trees cached in memory (to be populated with non-identical data [they contain information about stuff inside each node]).
I want to copy a single instance of the tree, and populate each copy with a seperate set of data.
However, at the moment, the cached 'blank' copy of the tree is not being copied, but simply referenced and filled with every single set of data.
How can I force the method that gets the cached blank tree to return a copy of the object, instead of a reference?
An alternative to Clone() - serialize it in the memory binary stream and then deserialize as a new instance.
EDIT
Also, if you will consider serialization, and if performance is you primary concern, please also take into account the following performance test Manual Serialization 200% + Faster than BinaryFormatter.
There are several ways, but I recommend implementing ICloneable on the tree object, and then call Clone() to create a deep copy.
I would suggest to look closely at your tree classes, and if you are going to be enforcing copy semantics, then use struct instead of class. Else use ICloneable interface to provide Clone() method, as chris166 suggested.
With such a large tree, having multiple copies of it will incur a lot of memory overhead. Why not just organise the data at each node (with a Dictionary, for example) so that it holds all the different data (as you're getting at the moment), but organised in a way which is convenient to your actual need?
I'm building a gui component that has a tree-based data model (e.g. folder structure in the file system). so the gui component basically has a collection of trees, which are just Node objects that have a key, reference to a piece of the gui component (so you can assign values to the Node object and it in turn updates the gui), and a collection of Node children.
one thing I'd like to do is be able to set "styles" that apply to each level of nodes (e.g. all top-level nodes are bold, all level-2 nodes are italic, etc). so I added this to the gui component object. to add nodes, you call AddChild on a Node object. I would like to apply the style here, since upon adding the node I know what level the node is.
problem is, the style info is only in the containing object (the gui object), so the Node doesn't know about it. I could add a "pointer" within each Node to the gui object, but that seems somehow wrong...or I could hide the Nodes and make the user only be able to add nodes through the gui object, e.g. gui.AddNode(Node new_node, Node parent), which seems inelegant.
is there a nicer design for this that I'm missing, or are the couple of ways I mentioned not really that bad?
Adding a ParentNode property to each node is "not really that bad". In fact, it's rather common. Apparently you didn't add that property because you didn't need it originally. Now you need it, so you have good reason to add it.
Alternates include:
Writing a function to find the parent of a child, which is processor intensive.
Adding a separate class of some sort which will cache parent-child relationships, which is a total waste of effort and memory.
Essentially, adding that one pointer into an existing class is a choice to use memory to cache the parent value instead of using processor time to find it. That appears to be a good choice in this situation.
It seems to me that the only thing you need is a Level property on the nodes, and use that when rendering a Node through the GUI object.
But it matters whether your Tree elements are Presentation agnostic like XmlNode or GUI oriented like Windows.Forms.TreeNode. The latter has a TreeView property and there is nothing wrong with that.
I see no reason why you should not have a reference to the GUI object in the node. A node cannot exist outside the GUI object, and it is useful to be able to easily find the GUI object a node is contained in.
You may not want to tie the formatting to the level the node is at if your leaf nodes may be at different levels.
Aloha,
I have a 8MB XML file that I wish to deserialize.
I'm using this code:
public static T Deserialize<T>(string xml)
{
TextReader reader = new StringReader(xml);
Type type = typeof(T);
XmlSerializer serializer = new XmlSerializer(type);
T obj = (T)serializer.Deserialize(reader);
return obj;
}
This code runs in about a minute, which seems rather slow to me. I've tried to use sgen.exe to precompile the serialization dll, but this didn't change the performance.
What other options do I have to improve performance?
[edit] I need the object that is created by the deserialization to perform (basic) transformations on. The XML is received from an external webservice.
The XmlSerializer uses reflection and is therefore not the best choice if performance is an issue.
You could build up a DOM of your XML document using the XmlDocument or XDocument classes and work with that, or, even faster use an XmlReader. The XmlReader however requires you to write any object mapping - if needed - yourself.
What approach is the best depends stronly on what you want to do with the XML data. Do you simply need to extract certain values or do you have to work and edit the whole document object model?
Yes it does use reflection, but performance is a gray area. When talking an 8mb file... yes it will be much slower. But if dealing with a small file it will not be.
I would NOT saying reading the file vial XmlReader or XPath would be easier or really any faster. What is easier then telling something to turn your xml to an object or your object to XML...? not much.
Now if you need fine grain control then maybe you need to do it by hand.
Personally the choice is like this. I am willing to give up a bit of speed to save a TON of ugly nasty code.
Like everything else in software development there are trade offs.
You can try implementing IXmlSerializable in your "T" class write custom logic to process the XML.