I have a list of Xml messages specifically DataContract messages that i record to a file. And i am trying to deserialize them from file one by one. I do not want to read the whole file into memory at once because i expect it to be very big.
I have an implementation of this serialization and that works. I did this by serializing using a FileStream and reading the bytes and using regular expression to determine the end of element. Then taking the element and using DataContractSerializer to get the actual object.
But i was told I should be using higher level code to do this task and it seems like that should be possible. I have the following code that i think should work but it doesn't.
FileStream readStream = File.OpenRead(filename);
DataContractSerializer ds = new DataContractSerializer(typeof(MessageType));
MessageType msg;
while ((msg = (MessageType)ds.ReadObject(readStream)) != null)
{
Console.WriteLine("Test " + msg.Property1);
}
The above code is fed with an input file containing something along the following lines:
<MessageType>....</MessageType>
<MessageType>....</MessageType>
<MessageType>....</MessageType>
It appears that i can read and deserialize the first element correctly but after that it fails saying:
System.Runtime.Serialization.SerializationException was unhandled
Message=There was an error deserializing the object of type MessageType. The data at the root level is invalid. Line 1, position 1.
Source=System.Runtime.Serialization
I have read somewhere that it is due to the way DataContractSerializer works with padded '\0''s to the end - but i couldn't figure out how to fix this problem when reading from a stream without figuring out the end of MessageType tag in some other way. Is there another Serialization class that i should be using? or perhaps a way around this problem?
Thanks!
When you're deserializing the data from the file, WCF uses by default a reader which can only consume proper XML documents. The document which you're reading isn't - it contains multiple root elements, so it's effectively a fragment. You can change the reader the serializer is using by using another overload of ReadObject, as shown in the example below, to one which accepts fragments (by using the XmlReaderSettings object). Or you can have some sort of wrapping element around the <MessageType> elements, and you'd read until the reader were positioned at the end element for the wrapper.
public class StackOverflow_7760551
{
[DataContract]
public class Person
{
[DataMember]
public string Name { get; set; }
[DataMember]
public int Age { get; set; }
public override string ToString()
{
return string.Format("Person[Name={0},Age={1}]", this.Name, this.Age);
}
}
public static void Test()
{
const string fileName = "test.xml";
using (FileStream fs = File.Create(fileName))
{
Person[] people = new Person[]
{
new Person { Name = "John", Age = 33 },
new Person { Name = "Jane", Age = 28 },
new Person { Name = "Jack", Age = 23 }
};
foreach (Person p in people)
{
XmlWriterSettings ws = new XmlWriterSettings
{
Indent = true,
IndentChars = " ",
OmitXmlDeclaration = true,
Encoding = new UTF8Encoding(false),
CloseOutput = false,
};
using (XmlWriter w = XmlWriter.Create(fs, ws))
{
DataContractSerializer dcs = new DataContractSerializer(typeof(Person));
dcs.WriteObject(w, p);
}
}
}
Console.WriteLine(File.ReadAllText(fileName));
using (FileStream fs = File.OpenRead(fileName))
{
XmlReaderSettings rs = new XmlReaderSettings
{
ConformanceLevel = ConformanceLevel.Fragment,
};
XmlReader r = XmlReader.Create(fs, rs);
while (!r.EOF)
{
Person p = new DataContractSerializer(typeof(Person)).ReadObject(r) as Person;
Console.WriteLine(p);
}
}
File.Delete(fileName);
}
}
Maybe your file contains BOM
It's common for UTF-8 encoding
XmlSerializer xml = new XmlSerializer(typeof(MessageType));
XmlDocument xdoc = new XmlDocument();
xdoc.Load(stream);
foreach(XmlElement elm in xdoc.GetElementsByTagName("MessageType"))
{
MessageType mt = (MessageType)xml.Deserialize(new StringReader(elm.OuterXml));
}
Related
So i have code that writes my data to a JSON with the library newtonsoft. But the problem now is that the JSON gets overwritten everytime instead of addes behind the previeous data.
Here is my code
StringBuilder sb = new StringBuilder();
StringWriter sw = new StringWriter(sb);
using (JsonWriter writer = new JsonTextWriter(sw))
{
writer.Formatting = Formatting.Indented;
writer.WriteStartArray();
writer.WriteStartObject();
writer.WritePropertyName("Temperature");
writer.WriteValue(temperature);
writer.WritePropertyName("Score");
writer.WriteValue(score);
writer.WritePropertyName("TrackId");
writer.WriteValue(trackId);
/*
writer.WriteStartObject();
writer.WritePropertyName("CPU");
writer.WriteValue("Intel");
writer.WritePropertyName("PSU");
writer.WriteValue("500W");
writer.WritePropertyName("Drives");
writer.WriteStartArray();
writer.WriteValue("DVD read/writer");
writer.WriteComment("(broken)");
writer.WriteValue("500 gigabyte hard drive");
writer.WriteValue("200 gigabyte hard drive");
writer.WriteEnd();
*/
writer.WriteEndObject();
writer.WriteEnd();
}
System.IO.File.WriteAllText(#"C:/Users/Kimeru/Documents/Dermalog Noah WPF/data.json", sb.ToString());
This is the result I want to achieve:
[
{
"Temperature": "24.6",
"Score": "37",
"TrackId": 3
}
,
{
"Temperature": "16.8",
"Score": "38",
"TrackId": 4
}
]
I'm pretty new to the .NET coding world so I'm trying my best to explain.
I think a better solution would be to:
Read json and convert to list of class object that represents the json objects
Add, modify or remove objects from the list
Serialize the list of class objects to json
Write the new json to the file
I made a little example:
public class TrackData
{
public double Temperature { get; set; }
public double Score { get; set; }
public int TrackId { get; set; }
}
public void WriteJson(string filePath, List<TrackData> trackDataList)
{
string json = JsonConvert.SerializeObject(trackDataList);
using (StreamWriter sw = new StreamWriter(filePath))
{
sw.Write(json);
}
}
public List<TrackData> ReadJson(string filePath)
{
using (StreamReader sr = new StreamReader(filePath))
{
string json = sr.ReadToEnd();
return JsonConvert.DeserializeObject<List<TrackData>>(json);
}
}
Now you can use the methods and class this way:
List<TrackData> myTrackDataList = ReadJson("Filepath");
TrackData newTrackData = new TrackData();
newTrackData.Score = 38;
newTrackData.Temperature = 22;
newTrackData.TrackId = 5;
myTrackDataList.Add(newTrackData);
WriteJson("FilePath", myTrackDataList);
You use System.IO.File.WriteAllText(); which overrides the existing file.
Simply use System.IO.File.AppendAllText(); to add your text to the file.
I dont think its good idea to add to a json file like this,
save each object into a new json file so you can read it after.
Path.GetTempFileName() should give you a unique file name
System.IO.File.WriteAllText($#"C:/Users/Kimeru/Documents/Dermalog Noah WPF/{Path.GetTempFileName()}_data.json", sb.ToString());
There are other ways to get unique file name generated for you
How to Generate unique file names in C#
I have a pseudo XML file with 5 small xmls in it like so:
What I am trying to achieve is separate and create a new file for each of these XMLs using MemoryStream with this code:
int flag = 0;
byte[] arr = Encoding.ASCII.GetBytes(File.ReadAllText(#"C:\\Users\\Aleksa\\Desktop\\testTxt.xml"));
for (int i = 0; i <= 5; i++)
{
MemoryStream mem = new MemoryStream(arr);
mem.Position = flag;
StreamReader rdr = new StreamReader(mem);
string st = rdr.ReadToEnd();
if (st.IndexOf("<TestNode") != -1 && (st.IndexOf("</TestNode>") != -1 || st.IndexOf("/>") != -1))
{
int curr = st.IndexOf("<TestNode");
int end = st.IndexOf("\r");
string toWrite = st.Substring(st.IndexOf("<TestNode"), end);
File.WriteAllText(#"C:\\Users\\Aleksa\\Desktop\\" + i.ToString() + ".xml", toWrite);
flag += end;
}
Console.WriteLine(st);
}
The first XML from the image gets separated and is okay, the rest are empty files, while debugging I noticed that even though I set the position to be the end variable it still streams from the top, also all iterations after the first have the end variable equal to zero!
I have tried changing the IndexOf parameter to </TestNode> + 11 which does the same as the code above except the rest of the files aren't empty but are not complete, leaving me with <TestNode a. How can I fix the logic here and split my stream of XML document(s) apart?
Your input stream consists of XML document fragments -- i.e. a series of XML root elements concatenated together.
You can read such a stream by using an XmlReader created with XmlReaderSettings.ConformanceLevel == ConformanceLevel.Fragment. From the docs:
Fragment
Ensures that the XML data conforms to the rules for a well-formed XML 1.0 document fragment.
This setting accepts XML data with multiple root elements, or text nodes at the top-level.
The following extension methods can be used for this task:
public static class XmlReaderExtensions
{
public static IEnumerable<XmlReader> ReadRoots(this XmlReader reader)
{
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element)
{
using (var subReader = reader.ReadSubtree())
yield return subReader;
}
}
}
public static void SplitDocumentFragments(Stream stream, Func<int, string> makeFileName, Action<string, IXmlLineInfo> onFileWriting, Action<string, IXmlLineInfo> onFileWritten)
{
using (var textReader = new StreamReader(stream, Encoding.UTF8, true, 4096, true))
{
SplitDocumentFragments(textReader, makeFileName, onFileWriting, onFileWritten);
}
}
public static void SplitDocumentFragments(TextReader textReader, Func<int, string> makeFileName, Action<string, IXmlLineInfo> onFileWriting, Action<string, IXmlLineInfo> onFileWritten)
{
if (textReader == null || makeFileName == null)
throw new ArgumentNullException();
var settings = new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Fragment, CloseInput = false };
using (var xmlReader = XmlReader.Create(textReader, settings))
{
var lineInfo = xmlReader as IXmlLineInfo;
var index = 0;
foreach (var reader in xmlReader.ReadRoots())
{
var outputName = makeFileName(index);
reader.MoveToContent();
if (onFileWriting != null)
onFileWriting(outputName, lineInfo);
using(var writer = XmlWriter.Create(outputName))
{
writer.WriteNode(reader, true);
}
index++;
if (onFileWritten != null)
onFileWritten(outputName, lineInfo);
}
}
}
}
Then you would use it as follows:
var fileName = #"C:\\Users\\Aleksa\\Desktop\\testTxt.xml";
var outputPath = ""; // The directory in which to create your XML files.
using (var stream = File.OpenRead(fileName))
{
XmlReaderExtensions.SplitDocumentFragments(stream,
index => Path.Combine(outputPath, index.ToString() + ".xml"),
(name, lineInfo) =>
{
Console.WriteLine("Writing {0}, starting line info: LineNumber = {1}, LinePosition = {2}...",
name, lineInfo?.LineNumber, lineInfo?.LinePosition);
},
(name, lineInfo) =>
{
Console.WriteLine(" Done. Result: ");
Console.Write(" ");
Console.WriteLine(File.ReadAllText(name));
});
}
And the output will look something like:
Writing 0.xml, starting line info: LineNumber = 1, LinePosition = 2...
Done. Result:
<?xml version="1.0" encoding="utf-8"?><TestNode active="1" lastName="l"><Foo /> </TestNode>
Writing 1.xml, starting line info: LineNumber = 2, LinePosition = 2...
Done. Result:
<?xml version="1.0" encoding="utf-8"?><TestNode active="2" lastName="l" />
Writing 2.xml, starting line info: LineNumber = 3, LinePosition = 2...
Done. Result:
<?xml version="1.0" encoding="utf-8"?><TestNode active="3" lastName="l"><Foo /> </TestNode>
... (others omitted).
Notes:
The method ReadRoots() reads through all the root elements of the XML fragment stream returns a nested reader restricted to just that specific root, by using XmlReader.ReadSubtree():
Returns a new XmlReader instance that can be used to read the current node, and all its descendants.
...
When the new XML reader has been closed, the original reader is positioned on the EndElement node of the sub-tree.
This allows callers of the method to parse each root individually without worrying about reading past the end of the root and into the next one. Then the contents of each root node can be copied to an output XmlWriter using XmlWriter.WriteNode(XmlReader, true).
You can track approximate position in the file using the IXmlLineInfo interface which is implemented by XmlReader subclasses that parse text streams. If your document fragment stream is truncated for some reason, this can help identify where the error occurs.
See: getting the current position from an XmlReader and C# how can I debug a deserialization exception? for details.
If you are parsing a string st containing your XML fragments rather that reading directly from a file, you can pass a StringReader to SplitDocumentFragments():
using (var textReader = new StringReader(st))
{
XmlReaderExtensions.SplitDocumentFragments(textReader,
// Remainder as before
Do not read an XML stream using Encoding.ASCII, this will strip all non-English characters from the file. Instead, use Encoding.UTF8 and/or detect the encoding from the BOM or XML declaration.
Demo fiddle here.
I am trying to deserialize a local file in my project which is a json file. However I am getting this error with the current code:
"Unexpected character encountered while parsing value: G. Path '', line 0, position 0"
C# code
string filepath = Application.StartupPath + #"\city.list.json";
for(int i = 0; i< 40; i++)
{
foreach (string x in File.ReadLines(filepath))
{
if(x.Contains("id") || x.Contains("name"))
{
var data = JsonConvert.DeserializeObject<City.values>(filepath);
//City city = JsonConvert.DeserializeObject<City>(File.ReadAllText(filepath));
//cityList.Add(data.name, data.id);
}
else
{
}
}
}
class City
{
[JsonObject(MemberSerialization = MemberSerialization.OptIn)]
public class values
{
[JsonProperty(PropertyName = "id")]
public string id { get; set; }
[JsonProperty(PropertyName = "name")]
public string name { get; set; }
}
}
Json file I am trying to deserialize from. This is just a quick sample taken out from the file. It is quite large ^^
[
{
"id":707860,
"name":"Hurzuf",
"country":"UA",
"coord":{
"lon":34.283333,
"lat":44.549999
}
},
{
"id":519188,
"name":"Novinki",
"country":"RU",
"coord":{
"lon":37.666668,
"lat":55.683334
}
},
You seem to have some huge misconceptions of how JSON deserializing works. First thing to address is you shouldn't be iterating through the lines of the json file. As pointed out in the comments, your JSON file is very large (~1.8 million lines) so your best bet is to use the JsonReader overload of DeserializeObject(), see Json.NET performance tips:
List<City.values> cities = new List<City.values>();
string filepath = Application.StartupPath + #"\city.list.json";
using (StreamReader sr = new StreamReader(filepath))
using (JsonReader reader = new JsonTextReader(sr))
{
JsonSerializer serializer = new JsonSerializer();
// read the json from a stream
// json size doesn't matter because only a small piece is read at a time from the HTTP request
cities = JsonConvert.DeserializeObject<List<City.values>>(reader);
}
Draw your attention to this line:
cities = JsonConvert.DeserializeObject<List<City.values>>(reader);
Here we leverage JSON.NET to deserialize. The difference between your code and the code I included here is that your JSON is a collection of objects, what this means is you need to deserialize into a collection of your City.values objects, in this case I used a List<T>.
Now we have a variable cities that is a collection of City.values objects that are included in your JSON.
Using the bog-standard System.Xml.Serialization.XmlSerializer, I am serializing an object who's class inherits from another. Inspecting the resulting XML, the root node is being given the attributes "p1:type" and "xmlns:p1":
<ApiSubmission ApiVersion="1" CustId="100104" p1:type="OrderConfirmationApiSubmission"
xmlns:p1="http://www.w3.org/2001/XMLSchema-instance">
...
</ApiSubmission>
Is there a nice way to remove these attributes?
So I came upon this same issue ~5 years after this question was originally asked and was disappointed that no one had answered. After searching around I cobbled something together that allows me to strip out the type attribute in a derived class.
internal static string SerializeObject(object objectToSerialize, bool OmitXmlDeclaration = true, System.Type type = null, bool OmitType = false, bool RemoveAllNamespaces = true)
{
XmlSerializer x;
string output;
if (type != null)
{
x = new XmlSerializer(type);
}
else
{
x = new XmlSerializer(objectToSerialize.GetType());
}
XmlWriterSettings settings = new XmlWriterSettings() { Indent = false, OmitXmlDeclaration = OmitXmlDeclaration, NamespaceHandling = NamespaceHandling.OmitDuplicates };
using (StringWriter swriter = new StringWriter())
using (XmlWriter xmlwriter = XmlWriter.Create(swriter, settings))
{
x.Serialize(xmlwriter, objectToSerialize);
output = swriter.ToString();
}
if (RemoveAllNamespaces || OmitType)
{
XDocument doc = XDocument.Parse(output);
if (RemoveAllNamespaces)
{
foreach (var element in doc.Root.DescendantsAndSelf())
{
element.Name = element.Name.LocalName;
element.ReplaceAttributes(GetAttributesWithoutNamespace(element));
}
}
if (OmitType)
{
foreach (var node in doc.Descendants().Where(e => e.Attribute("type") != null))
{
node.Attribute("type").Remove();
}
}
output = doc.ToString();
}
return output;
}
I use this and [XmlInclude] the derived class in the base class. Then OmitType and RemoveAllNamespaces. Essentially the derived class is then treated as if it were the base class.
I haven't been able to find a question related to my specific problem.
What I am trying to do is take a list of Xml nodes, and directly deserialize them to a List without having to create a class with attributes.
So the xml (myconfig.xml) would look something like this...
<collection>
<item>item1</item>
<item>item2</item>
<item>item3</item>
<item>etc...</item>
</collection>
In the end I would like a list of items as strings.
The code would look like this.
XmlSerializer serializer = new XmlSerializer( typeof( List<string> ) );
using (XmlReader reader = XmlReader.Create( "myconfig.xml" )
{
List<string> itemCollection = (List<string>)serializer.Deserialize( reader );
}
I'm not 100% confident that this is possible, but I'm guessing it should be. Any help would be greatly appreciated.
Are you married to the idea of using a serializer? If not, you can try Linq-to-XML. (.NET 3.5, C# 3 [and higher])
Based on your provided XML file format, this is the simple code.
// add 'using System.Xml.Linq' to your code file
string file = #"C:\Temp\myconfig.xml";
XDocument document = XDocument.Load(file);
List<string> list = (from item in document.Root.Elements("item")
select item.Value)
.ToList();
Ok, interestingly enough I may have found half the answer by serializing an existing List.
The result I got is as follows...
This following code:
List<string> things = new List<string> { "thing1", "thing2" };
XmlSerializer serializer = new XmlSerializer(typeof(List<string>), overrides);
using (TextWriter textWriter = new StreamWriter("things.xml"))
{
serializer.Serialize(textWriter, things);
}
Outputs a result of:
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfString xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<string>thing1</string>
<string>thing2</string>
</ArrayOfString>
I can override the root node by passing an XmlAttributeOverrides instance to the second parameter of the XmlSerializer constructor. It is created like this:
XmlAttributeOverrides overrides = new XmlAttributeOverrides();
XmlAttributes attributes = new XmlAttributes { XmlRoot = new XmlRootAttribute("collection") };
overrides.Add( typeof(List<string>), attributes );
This will change "ArrayOfString" to "collection". I still have not figured out how to control the name of the string element.
To customize List element names using XmlSerializer, you have to wrap the list.
[XmlRoot(Namespace="", ElementName="collection")]
public class ConfigWrapper
{
[XmlElement("item")]
public List<string> Items{ get; set;}
}
Usage:
var itemsList = new List<string>{"item1", "item2", "item3"};
var cfgIn = new ConfigWrapper{ Items = itemsList };
var xs = new XmlSerializer(typeof(ConfigWrapper));
string fileContent = null;
using (var sw = new StringWriter())
{
xs.Serialize(sw, cfgIn);
fileContent = sw.ToString();
Console.WriteLine (fileContent);
}
ConfigWrapper cfgOut = null;
using (var sr = new StringReader(fileContent))
{
cfgOut = xs.Deserialize(sr) as ConfigWrapper;
// cfgOut.Dump(); //view in LinqPad
if(cfgOut != null)
// yields 'item2'
Console.WriteLine (cfgOut.Items[1]);
}
Output:
// fileContent:
<?xml version="1.0" encoding="utf-16"?>
<collection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<item>item1</item>
<item>item2</item>
<item>item3</item>
</collection>
If you don't want to wrap the list, the DataContractSerializer will allow you to custom name the elements if you subclass it:
[CollectionDataContract(Name = "collection", ItemName = "item", Namespace = "")]
public class ConfigWrapper : List<string>
{
public ConfigWrapper() : base() { }
public ConfigWrapper(IEnumerable<string> items) : base(items) { }
public ConfigWrapper(int capacity) : base(capacity) { }
}
Usage And Output:
var cfgIn = new ConfigWrapper{ "item1", "item2", "item3" };
var ds = new DataContractSerializer(typeof(ConfigWrapper));
string fileContent = null;
using (var ms = new MemoryStream())
{
ds.WriteObject(ms, cfgIn);
fileContent = Encoding.UTF8.GetString(ms.ToArray());
Console.WriteLine (fileContent);
}
// yields: <collection xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><item>item1</item><item>item2</item><item>item3</item></collection>
ConfigWrapper cfgOut = null;
using (var sr = new StringReader(fileContent))
{
using(var xr = XmlReader.Create(sr))
{
cfgOut = ds.ReadObject(xr) as ConfigWrapper;
// cfgOut.Dump(); //view in LinqPad
if(cfgOut != null)
// yields 'item2'
Console.WriteLine (cfgOut[1]);
}
}