Get the entire Csv Line on parsing error
With CsvHelper we use :
MissingFieldFound:
Gets or sets the function that is called when a missing field is found.The default
function will throw a CsvHelper.MissingFieldException.You can supply your own
function to do other things like logging the issue instead of throwing an exception.
Arguments: headerNames, index, context
BadDataFound:
Gets or sets the function that is called when bad field data is found. A field
has bad data if it contains a quote and the field is not quoted (escaped). You
can supply your own function to do other things like logging the issue instead
of throwing an exception. Arguments: context
In the following MCVE, only MissingFieldFound capture the complete line when BadDataFound did not.
static void Main()
{
using (var stream = new MemoryStream())
using (var writer = new StreamWriter(stream))
using (var reader = new StreamReader(stream))
using (var csv = new CsvReader(reader))
{
writer.WriteLine("FirstName,LastName");
writer.WriteLine("\"Jon\"hn\"\",\"Doe\"");
writer.WriteLine("\"JaneDoe\"");
writer.WriteLine("\"Jane\",\"Doe\"");
writer.Flush();
stream.Position = 0;
var good = new List<Test>();
var bad = new List<string>();
var isRecordBad = false;
csv.Configuration.BadDataFound = context =>
{
isRecordBad = true;
bad.Add(context.RawRecord);
};
csv.Configuration.MissingFieldFound = (headerNames, index, context) =>
{
isRecordBad = true;
bad.Add(context.RawRecord);
};
while (csv.Read())
{
var record = csv.GetRecord<Test>();
if (!isRecordBad)
{
good.Add(record);
}
isRecordBad = false;
}
good.Dump();
bad.Dump();
}
}
public class Test
{
public string FirstName { get; set; }
public string LastName { get; set; }
}
I would like the result to be :
"Jon"hn"","Doe"
"JaneDoe"
Instead of :
"Jon"hn"",
"JaneDoe"
For long Csv with a lot of column the rest of the line often have valuable information.
You can get the line with this:
csv.Parser.Context.RawRecord;
Related
I have a process whereby we have written a class to import a large (ish) CSV into our app using CsvHelper (https://joshclose.github.io/CsvHelper).
I would like to compare the header to the Map to ensure the header's integrity. We get the CSV file from a 3rd party and I want to ensure it doesn't change over time and thought the best way to do this would be to compare it against the map.
We have a class set up as so (trimmed):
public class VisitExport
{
public int? Count { get; set; }
public string CustomerName { get; set; }
public string CustomerAddress { get; set; }
}
And its corresponding map (also trimmed):
public class VisitMap : ClassMap<VisitExport>
{
public VisitMap()
{
Map(m => m.Count).Name("Count");
Map(m => m.CustomerName).Name("Customer Name");
Map(m => m.CustomerAddress).Name("Customer Address");
}
}
This is the code I have for reading the CSV file and it works great. I have a try catch in place for the error but ideally, if it fails specifically for a header miss match, I'd like to handle that specifically.
private void fileLoadedLink_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
try
{
var filePath = string.Empty;
data = new List<VisitExport>();
using (OpenFileDialog openFileDialog = new OpenFileDialog())
{
openFileDialog.InitialDirectory = new KnownFolder(KnownFolderType.Downloads).Path;
openFileDialog.Filter = "csv files (*.csv)|*.csv";
openFileDialog.FilterIndex = 2;
openFileDialog.RestoreDirectory = true;
if (openFileDialog.ShowDialog() == DialogResult.OK)
{
filePath = openFileDialog.FileName;
var fileStream = openFileDialog.OpenFile();
var culture = CultureInfo.GetCultureInfo("en-GB");
using (StreamReader reader = new StreamReader(fileStream))
using (var readCsv = new CsvReader(reader, culture))
{
var map = new VisitMap();
readCsv.Context.RegisterClassMap(map);
var fileContent = readCsv.GetRecords<VisitExport>();
data = fileContent.ToList();
fileLoadedLink.Text = filePath;
viewModel.IsFileLoaded = true;
}
}
}
}
catch (CsvHelperException ex)
{
Console.WriteLine(ex.InnerException != null ? ex.InnerException.Message : ex.Message);
fileLoadedLink.Text = "Error loading file.";
viewModel.IsFileLoaded = false;
}
}
Is there a way of comparing the Csv header vs my map?
There are two basic cases for CSV files with headers: missing CSV columns, and extra CSV columns. The first is already detected by CsvHelper while the detection of the second is not implemented out of the box and requires subclassing of CsvReader.
(As CsvHelper maps CSV columns to model properties by name, permuting the order of the columns in the CSV file would not be considered a breaking change.)
Note that this only applies to CSV files that actually contain headers. Since you are not setting CsvConfiguration.HasHeaderRecord = false I assume that this applies to your use case.
Details about each of the two cases follow.
Missing CSV columns.
Currently CsvHelper already throws an exception by default in such situations. When unmapped data model properties are found, CsvConfiguration.HeaderValidated is invoked. By default this is set to ConfigurationFunctions.HeaderValidated whose current behavior is to throw a HeaderValidationException if there are any unmapped model properties. You can replace or extend HeaderValidated with logic of your own if you prefer:
var culture = CultureInfo.GetCultureInfo("en-GB");
var config = new CsvConfiguration (culture)
{
HeaderValidated = (args) =>
{
// Add additional logic as required here
ConfigurationFunctions.HeaderValidated(args);
},
};
using (var readCsv = new CsvReader(reader, config))
{
// Remainder unchanged
Demo fiddle #1 here.
Extra CSV columns.
Currently CsvHelper does not inform the application when this happens. See Throw if csv contains unexpected columns #1032 which confirms that this is not implemented out of the box.
In a GitHub comment, user leopignataro suggests a workaround, which is to subclass CsvReader and add the necessary validation logic oneself. However the version shown in the comment doesn't seem to handle duplicated column names or embedded references. The following subclass of CsvHelper should do this correctly. It is based on the logic in CsvReader.ValidateHeader(ClassMap map, List<InvalidHeader> invalidHeaders). It recursively walks the incoming ClassMap, attempts to find a CSV header corresponding to each member or constructor parameter, and flags the index of each one that is mapped. Afterwards, if there are any unmapped headers, the supplied Action<CsvContext, List<string>> OnUnmappedCsvHeaders is invoked to notify the application of the problem and throw some exception if desired:
public class ValidatingCsvReader : CsvReader
{
public ValidatingCsvReader(TextReader reader, CultureInfo culture, bool leaveOpen = false) : this(new CsvParser(reader, culture, leaveOpen)) { }
public ValidatingCsvReader(TextReader reader, CsvConfiguration configuration) : this(new CsvParser(reader, configuration)) { }
public ValidatingCsvReader(IParser parser) : base(parser) { }
public Action<CsvContext, List<string>> OnUnmappedCsvHeaders { get; set; }
public override void ValidateHeader(Type type)
{
base.ValidateHeader(type);
var headerRecord = HeaderRecord;
var mapped = new BitArray(headerRecord.Length);
var map = Context.Maps[type];
FlagMappedHeaders(map, mapped);
var unmappedHeaders = Enumerable.Range(0, headerRecord.Length).Where(i => !mapped[i]).Select(i => headerRecord[i]).ToList();
if (unmappedHeaders.Count > 0)
{
OnUnmappedCsvHeaders?.Invoke(Context, unmappedHeaders);
}
}
protected virtual void FlagMappedHeaders(ClassMap map, BitArray mapped)
{
// Logic adapted from https://github.com/JoshClose/CsvHelper/blob/0d753ff09294b425e4bc5ab346145702eeeb1b6f/src/CsvHelper/CsvReader.cs#L157
// By https://github.com/JoshClose
foreach (var parameter in map.ParameterMaps)
{
if (parameter.Data.Ignore)
continue;
if (parameter.Data.IsConstantSet)
// If ConvertUsing and Constant don't require a header.
continue;
if (parameter.Data.IsIndexSet && !parameter.Data.IsNameSet)
// If there is only an index set, we don't want to validate the header name.
continue;
if (parameter.ConstructorTypeMap != null)
{
FlagMappedHeaders(parameter.ConstructorTypeMap, mapped);
}
else if (parameter.ReferenceMap != null)
{
FlagMappedHeaders(parameter.ReferenceMap.Data.Mapping, mapped);
}
else
{
var index = GetFieldIndex(parameter.Data.Names.ToArray(), parameter.Data.NameIndex, true);
if (index >= 0)
mapped.Set(index, true);
}
}
foreach (var memberMap in map.MemberMaps)
{
if (memberMap.Data.Ignore || !CanRead(memberMap))
continue;
if (memberMap.Data.ReadingConvertExpression != null || memberMap.Data.IsConstantSet)
// If ConvertUsing and Constant don't require a header.
continue;
if (memberMap.Data.IsIndexSet && !memberMap.Data.IsNameSet)
// If there is only an index set, we don't want to validate the header name.
continue;
var index = GetFieldIndex(memberMap.Data.Names.ToArray(), memberMap.Data.NameIndex, true);
if (index >= 0)
mapped.Set(index, true);
}
foreach (var referenceMap in map.ReferenceMaps)
{
if (!CanRead(referenceMap))
continue;
FlagMappedHeaders(referenceMap.Data.Mapping, mapped);
}
}
}
And then in your code, handle the OnUnmappedCsvHeaders callback however you would like, such as by throwing a CsvHelperException or some other custom exception:
using (var readCsv = new ValidatingCsvReader(reader, culture)
{
OnUnmappedCsvHeaders = (context, headers) => throw new CsvHelperException(context, string.Format("Unmapped CSV headers: \"{0}\"", string.Join(",", headers))),
})
Demo fiddles:
#2 (your model).
#3 (with external references).
#4 (duplicate names).
#5 (using the auto-generated map).
This could use additional testing, e.g. for data models with parameterized constructors and additional, mutable properties.
How about catching HeaderValidationException before catching CsvHelperException
catch (HeaderValidationException ex)
{
var message = ex.Message.Split('\n')[0];
var currentHeader = ex.Context.Reader.HeaderRecord;
message += $"{Environment.NewLine}Header: \"{string.Join(",", currentHeader)}\"";
Console.WriteLine(message);
fileLoadedLink.Text = "Error loading file.";
viewModel.IsFileLoaded = false;
}
catch (CsvHelperException ex)
{
Console.WriteLine(ex.InnerException != null ? ex.InnerException.Message : ex.Message);
fileLoadedLink.Text = "Error loading file.";
viewModel.IsFileLoaded = false;
}
Below is the code which i using to read a stream source of csv files but I get error as "No header record found". The library is 15.0 and I am already using .ToList() as suggested in some solutions, but still the error persists. Below is the method along with the tablefield class and the Read Stream method.
Also note here, I can get the desired result if I pass source as MemoryStream but it fails if I pass it as Stream because I need to avoid writing to memory each time.
public async Task<Stream> DownloadBlob(string containerName, string fileName, string connectionString)
{
// MemoryStream memoryStream = new MemoryStream();
if (string.IsNullOrEmpty(connectionString))
{
connectionString = #"UseDevelopmentStorage=true";
containerName = "testblobs";
}
Microsoft.Azure.Storage.CloudStorageAccount storageAccount = Microsoft.Azure.Storage.CloudStorageAccount.Parse(connectionString);
CloudBlobClient serviceClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = serviceClient.GetContainerReference(containerName);
CloudBlockBlob blob = container.GetBlockBlobReference(fileName);
if (!blob.Exists())
{
throw new Exception($"Blob Not found");
}
return await blob.OpenReadAsync();
public class TableField
{
public string Name { get; set; }
public string Type { get; set; }
public Type DataType
{
get
{
switch( Type.ToUpper() )
{
case "STRING":
return typeof(string);
case "INT":
return typeof( int );
case "BOOL":
case "BOOLEAN":
return typeof( bool );
case "FLOAT":
case "SINGLE":
case "DOUBLE":
return typeof( double );
case "DATETIME":
return typeof( DateTime );
default:
throw new NotSupportedException( $"CSVColumn data type '{Type}' not supported" );
}
}
}
private IEnumerable<Dictionary<string, EntityProperty>> ReadCSV(Stream source, IEnumerable<TableField> cols)
{
using (TextReader reader = new StreamReader(source, Encoding.UTF8))
{
var cache = new TypeConverterCache();
cache.AddConverter<float>(new CSVSingleConverter());
cache.AddConverter<double>(new CSVDoubleConverter());
var csv = new CsvReader(reader,
new CsvHelper.Configuration.CsvConfiguration(global::System.Globalization.CultureInfo.InvariantCulture)
{
Delimiter = ";",
HasHeaderRecord = true,
CultureInfo = global::System.Globalization.CultureInfo.InvariantCulture,
TypeConverterCache = cache
});
csv.Read();
csv.ReadHeader();
var map = (
from col in cols
from src in col.Sources()
let index = csv.GetFieldIndex(src, isTryGet: true)
where index != -1
select new { col.Name, Index = index, Type = col.DataType }).ToList();
while (csv.Read())
{
yield return map.ToDictionary(
col => col.Name,
col => EntityProperty.CreateEntityPropertyFromObject(csv.GetField(col.Type, col.Index)));
}
}
}
StreamReading code:
public async Task<Stream> ReadStream(string containerName, string digestFileName, string fileName, string connectionString)
{
string data = string.Empty;
string fileExtension = Path.GetExtension(fileName);
var contents = await DownloadBlob(containerName, digestFileName, connectionString);
return contents;
}
Sample CSv to be read:
PartitionKey;Time;RowKey;State;RPM;Distance;RespirationConfidence;HeartBPM
te123;2020-11-06T13:33:37.593Z;10;1;8;20946;26;815
te123;2020-11-06T13:33:37.593Z;4;2;79944;8;36635;6
te123;2020-11-06T13:33:37.593Z;3;3;80042;9;8774;5
te123;2020-11-06T13:33:37.593Z;1;4;0;06642;6925;37
te123;2020-11-06T13:33:37.593Z;6;5;04740;74753;94628;21
te123;2020-11-06T13:33:37.593Z;7;6;6;2;14;629
te123;2020-11-06T13:33:37.593Z;9;7;126;86296;9157;05
te123;2020-11-06T13:33:37.593Z;5;8;5;3;7775;08
te123;2020-11-06T13:33:37.593Z;2;9;44363;65;70;229
te123;2020-11-06T13:33:37.593Z;8;10;02;24666;2;2
I have tried to reproduce the problem with version 15.0 of the library, but have failed with classes CSVSingleConverter and CSVDoubleConverter. With the standard classes of the CSVHelper, however, reading the header works:
using System;
using System.IO;
using System.Text;
using CsvHelper;
using CsvHelper.TypeConversion;
namespace ConsoleApp2
{
class Program
{
static void Main(string[] args)
{
using (Stream stream = new FileStream(#"e:\demo.csv", FileMode.Open, FileAccess.Read))
{
ReadCSV(stream);
}
}
private static void ReadCSV(Stream source)
{
using (TextReader reader = new StreamReader(source, Encoding.UTF8))
{
var cache = new TypeConverterCache();
cache.AddConverter<float>(new SingleConverter());
cache.AddConverter<double>(new DoubleConverter());
var csv = new CsvReader(reader,
new CsvHelper.Configuration.CsvConfiguration(global::System.Globalization.CultureInfo.InvariantCulture)
{
Delimiter = ";",
HasHeaderRecord = true,
CultureInfo = global::System.Globalization.CultureInfo.InvariantCulture,
TypeConverterCache = cache
});
csv.Read();
csv.ReadHeader();
foreach (string headerRow in csv.Context.HeaderRecord)
{
Console.WriteLine(headerRow);
}
}
}
}
}
I´ve changed the lines ...
cache.AddConverter<float>(new CSVSingleConverter());
cache.AddConverter<double>(new CSVDoubleConverter());
... to ...
cache.AddConverter<float>(new SingleConverter());
cache.AddConverter<double>(new DoubleConverter());
I put the CSV data into a UTF-8 text file. Output at the console is:
PartitionKey
Time
RowKey
State
RPM
Distance
RespirationConfidence
HeartBPM
EDIT 2020-12-24:
Put the whole source text online, not just part of it.
Related to my answer to your other question (it has more detail ; you can read it there) I didn't encounter any problem connecting CsvHelper to a blob storage sourced stream
This was the code used (I took the CSV data you posted, added it to a file, upped it to blob):
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private async void button1_Click(object sender, EventArgs e)
{
var cstr = "YOUR CONNSTR" HERE;
var bbc = new BlockBlobClient(cstr, "temp", "ankit.csv");
var s = await bbc.OpenReadAsync(new BlobOpenReadOptions(true) { BufferSize = 16384 });
var sr = new StreamReader(s);
var csv = new CsvHelper.CsvReader(sr, new CsvConfiguration(CultureInfo.CurrentCulture) { HasHeaderRecord = true, Delimiter = ";" });
//try by read/getrecord
while(await csv.ReadAsync())
{
var rec = csv.GetRecord<X>();
Console.WriteLine(rec.PartitionKey);
}
var x = new X();
//try by await foreach
await foreach (var r in csv.EnumerateRecordsAsync(x))
{
Console.WriteLine(r.PartitionKey);
}
}
}
class X {
public string PartitionKey { get; set; }
}
Try setting the source stream back to the start.
private IEnumerable<Dictionary<string, EntityProperty>> ReadCSV(Stream source, IEnumerable<TableField> cols)
{
source.Position = 0;
You also can't use yield return there. It delays execution of the code until you access the IEnumerable<Dictionary<string, EntityProperty>> returned from the ReadCSV method. The problem is at that point you have already closed the using statement with the TextReader that CsvHelper needs to read your data, so you get a NullReferenceException.
You either need to remove the yield return
var result = new List<Dictionary<string, EntityProperty>>();
while (csv.Read()){
// Add to result
}
return result;
Or pass the TextReader to your method. Any enumaration of the IEnumerable<Dictionary<string, EntityProperty>> must occur before leaving the using statement which will dispose of the TextReader needed by the CsvReader
IEnumerable<Dictionary<string, EntityProperty>> result;
using (TextReader reader = new StreamReader(source, Encoding.UTF8)){
// Calling ToList() will enumerate your yield statement
result = ReadCSV(reader, cols).ToList();
}
I was getting the same error 'No header found...' and this was after several hundred successful reads of the same file. I added the delimiter=","
reader = csv.reader(filename, delimiter=",")
and that solved the problem. I think the csv_reader will attempt to determine the delimiter if the delimiter is not specified, and fails after a while, maybe a memory leak? the comma is the default, but if the reader has to programatically determine it, it is more likely to fail.
I have a csv file with the following data:
500000,0.005,6000
690000,0.003,5200
I need to add each line as a separate array. So 50000, 0.005, 6000 would be array1. How would I do this?
Currently my code adds each column into one element.
For example data[0] is showing 500000
690000
static void ReadFromFile(string filePath)
{
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader(filePath))
{
string line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
string[] data = line.Split(',');
Console.WriteLine(data[0] + " " + data[1]);
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}
Using the limited data set you've provided...
const string test = #"500000,0.005,6000
690000,0.003,5200";
var result = test.Split('\n')
.Select(x=> x.Split(',')
.Select(y => Convert.ToDecimal(y))
.ToArray()
)
.ToArray();
foreach (var element in result)
{
Console.WriteLine($"{element[0]}, {element[1]}, {element[2]}");
}
Can it be done without LINQ? Yes, but it's messy...
const string test = #"500000,0.005,6000
690000,0.003,5200";
List<decimal[]> resultList = new List<decimal[]>();
string[] lines = test.Split('\n');
foreach (var line in lines)
{
List<decimal> decimalValueList = new List<decimal>();
string[] splitValuesByComma = line.Split(',');
foreach (string value in splitValuesByComma)
{
decimal convertedValue = Convert.ToDecimal(value);
decimalValueList.Add(convertedValue);
}
decimal[] decimalValueArray = decimalValueList.ToArray();
resultList.Add(decimalValueArray);
}
decimal[][] resultArray = resultList.ToArray();
That will give the exact same output as what I've done with the first example
If you may use a List<string[]> you do not have to worry about the array length.
In the following example, the variable lines will be a list arrays, like:
["500000", "0.005", "6000"]
["690000", "0.003", "5200"]
static void ReadFromFile(string filePath)
{
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader(filePath))
{
List<string[]> lines = new List<string[]>();
string line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
string[] splittedLine = line.Split(',');
lines.Add(splittedLine);
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}
While other have split method, I will have a more "scolar"-"specified" method.
You have some Csv value in a file. Find a name for this object stored in a Csv, name every column, type them.
Define the default value of those field. Define what happends for missing column, and malformed field. Header?
Now that you know what you have, define what you want. This time again: Object name -> Property -> Type.
Believe me or not, the simple definition of your input and output solved your issue.
Use CsvHelper to simplify your code.
CSV File Definition:
public class CsvItem_WithARealName
{
public int data1;
public decimal data2;
public int goodVariableNames;
}
public class CsvItemMapper : ClassMap<CsvItem_WithARealName>
{
public CsvItemMapper()
{ //mapping based on index. cause file has no header.
Map(m => m.data1).Index(0);
Map(m => m.data2).Index(1);
Map(m => m.goodVariableNames).Index(2);
}
}
A Csv reader method, point a document it will give your the Csv Item.
Here we have some configuration: no header and InvariantCulture for decimal convertion
private IEnumerable<CsvItem_WithARealName> GetCsvItems(string filePath)
{
using (var fileReader = File.OpenText(filePath))
using (var csvReader = new CsvHelper.CsvReader(fileReader))
{
csvReader.Configuration.CultureInfo = CultureInfo.InvariantCulture;
csvReader.Configuration.HasHeaderRecord = false;
csvReader.Configuration.RegisterClassMap<CsvItemMapper>();
while (csvReader.Read())
{
var record = csvReader.GetRecord<CsvItem_WithARealName>();
yield return record;
}
}
}
Usage :
var filename = "csvExemple.txt";
var items = GetCsvItems(filename);
There is an error in XML document (8, 20). Inner 1: Unexpected XML declaration. The XML declaration must be the first node in the document, and no white space characters are allowed to appear before it.
OK, I understand this error.
How I get it, however, is what perplexes me.
I create the document with Microsoft's Serialize tool. Then, I turn around and attempt to read it back, again, using Microsoft's Deserialize tool.
I am not in control of writing the XML file in the correct format - that I can see.
Here is the single routine I use to read and write.
private string xmlPath = System.Web.Hosting.HostingEnvironment.MapPath(WebConfigurationManager.AppSettings["DATA_XML"]);
private object objLock = new Object();
public string ErrorMessage { get; set; }
public StoredMsgs Operation(string from, string message, FileAccess access) {
StoredMsgs list = null;
lock (objLock) {
ErrorMessage = null;
try {
if (!File.Exists(xmlPath)) {
var root = new XmlRootAttribute(rootName);
var serializer = new XmlSerializer(typeof(StoredMsgs), root);
if (String.IsNullOrEmpty(message)) {
from = "Code Window";
message = "Created File";
}
var item = new StoredMsg() {
From = from,
Date = DateTime.Now.ToString("s"),
Message = message
};
using (var stream = File.Create(xmlPath)) {
list = new StoredMsgs();
list.Add(item);
serializer.Serialize(stream, list);
}
} else {
var root = new XmlRootAttribute("MessageHistory");
var serializer = new XmlSerializer(typeof(StoredMsgs), root);
var item = new StoredMsg() {
From = from,
Date = DateTime.Now.ToString("s"),
Message = message
};
using (var stream = File.Open(xmlPath, FileMode.Open, FileAccess.ReadWrite)) {
list = (StoredMsgs)serializer.Deserialize(stream);
if ((access == FileAccess.ReadWrite) || (access == FileAccess.Write)) {
list.Add(item);
serializer.Serialize(stream, list);
}
}
}
} catch (Exception error) {
var sb = new StringBuilder();
int index = 0;
sb.AppendLine(String.Format("Top Level Error: <b>{0}</b>", error.Message));
var err = error.InnerException;
while (err != null) {
index++;
sb.AppendLine(String.Format("\tInner {0}: {1}", index, err.Message));
err = err.InnerException;
}
ErrorMessage = sb.ToString();
}
}
return list;
}
Is something wrong with my routine? If Microsoft write the file, it seems to me that it should be able to read it back.
It should be generic enough for anyone to use.
Here is my StoredMsg class:
[Serializable()]
[XmlType("StoredMessage")]
public class StoredMessage {
public StoredMessage() {
}
[XmlElement("From")]
public string From { get; set; }
[XmlElement("Date")]
public string Date { get; set; }
[XmlElement("Message")]
public string Message { get; set; }
}
[Serializable()]
[XmlRoot("MessageHistory")]
public class MessageHistory : List<StoredMessage> {
}
The file it generates doesn't look to me like it has any issues.
I saw the solution here:
Error: The XML declaration must be the first node in the document
But, in that case, it seems someone already had an XML document they wanted to read. They just had to fix it.
I have an XML document created my Microsoft, so it should be read back in by Microsoft.
The problem is that you are adding to the file. You deserialize, then re-serialize to the same stream without rewinding and resizing to zero. This gives you multiple root elements:
<?xml version="1.0"?>
<StoredMessage>
</StoredMessage
<?xml version="1.0"?>
<StoredMessage>
</StoredMessage
Multiple root elements, and multiple XML declarations, are invalid according to the XML standard, thus the .NET XML parser throws an exception in this situation by default.
For possible solutions, see XML Error: There are multiple root elements, which suggests you either:
Enclose your list of StoredMessage elements in some synthetic outer element, e.g. StoredMessageList.
This would require you to load the list of messages from the file, add the new message, and then truncate the file and re-serialize the entire list when adding a single item. Thus the performance may be worse than in your current approach, but the XML will be valid.
When deserializing a file containing concatenated root elements, create an XML writer using XmlReaderSettings.ConformanceLevel = ConformanceLevel.Fragment and iteratively walk through the concatenated root node(s) and deserialize each one individually as shown, e.g., here. Using ConformanceLevel.Fragment allows the reader to parse streams with multiple root elements (although multiple XML declarations will still cause an error to be thrown).
Later, when adding a new element to the end of the file using XmlSerializer, seek to the end of the file and serialize using an XML writer returned from XmlWriter.Create(TextWriter, XmlWriterSettings)
with XmlWriterSettings.OmitXmlDeclaration = true. This prevents output of multiple XML declarations as explained here.
For option #2, your Operation would look something like the following:
private string xmlPath = System.Web.Hosting.HostingEnvironment.MapPath(WebConfigurationManager.AppSettings["DATA_XML"]);
private object objLock = new Object();
public string ErrorMessage { get; set; }
const string rootName = "MessageHistory";
static readonly XmlSerializer serializer = new XmlSerializer(typeof(StoredMessage), new XmlRootAttribute(rootName));
public MessageHistory Operation(string from, string message, FileAccess access)
{
var list = new MessageHistory();
lock (objLock)
{
ErrorMessage = null;
try
{
using (var file = File.Open(xmlPath, FileMode.OpenOrCreate))
{
list.AddRange(XmlSerializerHelper.ReadObjects<StoredMessage>(file, false, serializer));
if (list.Count == 0 && String.IsNullOrEmpty(message))
{
from = "Code Window";
message = "Created File";
}
var item = new StoredMessage()
{
From = from,
Date = DateTime.Now.ToString("s"),
Message = message
};
if ((access == FileAccess.ReadWrite) || (access == FileAccess.Write))
{
file.Seek(0, SeekOrigin.End);
var writerSettings = new XmlWriterSettings
{
OmitXmlDeclaration = true,
Indent = true, // Optional; remove if compact XML is desired.
};
using (var textWriter = new StreamWriter(file))
{
if (list.Count > 0)
textWriter.WriteLine();
using (var xmlWriter = XmlWriter.Create(textWriter, writerSettings))
{
serializer.Serialize(xmlWriter, item);
}
}
}
list.Add(item);
}
}
catch (Exception error)
{
var sb = new StringBuilder();
int index = 0;
sb.AppendLine(String.Format("Top Level Error: <b>{0}</b>", error.Message));
var err = error.InnerException;
while (err != null)
{
index++;
sb.AppendLine(String.Format("\tInner {0}: {1}", index, err.Message));
err = err.InnerException;
}
ErrorMessage = sb.ToString();
}
}
return list;
}
Using the following extension method adapted from Read nodes of a xml file in C#:
public partial class XmlSerializerHelper
{
public static List<T> ReadObjects<T>(Stream stream, bool closeInput = true, XmlSerializer serializer = null)
{
var list = new List<T>();
serializer = serializer ?? new XmlSerializer(typeof(T));
var settings = new XmlReaderSettings
{
ConformanceLevel = ConformanceLevel.Fragment,
CloseInput = closeInput,
};
using (var xmlTextReader = XmlReader.Create(stream, settings))
{
while (xmlTextReader.Read())
{ // Skip whitespace
if (xmlTextReader.NodeType == XmlNodeType.Element)
{
using (var subReader = xmlTextReader.ReadSubtree())
{
var logEvent = (T)serializer.Deserialize(subReader);
list.Add(logEvent);
}
}
}
}
return list;
}
}
Note that if you are going to create an XmlSerializer using a custom XmlRootAttribute, you must cache the serializer to avoid a memory leak.
Sample fiddle.
Is there a way to use the SQL Server 2012 Microsoft.SqlServer.Dac Namespace to determine if a database has an identical schema to that described by a DacPackage object? I've looked at the API docs for DacPackage as well as DacServices, but not having any luck; am I missing something?
Yes there is, I have been using the following technique since 2012 without issue.
Calculate a fingerprint of the dacpac.
Store that fingerprint in the target database.
The .dacpac is just a zip file containing goodies like metadata, and
model information.
Here's a screen-grab of what you will find in the .dacpac:
The file model.xml has XML structured like the following
<DataSchemaModel>
<Header>
... developer specific stuff is in here
</Header>
<Model>
.. database model definition is in here
</Model>
</<DataSchemaModel>
What we need to do is extract the contents from <Model>...</Model>
and treat this as the fingerprint of the schema.
"But wait!" you say. "Origin.xml has the following nodes:"
<Checksums>
<Checksum Uri="/model.xml">EB1B87793DB57B3BB5D4D9826D5566B42FA956EDF711BB96F713D06BA3D309DE</Checksum>
</Checksums>
In my experience, this <Checksum> node changes regardless of a schema change in the model.
So let's get to it.
Calculate the fingerprint of the dacpac.
using System.IO;
using System.IO.Packaging;
using System.Security.Cryptography;
static string DacPacFingerprint(byte[] dacPacBytes)
{
using (var ms = new MemoryStream(dacPacBytes))
using (var package = ZipPackage.Open(ms))
{
var modelFile = package.GetPart(new Uri("/model.xml", UriKind.Relative));
using (var streamReader = new System.IO.StreamReader(modelFile.GetStream()))
{
var xmlDoc = new XmlDocument() { InnerXml = streamReader.ReadToEnd() };
foreach (XmlNode childNode in xmlDoc.DocumentElement.ChildNodes)
{
if (childNode.Name == "Header")
{
// skip the Header node as described
xmlDoc.DocumentElement.RemoveChild(childNode);
break;
}
}
using (var crypto = new SHA512CryptoServiceProvider())
{
byte[] retVal = crypto.ComputeHash(Encoding.UTF8.GetBytes(xmlDoc.InnerXml));
return BitConverter.ToString(retVal).Replace("-", "");// hex string
}
}
}
}
With this fingerprint now available, pseudo code for applying a dacpac can be:
void main()
{
var dacpacBytes = File.ReadAllBytes("<path-to-dacpac>");
var dacpacFingerPrint = DacPacFingerprint(dacpacBytes);// see above
var databaseFingerPrint = Database.GetFingerprint();//however you choose to do this
if(databaseFingerPrint != dacpacFingerPrint)
{
DeployDacpac(...);//however you choose to do this
Database.SetFingerprint(dacpacFingerPrint);//however you choose to do this
}
}
Here's what I've come up with, but I'm not really crazy about it. If anyone can point out any bugs, edge cases, or better approaches, I'd be much obliged.
...
DacServices dacSvc = new DacServices(connectionString);
string deployScript = dacSvc.GenerateDeployScript(myDacpac, #"aDb", deployOptions);
if (DatabaseEqualsDacPackage(deployScript))
{
Console.WriteLine("The database and the DacPackage are equal");
}
...
bool DatabaseEqualsDacPackage(string deployScript)
{
string equalStr = string.Format("GO{0}USE [$(DatabaseName)];{0}{0}{0}GO{0}PRINT N'Update complete.'{0}GO", Environment.NewLine);
return deployScript.Contains(equalStr);
}
...
What I really don't like about this approach is that it's entirely dependent upon the format of the generated deployment script, and therefore extremely brittle. Questions, comments and suggestions very welcome.
#Aaron Hudon answer does not account for post script changes. Sometimes you just add a new entry to a type table without changing the model. In our case we want this to count as new dacpac. Here is my modification of his code to account for that
private static string DacPacFingerprint(string path)
{
using (var stream = File.OpenRead(path))
using (var package = Package.Open(stream))
{
var extractors = new IDacPacDataExtractor [] {new ModelExtractor(), new PostScriptExtractor()};
string content = string.Join("_", extractors.Select(e =>
{
var modelFile = package.GetPart(new Uri($"/{e.Filename}", UriKind.Relative));
using (var streamReader = new StreamReader(modelFile.GetStream()))
{
return e.ExtractData(streamReader);
}
}));
using (var crypto = new MD5CryptoServiceProvider())
{
byte[] retVal = crypto.ComputeHash(Encoding.UTF8.GetBytes(content));
return BitConverter.ToString(retVal).Replace("-", "");// hex string
}
}
}
private class ModelExtractor : IDacPacDataExtractor
{
public string Filename { get; } = "model.xml";
public string ExtractData(StreamReader streamReader)
{
var xmlDoc = new XmlDocument() { InnerXml = streamReader.ReadToEnd() };
foreach (XmlNode childNode in xmlDoc.DocumentElement.ChildNodes)
{
if (childNode.Name == "Header")
{
// skip the Header node as described
xmlDoc.DocumentElement.RemoveChild(childNode);
break;
}
}
return xmlDoc.InnerXml;
}
}
private class PostScriptExtractor : IDacPacDataExtractor
{
public string Filename { get; } = "postdeploy.sql";
public string ExtractData(StreamReader stream)
{
return stream.ReadToEnd();
}
}
private interface IDacPacDataExtractor
{
string Filename { get; }
string ExtractData(StreamReader stream);
}