write xml file from data table ignoring columns with white spaces

write xml file from data table ignoring columns with white spaces - c#

I am trying to export the data from a data table into xml file. I have this part working but when a record does not have any data or a white space it still writes the it in the xml file with XML:space Preserved.
I want to ignore the columns and not have them in xml file if they do not have any data in them
example of the xml file it is producing now
I want customer street 3 and 4 nodes to be not printed if they don't have any values in them.
here is my code
using System.IO;
using System.Linq;
using System.Text;
using System.Data;
using System.Xml.Linq;
using System;
using System.Collections.Generic;
using PdfSharp.Pdf.IO;
using PdfSharp.Pdf;
using PdfSharp.Drawing;
using System.Xml;
namespace InvoicePrintProgram
{
class XMLGenerator
{
//Defining method that generates XMl Files
public void Start(String XmlFilepath, string XMlFileName, DataTable DT, int PageCountOut, int SequenceCountOut, int[] PrefIndex/*, int[] SequenceIndex, IEnumerable<String> chunk, int IndexCount out int IndexCountOut*/)
{
// Creates Xml file from datatable using the wrtieXml method
FileStream streamWrite = new FileStream(XmlFilepath, System.IO.FileMode.Create);
System.Xml.XmlWriterSettings settings = new System.Xml.XmlWriterSettings();
settings.Indent = true;
//settings.Encoding = System.Text.Encoding.GetEncoding("ISO-8859-1")
settings.Encoding = System.Text.Encoding.UTF8;
settings.CloseOutput = true;
settings.CheckCharacters = true;
settings.NewLineChars = "\r\n";
DT.WriteXml(streamWrite, XmlWriteMode.IgnoreSchema);

You can create a List of Object from that DataTable and use something like this:
List<string> xml_string = new List<string>();
xml_string.Add("<?xml version=\"1.0\" encoding=\"UTF-8\" ?>");
xml_string.Add("anything you need as header");
foreach(string current_string in your_object_of_DataTable)
{
if(current_string!=null || current_string.Trim()!="")
{
xml_string.Add("<Your Tag>"+current_string+"</Your Tag>");
}
}
try
{
using (System.IO.StreamWriter file = new System.IO.StreamWriter(#"D:\xml_file_name.xml"))
{
foreach (string line in xml_string)
{
file.WriteLine(line);
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
finally
{
MessageBox.Show("File exported.");
}

Related

How do I pull all values from PDF?

I have a working solution that opens a PDF file and grabs the text. Unfortunately the values that I need are in form fields. I've tried a few ways to get the values but I can only get what appears to be the form name. The key values are correct, but the value received is wrong.
Key ValueReturned Company Name iText.Forms.Fields.PdfTextFormField
Phone Number iText.Forms.Fields.PdfTextFormField Business Contact
Data iText.Forms.Fields.PdfTextFormField Name
iText.Forms.Fields.PdfTextFormField
The values in the form fields are not being returned. Is there a better way to do this?
using System;
using System.Collections.Generic;
using iText.Forms;
using iText.Forms.Fields;
using iText.Kernel.Pdf;
namespace ConsoleApplication1 {
class Class1 {
public string pdfthree(string pdfPath) {
PdfReader reader = new PdfReader(pdfPath);
PdfDocument document = new PdfDocument(reader);
PdfAcroForm acroForm = PdfAcroForm.GetAcroForm(document, false);
IDictionary<string, PdfFormField> Map = new Dictionary<string, PdfFormField>();
Map = acroForm.GetFormFields();
acroForm.GetField("Name");
string output = "";
foreach (String fldName in Map.Keys) {
output += fldName + ": " + Map[fldName].ToString() + "\n";
}
System.IO.File.WriteAllText(pdfPath, output);
document.Close();
reader.Close();
return output;
}
}
}

Instead of calling PdfFormField#ToString(), you should call PdfFormField#GetValueAsString() to get the value of the field.
Full code:
using System;
using System.Collections.Generic;
using iText.Forms;
using iText.Forms.Fields;
using iText.Kernel.Pdf;
namespace ConsoleApplication1 {
class Class1 {
public string pdfthree(string pdfPath) {
PdfReader reader = new PdfReader(pdfPath);
PdfDocument document = new PdfDocument(reader);
PdfAcroForm acroForm = PdfAcroForm.GetAcroForm(document, false);
IDictionary<string, PdfFormField> Map = new Dictionary<string, PdfFormField>();
Map = acroForm.GetFormFields();
acroForm.GetField("Name");
string output = "";
foreach (String fldName in Map.Keys) {
output += fldName + ": " + Map[fldName].GetValueAsString() + "\n";
}
System.IO.File.WriteAllText(pdfPath, output);
document.Close();
reader.Close();
return output;
}
}
}

string.Split() not working for the CSV file read

I am reading an excel file using my controller. I am storing all the lines of CSV file in an array. When I print it out, I can see the contents of array. But when I iterate through each of the lines and split by comma, I get nothing and as a result I can't store read values.
Here is a sample of my output along with code:
VAWC Neptune flat file is my file. 206 are number of lines in file. Then I am printing line along with its length. And when this line is splitted by commas, I see only first output else everything is empty.
However as we keep on reading other lines, this splitted array doesn't appear.
Here is the section of code which I am using:
//files is only having a single file named VAWC Neptune flat file - new meters for inventory.csv
public ActionResult ReadFile(IEnumerable<HttpPostedFileBase> files)
{
var fileName = Path.GetFileName(files.First().FileName);
var destinationPath = Path.Combine(Server.MapPath("~/App_Data"), fileName);
files.First().SaveAs(destinationPath);
try
{
string[] read = System.IO.File.ReadAllLines(destinationPath);
System.Diagnostics.Debug.WriteLine(read.Length);
for (int i = 0; i < read.Length; i++)
{
System.Diagnostics.Debug.WriteLine(read[i]);
List<string> s = read[i].Replace(Environment.NewLine,"").Split(new char[] { ',' }, StringSplitOptions.None).ToList<string>();
System.Diagnostics.Debug.WriteLine("Length of words in line:" + s.Capacity);
for (int j = 0; j < s.Capacity; j++)
{
System.Diagnostics.Debug.WriteLine("Data:s[" + j + "]" + s[j]);
}
}
}
I have tried so many possible ways but nothing has worked.

From Excel to CSV, right. This is how I would do it.
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using Excel = Microsoft.Office.Interop.Excel;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
Excel.Application xlApp = new Excel.Application();
Excel.Workbook xlWorkBook = xlApp.Workbooks.Open(#"C:\Users\Ryan\Desktop\Coding\DOT.NET\Samples C#\Excel Workbook - Save Each Sheet as a CSV File\Book1.xlsx");
xlApp.Visible = true;
foreach (Excel.Worksheet sht in xlWorkBook.Worksheets)
{
sht.Select();
xlWorkBook.SaveAs(string.Format("{0}{1}.csv", #"C:\Users\Ryan\Desktop\Coding\DOT.NET\Samples C#\Excel Workbook - Save Each Sheet as a CSV File to CSV\", sht.Name), Excel.XlFileFormat.xlCSV, Excel.XlSaveAsAccessMode.xlNoChange);
}
xlWorkBook.Close(false);
}
}
}

Type of Properties method in Stanford Core NLP

I am building a front end to parse some text files using Stanford Core NLP in C#. I open a file selection dialog and select some text files. Then the following method works from there on.
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.IO;
using System.Text.RegularExpressions;
using java.util;
using java.io;
using edu.stanford.nlp.pipeline;
namespace Parser_SVO
{
public partial class Form1 : Form
{
public static List<string> textFiles = new List<string>();
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
OpenFileDialog openFileDialog1 = new OpenFileDialog();
openFileDialog1.ShowReadOnly = true;
openFileDialog1.Filter = "Text Files|*.txt";
if (openFileDialog1.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
textFiles.AddRange(openFileDialog1.FileNames);
}
parseText();
}
public static void parseText()
{
label2.Text = "Stanford Parser....";
// Path to the folder with models extracted from `stanford-corenlp-3.7.0-models.jar`
string jarRoot = "";
string prettyPrint = "";
if (textFiles.Count != 0)
{
jarRoot = Path.GetDirectoryName(textFiles[0]) + #"\Models\";
prettyPrint = Path.GetDirectoryName(textFiles[0]);
Directory.CreateDirectory(prettyPrint + #"\PrettyPrint\");
prettyPrint = prettyPrint + #"\PrettyPrint\";
}
// Annotation pipeline configuration
var props = Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
props.setProperty("ner.useSUTime", "0");
// We should change current directory, so StanfordCoreNLP could find all the model files automatically
var curDir = Environment.CurrentDirectory;
Directory.SetCurrentDirectory(jarRoot);
var pipeline = new StanfordCoreNLP(props);
Directory.SetCurrentDirectory(curDir);
foreach (string file in textFiles)
{
label3.Text = file;
// Text for processing
var text = System.IO.File.ReadAllText(file);
// Annotation
var annotation = new Annotation(text);
pipeline.annotate(annotation);
// Result - Pretty Print
string output = prettyPrint + Path.GetFileName(file);
using (var stream = new ByteArrayOutputStream())
{
pipeline.prettyPrint(annotation, new PrintWriter(stream));
System.IO.File.AppendAllText(output, stream.toString()+Environment.NewLine);
stream.close();
}
}
}
}
}
I have modified the example from official StanfordCoreNLP .Net port here.
Since I am using Windows Forms instead of Console application, this line of code is creating problem: var props = Properties();. I am not sure how to find the namespace of this method to provide a complete namespace.class.method path to disambiguate.
Another minor problem is that I want to update label text as in label2.Text = "Stanford Parser...."; but visual studio says that "An object reference is required" while I am in the same class (Forms1.cs). Your help will be greatly appreciated.

The Properties() class is java.util.Properties.
Simply remove static from the method name to access windows forms objects like text box or label.

How do I populate an array with foreach?

How do I get the data I can write in the console to write to the array and the console.
At the moment it only displays on the console (not added functionality to add to array).
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Diagnostics;
using System.IO;
using System.Text.RegularExpressions;
namespace TBParser
{
class Program
{
static void Main(string[] args)
{
String[] arr = new String[100];
string[] lines = System.IO.File.ReadAllLines(#"C:\ShpereCompare3.txt");
Console.WriteLine("Contents of Text File: ");
foreach (string line in lines)
{
Console.WriteLine("\r\t" + line);
}
System.IO.File.WriteAllLines(#"C:\Test.txt",lines);
Console.WriteLine("Press any key to Exit");
Console.ReadKey();
}
}
}
if my lines of text say
hello
my
name
is
Simon
then the first 5 slots of the array should contain each line?

The line:
string[] lines = System.IO.File.ReadAllLines(#"C:\ShpereCompare3.txt");
is already creating an array, each element of which contains one line.
There is no need to populate a new array with this same information via a foreach.

If you want to copy the lines from the text file into another array, then you can do this:
String[] arr = new String[lines.Length];
Array.Copy(lines, arr, lines.Length);

found a work around. the path i was going down was too complicated. thanks for all your input
fixed code here
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Diagnostics;
using System.IO;
using System.Text.RegularExpressions;
namespace TBParser
{
class Program
{
static void Main(string[] args)
{
string fileName = #"C:shpereCompare3.txt";
List<string> Names = new List<string>();
List<string> Value = new List<string>();
using (StreamReader fileReader = new StreamReader(fileName))
{
string fileLine;
while (!fileReader.EndOfStream)
{
fileLine = fileReader.ReadLine();
if (fileLine.StartsWith("Name"))
{
Names.Add(fileLine.Substring(21));
}
if (fileLine.StartsWith("Center"))
{
string[] fileSplit = fileLine.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
Value.Add(fileSplit[1]);
}
}
string outputString = "";
for (int i = 0;i < Names.Count; i++)
{
outputString += Names[i] + " = " + Value[i] + "\r\n";
}
System.IO.File.WriteAllText(#"C:Test.txt", outputString);
}
}
}
}

C# DataSet To Xml, Node Rename

DataSet ds = GetExcelToXml("test.xls");
string filename = #"C:\test.xml";
FileStream myFileStream = new FileStream(filename, FileMode.Create);
XmlTextWriter myXmlWriter = new XmlTextWriter(myFileStream, Encoding.Default);
ds.WriteXml(myXmlWriter);
myXmlWriter.Close();
Output Xml
<NewDataSet>
<Table>
<UserName>bla1</User_Name>
<Mail>bla1#bla2.com</Mail>
<Address>World</Address>
</Table>
</NewDataSet>
I need Xml Node Name
<ROWS>
<ROW>
<UserName>bla1</User_Name>
<Mail>bla1#bla2.com</Mail>
<Address>World</Address>
</ROW>
</ROWS>
How To Make ?

Try this,
ds.DataSetName = "ROWS";
ds.Tables[0].TableName = "ROW";
ds.WriteXml(myXmlWriter);
myXmlWriter.Close();

XmlDocument myXml;
myXml.Load(myXmlWriter); //Not sure if this will work, but you get the idea
myXml.InnerXml = myXml.InnerXml.Replace("< NewDataSet", "< ROWS")
.Replace("< /NewDataSet>", "< /ROWS>")
.Replace("< Table", "< ROW")
.Replace("< /Table>", "< /ROW>");

Here is a sample C# application which will read the input XML and then copy different XML/Data tables to other files using the Table name:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Xml;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
DataSet dsXml = new DataSet();
dsXml.ReadXml("mydata.xml");
for (int i = 0; i < dsXml.Tables.Count; i++)
{
Console.WriteLine("Table Name: " + dsXml.Tables[i].TableName);
DataSet newDataSet = new DataSet();
newDataSet.Tables.Add(dsXml.Tables[i].Copy());
FileStream myFileStream = new FileStream(dsXml.Tables[i].TableName + ".xml", FileMode.Create);
XmlTextWriter myXmlWriter = new XmlTextWriter(myFileStream, Encoding.Default);
newDataSet.WriteXml(myXmlWriter);
myXmlWriter.Close();
}
}
}
}

In case anyone comes here looking for the opposite direction problem where a typed dataset table name has changed since the xml files were written.
// for xml files created prior to rename of Sample table to SampleS,
// rename the Sample table, read xml,
// then rename table back to current SampleS
if (ds.SampleS.Count == 0)
{
ds = new AnalysisDSX();
ds.Tables["SampleS"].TableName = "Sample";
ds.ReadXml(xmlFilePath);
ds.Tables["Sample"].TableName = "SampleS";
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

write xml file from data table ignoring columns with white spaces - c#

Related

How do I pull all values from PDF?

string.Split() not working for the CSV file read

Type of Properties method in Stanford Core NLP

How do I populate an array with foreach?

C# DataSet To Xml, Node Rename

Categories

Resources