Fast way to get Word range as xml in C#?

Fast way to get Word range as xml in C#? - c#

my code:
// use MS-Word Interop
var rangeList = doc.Sentences.ofType<Range>().ToList();//About 4000 Sentences
foreach (var range in rangeList)
{
string sentXml=range.get_XML(false);//get all sentences xml,it's very slow ,about 18 min
//ConvertToFlowDocument(sentxml);
}
but it is very slow.
how do i convert range.WordOpenXML to openxml elements or etc...
yes,i also use range.Copy() to Convert to rtf string,it is also slow.

Sure, you can use Range.XML.
https://learn.microsoft.com/en-us/dotnet/api/microsoft.office.interop.word.range.xml?view=word-pia
string sentXml = string.Empty;
//Use MS-W Word Interop
foreach (var range in doc.Sentences)
//You don't need to create a list variable because Sentences implements IEnumerable
string sentXml += range.XML; //Get XML for all sentences
//ConvertToFlowDocument(sentxml);
}
I'm not sure how you are going about it , but it would be helpful if you showed what the get_Xml method does. Based on your comments, it sounds like my method may be faster. You can open up a StreamReader, and then insert the XML inside.
WordprocessingDocument MainWordDocument;
using (StreamReader sr = new StreamReader(m_MainWordDocument.MainDocumentPart.GetStream()))
{
string docText = sr.ReadToEnd();
//Then, you can replace any text that you want
//string textToReplace("");
//docText = docText.Replace("what you want to replace", "some xml string that you extracted");
//Or you can just add it onto the end.
string docText = docText + sentXml;
//Open a StreamWriter and write overwrite the file
using (StreamWriter writer = new StreamWriter(filePath, true))
{
writer.Write(docText);
}
}
I don't think this is the most efficient way of doing things and don't know why someone would you use both WordInterop and OpenXML together, and this question is quite old, but this answers your question. If you think that there is a good use case for using the two together, I'd be interested to hear about it.

Related

How to read a portion of a line in a text file?

So, I have a text file with thousands of lines formatted similarly to this:
123456:0.8525000:1590882780:91011
These files are almost always a different length, and I only need to read the first two parts of the line, being 123456:0.8525000.
I know that I can split each line using C#, but I'm unsure how to only read the first 2 parts. Anyone have any idea on how to do this? Sorry if my question doesn't make sense, I can restate it if needed.

The Split function returns a string[], an array of strings.
Just take the 2 first elements of the result of Split (with : as the separator).
var read = "123456:0.8525000:1590882780:91011";
var values = read.Split(":");
Console.WriteLine(values[0]); // 123456
Console.WriteLine(values[1]); // 0.8525000
.NET Fiddle
Don't forget that elements of values are string and not yet int or double values. See How to convert string to integer in C# for how to convert from string to number type.

There are TONS of ways to doing this but I am going to suggest some options that involving read the full line as its much easier to work with / understand and that your lines are of varying length. I did add a suggestion on using StreamReader on a file at the end in addendum but you may need to figure out serious work arounds on skipping lines you don't want, restarting a char iterating loop on new lines etc.
I first demonstrate the latest and greatest IAsyncEnumerable found in NetCore 3.x followed by a similar string-based approach. By sharing an Int example that is a slightly advanced and that will also be asynchronous, I hope to also help others and demonstrate a fairly modern approach in 2020. Streaming out only the data you need will be a huge benefit in keeping it fast and a low memory footprint.
public static async IAsyncEnumerable<int> StreamFileOutAsIntsAsync(string filePathName)
{
if (string.IsNullOrWhiteSpace(filePathName)) throw new ArgumentNullException(nameof(filePathName));
if (!File.Exists(filePathName)) throw new ArgumentException($"{filePathName} is not a valid file path.");
using var streamReader = File.OpenText(filePathName);
string currentLine;
while ((currentLine = await streamReader.ReadLineAsync().ConfigureAwait(false)) != null)
{
if (int.TryParse(currentLine.AsSpan(), out var output))
{
yield return output;
}
}
}
This streams every int out of a file, checking that file exists and that the filename path is not null or blank etc.
Streaming maybe too much for a beginner so I don't know your level.
You may want to start with just turning the file into a list of strings.
Modifying my previous example above to something less complex but split your strings for you. I recommend learning about streaming so you don't have every piece of string in memory while you work on it... or maybe you want them all. I am not here to judge.
Once you get your string line out from a file you can do whatever else needs to be done.
public static async Task<List<string>> GetStringsFromFileAsync(string filePathName)
{
if (string.IsNullOrWhiteSpace(filePathName)) throw new ArgumentNullException(nameof(filePathName));
if (!File.Exists(filePathName)) throw new ArgumentException($"{filePathName} is not a valid file path.");
using var streamReader = File.OpenText(filePathName);
string currentLine;
var strings = new List<string>();
while ((currentLine = await streamReader.ReadLineAsync().ConfigureAwait(false)) != null)
{
var lineAsArray = currentLine.Split(new string[] { ":" }, StringSplitOptions.RemoveEmptyEntries);
// Simple Data Validation
if (lineAsArray.Length == 4)
{
strings.Add($"{lineAsArray[0]}:{lineAsArray[1]}");
strings.Add($"{lineAsArray[2]}:{lineAsArray[3]}");
}
}
return strings;
}
The meat of the code is really simple, open the file for reading!
using var streamReader = File.OpenText(filePathName);
and then loop through that file...
while ((currentLine = await streamReader.ReadLineAsync()) != null)
{
var lineAsArray = currentLine.Split(new string[] { ":" }, StringSplitOptions.RemoveEmptyEntries);
// Simple Data Validation
if (lineAsArray.Length == 4)
{
// Do whatever you need to do with the first bits of information.
// In this case, we add them all to a list for return.
strings.Add($"{lineAsArray[0]}:{lineAsArray[1]}");
strings.Add($"{lineAsArray[2]}:{lineAsArray[3]}");
}
}
What this demonstrates is that, for every line that I read out that is not null, break into four parts (based on the ":") character removing all empty entries.
We then use a C# feature called String Interpolation ($"") to put the first two back together with ":" as a string. Then the second two. Or whatever you need to do with reading each part of the line.
That's really all there is to it! Hope it helps.
Addendum: If you really need to read parts of file, please use a StreamReader.Read and Peek()
using (var sr = new StreamReader(path))
{
while (sr.Peek() >= 0)
{
Console.Write((char)sr.Read());
}
}
Reading each character

Some bare bones code:
string fileName = #"c:\some folder\path\file.txt";
using (StreamReader sr = new StreamReader(fileName))
{
while (!sr.EndOfStream)
{
String[] values = sr.ReadLine().Split(":".ToCharArray());
if (values.Length >= 2)
{
// ... do something with values[0] and values[1] ...
Console.WriteLine(values[0] + ", " + values[1]);
}
}
}

Importing Data from a csv file

I have a csv file.
When I try to read that file using filestream readtoend(), I get inverted commas and \r at many places that breaks my number of rows in each column.
Is there a way to remove inverted commas and \r.
I tried to replace
FileStream obj = new FileStream();
string a = obj.ReadToEnd();
a.Replace("\"","");
a.Replace("\r\"","");
When I visualize a all \r and inverted commas are removed.
But when I read the file again from beginning using ReadLine() they appear again?

First of all, a String is immutable. You might think this is not important for your question, but actualy it's important whenever you are developing.
If I look at your code snippet, I'm pretty sure you have no knowledge of immutable objects so I advice you to make sure you fully understand the concept.
More information regarding immutable objects can be found: http://en.wikipedia.org/wiki/Immutable_object
Basicly, it means one can never modify a string object. Strings will always point to a new object whenever we change the value.
That's why the Replace method returns a value, which's documentation can be found here: https://msdn.microsoft.com/en-us/library/system.string.replace%28v=vs.110%29.aspx and states clearly that it Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string.
In your example, you aren't using the return value of the Replace function.
Could you show us that the string values are actuably being replaced from your a variable? Because I do not believe this is going to be the case. When you visualize a string, carriage returns (\r) are not visual and replaced by an actual carriage return. If you debug and take alook at the actual string value, you should still see the \n.
Take the following code snippet:
var someString = "Hello / world";
someString.Replace("/", "");
Console.Log(someString);
You might think that the console will show "Hello world". However, on this fiddle you can see that it still logs "Hello / World": https://dotnetfiddle.net/cp59i3
What you have to do to correctly use String.Replace can be seen in this fiddle: https://dotnetfiddle.net/XCGtOu
Basicly, you want to log the return value of the Replace function:
var a = "Some / Value";
var b = a.Replace("/", "");
Console.WriteLine(b);
Also, as mentioned by others in the comment section at ur post, you are not replacing the contents of the file, but the string variable in your memory.
If you want to save the new string, make sure to use the Write method of the FileStream (or any other way to write to a file), an explanation can be found here: How to Find And Replace Text In A File With C#
Apart from all what I have been saying throughout this answer, you should not replace both inverted comma's and carriage returns in a file in most cases, they are there for a reason. Unless you do have a specific reason.

At last I succeeded. Thanks to everybody. Here is the code I did.
FileStream obj = new FileStream();
using(StreamReader csvr = new StreamReader(obj))
{
string a = obj.ReadToEnd();
a = a.Replace("\"","");
a = a.Replace("\r\"","");
obj.Dispose();
}
using(StreamWriter Wr = new StreamWriter(TempPath))
{
Wr.Write(a);
}
using(StreamReader Sr = new StreamReader(Tempath))
{
Sr.ReadLine();
}
I Created a temp path on the system. After this things were easy to enter into database.

Try something like this
StreamReader sReader = new StreamReader("filename");
string a = sReader.ReadToEnd();
a.Replace("\"", "");
a.Replace("\r\"", "");
StringReader reader = new StringReader(a);
string inputLine = "";
while ((inputLine = reader.ReadLine()) != null)
{
}

Read and extract from file

I have a huge file with ~3 mill rows. Every line contains record like this:
1|2|3|4|5|6|7|8|9
Exactly 8 separators like '|' on every line. I am looking for a way to read this file then extract last '9' number only from every line and store it into another file.
edit:
Ok here is what i done already.
using (StreamReader sr = new StreamReader(filepath))
using (StreamWriter sw = new StreamWriter(filepath1))
{
string line = null;
while ((line = sr.ReadLine()) != null)
sw.WriteLine(line.Split('|')[8]);
}
File.WriteAllLines("filepath", File.ReadAllLines(filepath).Where(l => !string.IsNullOrWhiteSpace(l)));
Read file, extract last digits then write in new file and clear blank lines. Last digit is 10-15 symbols and I want to extract first 6. I continue to read and try some and when I'm done or have some question I'll edit again.
Thanks
Edit 2:
Ok, here I take first 8 digits from the number:
sw.WriteLine(line.Substring(0, Math.Min(line.Length, 8)));
Edit 3:
I have no idea how can I match now every numbers that left in file. I want to match them and to see witch number how many times is in the file.
Any help?

I am looking for a way to read this file then extract last [..] number only from every line and store it into another file.
What part exactly are you having trouble with? In psuedo code, this is what you want:
fileReader = OpenFile("input")
fileWriter = OpenFile("output")
while !fileReader.EndOfFile
line = fileReader.ReadLine
records[] = line.Split('|')
value = records[8]
fileWriter.WriteLine(value)
do
So start implementing it and feel free to ask a question on any specific line you're having trouble with. Each line of code I posted contains enough pointers to figure out the C# code or the terms to do a web search for it.

You don't say where you are stuck. Break the problem down:
Write and run minimal C# program
Read lines from file
Break up one line
write result line to a file
Are you stuck on any one of those? Then ask a specific question about that. This decomposition technique is key to many programming tasks, and indeed complex tasks in general.
You might find the string split capability useful.

Because it's a huge file you must read it line by line!
public IEnumerable ReadFileIterator(String filePath)
{
using (StreamReader streamReader = new StreamReader(filePath, Encoding.Default))
{
String line;
while ((line = streamReader.ReadLine()) != null)
{
yield return line;
}
yield break;
}
}
public void WriteToFile(String inputFilePath, String outputFilePath)
{
using (StreamWriter streamWriter = new StreamWriter(outputFilePath, true, Encoding.Default))
{
foreach (String line in ReadFileIterator(inputFilePath))
{
String[] subStrings = line.Split('|');
streamWriter.WriteLine(subStrings[8]);
}
streamWriter.Flush();
streamWriter.Close();
}
}

using (StreamReader sr = new StreamReader("input"))
using (StreamWriter sw = new StreamWriter("output"))
{
string line = null;
while ((line=sr.ReadLine())!=null)
sw.WriteLine(line.Split('|')[8]);
}

Some pointer to start from: StreamReader.Readline() and String.Split(). There are examples on both pages.

With LINQ you could do a thing like the following to filter the numbers:
var numbers = from l in File.ReadLines(fileName)
let p = l.Split('|')
select p[8];
and then write them into a new file like that:
File.WriteAllText(newFileName, String.Join("\r\n", numbers));

Use String.Split() to get the line inside an array and get the last element and store it into another file. Repeat the process for each line.

Try this...
// Read the file and display it line by line.
System.IO.StreamReader file =
new System.IO.StreamReader("c:\\test.txt");
while((line = file.ReadLine()) != null)
{
string[] words = s.Split('|');
string value = words [8]
Console.WriteLine (value);
}
file.Close();

C# CSV file to array/list

I want to read 4-5 CSV files in some array in C#
I know that this question is been asked and I have gone through them...
But my use of CSVs is too much simpler for that...
I have csv fiels with columns of following data types....
string , string
These strings are without ',' so no tension...
That's it. And they aren't much big. Only about 20 records in each.
I just want to read them into array of C#....
Is there any very very simple and direct way to do that?

To read the file, use
TextReader reader = File.OpenText(filename);
To read a line:
string line = reader.ReadLine()
then
string[] tokens = line.Split(',');
to separate them.
By using a loop around the two last example lines, you could add each array of tokens into a list, if that's what you need.

This one includes the quotes & commas in fields. (assumes you're doing a line at a time)
using Microsoft.VisualBasic.FileIO; //For TextFieldParser
// blah blah blah
StringReader csv_reader = new StringReader(csv_line);
TextFieldParser csv_parser = new TextFieldParser(csv_reader);
csv_parser.SetDelimiters(",");
csv_parser.HasFieldsEnclosedInQuotes = true;
string[] csv_array = csv_parser.ReadFields();

Here is a simple way to get a CSV content to an array of strings. The CSV file can have double quotes, carriage return line feeds and the delimiter is a comma.
Here are the libraries that you need:
System.IO;
System.Collection.Generic;
System.IO is for FileStream and StreamReader class to access your file. Both classes implement the IDisposable interface, so you can use the using statements to close your streams. (example below)
System.Collection.Generic namespace is for collections, such as IList,List, and ArrayList, etc... In this example, we'll use the List class, because Lists are better than Arrays in my honest opinion. However, before I return our outbound variable, i'll call the .ToArray() member method to return the array.
There are many ways to get content from your file, I personally prefer to use a while(condition) loop to iterate over the contents. In the condition clause, use !lReader.EndOfStream. While not end of stream, continue iterating over the file.
public string[] GetCsvContent(string iFileName)
{
List<string> oCsvContent = new List<string>();
using (FileStream lFileStream =
new FileStream(iFilename, FileMode.Open, FileAccess.Read))
{
StringBuilder lFileContent = new StringBuilder();
using (StreamReader lReader = new StreamReader(lFileStream))
{
// flag if a double quote is found
bool lContainsDoubleQuotes = false;
// a string for the csv value
string lCsvValue = "";
// loop through the file until you read the end
while (!lReader.EndOfStream)
{
// stores each line in a variable
string lCsvLine = lReader.ReadLine();
// for each character in the line...
foreach (char lLetter in lCsvLine)
{
// check if the character is a double quote
if (lLetter == '"')
{
if (!lContainsDoubleQuotes)
{
lContainsDoubleQuotes = true;
}
else
{
lContainsDoubleQuotes = false;
}
}
// if we come across a comma
// AND it's not within a double quote..
if (lLetter == ',' && !lContainsDoubleQuotes)
{
// add our string to the array
oCsvContent.Add(lCsvValue);
// null out our string
lCsvValue = "";
}
else
{
// add the character to our string
lCsvValue += lLetter;
}
}
}
}
}
return oCsvContent.ToArray();
}
Hope this helps! Very easy and very quick.
Cheers!

Most efficient way of removing lines that contain more than one string from a file?

I want to find the most efficient way of removing string 1 and string 2 when reading a file (host file) and remove the entire lines that contains string 1 or string 2.
Currently I have, and is obviously sluggish. What better methods are there?
using(StreamReader sr = File.OpenText(path)){
while ((stringToRemove = sr.ReadLine()) != null)
{
if (!stringToRemove.Contains("string1"))
{
if (!stringToRemove.Contains("string2"))
{
emptyreplace += stringToRemove + Environment.NewLine;
}
}
}
sr.Close();
File.WriteAllText(path, emptyreplace);
hostFileConfigured = false;
UInt32 result = DnsFlushResolverCache();
MessageBox.Show(removeSuccess, windowOffline);
}

The primary problem that you have is that you are constantly using large regular strings and appending data onto the end. This is re-creating the strings each time and consumes a lot of time and particularly memory. By using string.Join it will avoid the (very large number of) intermediate string values being created.
You can also shorten the code to get the lines of text by using File.ReadLines instead of using the stream directly. It's not really any better or worse, just prettier.
var lines = File.ReadLines(path)
.Where(line => !line.Contains("string1") && !line.Contains("string2"));
File.WriteAllText(path, string.Join(Environment.NewLine, lines));
Another option would be to stream the writing of the output as well. Since there is no good library method for writing out a IEnumerable<string> without eagerly evaluating the input, we'll have to write our own (which is simple enough):
public static void WriteLines(string path, IEnumerable<string> lines)
{
using (var stream = File.CreateText(path))
{
foreach (var line in lines)
stream.WriteLine(line);
}
}
Also note that if we're streaming our output then we'll need a temporary file, since we don't want to be reading and writing to the same file at the same time.
//same code as before
var lines = File.ReadLines(path)
.Where(line => !line.Contains("string1") && !line.Contains("string2"));
//get a temp file path that won't conflict with any other files
string tempPath = Path.GetTempFileName();
//use the method from above to write the lines to the temp file
WriteLines(tempPath, lines);
//rename the temp file to the real file we want to replace,
//both deleting the temp file and the old file at the same time
File.Move(tempPath, path);
The primary advantage of this option, as opposed to the first, is that it will consume far less memory. In fact, it only ever needs to hold line of the file in memory at a time, rather than the whole file. It does take up a bit of extra space on disk (temporarily) though.

The first thing that shines to me, is wrong (not efficient) use of string type variable inside a while loop (emptyreplace), use StrinBuilder type and it will be much memory efficient.
For example:
StringBuilder emptyreplace = new StringBuilder();
using(StreamReader sr = File.OpenText(path)){
while ((stringToRemove = sr.ReadLine()) != null)
{
if (!stringToRemove.Contains("string1"))
{
if (!stringToRemove.Contains("string2"))
{
//USE StringBuilder.Append, and NOT string concatenation
emptyreplace.AppendLine(stringToRemove + Environment.NewLine);
}
}
}
...
}
The rest seems good enough.

There are a number of ways to improve this:
Compile the array of words you're searching for into a regex (eg, word1|word2; beware of special characters) so that you'll only need to loop over the string once. (this would also allow you to use \b to only match words)
Write each line through a StreamWriter to a new file so that you don't need to store the whole thing in memory while building it. (after you finish, delete the original file & rename the new one)

Is your host file really that big that you need to bother with reading it line by line? Why not simply do this?
var lines = File.ReadAllLines(path);
var lines = lines.Where(x => !badWords.Any(y => x.Contains(y))).ToArray();
File.WriteAllLines(path, lines);

Two suggestions:
Create an array of strings to detect (I'll call them stopWords) and use Linq's Any extension method.
Rather than building the file up and writing it all at once, write each line to an output file one at a time while your reading the source file, and replace the source file once your done.
The resulting code:
string[] stopWords = new string[]
{
"string1",
"string2"
}
using(StreamReader sr = File.OpenText(srcPath))
using(StreamWriter sw = new StreamWriter(outPath))
{
while ((stringToRemove = sr.ReadLine()) != null)
{
if (!stopWords.Any(s => stringToRemove.Contains(s))
{
sw.WriteLine(stringToRemove);
}
}
}
File.Move(outPath, srcPath);

Update: I just realized that you are actually talking about the "hosts file". Assuming you mean %windir%\system32\drivers\etc\hosts, it is very unlikely that this file has a truly significant size (like more than a couple of KBs). So personally, I would go with the most readable approach. Like, for example, the one by #servy.
In the end you will have to read every line and write every line, that does not match your criteria. So, you will always have the basic IO overhead that you cannot avoid. Depending on the actual (average) size of your files that might overshadow every other optimization technique you use in your code to actually filter the lines.
Having that said, you can however be a little less wasteful on the memory side of things, by not collecting all output lines in a buffer, but directly writing them to the output file as you have read them (again, this might be pointless if you files are not very big).
using (var reader = new StreamReader(inputfile))
{
using (var writer = new StreamWriter(outputfile))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (line.IndexOf("string1") == -1 && line.IndexOf("string2") == -1)
{
writer.WriteLine(line);
}
}
}
}
File.Move(outputFile, inputFile);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Fast way to get Word range as xml in C#? - c#

Related

How to read a portion of a line in a text file?

Importing Data from a csv file

Read and extract from file

C# CSV file to array/list

Most efficient way of removing lines that contain more than one string from a file?

Categories

Resources