How to extract text from multiple files

How to extract text from multiple files - c#

I have upwards of 200 files that I need to extract a certain sequence of lines from, and write the results in a new csv file. I am just learning C#, but have experience with other languages far in the past. I have tried looking up all the individual steps, along with Regex, which I don't understand, but I don't know how to stitch it all together.
Sample text:
--> SAT1_988_Connection_Verify
EA0683010A01030F15A40202004E2000
E0068300
E40683010278053A
>
(S45, 10:38:35 AM)
Algorithm Steps
1) I need to point the program at a directory with the files.
2) I need the program to search through each file in the directory.
3) I need to find the lines that starts with "E40", of which there could be multiple or none. Additionally, this line varies in length.
4) I need to grab that line, as well as the two before it, which are highlighted in the nested block quote above.
5) There is always a blank line after the target line.
6)I need to write those three lines separated by commas in a text document.
My code so far:
using System;
using System.Collections.Generic;
using System.IO;
namespace ConsoleApplication2
{
class Program
{
static void Main()
{
string path = #"C:\ETT\Test.txt";
string[] readText = File.ReadAllLines(path);
foreach (string s in readText)
{
}
}
public static string getBetween(string[] strSource, string strKey)
{
int Start, End;
if (strSource.Contains(strKey))
{
Start = Array.IndexOf(strSource, strKey) -2;
End = Array.IndexOf(strSource, strKey) + 1;
return strSource.Substring(Start, End - Start);
}
else
{
return "";
}
}
}
}

There are many ways of doing this. However just to help you (and because you added comparatively detailed amount of information for a first post, you need to look up the following topics
Directory.EnumerateFiles Method
Returns an enumerable collection of file names that match a search
pattern in a specified path.
File.ReadAllLines Method
Opens a text file, reads all lines of the file into a string array,
and then closes the file.
Enumerable.Where<TSource> Method (IEnumerable, Func)
Filters a sequence of values based on a predicate.
String.StartsWith Method
Determines whether the beginning of this string instance matches a
specified string.
https://joshclose.github.io/CsvHelper/
A library for reading and writing CSV files. Extremely fast, flexible,
and easy to use. Supports reading and writing of custom class objects.
CSV helper implements RFC 4180. By default, it's very conservative in
its writing, but very liberal in its reading. There is a large set of
configuration that can be done to change how reading and writing
behaves, giving you the ability read/write non-standard files also.
The only tricky part will be getting 3 lines before
List<T>.IndexOf Method (T)
Searches for the specified object and returns the zero-based index of
the first occurrence within the entire List.
From that index, you can use List[Index-1] List[Index-2] to get the preceding lines
Good luck.

Related

Delete specific row from File

I have this little project in C# where I am manipulating with files. Now my task is that I have to delete specific rows from files.
For example my file looks like this:
1-this is the first line
2-this is the second line
3-this is the third line
4-this is the fourth line
Now how can I keep only the first two rows and delete only the last two rows?
Note- this is how I read the file from my local machine:
string[] lines = File.ReadAllLines(#"C:\Users\admin\Desktop\COMMANDS.dat");
I have tried something like this but I think it's not so "efficient"
string text = File.ReadAllText(#"C:\Users\admin\Desktop\COMMANDS.dat");
text = text.Replace(lines[2], "");
text = text.Replace(lines[3], "");
File.WriteAllText(#"C:\Users\admin\Desktop\COMMANDS.dat", text);
So this actually does the job, it replaces the lines by string with an empty character but when I take a look at the file, I don't want to have 4 lines there, even though 2 of them are real strings and the other two are just empty lines... Can I manage to do this in another way?

Try replacing the newline character with an empty string:
string text = File.ReadAllText(#"C:\Users\admin\Desktop\COMMANDS.dat");
text = text.Replace(lines[2], "").Remove(Environment.NewLine, "");
text = text.Replace(lines[3], "").Remove(Environment.NewLine , "");
File.WriteAllText(#"C:\Users\admin\Desktop\COMMANDS.dat", text);
If my answer is useful, please mark it as accepted, and upvote it.

async Task Example()
{
var inputLines = await File.ReadAllLinesAsync("path/to/file.txt");
var outputLines = inputLines.Where((l, i) => i < 2);
await File.WriteAllLinesAsync("target/file.txt", outputLines);
}
What it does
Read data but not as one string but as a collection of lines
Create a new collection containing only the lines you want in your output
Write the filtered lines
Notes:
This example is not optimized for memory usage (because we read all lines and for larger files, e.g. multiple GB, this will fail). See existing answers for memory optimized version) - but: It's totally fine to do it this way if you know you have just a few k lines. (and it's faster)
Try not to "modify" strings. This will always create a copy and needs a lot of memory.
In this "Linq style" (functional) approach, we should treat data as immutable. That means: we have one variable that represents the input file and one variable that represents the result. We use declarative Linq to describe how the output should look like. "output is input where the filter index < 2 matches" instead of "if xy remove line" in an imperative style.

Reading a manipulating text files variable suggestion

I'm new to C# and am working on a fun maybe useful program for my education.
I've got data that's stored in files, one file for each entry in the below format. I believe the are in a particular order in the file as well.
There are around a 100 values for each file. My program will basically alter a few of these
values and write them back to the file.
I'm trying to figure out how I should store these values. I know how to read the text file.
I thought about reading each line and storing it in an array. Does anyone have any other suggestions? Would this be a good use case for a class?
D:"value1"=00000800
D:"value2"=00000001
S:"value3"=full

Glad you have picked up C#. I hope you find learning it rewarding.
One of the methods I prefer when I want to modify a file in C# is first File.ReadAllLines and then Files.WriteAllLines. For these two static methods, you will need using System.IO.
To parse the texts, you might need String.Split.
Here's an example:
using System;
using System.IO;
class Test
{
public static void Main()
{
var filepath = #"myfile.txt";
// Read all lines.
var allLines = File.ReadAllLines(filepath);
// Modify your text here.
foreach (var line in allLines)
{
// Parse the line and separate its components with delimiters ':', '"' and '='.
var components = line.Split(new char[]{':', '"', '=',});
// Change all X:"value_i"=Y to X:"value_i"=5.
components[2] = "5";
}
// Write all lines.
File.WriteAllLines(filepath, allLines);
}
}

Removing text above real content of CSV file

I have a CSV whose author, annoyingly enough, has decided to 'introduce' the file before the contents themselves. So in all, I have a CSV that looks like:
This file was created by XXXXYY and represents the crossover between YY and QQQ.
Additional information can be found through the website GG, blah blah blah...
Jacob, Hybrid
Dan, Pure
Lianne, Hybrid
Jack, Hatchback
So the problem here is that I want to get rid of the first few lines before the 'real content' of the CSV file begins. I'm looking for robustness here, so using Streamreader and removing all content before the 4th line for example, is not ideal (plus the length of the text can vary).
Is there a way in which one can read only what matters and write a new CSV into a directory path?
Regards,
genesis
(edit - I'm looking for C sharp code)

The solution depends on the files you have to parse. You need to look for a reliable pattern that distinguishes data from comment.
In your example, there are some possibilities that might be the same in other files:
there are 4 lines of text. But you say this isn't consistent across files
The text lives may not contain the same number of commas as the data table. But that is unlikely to be reliable for all files.
there is a blank/whitespace only line between the text and the data.
the data appears to be in the form word-comma-word. If this is true it should be easy to identify non data lines (any line which doesn't contain exactly one comma, or has multiple words etc)
You may be able to use a combination of these heuristics to more reliably detect the data.

You could scan by line (looking for the \r\n) and ignore lines that don't have a comma count that matches you csv.
You should be able to read the file into a string pretty easily unless it is really massive.
e.g.
var csv = "some test\r\nsome more text\r\na,b,c\r\nd,e,f\r\n";
var lines = csv.Split('\r\n');
var csvLines = line.Where(l => l.Count(',') == 2);
// now csvLines contains only the lines you are after

List<string> info = new List<string>();
int counter = 0;
// Open the file to read from.
info = System.IO.File.ReadAllLines(path).ToList();
// Find the lines up until (& including) the empty one
foreach (string s in info)
{
counter++;
if(string.IsNullOrEmpty(s))
break; //exit from the loop
}
// Remove the lines including the blank one.
info.RemoveRange(0,counter);
Something like this should work, you should probably put some tests in to make sure counter is not > length and other tests to handle errors.
You could adapt this code so that it just finds the empty line number using linq or something, but I don't like the overhead of linq (Yeah ironic considering I'm using c#).
Regards,
Slipoch

Simple csv comma delimited html table gen

I know the question has been asked before, but I wasn't quite satisfied with the answer (considering it didn't explain what was going on).
My specific question is :How do I open a csv/text file that's comma delimited and rows are separated by returns and put it into an HTML table using C#?
I understand how to do this in PHP but I have just started learning ASP.Net/C#, if anyone has some free resources for C# and/or is able to provide me with a snippet of code with some explanation of whats going on I would appreciate it.
I have this code, but I'm not sure how I would use it because A)I don't know how C# arrays work and B)I'm not sure how to open files in ASP.Net C#:
var lines =File.ReadAllLines(args[0]);
using (var outfs = File.AppendText(args[1]))
{
outfs.Write("<html><body><table>");
foreach (var line in lines)
outfs.Write("<tr><td>" + string.Join("</td><td>", line.Split(',')) + "</td></tr>");
outfs.Write("</table></body></html>");
}
I apologize for my glaring inexperience here.

The code sample you posted does exactly what you're asking:
Opens a file.
Writes a string of HTML for a table with the contents of the file from step 1 to another file.
Let's break it down:
var lines =File.ReadAllLines(args[0]);
This opens the file specified in args[0] and reads all the lines into a string array, one lay per element. See File.ReadAllLines Method (String).
using (var outfs = File.AppendText(args[1]))
{
File.AppendText Method creates a StreamWriter to append text to an existing file (or creates it if it doesn't exist). The filename (and path, possibly) are in args[1]. The using statement puts the StreamWriter into what is called a using block, to ensure the stream is correctly disposed once the using block is left. See using Statement (C# Reference) for more information.
outfs.Write("<html><body><table>");
outfs.Write calls the Write method of the StreamWriter (StreamWriter.Write Method (String)). Actually in the case of your code snippet nothing is written to the file until you exit the using block - it's written to a buffer. Exiting the using block will flush the buffer and write to the file.
foreach (var line in lines)
This command starts a loop through all the elements in the string array lines, staring with the first (element 0) index. See foreach, in (C# Reference) for more information if you need it.
outfs.Write("<tr><td>" + string.Join("</td><td>", line.Split(',')) + "</td></tr>");
String.Join is the key part here, where most of the work is done. String.Join Method (String, String[]) has the technical details, but essentially what is happening here is that the second argument (line.Split(',')) is passing in an array of strings, and the strings in that array are then being concatenated together with the first argument (</td><td>) as the separator, and the table row is being opened and closed.
For example, if the line is "1,2,3,4,5,6", the Split gives you a 6 element array. This array is then conatenated with </td><td> as the separator by String.Join, so you have "1</td><td>2</td><td>3</td><td>4</td><td>5</td><td>6". "<tr><td>" is added to the front and "</td></tr>" is added to the end and the final line is "<tr><td>1</td><td>2</td><td>3</td><td>4</td><td>5</td><td>6</td></tr>".
outfs.Write("</table></body></html>");
}
This writes the end of the HTML to the buffer, which is then flushed and written to the specified text file.
A couple of things to note. args[0] and args[1] are used to hold command line arguments (i.e., MakeMyTable.exe InFile.txt OutFile.txt), which aren't (in my experience) applicable to ASP.NET applications. You'll need to either code the files (and paths) necessary, or allow the user to specify the input file and/or output file. The ASP.NET application will need to be running under an account that has permission to access those files as well.
If you have quoted values in the CSV file, you'll need to handle those (this is very common when dealing with monetary amounts, for example), as splitting on the , may cause an incorrect split. I recommend taking a look at TextFieldParser, as it can handle quoted fields quite easily.
Unless you're sure that each line in the file has the same number of fields, you run the risk of having poorly formed HTML in your table and no guarantees on how it will render.
Additionally, it would be advisable to test that the file you're opening exists. There's probably more, but these are the basics (and may already be beyond the scope of Stack Overflow).

Hopefully this will help point you in the right direction:
line = "<table>" + line;
foreach (string line in lines)
{
line = "<tr><td>" + line;
line.Replace(",", "</td><td>");
line += "</td></tr>";
}
Response.Write(line + "</table>");
Good luck with your learning!

How should I use the following code in VS?

We recently received a bunch of files with tab-delimiters.
We were having difficulties importing them in sql server database.
The vendor who sent the files also sent the code below for us to use in converting the files from tab to comma delimiters.
How do I use this file in visual studio.
I have used visual studio several times befor but I have not used it with just single file such as this.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace TabToComma
{
class Program
{
static void Main(string[] args)
{
StreamReader sr;
StreamWriter sw;
sr = new StreamReader(#"c:\input.txt");
sw = new StreamWriter(#"c:\output.txt");
string nextline;
string replacedline;
while (sr.Peek() >= 0)
{
nextline = sr.ReadLine();
replacedline = nextline.Replace('\t',','); // replace each tab in line with a comma
sw.WriteLine(replacedline);
}
sr.Close();
sw.Close();
}
}
}
Alternatively, if someone knows how I can accomplish same thing using vbscript please point me in the right direction.
Thanks alot in advance

Create a console app, and replace contents of generated program.cs with the text above. And then, hit RUN :)

You need to create a new Console application and then paste this code into the example file created as part of the solution. Then change the "c:\input.txt" to be the file you want to convert and then hit run.

Also, here's a replacement for the content of Main() that might make your life easier, as long as the files are of decent size:
foreach(string f in args) {
System.IO.File.WriteAllText(f, System.IO.File.ReadAllText(f).Replace('\t', ','));
}
Compile and drag and drop all your files onto the resulting executable. They'll be converted automatically.
You can even grab the compiled executable from here: http://dl.dropbox.com/u/2463964/TabsToCommas.exe if you're having trouble compiling it.

OK, that was nice playing in the answers with all kind of methods how to replace characters in a string. But unfortunately, reality is not as easy as that. How do you handle data with comma's in it for example? Like Telephone bill{tab}USD{tab}1,234.00 becoming Telephone bill,USD,1,234.00. An extra column is inserted and data gets corrupted because the database registers that your telephone bill was only one dollar. Luckily, the problem is not the other way around because even The Scripting Guy doesn't have a waterproof solution for that.
What your vendor should have delivered is a line by line reader, where every line is split on the tab character into an array with all values. Then read out all values in the array to see if there is one or more comma's in the value and wrap it with double quotes. After that, the array is assembled to a string with a join on the comma to make it a 'real' CSV file.
But why go through all the hassle if you can tackle the problem at the source; why not flag your data as tab delimited in SQL?
BULK
INSERT TableYouWantToImportTo
FROM 'c:\input.txt'
WITH
(
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n'
)
GO

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to extract text from multiple files - c#

Related

Delete specific row from File

Reading a manipulating text files variable suggestion

Removing text above real content of CSV file

Simple csv comma delimited html table gen

How should I use the following code in VS?

Categories

Resources