Search specific string and return whole line - c#

What I would like to do is find all instances of a string in a text file, then add the full lines containing the said string to an array.
For example:
eng GB English
lir LR Liberian Creole English
mao NZ Maori
Searching eng, for example, must add the first two lines to the array, including of course the many more instances of 'eng' in the file.
How can this be done, using a text file input and C#?

you can use TextReader to read each line and search for it, if you find what u want, then add that line into string array
List<string> found = new List<string>();
string line;
using(StreamReader file = new StreamReader("c:\\test.txt"))
{
while((line = file.ReadLine()) != null)
{
if(line.Contains("eng"))
{
found.Add(line);
}
}
}
or you can use yield return to return enumurable

One line:
using System.IO;
using System.Linq;
var result = File.ReadAllLines(#"c:\temp").Select(s => s.Contains("eng"));
Or, if you want a more memory efficient solution, you can roll an extension method. You can use FileInfo, FileStream, etc. as the base handler:
public static IEnumerable<string> ReadAndFilter(this FileInfo info, Predicate<string> condition)
{
string line;
using (var reader = new StreamReader(info.FullName))
{
while ((line = reader.ReadLine()) != null)
{
if (condition(line))
{
yield return line;
}
}
}
}
Usage:
var result = new FileInfo(path).ReadAndFilter(s => s.Contains("eng"));

You can try the following code, i tried it and it was working
string searchKeyword = "eng";
string fileName = "Some file name here";
string[] textLines = File.ReadAllLines(fileName);
List<string> results = new List<string>();
foreach (string line in textLines)
{
if (line.Contains(searchKeyword))
{
results.Add(line);
}
}

The File object contains a static ReadLines method that returns line-by-line, in contrast with ReadAllLines which returns an array and thus needs to load the complete file in memory.
So, by using File.ReadLines and LINQ an efficient and short solution could be written as:
var found = File.ReadLines().Where(line => line.Contains("eng")).ToArray();
As for the original question, it could be optimized further by replacing line.Contains with line.StartsWith, as it seems the required term appears in the beginning of each line.

Related

C# Regex Pattern to remove comma inside double quote delimited string

I can't be the first person to have this issue but hours of searching Stack revealed nothing close to an answer. I have an SSIS script that works over a directory of csv files. This script folds, bends and mutilates these files; performs queries, data cleansing, persists some data and finally outputs a small set to csv file that is ingested by another system.
One of the files has a free text field that contains the value: "20,000 BONUS POINTS". This one field, in a file of 10k rows, one of dozens of similar files, is the problem that I can't seem to solve.
Be advised: I'm weak on both C# and Regex.
Sample csv set:
4121,6383,0,,,TRUE
4122,6384,0,"20,000 BONUS POINTS",,TRUE
4123,6385,,,,
4124,6386,0,,,TRUE
4125,6387,0,,,TRUE
4126,6388,0,,,TRUE
4127,6389,0,,,TRUE
4128,6390,0,,,TRUE
I found plenty of information on how to parse this using a variety of Regex patterns but what I've noticed is the StreamReader.ReadLine() method wraps the complete line with double quotes:
"4121,6383,0,,,TRUE"
such that the output of the regex Replace method:
s = Regex.Replace(line, #"[^\""]([^\""])*[^\""]",
m => m.Value.Replace(",", ""));
looks like this:
412163830TRUE
and the target line that actually contains a double quote delimited string ends up looking like:
"412263840\"20000 BONUS POINTS\"TRUE"
My entire method (for your reading pleasure) is this:
string fileDirectory = "C:\\tmp\\Unzip\\";
string fullPath = "C:\\tmp\\Unzip\\test.csv";
string line = "";
//int count=0;
List<string> list = new List<string>();
try
{
//MessageBox.Show("inside Try Block");
string s = null;
StreamReader infile = new StreamReader(fullPath);
StreamWriter outfile = new StreamWriter(Path.Combine(fileDirectory, "output.csv"));
while ((line = infile.ReadLine()) != null)
{
//line.Substring(0,1).Substring(line.Length-1, 1);
System.Console.WriteLine(line);
Console.WriteLine(line);
line =
s = Regex.Replace(line, #"[^\""]([^\""])*[^\""]",
m => m.Value.Replace(",", ""));
System.Console.WriteLine(s);
list.Add(s);
}
foreach (string item in list)
{
outfile.WriteLine(item);
};
infile.Close();
outfile.Close();
//System.Console.WriteLine("There were {0} lines.", count);
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
//another addition for TFS consumption
}
Thanks for reading and if you have a useful answer, bless you and your prodigy for generations to come!
mfc
EDIT: The requirement is a valid csv file output. In the case of the test data, it would look like this:
4121,6383,0,,,TRUE
4122,6384,0,"20000 BONUS POINTS",,TRUE
4123,6385,,,,
4124,6386,0,,,TRUE
4125,6387,0,,,TRUE
4126,6388,0,,,TRUE
4127,6389,0,,,TRUE
4128,6390,0,,,TRUE
I recommend using a CSV reader lib like others have suggested.
Install-Package LumenWorksCsvReader
https://github.com/phatcher/CsvReader#getting-started
However, if you just want to try something fast and dirty. Give this a try.
If I understand correctly. You need to remove commas between double quotes within each line of a CSV file. This should do that.
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string pattern = #"([""'])(?:(?=(\\?))\2.)*?\1";
List<string> lines = new List<string>();
lines.Add("4121,6383,0,,,TRUE");
lines.Add("4122,6384,0,\"20,000 BONUS POINTS\",,TRUE");
lines.Add("4123,6385,,,,");
lines.Add("4124,6386,0,,,TRUE");
lines.Add("4125,6387,0,,,TRUE");
lines.Add("4126,6388,0,,,TRUE");
lines.Add("4127,6389,0,,,TRUE");
lines.Add("4128,6390,0,,,TRUE");
StringBuilder sb = new StringBuilder();
foreach (var line in lines)
{
sb.Append(Regex.Replace(line, pattern, m => m.Value.Replace(",", ""))+"\n");
}
Console.WriteLine(sb.ToString());
}
}
OUTPUT
4121,6383,0,,,TRUE
4122,6384,0,"20000 BONUS POINTS",,TRUE
4123,6385,,,,
4124,6386,0,,,TRUE
4125,6387,0,,,TRUE
4126,6388,0,,,TRUE
4127,6389,0,,,TRUE
4128,6390,0,,,TRUE
https://dotnetfiddle.net/flmWG3
I haven't tried with numerous lines, but this would be my first approach:
namespace ConsoleTestApplication
{
class Program
{
static void Main(string[] args)
{
var before = "4122,6384,0,\"20,000 BONUS POINTS\",,TRUE";
var pattern = #"""[^""]*""";
var after = Regex.Replace(before, pattern, match => match.Value.Replace(",", ""));
Console.WriteLine(after);
}
}
}

Writing List<String> contents to text file after deleting string

I'm trying to get the contents of a Text File, delete a line of string, and re-write back to the Text File, deleting the line of string. I'm using StreamReader to get the text, importing into a List, removing the string, then rewriting using StreamWriter. My problems arises somewhere around the removing or writing of the string. Instead of writing back the existing, non deleted contents to the text file, all the text is replaced with :
System.Collections.Generic.List`1[System.String]
My code for this function is as follows:
{
for (int i = deleteDevice.Count - 1; i >= 0; i--)
{
string split = "";
//deleteDevice[i].Split(',').ToString();
List<string> parts = split.Split(',').ToList();
if (parts.Contains(deviceList.SelectedItem.ToString()))
{
deleteDevice.Remove(i.ToString());
}
}
if (deleteDevice.Count != 0) //Error Handling
{
writer.WriteLine(deleteDevice);
}
}
deviceList.Items.Remove(deviceList.SelectedItem);
}
I would just like the script to write back any string that isn't deleted (If there is any), without replacing it. Any help is appreciated, Cheers
You can read all the info from the text file into a list and then remove from the list and rewrite that to the text file.
I would change the list 'deleteDevice' to store a string array instead and use the code below to determine which item to remove.
List<int> toRemove = new List<int>();
int i = 0;
/*build a list of indexes to remove*/
foreach (string[] x in deleteDevice)
{
if (x[0].Contains(deviceList.SelectedItem.ToString()))
{
toRemove.Add(i);
}
i++;
}
/*Remove items from list*/
foreach (int fd in toRemove)
deleteDevice.RemoveAt(fd);
/*write to text file*/
using (StreamWriter writer = new StreamWriter("Devices.txt"))
{
if (deleteDevice.Count != 0) //Error Handling
{
foreach (string[] s in deleteDevice)
{
StringBuilder sb = new StringBuilder();
for (int fds = 0; fds < s.Length; fds++ )
{
sb.Append(s[fds] + ",");
}
string line = sb.ToString();
writer.WriteLine(line.Substring(0, line.Length - 1));
}
}
}
This isn't the best solution but should work for your needs. There's probably a much easier way of doing this.
The problem is in the following line:
writer.WriteLine(deleteDevice);
You're writing deleteDevice (I assume this is of type List). List.ToString() returns the type name of the list, because this has no specific implementation. What you want is
foreach(String s in deleteDevice)
{
writer.WriteLine(s);
}
Problems
deleteDevice is of type List<string>, and because it also doesn't overload ToString(), the default behaviour of List<string>.ToString() is to return the name of the type.
Hence your line writer.WriteLine(deleteDevice); writes the string System.Collections.Generic.List1[System.String]`.
Other than that, there are many things wrong with your code...
For example, you do this:
string split = "";
and then on the line afterwards you do this:
List<string> parts = split.Split(',').ToList();
But because split is "", this will always return an empty list.
Solution
To simplify the code, you could first write a helper method that will remove from a file all the lines that match a specified predicate:
public void RemoveUnwantedLines(string filename, Predicate<string> unwanted)
{
var lines = File.ReadAllLines(filename);
File.WriteAllLines(filename, lines.Where(line => !unwanted(line)));
}
Then you can write the predicate something like this (this might not be quite right; I don't really know exactly what your code is doing because it's not compilable and omits some of the types):
string filename = "My Filename";
string deviceToRemove= deviceList.SelectedItem.ToString();
Predicate<string> unwanted = line =>
line.Split(new [] {','})
.Contains(deviceToRemove);
RemoveUnwantedLines(filename, unwanted);

C# CSV file to array/list

I want to read 4-5 CSV files in some array in C#
I know that this question is been asked and I have gone through them...
But my use of CSVs is too much simpler for that...
I have csv fiels with columns of following data types....
string , string
These strings are without ',' so no tension...
That's it. And they aren't much big. Only about 20 records in each.
I just want to read them into array of C#....
Is there any very very simple and direct way to do that?
To read the file, use
TextReader reader = File.OpenText(filename);
To read a line:
string line = reader.ReadLine()
then
string[] tokens = line.Split(',');
to separate them.
By using a loop around the two last example lines, you could add each array of tokens into a list, if that's what you need.
This one includes the quotes & commas in fields. (assumes you're doing a line at a time)
using Microsoft.VisualBasic.FileIO; //For TextFieldParser
// blah blah blah
StringReader csv_reader = new StringReader(csv_line);
TextFieldParser csv_parser = new TextFieldParser(csv_reader);
csv_parser.SetDelimiters(",");
csv_parser.HasFieldsEnclosedInQuotes = true;
string[] csv_array = csv_parser.ReadFields();
Here is a simple way to get a CSV content to an array of strings. The CSV file can have double quotes, carriage return line feeds and the delimiter is a comma.
Here are the libraries that you need:
System.IO;
System.Collection.Generic;
System.IO is for FileStream and StreamReader class to access your file. Both classes implement the IDisposable interface, so you can use the using statements to close your streams. (example below)
System.Collection.Generic namespace is for collections, such as IList,List, and ArrayList, etc... In this example, we'll use the List class, because Lists are better than Arrays in my honest opinion. However, before I return our outbound variable, i'll call the .ToArray() member method to return the array.
There are many ways to get content from your file, I personally prefer to use a while(condition) loop to iterate over the contents. In the condition clause, use !lReader.EndOfStream. While not end of stream, continue iterating over the file.
public string[] GetCsvContent(string iFileName)
{
List<string> oCsvContent = new List<string>();
using (FileStream lFileStream =
new FileStream(iFilename, FileMode.Open, FileAccess.Read))
{
StringBuilder lFileContent = new StringBuilder();
using (StreamReader lReader = new StreamReader(lFileStream))
{
// flag if a double quote is found
bool lContainsDoubleQuotes = false;
// a string for the csv value
string lCsvValue = "";
// loop through the file until you read the end
while (!lReader.EndOfStream)
{
// stores each line in a variable
string lCsvLine = lReader.ReadLine();
// for each character in the line...
foreach (char lLetter in lCsvLine)
{
// check if the character is a double quote
if (lLetter == '"')
{
if (!lContainsDoubleQuotes)
{
lContainsDoubleQuotes = true;
}
else
{
lContainsDoubleQuotes = false;
}
}
// if we come across a comma
// AND it's not within a double quote..
if (lLetter == ',' && !lContainsDoubleQuotes)
{
// add our string to the array
oCsvContent.Add(lCsvValue);
// null out our string
lCsvValue = "";
}
else
{
// add the character to our string
lCsvValue += lLetter;
}
}
}
}
}
return oCsvContent.ToArray();
}
Hope this helps! Very easy and very quick.
Cheers!

(C#) How to read all files in a folder to find specific lines?

I'm very new to C# so please have some extra patience. What I am looking to do is read all files in a folder, to find a specific line (which can occur more than once in the same file) and get that output to show onscreen.
If anyone could point me in the direction to which methods I need to use it would be great.
Thanks!
Start with
const string lineToFind = "blah-blah";
var fileNames = Directory.GetFiles(#"C:\path\here");
foreach (var fileName in fileNames)
{
int line = 1;
using (var reader = new StreamReader(fileName))
{
// read file line by line
string lineRead;
while ((lineRead = reader.ReadLine()) != null)
{
if (lineRead == lineToFind)
{
Console.WriteLine("File {0}, line: {1}", fileName, line);
}
line++;
}
}
}
As Nick pointed out below, you can make search parallel using Task Library, just replace 'foreach' with Parallel.Foreach(filesNames, file=> {..});
Directory.GetFiles: http://msdn.microsoft.com/en-us/library/07wt70x2
StreamReader: http://msdn.microsoft.com/en-us/library/f2ke0fzy.aspx
What output do you want to get on the screen?
If you want to find the first file with the given line, you can use this short code:
var firstMatchFilePath = Directory.GetFiles(#"C:\Temp", "*.txt")
.FirstOrDefault(fn => File.ReadLines(fn)
.Any(l => l == lineToFind));
if (firstMatchFilePath != null)
MessageBox.Show(firstMatchFilePath);
I've used Directory.GetFiles with a search pattern to find all text files in a directory. I've used the LINQ extension methods FirstOrDefault and Any to find the first file with a given line.

Searching strings in txt file

I have a .txt file with a list of 174 different strings. Each string has an unique identifier.
For example:
123|this data is variable|
456|this data is variable|
789|so is this|
etc..
I wish to write a programe in C# that will read the .txt file and display only one of the 174 strings if I specify the ID of the string I want. This is because in the file I have all the data is variable so only the ID can be used to pull the string. So instead of ending up with the example about I get just one line.
eg just
123|this data is variable|
I seem to be able to write a programe that will pull just the ID from the .txt file and not the entire string or a program that mearly reads the whole file and displays it. But am yet to wirte on that does exactly what I need. HELP!
Well the actual string i get out from the txt file has no '|' they were just in the example. An example of the real string would be: 0111111(0010101) where the data in the brackets is variable. The brackets dont exsist in the real string either.
namespace String_reader
{
class Program
{
static void Main(string[] args)
{
String filepath = #"C:\my file name here";
string line;
if(File.Exists(filepath))
{
StreamReader file = null;
try
{
file = new StreamReader(filepath);
while ((line = file.ReadLine()) !=null)
{
string regMatch = "ID number here"; //this is where it all falls apart.
Regex.IsMatch (line, regMatch);
Console.WriteLine (line);// When program is run it just displays the whole .txt file
}
}
}
finally{
if (file !=null)
file.Close();
}
}
Console.ReadLine();
}
}
}
Use a Regex. Something along the lines of Regex.Match("|"+inputString+"|",#"\|[ ]*\d+\|(.+?)\|").Groups[1].Value
Oh, I almost forgot; you'll need to substitute the d+ for the actual index you want. Right now, that'll just get you the first one.
The "|" before and after the input string makes sure both the index and the value are enclosed in a | for all elements, including the first and last. There's ways of doing a Regex without it, but IMHO they just make your regex more complicated, and less readable.
Assuming you have path and id.
Console.WriteLine(File.ReadAllLines(path).Where(l => l.StartsWith(id + "|")).FirstOrDefault());
Use ReadLines to get a string array of lines then string split on the |
You could use Regex.Split method
FileInfo info = new FileInfo("filename.txt");
String[] lines = info.OpenText().ReadToEnd().Split(' ');
foreach(String line in lines)
{
int id = Convert.ToInt32(line.Split('|')[0]);
string text = Convert.ToInt32(line.Split('|')[1]);
}
Read the data into a string
Split the string on "|"
Read the items 2 by 2: key:value,key:value,...
Add them to a dictionary
Now you can easily find your string with dictionary[key].
first load the hole file to a string.
then try this:
string s = "123|this data is variable| 456|this data is also variable| 789|so is this|";
int index = s.IndexOf("123", 0);
string temp = s.Substring(index,s.Length-index);
string[] splitStr = temp.Split('|');
Console.WriteLine(splitStr[1]);
hope this is what you are looking for.
private static IEnumerable<string> ReadLines(string fspec)
{
using (var reader = new StreamReader(new FileStream(fspec, FileMode.Open, FileAccess.Read, FileShare.Read)))
{
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
}
var dict = ReadLines("input.txt")
.Select(s =>
{
var split = s.Split("|".ToArray(), 2);
return new {Id = Int32.Parse(split[0]), Text = split[1]};
})
.ToDictionary(kv => kv.Id, kv => kv.Text);
Please note that with .NET 4.0 you don't need the ReadLines function, because there is ReadLines
You can now work with that as any dictionary:
Console.WriteLine(dict[12]);
Console.WriteLine(dict[999]);
No error handling here, please add your own
You can use Split method to divide the entire text into parts sepparated by '|'. Then all even elements will correspond to numbers odd elements - to strings.
StreamReader sr = new StreamReader(filename);
string text = sr.ReadToEnd();
string[] data = text.Split('|');
Then convert certain data elements to numbers and strings, i.e. int[] IDs and string[] Strs. Find the index of the given ID with idx = Array.FindIndex(IDs, ID.Equals) and the corresponding string will be Strs[idx]
List <int> IDs;
List <string> Strs;
for (int i = 0; i < data.Length - 1; i += 2)
{
IDs.Add(int.Parse(data[i]));
Strs.Add(data[i + 1]);
}
idx = Array.FindIndex(IDs, ID.Equals); // we get ID from input
answer = Strs[idx];

Categories

Resources