Extracting specific part of a text file in C# - c#

I usually add some strings from a text file into a list or array line by line, although I am now using "#"'s as separators in the text file. How would it be possible to read the two strings "softpedia.com" and "download.com" into a list using the two "#" signs as a breaking point? Baring in mind that there might be more or less strings inbetween the two hashes
e.g.
# Internal Hostnames
softpedia.com
download.com
# External Hostnames
Expected output:
softpedia.com
download.com

class Program
{
static void Main()
{
using (var reader = File.OpenText("test.txt"))
{
foreach (var line in Parse(reader))
{
Console.WriteLine(line);
}
}
}
public static IEnumerable<string> Parse(StreamReader reader)
{
string line;
bool first = false;
while ((line = reader.ReadLine()) != null)
{
if (!line.StartsWith("#"))
{
if (first)
{
yield return line;
}
}
else if (!first)
{
first = true;
}
else
{
yield break;
}
}
}
}
and if you wanted to just get them in a list:
using (var reader = File.OpenText("test.txt"))
{
List<string> hostnames = Parse(reader).ToList();
}

Read it into a buffer and let regex do the work.
string input = #"
# Internal Hostnames
softpedia.com
download.com
# External Hostnames
";
string pattern = #"^(?!#)(?<Text>[^\r\s]+)(?:\s?)";
Regex.Matches(input, pattern, RegexOptions.Multiline)
.OfType<Match>()
.Select (mt => mt.Groups["Text"].Value)
.ToList()
.ForEach( site => Console.WriteLine (site));
/* Outputs
softpedia.com
download.com
*/

It sounds like you want to read all of the lines in between a set of # start lines. If so try the following
List<string> ReadLines(string filePath) {
var list = new List<string>();
var foundStart = false;
foreach (var line in File.ReadAllLines(filePath)) {
if (line.Length > 0 && line[0] == '#') {
if (foundStart) {
return list;
}
foundStart = true;
} else if (foundStart) {
list.Add(line);
}
}
return line;
}

Related

How To read Block of text in a text file

I have a file in which i have to read text between startscriptexpression$ and Finish scriptExpression$, and also read between startupdatedescription$ and startupdatedescription$[
The problem is that i want to re write the code in a cleaner format.
My Code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace Vesrion
{
class Program
{
static void Main(string[] args)
{
string path = #"C:\Users\Development\Desktop\Read\Test.txt";
using (var reader = new StreamReader(path))
{
var textInBetween = new List<string>();
var ListOFDescription = new List<string>();
string NewString = "";
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
//Reads First line,
switch (line)
{
case "StartScriptExpression$":
continue;
case "FinishScriptExpression$":
if (line.Contains("FinishScriptExpression$"))
{
line = "";
}
string Something = string.Join("", textInBetween);
textInBetween = line.Split(',').ToList();
string[] lines = Something.Split(
new string[] { Environment.NewLine },
StringSplitOptions.None);
foreach (var S in lines)
{
ListOFDescription.Add(S);
Console.WriteLine(S);
}
NewString += ListOFDescription;
break;
case "StartUpdateDescription$":
//Console.WriteLine(Environment.NewLine);
continue;
case "FinishUpdateDescription$":
// Console.WriteLine(Environment.NewLine);
continue;
default:
textInBetween.Add(line);
//Console.WriteLine(line);
break;
}
}
}
}
}
}
Text inside start and finish expression must be in a list of string array.
text inside startupdatedescription and finishupdatedescription must be in a string.
.
One way to do it is using regular expression https://dotnetfiddle.net/pxBAMv

Add two lines from csv file to array(s)

I have a csv file with the following data:
500000,0.005,6000
690000,0.003,5200
I need to add each line as a separate array. So 50000, 0.005, 6000 would be array1. How would I do this?
Currently my code adds each column into one element.
For example data[0] is showing 500000
690000
static void ReadFromFile(string filePath)
{
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader(filePath))
{
string line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
string[] data = line.Split(',');
Console.WriteLine(data[0] + " " + data[1]);
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}
Using the limited data set you've provided...
const string test = #"500000,0.005,6000
690000,0.003,5200";
var result = test.Split('\n')
.Select(x=> x.Split(',')
.Select(y => Convert.ToDecimal(y))
.ToArray()
)
.ToArray();
foreach (var element in result)
{
Console.WriteLine($"{element[0]}, {element[1]}, {element[2]}");
}
Can it be done without LINQ? Yes, but it's messy...
const string test = #"500000,0.005,6000
690000,0.003,5200";
List<decimal[]> resultList = new List<decimal[]>();
string[] lines = test.Split('\n');
foreach (var line in lines)
{
List<decimal> decimalValueList = new List<decimal>();
string[] splitValuesByComma = line.Split(',');
foreach (string value in splitValuesByComma)
{
decimal convertedValue = Convert.ToDecimal(value);
decimalValueList.Add(convertedValue);
}
decimal[] decimalValueArray = decimalValueList.ToArray();
resultList.Add(decimalValueArray);
}
decimal[][] resultArray = resultList.ToArray();
That will give the exact same output as what I've done with the first example
If you may use a List<string[]> you do not have to worry about the array length.
In the following example, the variable lines will be a list arrays, like:
["500000", "0.005", "6000"]
["690000", "0.003", "5200"]
static void ReadFromFile(string filePath)
{
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader(filePath))
{
List<string[]> lines = new List<string[]>();
string line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
string[] splittedLine = line.Split(',');
lines.Add(splittedLine);
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}
While other have split method, I will have a more "scolar"-"specified" method.
You have some Csv value in a file. Find a name for this object stored in a Csv, name every column, type them.
Define the default value of those field. Define what happends for missing column, and malformed field. Header?
Now that you know what you have, define what you want. This time again: Object name -> Property -> Type.
Believe me or not, the simple definition of your input and output solved your issue.
Use CsvHelper to simplify your code.
CSV File Definition:
public class CsvItem_WithARealName
{
public int data1;
public decimal data2;
public int goodVariableNames;
}
public class CsvItemMapper : ClassMap<CsvItem_WithARealName>
{
public CsvItemMapper()
{ //mapping based on index. cause file has no header.
Map(m => m.data1).Index(0);
Map(m => m.data2).Index(1);
Map(m => m.goodVariableNames).Index(2);
}
}
A Csv reader method, point a document it will give your the Csv Item.
Here we have some configuration: no header and InvariantCulture for decimal convertion
private IEnumerable<CsvItem_WithARealName> GetCsvItems(string filePath)
{
using (var fileReader = File.OpenText(filePath))
using (var csvReader = new CsvHelper.CsvReader(fileReader))
{
csvReader.Configuration.CultureInfo = CultureInfo.InvariantCulture;
csvReader.Configuration.HasHeaderRecord = false;
csvReader.Configuration.RegisterClassMap<CsvItemMapper>();
while (csvReader.Read())
{
var record = csvReader.GetRecord<CsvItem_WithARealName>();
yield return record;
}
}
}
Usage :
var filename = "csvExemple.txt";
var items = GetCsvItems(filename);

How to search text based on line number in string

I have a function which searching a text in a string and returning me the line which contains the specific substring.Here is the function..
private static string getLine(string text,string text2Search)
{
string currentLine;
using (var reader = new StringReader(text))
{
while ((currentLine= reader.ReadLine()) != null)
{
if (currentLine.Contains(text2Search,StringComparison.OrdinalIgnoreCase))
{
break;
}
}
}
return currentLine;
}
Now in my condition i have to start searching the lines after a particular line suppose here its 10.Means have to start searching the string for specific text after 10 line.So my query is how can i add this into my current function..
Please help me.
You can use File.ReadLines method with Skip:
var line = File.ReadLines("path").Skip(10)
.SkipWhile(line => !line.Contains(text2Search,StringComparison.OrdinalIgnoreCase))
.First();
You can introduce a counter into your current code as so:
private static string getLine(string text,string text2Search)
{
string currentLine;
int endPoint = 10;
using (var reader = new StringReader(text))
{
int lineCount = 0;
while ((currentLine= reader.ReadLine()) != null)
{
if (lineCount++ >= endPoint &&
currentLine.Contains(text2Search,StringComparison.OrdinalIgnoreCase))
{
return currentLine;
}
}
}
return string.Empty;
}
Alternatively, use your current code to add all lines to a list in which you will then be able to use Selmans answer.
String.Contains doesn't have an overload taking StringComparison.OrdinalIgnoreCase
var match = text.Split(new char[]{'\n','\r'})
.Skip(10)
.FirstOrDefault(line=>line.IndexOf("", StringComparison.OrdinalIgnoreCase)>=0);

C#: read text file separated by additional newline character

I have some sql commands that are separated by an additional newline character:
ALTER TABLE XXX
ALTER COLUMN xxx real
ALTER TABLE YYY
ALTER COLUMN yyy real
ALTER TABLE ZZZ
ALTER COLUMN zzz real
I've tried reading the file by using an array of character separators such as the following,
new char[] { '\n', '\r'}
inside this method:
private static List<string> ReadFile(string FileName, char[] seps)
{
if (!File.Exists(FileName))
{
Console.WriteLine("File not found");
return null;
}
using (StreamReader sr = new StreamReader(FileName, Encoding.Default))
{
string content = sr.ReadToEnd();
return content.Split(seps, StringSplitOptions.RemoveEmptyEntries).ToList();
}
}
However, this doesn't seem to be working. I would like to have each command represented by a separate string. How can I do this?
Why not use File.ReadAllLines()?
private static List<string> ReadFile(string FileName)
{
if (!File.Exists(FileName))
{
Console.WriteLine("File not found");
return null;
}
var lines = File.ReadAllLines(FileName);
return lines.ToList();
}
This will automatically read and split your file by newlines.
If you want to filter out empty lines, do this:
var nonEmpty = ReadFile(path).Where(x => !string.IsNullOrEmpty(x)).ToList();
Side note, I would change your if statement to throw an exception if the file cannot be found.
if (!File.Exists(FileName))
{
throw new FileNotFoundException("Can't find file");
}
You can filter the examples. When I read them in, the empty lines had a length 1 and its char value said 131 for some reason. So I just filtered by length > 1
void Main()
{
var results = ReadFile(#"C:\temp\sql.txt", new char[]{'\n'});
Console.WriteLine(results.Count);
foreach (var result in results)
{
Console.WriteLine(result);
}
}
private static List<string> ReadFile(string FileName, char[] seps)
{
if (!File.Exists(FileName))
{
Console.WriteLine("File not found");
return null;
}
using (StreamReader sr = new StreamReader(FileName, Encoding.Default))
{
string content = sr.ReadToEnd();
return content.Split(seps, StringSplitOptions.RemoveEmptyEntries).Where (c => c.Length > 1).ToList();
}
}
Try This:
private static List<string> ReadFile(string FileName)
{
List<string> commands = new List<string>();
StringBuilder command = new StringBuilder();
if (!File.Exists(FileName))
{
Console.WriteLine("File not found");
return null;
}
foreach (var line in File.ReadLines(FileName))
{
if (!String.IsNullOrEmpty(line))
{
command.Append(line + "\n");
}
else
{
commands.Add(command.ToString());
command.Clear();
}
}
commands.Add(command.ToString());
return commands;
}
If you are sure you'll always have \r\n line endings, you can use:
var commands = content.Split(new []{"\r\n\r\n"}, StringSplitOptions.RemoveEmptyEntries);
Otherwise, try using regex:
var commands = Regex.Split(content, #"\r?\n\r?\n")
Thank you everyone for your answers. I ended up going with this helper method:
private static List<string> GetCommands(string location)
{
List<string> ret = new List<string>();
List<string> tmp = ReadFile(location, new string[] { "\r\n\r\n"});
for (int i = 0; i < tmp.Count; i++)
{
string rem = tmp[i].Replace("\r", "");
ret.Add(rem);
}
return ret;
}
As an aside, the equivalent is so much easier in Python. For example, what I'm trying to do can be expressed in these three lines:
with open('commands.txt', 'r') as f:
content = f.read()
commands = [ command for command in content.split('\n\n') ]

yield pattern, state machine flow

I have the following file and I am using an iterator block to parse certain re-occuring nodes/parts within the file. I initially used regex to parse the entire file but when certain fields were not present in a node, it would not match. So I am trying to use the yield pattern. The file format is as follows perceeded with the code I am using. All I want from the file are the replicate nodes as an individual part so I can fetch fields within it using a key string and store in collection of objects. I can start parsing where the first replicate occurs but unable to end it where the replicate node ends.
File Format:
X_HEADER
{
DATA_MANAGEMENT_FIELD_2 NA
DATA_MANAGEMENT_FIELD_3 NA
DATA_MANAGEMENT_FIELD_4 NA
SYSTEM_SOFTWARE_VERSION NA
}
Y_HEADER
{
DATA_MANAGEMENT_FIELD_2 NA
DATA_MANAGEMENT_FIELD_3 NA
DATA_MANAGEMENT_FIELD_4 NA
SYSTEM_SOFTWARE_VERSION NA
}
COMPLETION
{
NUMBER 877
VERSION 4
CALIBRATION_VERSION 1
CONFIGURATION_ID 877
}
REPLICATE
{
REPLICATE_ID 1985
ASSAY_NUMBER 656
ASSAY_VERSION 4
ASSAY_STATUS Research
DILUTION_ID 1
}
REPLICATE
{
REPLICATE_ID 1985
ASSAY_NUMBER 656
ASSAY_VERSION 4
ASSAY_STATUS Research
}
Code:
static IEnumerable<IDictionary<string, string>> ReadParts(string path)
{
using (var reader = File.OpenText(path))
{
var current = new Dictionary<string, string>();
string line;
while ((line = reader.ReadLine()) != null)
{
if (string.IsNullOrWhiteSpace(line)) continue;
if (line.StartsWith("REPLICATE"))
{
yield return current;
current = new Dictionary<string, string>();
}
else
{
var parts = line.Split('\t');
}
if (current.Count > 0) yield return current;
}
}
}
public static void parseFile(string fileName)
{
foreach (var part in ReadParts(fileName))
{
//part["fIELD1"] will retireve certain values from the REPLICATE PART HERE
}
}
Well, it sounds like you just need to "close" a section when you get a closing brace, and only yield return at that point. For example:
static IEnumerable<IDictionary<string, string>> ReadParts(string path)
{
using (var reader = File.OpenText(path))
{
string currentName = null;
IDictionary<string, string> currentMap = null;
while ((line = reader.ReadLine()) != null)
{
if (string.IsNullOrWhiteSpace(line))
{
continue;
}
if (line == "{")
{
if (currentName == null || currentMap != null)
{
throw new BadDataException("Open brace at wrong place");
}
currentMap = new Dictionary<string, string>();
}
else if (line == "}")
{
if (currentName == null || currentMap == null)
{
throw new BadDataException("Closing brace at wrong place");
}
// Isolate the "REPLICATE-only" requirement to a single
// line - if you ever need other bits, you can change this.
if (currentName == "REPLICATE")
{
yield return currentMap;
}
currentName = null;
currentMap = null;
}
else if (!line.StartsWith("\t"))
{
if (currentName != null || currentMap != null)
{
throw new BadDataException("Section name at wrong place");
}
currentName = line;
}
else
{
if (currentName == null || currentMap == null)
{
throw new BadDataException("Name/value pair at wrong place");
}
var parts = line.Substring(1).Split('\t');
if (parts.Length != 2)
{
throw new BadDataException("Invalid name/value pair");
}
currentMap[parts[0]] = parts[1];
}
}
}
}
Now that's a pretty ghastly function, to be honest. I suspect I'd put this in its own class instead (possibly a nested one) to store the state, and make each handler its own method. Heck, this is actually a situation where the state pattern could make sense :)
private IEnumerable<IDictionary<string, string>> ParseFile(System.IO.TextReader reader)
{
string token = reader.ReadLine();
while (token != null)
{
bool isReplicate = token.StartsWith("REPLICATE");
token = reader.ReadLine(); //consume this token to either skip it or parse it
if (isReplicate)
{
yield return ParseBlock(ref token, reader);
}
}
}
private IDictionary<string, string> ParseBlock(ref string token, System.IO.TextReader reader)
{
if (token != "{")
{
throw new Exception("Missing opening brace.");
}
token = reader.ReadLine();
var result = ParseValues(ref token, reader);
if (token != "}")
{
throw new Exception("Missing closing brace.");
}
token = reader.ReadLine();
return result;
}
private IDictionary<string, string> ParseValues(ref string token, System.IO.TextReader reader)
{
IDictionary<string, string> result = new Dictionary<string, string>();
while (token != "}" and token != null)
{
var args = token.Split('\t');
if (args.Length < 2)
{
throw new Exception();
}
result.Add(args[0], args[1]);
token = reader.ReadLine();
}
return result;
}
If you add a yield return current; after your while loop is over, you will get the final dictionary.
I believe it would be better to check for '}' as an end to the current block, and then put the yield return there. although you can't use regex t parse the entire file, you can use regex to search for the key-value pairs within the lines. The following iterator code should work. It will only return dictonaries for REPLICATE blocks.
// Check for lines that are a key-value pair, separated by whitespace.
// Note that value is optional
static string partPattern = #"^(?<Key>\w*)(\s+(?<Value>\.*))?$";
static IEnumerable<IDictionary<string, string>> ReadParts(string path)
{
using (var reader = File.OpenText(path))
{
string line;
while ((line = reader.ReadLine()) != null)
{
// Ignore lines that just contain whitespace
if (string.IsNullOrWhiteSpace(line)) continue;
// This is a new replicate block, start a new dictionary
if (line.Trim().CompareTo("REPLICATE") == 0)
{
yield return parseReplicateBlock(reader);
}
}
}
}
private static IDictionary<string, string> parseReplicateBlock(StreamReader reader)
{
// Make sure we have an opening brace
VerifyOpening(reader);
string line;
var currentDictionary = new Dictionary<string, string>();
while ((line = reader.ReadLine()) != null)
{
// Ignore lines that just contain whitespace
if (string.IsNullOrWhiteSpace(line)) continue;
line = line.Trim();
// Since our regex used groupings (?<Key> and ?<Value>),
// we can do a match and check to see if our groupings
// found anything. If they did, extract the key and value.
Match m = Regex.Match(line, partPattern);
if (m.Groups["Key"].Length > 0)
{
currentDictionary.Add(m.Groups["Key"].Value, m.Groups["Value"].Value);
}
else if (line.CompareTo("}") == 0)
{
return currentDictionary;
}
}
// We exited the loop before we found a closing brace, throw an exception
throw new ApplicationException("Missing closing brace");
}
private static void VerifyOpening(StreamReader reader)
{
string line;
while ((line = reader.ReadLine()) != null)
{
// Ignore lines that just contain whitespace
if (string.IsNullOrWhiteSpace(line)) continue;
if (line.Trim().CompareTo("{") == 0)
{
return;
}
else
{
throw new ApplicationException("Missing opening brace");
}
}
throw new ApplicationException("Missing opening brace");
}
Update: I made sure that the regex string includes cases where there is no value. In addition, the group indexes were all changed to use the group name to avoid any issues if the regex string is modified.

Categories

Resources