Reading multiple lines of text if it starts with a specific token

Reading multiple lines of text if it starts with a specific token - c#

:58A:/C/81000098099CL
CBNINGLA
:72:/CODTYPTR/012
/CLEARING/0003
/SGI/DBLNNGLA
am trying to read the swift message above, line :58A: and line :72:, am having a little issue. My code only reads line :58A: like this C/81000098099CL, but I want it to read down the line before getting to line :72:, in short, the output should be like this for line :58A: C/81000098099CL CBNINGLA.
Same also for line :72:, this is because the messages come formatted in this form. This is my code below
if (line.StartsWith(":58A:"))
{
string[] narr = line.Split('/');
inflow202.BENEFICIARY_INSTITUTION = narr[2];
}
if (line.StartsWith(":72:"))
{
inflow202.RECEIVER_INFORMATION = line.Substring(5);
}

You can replace all new lines not followed by : with spaces (or empty string).
string output = Regex.Replace(text, #"\r?\n(?!:)", " ");
string[] lines = output.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string line in lines)
{
if (line.StartsWith(":58A:"))
{
}
else if (line.StartsWith(":72:"))
{
}
}

If the message always comes formatted in this form and : never occurs in the text except for these line starters, consider splitting the whole text into an array by : first. On 0th position there will be nothing, on all odd positions will be the number, on all even positions will be the content until next :. This solution will work providing that you are able to read the whole input into a single string first. I.e. having string message, you can do something like:
var splitted = message.Split(':');
for (i=1;i<= splitted.Length -1; i+=2){
if (splitted[i] == "58A") {
//do what you need to do, the text you need is stored in splitted[i+1]
}
...
}

Related

How to read text from file from one value to other in c#

I`m new in c#, I'm still learning that language. Now I try to make app which read text and to my data i need only specific lines. Text look like:
[HAEDING]
Some value
[HEADING]
Some other value
[HEADING]
Some other text
and continuation of this text in new line
[HEADING]
Last text
I try to write method which read text and put it into string[] by split it like this:
string[0] = Some value
string[1] = Some other value
string[2] = Some other text and continuation of this text in new line
string[3] = Last text
So I want to read line from value [HEADING] to value new line which is empty. I thought that is should write by ReadAllLines and line by line check start position on value [HEADING] and end position on empty value in new line. I try this code:
string s = "mystring";
int start = s.IndexOf("[HEADING]");
int end = s.IndexOf("\n", start);
string result = s.Substring(start, end - start);
but it's substring to all lines in my text not like loop between first [HEADING] and empty new line, second etc.
Maybe someone can help me with this?

You could try to split the string by "[HEADING]" to get the strings between these lines. Then you could join each string into a single line and trim the whitespace around the strings:
string content = #"[HEADING]
Some value
[HEADING]
Some other value
[HEADING]
Some other text
and continuation of this text in new line
[HEADING]
Last text";
var segments = content.Split(new[] { "[HEADING]"}, StringSplitOptions.RemoveEmptyEntries) // Split into multiple strings
.Select(p=>p.Replace("\r\n"," ").Replace("\r"," ").Replace("\n"," ").Trim()) // Join each single string into single line
.ToArray();
Result:
segments[0] = "Some value"
segments[1] = "Some other value"
segments[2] = "Some other text and continuation of this text in new line"
segments[3] = "Last text"

Here's a solution which avoids the substring/index checking, which could potentially be fraught with errors.
There are answers such as this one that use LINQ, but for a newcomer to the language, basic looping is an OK place to start. Also, this is not necessarily the best solution for efficiency or whatever.
This foreach loop will handle your case, and some of the "dirty" cases.
var segments = new List<string>();
bool headingChanged = false;
foreach (var line in File.ReadAllLines("somefilename.txt"))
{
// skip blank lines
if (string.IsNullOrWhitespace(line)) continue;
// detect a heading
if (line.Contains("[HEADING]")
{
headingChanged = true;
continue;
}
if (headingChanged)
{
segments.Add(line);
// this keeps us working on the same segment if there
// are more lines to be added to the segment
headingChanged = false;
}
else
{
segments[segments.Length - 1] += " ";
segments[segments.Length - 1] += line;
// you could replace the above two lines with string interpolation...
// segments[segments.Length - 1] = $"{segments[segments.Length - 1]} {line}";
}
}
In the above loop, the ReadAllLines obviates the need to check for \r and \n. Contains will handle [HEADING] no matter where it changes.

You don't need substring, you can just compare the value s == "[HEADING]".
Here's an easy to understand example:
var lines = System.IO.File.ReadAllLines(myFilePath);
var resultLines = new List<String>();
var collectedText = new List<String>();
foreach (var line in lines)
{
if (line == "[HEADING]")
{
collectedText = new List<String>();
}
else if (line != "")
{
collectedText.Add(line);
}
else //if (line == "")
{
var joinedText = String.Join(" ", collectedText);
resultLines.Add(joinedText);
}
}
return resultLines.ToArray();
the loop does this:
we go line by line
"start collecting" (create list) when we encounter with "[HEADING]" line
"collect" (add to list) line if not empty
"finish collecting" (concat and add to results list) when line is empty

c# showdialog return whole line

Greets.
I'm calling a Window with .ShowDialog() and returning some lines form a textbox.
The lines return back to a List<>, but each character in the textbox getting returned is getting assigned to it's own index value within the List<>.
I essentially want to add an entire line from the textbox to it's own index value in the List<>
EXAMPLE:
I enter the below in the textbox that was called from the ShowDialog();
123456
87564
125
How do I add each line from the textbox to it's own index on the list?
This is what I have now. (No code on the textbox window that I enter these values into) (I realize I spelled it as imput...) When I debug and review the pos List<>, each character has it's own index ID..
private void GetPOs()
{
MultiLineImput getPOList = new MultiLineImput();
getPOList.ShowDialog();
foreach (char po in getPOList.listOfPOs.Text)
{
pos.Add(po.ToString());
}
if (pos.Count > 0)
{
string a = String.Join("", pos);
MessageBox.Show(a, "POs to Process");
}
else
{
if (!getPOList.wasCanceled.Equals(1))
{
MessageBox.Show("No values were passed", "Warning");
}
}
}

You're iterating over the characters of Text property, so each character is converted to string and added to list separately.
I'm not sure what you mean by adding the entire "line". In your example there's only one line, so you can rewrite this loop
foreach (char po in getPOList.listOfPOs.Text)
{
pos.Add(po.ToString());
}
to simply
pos.Add(getPOList.listOfPOs.Text);
if you meant to split this line in entries "123456", "87564", "125", you can do the following way:
foreach (string po in getPOList.listOfPOs.Text.Split(' '))
{
pos.Add(po);
}
if your textbox indeed support multiline input, you can split by Environment.NewLine, like this:
foreach (string po in getPOList.listOfPOs.Text.Split(new[] { Environment.NewLine }, StringSplitOptions.None))
{
pos.Add(po);
}

If you iterate over a string, the iterator will pull one character at a time. It has no idea what a line break is.
I suggest you break the string up by line breaks, then iterate over the result, like so:
MultiLineInput getPOList = new MultiLineInput();
getPOList.ShowDialog();
var wholeText = getPOList.listOfPOs.Text;
var lines = wholeText.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
foreach (string po in getPOList.listOfPOs.Text)
{
pos.Add(po);
}
//Etc.....

how to replace a character in a string and save into text file in c#

I'm trying to replace pipe symbol(|) with new line(\n) in my text(test1.txt) file. But when I'm trying to save it in text(test2.text) file the result is not coming in my test2.txt file but I see the result in my console window. Any one please help on this.
string lines = File.ReadAllText(#"C:\NetProject\Nag Assignments\hi.txt");
//string input = "abcd|efghijk|lmnopqrstuvwxyz";
lines = lines.Replace('|', '\n');
File.WriteAllText(#"C:\NetProject\Nag Assignments\hi2.txt", lines);
Console.WriteLine(lines);

You can try this one:
lines = lines.Replace("|", Environment.NewLine);
It returns "\r\n", for non-Unix platforms according to documentation.

Seems like you want multiple things here. (both original question and subsequent comments)
One is to separate the lines and be able to reference them separately:
string[] separatedLines = lines.Split('|');
The other is to join them back together with a different separator:
string rejoinedLines = string.Join(Environment.NewLine, separatedLines);
You then have access to the individual lines from the separatedLines variable above such as separatedLines[0] and you can also write the rejoinedLines variable back to the other file like you wanted.
EDIT: For example, the following code:
string lines = "a|bc|def";
string[] separatedLines = lines.Split('|');
string rejoinedLines = string.Join(Environment.NewLine, separatedLines);
for (int i = 0; i < separatedLines.Length; i++)
{
Console.WriteLine("Line {0}: {1}", i + 1, separatedLines[i]);
}
Gives output of:
Line 1: a
Line 2: bc
Line 3: def

Instead of:
lines = lines.Replace('|', '\n');
Try:
lines = lines.Replace("|","\r\n");

string[] space = lines.Split ('|');
Will save every substring in space.
The line break should be \r\n for carriage return. It depends if you are reading a file binary or text mode. \n is used in text mode while \r\n is used in binary mode.

how to split a text in to paragraph with a particular string

I have a long text file ... I read the text file and store the content in a string...
Now I want this text to split. The below is an image which shows what I want.
In the image "This is common text" means this string is common in every paragraph.
Green squares shows that I want that part in string array.
but how o do that... I have tried Regular expression for this... but isn't working....
please help

Try using RegEx.Split() using this pattern:
(.*This is common text.*)
Well, giving priority to RegEx over the string functions is always leads to a performance overhead.
It would be great if you use: (UnTested but it will give you an idea)
string[] lines = IO.File.ReadAllLines("FilePath")
List<string> lst = new List<string>();
List<string> lstgroup = new List<string>();
int i=0;
foreach(string line in lines)
{
if(line.Tolower().contains("this is common text"))
{
if(i > 0)
{
lst.AddRange(lstgroup.ToArray());
// Print elements here
lstgroup.Clear();
}
else { i++; }
continue;
}
else
{
lstgroup.Add(line)
}
}
i = 0;
// Print elements here too

I am not sure what you want to split on but you could use
string[] stringArray = Regex.Split(yourString, regex);
If you want a more concrete example you will have to (as others mentioned) give us more information regardning what the text looks like rather than just "common text".

Need to pick up line terminators with StreamReader.ReadLine()

I wrote a C# program to read an Excel .xls/.xlsx file and output to CSV and Unicode text. I wrote a separate program to remove blank records. This is accomplished by reading each line with StreamReader.ReadLine(), and then going character by character through the string and not writing the line to output if it contains all commas (for the CSV) or all tabs (for the Unicode text).
The problem occurs when the Excel file contains embedded newlines (\x0A) inside the cells. I changed my XLS to CSV converter to find these new lines (since it goes cell by cell) and write them as \x0A, and normal lines just use StreamWriter.WriteLine().
The problem occurs in the separate program to remove blank records. When I read in with StreamReader.ReadLine(), by definition it only returns the string with the line, not the terminator. Since the embedded newlines show up as two separate lines, I can't tell which is a full record and which is an embedded newline for when I write them to the final file.
I'm not even sure I can read in the \x0A because everything on the input registers as '\n'. I could go character by character, but this destroys my logic to remove blank lines.

I would recommend that you change your architecture to work more like a parser in a compiler.
You want to create a lexer that returns a sequence of tokens, and then a parser that reads the sequence of tokens and does stuff with them.
In your case the tokens would be:
Column data
Comma
End of Line
You would treat '\n' ('\x0a') by its self as an embedded new line, and therefore include it as part of a column data token. A '\r\n' would constitute an End of Line token.
This has the advantages of:
Doing only 1 pass over the data
Only storing a max of 1 lines worth of data
Reusing as much memory as possible (for the string builder and the list)
It's easy to change should your requirements change
Here's a sample of what the Lexer would look like:
Disclaimer: I haven't even compiled, let alone tested, this code, so you'll need to clean it up and make sure it works.
enum TokenType
{
ColumnData,
Comma,
LineTerminator
}
class Token
{
public TokenType Type { get; private set;}
public string Data { get; private set;}
public Token(TokenType type)
{
Type = type;
}
public Token(TokenType type, string data)
{
Type = type;
Data = data;
}
}
private IEnumerable<Token> GetTokens(TextReader s)
{
var builder = new StringBuilder();
while (s.Peek() >= 0)
{
var c = (char)s.Read();
switch (c)
{
case ',':
{
if (builder.Length > 0)
{
yield return new Token(TokenType.ColumnData, ExtractText(builder));
}
yield return new Token(TokenType.Comma);
break;
}
case '\r':
{
var next = s.Peek();
if (next == '\n')
{
s.Read();
}
if (builder.Length > 0)
{
yield return new Token(TokenType.ColumnData, ExtractText(builder));
}
yield return new Token(TokenType.LineTerminator);
break;
}
default:
builder.Append(c);
break;
}
}
s.Read();
if (builder.Length > 0)
{
yield return new Token(TokenType.ColumnData, ExtractText(builder));
}
}
private string ExtractText(StringBuilder b)
{
var ret = b.ToString();
b.Remove(0, b.Length);
return ret;
}
Your "parser" code would then look like this:
public void ConvertXLS(TextReader s)
{
var columnData = new List<string>();
bool lastWasColumnData = false;
bool seenAnyData = false;
foreach (var token in GetTokens(s))
{
switch (token.Type)
{
case TokenType.ColumnData:
{
seenAnyData = true;
if (lastWasColumnData)
{
//TODO: do some error reporting
}
else
{
lastWasColumnData = true;
columnData.Add(token.Data);
}
break;
}
case TokenType.Comma:
{
if (!lastWasColumnData)
{
columnData.Add(null);
}
lastWasColumnData = false;
break;
}
case TokenType.LineTerminator:
{
if (seenAnyData)
{
OutputLine(lastWasColumnData);
}
seenAnyData = false;
lastWasColumnData = false;
columnData.Clear();
}
}
}
if (seenAnyData)
{
OutputLine(columnData);
}
}

You can't change StreamReader to return the line terminators, and you can't change what it uses for line termination.
I'm not entirely clear about the problem in terms of what escaping you're doing, particularly in terms of "and write them as \x0A". A sample of the file would probably help.
It sounds like you may need to work character by character, or possibly load the whole file first and do a global replace, e.g.
x.Replace("\r\n", "\u0000") // Or some other unused character
.Replace("\n", "\\x0A") // Or whatever escaping you need
.Replace("\u0000", "\r\n") // Replace the real line breaks
I'm sure you could do that with a regex and it would probably be more efficient, but I find the long way easier to understand :) It's a bit of a hack having to do a global replace though - hopefully with more information we'll come up with a better solution.

Essentially, a hard-return in Excel (shift+enter or alt+enter, I can't remember) puts a newline that is equivalent to \x0A in the default encoding I use to write my CSV. When I write to CSV, I use StreamWriter.WriteLine(), which outputs the line plus a newline (which I believe is \r\n).
The CSV is fine and comes out exactly how Excel would save it, the problem is when I read it into the blank record remover, I'm using ReadLine() which will treat a record with an embedded newline as a CRLF.
Here's an example of the file after I convert to CSV...
Reference,Name of Individual or Entity,Type,Name Type,Date of Birth,Place of Birth,Citizenship,Address,Additional Information,Listing Information,Control Date,Committees
1050,"Aziz Salih al-Numan
",Individual,Primary Name,1941 or 1945,An Nasiriyah,Iraqi,,Ba’th Party Regional Command Chairman; Former Governor of Karbala and An Najaf Former Minister of Agriculture and Agrarian Reform (1986-1987),Resolution 1483 (2003),6/27/2003,1518 (Iraq)
1050a,???? ???? ???????,Individual,Original script,1941 or 1945,An Nasiriyah,Iraqi,,Ba’th Party Regional Command Chairman; Former Governor of Karbala and An Najaf Former Minister of Agriculture and Agrarian Reform (1986-1987),Resolution 1483 (2003),6/27/2003,1518 (Iraq)
As you can see, the first record has an embedded new-line after al-Numan. When I use ReadLine(), I get '1050,"Aziz Salih al-Numan' and when I write that out, WriteLine() ends that line with a CRLF. I lose the original line terminator. When I use ReadLine() again, I get the line starting with '1050a'.
I could read the entire file in and replace them, but then I'd have to replace them back afterwards. Basically what I want to do is get the line terminator to determine if its \x0a or a CRLF, and then if its \x0A, I'll use Write() and insert that terminator.

I know I'm a little late to the game here, but I was having the same problem and my solution was a lot simpler than most given.
If you are able to determine the column count which should be easy to do since the first line is usually the column titles, you can check your column count against the expected column count. If the column count doesn't equal the expected column count, you simply concatenate the current line with the previous unmatched lines. For example:
string sep = "\",\"";
int columnCount = 0;
while ((currentLine = sr.ReadLine()) != null)
{
if (lineCount == 0)
{
lineData = inLine.Split(new string[] { sep }, StringSplitOptions.None);
columnCount = lineData.length;
++lineCount;
continue;
}
string thisLine = lastLine + currentLine;
lineData = thisLine.Split(new string[] { sep }, StringSplitOptions.None);
if (lineData.Length < columnCount)
{
lastLine += currentLine;
continue;
}
else
{
lastLine = null;
}
......

Thank you so much with your code and some others I came up with the following solution! I have added a link at the bottom to some code I wrote that used some of the logic from this page. I figured I'd give honor where honor was due! Thanks!
Below is a explanation about what I needed:
Try This, I wrote this because I have some very large '|' delimited files that have \r\n inside of some of the columns and I needed to use \r\n as the end of the line delimiter. I was trying to import some files using SSIS packages but because of some corrupted data in the files I was unable to. The File was over 5 GB so it was too large to open and manually fix. I found the answer through looking through lots of Forums to understand how streams work and ended up coming up with a solution that reads each character in a file and spits out the line based on the definitions I added into it. this is for use in a Command Line Application, complete with help :). I hope this helps some other people out, I haven't found a solution quite like it anywhere else, although the ideas were inspired by this forum and others.
https://stackoverflow.com/a/12640862/1582188

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Reading multiple lines of text if it starts with a specific token - c#

Related

How to read text from file from one value to other in c#

c# showdialog return whole line

how to replace a character in a string and save into text file in c#

how to split a text in to paragraph with a particular string

Need to pick up line terminators with StreamReader.ReadLine()

Categories

Resources