How to split complex text file in C#? [duplicate]

How to split complex text file in C#? [duplicate] - c#

This question already has answers here:
Reading CSV files using C#
(12 answers)
Closed last month.
I have a text file as below and I want to get only the numbers below column rank:
SKYRain LND(4) VA(x) ZZ(x) NUM(n) Rank ll ListOfNames
------- ------ ----- ----- ------ ---- -- -----------
1002 75 283680 185836 1,111.50 19268 1 Jack
4308 1100 175896 195404 751.70 6384 1 Sara
3070 252 1044788 884160 682.94 18924 1 Robert
3187 206 852280 97932 535.83 16472 1 Harry
I just want the numbers below the rank below:
19268
6384
18924
16472
Is there a way?

You have a fixed width text file. Just can simply use SubString():
public class Program{
public static void Main(){
string aLineOfYourTextFile = " 1002 75 283680 185836 1,111.50 19268 1 Jack ";
Console.WriteLine(aLineOfYourTextFile.Substring(48,5));
}
}
You can also use Split():
public class Program{
public static void Main(){
string aLineOfYourTextFile = " 1002 75 283680 185836 1,111.50 19268 1 Jack ";
var columns = aLineOfYourTextFile.Split(new[]{" "}, StringSplitOptions.RemoveEmptyEntries);
Console.WriteLine(columns[5]);
}
}

read and skip 2 lines
read next line, split on " ", select 5th entry, convert to int
repeat above till EOF

Related

Find multiple values and strings within another string in C#

So I have this string with 4 lines:
id score ping guid name lastmsg address qport rate
--- ----- ---- ---------- --------------- ------- --------------------- ------ -----
1 11 45 176387877 Player 1 3250 101.102.103.104:555 3647 25000
2 23 61 425716719 Player 2 3250 105.106.107.108:555 5978 25000
How can I 'extract' all of these values? Like, I want to save "id", "score", "ping", "guid", "name", etc.
I have played around with a "GetBetween" function I found here. I also tried to learn the string.Split function. But I don't think I'm getting close to what I want to archive, also I don't really understand splitting a string quite yet.
I basically need to remove all of the " " empty spaces between the values, problem is, the value length may change, e.g "name".
Can someone give me an example how I could extract the values?
Thanks in advance!

RegEx.Split is your friend, and this works well enough.
void Main()
{
// fun fact, the # in front of the string means it's literal, so you
// literally get the new lines
var input =
#"id score ping guid name lastmsg address qport rate
-- - -------------------------------------------------------------------------
1 11 45 176387877 Player 1 3250 101.102.103.104:555 3647 25000
2 23 61 425716719 Player 2 3250 105.106.107.108:555 5978 25000";
//Gets you each line
var lines = input.Split('\n');
// Skip 2 because I doubt you care about the column title
// or the row with the dashes
foreach (var line in lines.Skip(2))
{
// For each line, Regex split will return an array with each entry
// Set a breakpoint with the debugger and inspect to see what I mean.
// Splits using regex - assumes at least 2 spaces between items
// so space in 'Player 1' is handled it's a fickle solution though
// Trim the line before RegEx split to avoid extra data in the split
var r = Regex.Split(line.Trim(), #"\s{2,}");
}
}

You can do this with Regex and named groups.
Sample Input
var str = #"id score ping guid name lastmsg address qport rate
--- ----- ---- ---------- --------------- ------- --------------------- ------ -----
1 11 45 176387877 Player 1 3250 101.102.103.104:555 3647 25000
2 23 61 425716719 Player 2 3250 105.106.107.108:555 5978 25000";
Regex Definition
var regex = new Regex(#"^(?<id>[\d]+)(\s{2,})(?<score>[\d]+)(\s{2,})(?<ping>[\d]+)(\s{1,})(?<guid>[\d]+)(\s{2,})(?<name>([\w]+\s[\w]+))(\s{2,})(?<lastmsg>[\d]+)(\s{2,})(?<ip>[\d.:]+)(\s{2,})(?<port>[\d]+)(\s{2,})(?<rate>[\d]+)$",RegexOptions.Compiled);
Parsing Code
var lines = str.Split(new []{Environment.NewLine},StringSplitOptions.RemoveEmptyEntries);
foreach(var line in lines)
{
var match = regex.Match(line.Trim());
if(!match.Success) continue;
Console.WriteLine($"ID = {match.Groups["id"].Value}");
Console.WriteLine($"Score = {match.Groups["score"].Value}");
Console.WriteLine($"Ping = {match.Groups["ping"].Value}");
Console.WriteLine($"Guid = {match.Groups["guid"].Value}");
Console.WriteLine($"Name = {match.Groups["name"].Value}");
Console.WriteLine($"Last Msg = {match.Groups["lastmsg"].Value}");
Console.WriteLine($"Port = {match.Groups["port"].Value}");
Console.WriteLine($"Rate = {match.Groups["rate"].Value}");
}
Output
ID = 1
Score = 11
Ping = 45
Guid = 176387877
Name = Player 1
Last Msg = 3250
Port = 3647
Rate = 25000
ID = 2
Score = 23
Ping = 61
Guid = 425716719
Name = Player 2
Last Msg = 3250
Port = 5978
Rate = 25000

Split long string for each colon ":" and get index of the line by position

im struggling with the understanding of using Split method to receive my desired texts
im receiving long registration string from user and im trying to split it by colon : and for each colon found i want to get all the text until /n in the line
The string i'm receiving from the user is formatted like this example:
"Username: Jony \n
Fname: Dep\n
Address: Los Angeles\n
Age: 28\n
Date: 11/01:2001\n"
Thats my approche until now didnt figurate out how it works and didnt found question similler like my question
str = the long string
List<string> names = str.ToString().Split(':').ToList<string>();
names.Reverse();
var result = names[0].ToString();
var result1 = names[1].ToString();
Console.WriteLine(result.Remove('\n').Replace(" ",string.Empty));
Console.WriteLine(result1.Remove('\n').Replace(" ",string.Empty));

Benchmarks
----------------------------------------------------------------------------
Mode : Release (64Bit)
Test Framework : .NET Framework 4.7.1 (CLR 4.0.30319.42000)
----------------------------------------------------------------------------
Operating System : Microsoft Windows 10 Pro
Version : 10.0.17134
----------------------------------------------------------------------------
CPU Name : Intel(R) Core(TM) i7-3770K CPU # 3.50GHz
Description : Intel64 Family 6 Model 58 Stepping 9
Cores (Threads) : 4 (8) : Architecture : x64
Clock Speed : 3901 MHz : Bus Speed : 100 MHz
L2Cache : 1 MB : L3Cache : 8 MB
----------------------------------------------------------------------------
Results
--- Random characters -------------------------------------------------
| Value | Average | Fastest | Cycles | Garbage | Test | Gain |
--- Scale 1 -------------------------------------------- Time 1.152 ---
| split | 4.975 µs | 4.091 µs | 20.486 K | 0.000 B | N/A | 71.62 % |
| regex | 17.530 µs | 14.029 µs | 65.707 K | 0.000 B | N/A | 0.00 % |
-----------------------------------------------------------------------
Original Answer
You could use regex , or you could simply use Split
var input = "Username: Jony\n Fname: Dep\nAddress: Los Angeles\nAge: 28\nDate: 11/01:2001\n";
var results = input.Split(new []{'\n'}, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Split(':')[1].Trim());
foreach (var result in results)
Console.WriteLine(result);
Full Demo Here
Output
Jony
Dep
Los Angeles
28
11/01
Note : This has no error checking, so if your string doesn't contain a Colon, it will break
Additional Resources
String.Split Method
Returns a string array that contains the substrings in this instance
that are delimited by elements of a specified string or Unicode
character arr
StringSplitOptions Enum
Specifies whether applicable Split method overloads include or omit
empty substrings from the return value
String.Trim Method
Returns a new string in which all leading and trailing occurrences of
a set of specified characters from the current String object are
removed.
Enumerable.Select Method
Projects each element of a sequence into a new form.

You can use a regex to find the matches after colon and up to the Newline character:
(?<=:)\s*[^\n]*
The regex uses a look back, ensuring there's a colon in front of the string, then it matches everything not being Newline = rest of line.
Use it like this:
string searchText = "Username: Jony\n
Fname: Dep\n
Address: Los Angeles\n
Age: 28\n
Date: 11/01:2001\n";
Regex myRegex = new Regex("(?<=:)\s*[^\n]*");
foreach (Match match in myRegex.Matches(searchText))
{
DoSomething(match.Value);
}

C# string.format add a "-" value?

I have a string.format issue ...
I'm trying to pass my invoice ID as an arguments to my program ... and the 6th argument always end up with "-" no matter what I do ( we must use the ¿ because of an old program ) ...
public static void OpenIdInvoice(string wdlName, string IdInvoice, Form sender){
MessageBox.Show(string.Format("¿{0}",IdInvoice));
proc.Arguments = string.Format("{0}¿{1}¿{2}¿{3}¿{4}¿{5}",
session.SessionId.ToString(),
Session.GetCurrentDatabaseName(),
session.Librairie,
wdlName,
"",
IdInvoice
);
System.Windows.Forms.MessageBox.Show(proc.Arguments);
In the end, "-" is always added to my formatted result, but only before my IdInvoice ... (so Id 10 ends up -10 in my Arguments )
now the fun part ... I hardcode some string and ...
if I pass -1 instead of an Id, I have --1 as a result and If I write "banana" ... i get "-banana" ...
I know I could just build the string otherwise ... but I'm getting curious as to why it happens.
Here's the screenshot ...
EDIT :
thats the copy/paste of my code
var proc = new System.Diagnostics.ProcessStartInfo("Achat.exe");
System.Windows.Forms.MessageBox.Show(string.Format("¿{0}",args));
proc.Arguments = string.Format(#"{0}¿{1}¿{2}¿{3}¿{4}¿{5}¿{6}",
"12346", //session.SessionId.ToString(),
"fake DB",//Session.GetCurrentDatabaseName().ToString(),
"false", //session.Librairie.ToString(),
"myScreenName", //wdl.ToString(),
"123456",
"Banana",
"123456"
//args.ToString(),
);
System.Windows.Forms.MessageBox.Show(proc.Arguments);
System.Windows.Forms.MessageBox.Show(args);
and thats the copy/paste of my text visualiser result :
12346¿fake DB¿false¿myScreenName¿123456¿Banana¿123456

You literally have an extra character before "{5}" that's called a soft hyphen. It's one of those weird characters that isn't always displayed. If you place your cursor after the "{" in "{5}" and press the left arrow and then press backspace it will actually delete it. That or you can try to use an editor like Notepad++ that will display it. I was able to find it by running the following code
var t = #"{0}¿{1}¿{2}¿{3}¿{4}¿{5}";
foreach (var c in t)
{
Console.WriteLine((int)c + " " + c);
}
which printed out
123 {
48 0
125 }
191 ¿
123 {
49 1
125 }
191 ¿
123 {
50 2
125 }
191 ¿
123 {
51 3
125 }
191 ¿
123 {
52 4
125 }
191 ¿
173 -
123 {
53 5
125 }

Prepend string and Suffix sting to record using CSVHelper

I need to export Entities to a CSV File using CSVHelper. I made a trial work but I would have to write every field manually. What I want is to Write a record Prepended with either an 'H' or a 'D' and end every line with a single space. My Demo models:
PersonId FirstName LastName DateOfBirth
1 Randy Smith 1968-08-31
2 Zachary Smith 2002-01-10
3 Angie Smith 1969-11-20
4 Khelzie Smith 1996-07-27
AutoId Year Make Model OwnerId
1 2000 Toyota 4Runner 1
2 1995 Ford Mustang 1
3 2014 Chevrolet Corvette Stingray Coupe 2
4 2014 Volkswagen Beetle Coupe 4
5 1980 Ford F-150 2
6 1968 Chevrolet Camaro 3
7 2000 Tonka Truck 3
8 1993 Honda Accord 4
Into a CSV File Like this:
H 1 Randy Smith 8/31/1968
D 1 2000 Toyota 4Runner
D 2 1995 Ford Mustang
H 2 Zachary Smith 1/10/2002
D 3 2014 Chevy Corevett
D 5 1980 Ford F-150
H 3 Angie Smith 11/20/1969
D 6 1968 Chevrolet Camaro
D 7 2000 Tonka Truck
H 4 Khelzie Smith 7/27/1996
D 4 2014 Volkswagen Beetle Coupe
This is the Code I finally got to work:
StreamWriter textWriter = File.CreateText(fileName);
var csv = new CsvWriter(textWriter);
csv.Configuration.Delimiter = delimiter;
csv.Configuration.QuoteNoFields = true;
// This will skip those people who don't own a vehicle
foreach (Person person in people.Where(person => person.Vehicles.Count > 0))
{
// The letter 'H' must prefix every Header line
csv.WriteField((#"H " + person.PersonId));
csv.WriteField(person.FirstName);
csv.WriteField(person.LastName);
// Headers lines must end with a single space.
csv.WriteField((person.DateOfBirth.ToShortDateString() + " "));
csv.NextRecord();
foreach (Automobile auto in person.Vehicles)
{
// The letter 'D' must prefix every Detail line
csv.WriteField((#"D " + auto.AutoId));
csv.WriteField(auto.Year);
csv.WriteField(auto.Make);
// Details lines must end with a single space.
csv.WriteField((auto.Model + " "));
csv.NextRecord();
}
}
The real tables have ~70 fields apiece.

Just for those that have as thick a skull as mine, here is a solution:
foreach (TransactionHeader header in headers)
{
csv.WriteField("H");
csv.WriteRecord(header);
csv.WriteField(" ");
csv.NextRecord();
foreach (TransactionDetail detail in header.TransactionDetail)
{
csv.WriteField("D");
csv.WriteRecord(detail);
csv.WriteField(" ");
csv.NextRecord();
}
}
Thanks to everyone who saw this as pretty obvious and patiently waited for me to bash my head down on my desk enough times and then figure this out myself.

Best way to Find which cell of string array contins text

I have a block of text that im taking from a Gedcom (Here and Here) File
The text is flat and basically broken into "nodes"
I am splitting each node on the \r char and thus subdividing it into each of its parts( amount of "lines" can vary)
I know the 0 address will always be the ID but after that everything can be anywhere so i want to test each Cell of the array to see if it contains the correct tag for me to proccess
an example of what two nodes would look like
0 #ind23815# INDI <<<<<<<<<<<<<<<<<<< Start of node 1
1 NAME Lawrence /Hucstepe/
2 DISPLAY Lawrence Hucstepe
2 GIVN Lawrence
2 SURN Hucstepe
1 POSITION -850,-210
2 BOUNDARY_RECT (-887,-177),(-813,-257)
1 SEX M
1 BIRT
2 DATE 1521
1 DEAT Y
2 DATE 1559
1 NOTE * Born: Abt 1521, Kent, England
2 CONT * Marriage: Jane Pope 17 Aug 1546, Kent, England
2 CONT * Died: Bef 1559, Kent, England
2 CONT
1 FAMS #fam08318#
0 #ind23816# INDI <<<<<<<<<<<<<<<<<<<<<<< Start of Node 2
1 NAME Jane /Pope/
2 DISPLAY Jane Pope
2 GIVN Jane
2 SURN Pope
1 POSITION -750,-210
2 BOUNDARY_RECT (-787,-177),(-713,-257)
1 SEX F
1 BIRT
2 DATE 1525
1 DEAT Y
2 DATE 1609
1 NOTE * Born: Abt 1525, Tenterden, Kent, England
2 CONT * Marriage: Lawrence Hucstepe 17 Aug 1546, Kent, England
2 CONT * Died: 23 Oct 1609
2 CONT
1 FAMS #fam08318#
0 #ind23817# INDI <<<<<<<<<<< start of Node 3
So a when im done i have an array that looks like
address , string
0 = "1 NAME Lawrence /Hucstepe/"
1 = "2 DISPLAY Lawrence Hucstepe"
2 = "2 GIVN Lawrence"
3 = "2 SURN Hucstepe"
4 = "1 POSITION -850,-210"
5 = "2 BOUNDARY_RECT (-887,-177),(-813,-257)"
6 = "1 SEX M"
7 = "1 BIRT "
8 = "1 FAMS #fam08318#"
So my question is what is the best way to search the above array to see which Cell has the SEX tag or the NAME Tag or the FAMS Tag
this is the code i have
private int FindIndexinArray(string[] Arr, string search)
{
int Val = -1;
for (int i = 0; i < Arr.Length; i++)
{
if (Arr[i].Contains(search))
{
Val = i;
}
}
return Val;
}
But it seems inefficient because i end up calling it twice to make sure it doesnt return a -1
Like so
if (FindIndexinArray(SubNode, "1 BIRT ") != -1)
{
// add birthday to Struct
I.BirthDay = SubNode[FindIndexinArray(SubNode, "1 BIRT ") + 1].Replace("2 DATE ", "").Trim();
}
sorry this is a longer post but hopefully you guys will have some expert advice

Can use the static method FindAll of the Array class:
It will return the string itself though, if that works..
string[] test = { "Sex", "Love", "Rock and Roll", "Drugs", "Computer"};
Array.FindAll(test, item => item.Contains("Sex") || item.Contains("Drugs") || item.Contains("Computer"));
The => indicates a lamda expression. Basically a method without a concrete implementation.
You can also do this if the lamda gives you the creeps.
//Declare a method
private bool HasTag(string s)
{
return s.Contains("Sex") || s.Contains("Drugs") || s.Contains("Computer");
}
string[] test = { "Sex", "Love", "Rock and Roll", "Drugs", "Computer"};
Array.FindAll(test, HasTag);

What about a simple regular expression?
^(\d)\s=\s\"\d\s(SEX|BIRT|FAMS){1}.*$
First group captures the address, second group the tag.
Also, it might be quicker to dump all array items into a string and do your regex on the whole lot at once.

"But it seems inefficient because i end up calling it twice to make sure it doesnt return a -1"
Copy the returned value to a variable before you test to prevent multiple calls.
IndexResults = FindIndexinArray(SubNode, "1 BIRT ")
if (IndexResults != -1)
{
// add birthday to Struct
I.BirthDay = SubNode[IndexResults].Replace("2 DATE ", "").Trim();
}

The for loop in method FindIndexinArray shd break once you find a match if you are interested in only the first match.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to split complex text file in C#? [duplicate] - c#

read and skip 2 lines read next line, split on " ", select 5th entry, convert to int repeat above till EOF

Related

Find multiple values and strings within another string in C#

Split long string for each colon ":" and get index of the line by position

C# string.format add a "-" value?

Prepend string and Suffix sting to record using CSVHelper

Best way to Find which cell of string array contins text

Categories

Resources