Best way to Find which cell of string array contins text - c#

I have a block of text that im taking from a Gedcom (Here and Here) File
The text is flat and basically broken into "nodes"
I am splitting each node on the \r char and thus subdividing it into each of its parts( amount of "lines" can vary)
I know the 0 address will always be the ID but after that everything can be anywhere so i want to test each Cell of the array to see if it contains the correct tag for me to proccess
an example of what two nodes would look like
0 #ind23815# INDI <<<<<<<<<<<<<<<<<<< Start of node 1
1 NAME Lawrence /Hucstepe/
2 DISPLAY Lawrence Hucstepe
2 GIVN Lawrence
2 SURN Hucstepe
1 POSITION -850,-210
2 BOUNDARY_RECT (-887,-177),(-813,-257)
1 SEX M
1 BIRT
2 DATE 1521
1 DEAT Y
2 DATE 1559
1 NOTE * Born: Abt 1521, Kent, England
2 CONT * Marriage: Jane Pope 17 Aug 1546, Kent, England
2 CONT * Died: Bef 1559, Kent, England
2 CONT
1 FAMS #fam08318#
0 #ind23816# INDI <<<<<<<<<<<<<<<<<<<<<<< Start of Node 2
1 NAME Jane /Pope/
2 DISPLAY Jane Pope
2 GIVN Jane
2 SURN Pope
1 POSITION -750,-210
2 BOUNDARY_RECT (-787,-177),(-713,-257)
1 SEX F
1 BIRT
2 DATE 1525
1 DEAT Y
2 DATE 1609
1 NOTE * Born: Abt 1525, Tenterden, Kent, England
2 CONT * Marriage: Lawrence Hucstepe 17 Aug 1546, Kent, England
2 CONT * Died: 23 Oct 1609
2 CONT
1 FAMS #fam08318#
0 #ind23817# INDI <<<<<<<<<<< start of Node 3
So a when im done i have an array that looks like
address , string
0 = "1 NAME Lawrence /Hucstepe/"
1 = "2 DISPLAY Lawrence Hucstepe"
2 = "2 GIVN Lawrence"
3 = "2 SURN Hucstepe"
4 = "1 POSITION -850,-210"
5 = "2 BOUNDARY_RECT (-887,-177),(-813,-257)"
6 = "1 SEX M"
7 = "1 BIRT "
8 = "1 FAMS #fam08318#"
So my question is what is the best way to search the above array to see which Cell has the SEX tag or the NAME Tag or the FAMS Tag
this is the code i have
private int FindIndexinArray(string[] Arr, string search)
{
int Val = -1;
for (int i = 0; i < Arr.Length; i++)
{
if (Arr[i].Contains(search))
{
Val = i;
}
}
return Val;
}
But it seems inefficient because i end up calling it twice to make sure it doesnt return a -1
Like so
if (FindIndexinArray(SubNode, "1 BIRT ") != -1)
{
// add birthday to Struct
I.BirthDay = SubNode[FindIndexinArray(SubNode, "1 BIRT ") + 1].Replace("2 DATE ", "").Trim();
}
sorry this is a longer post but hopefully you guys will have some expert advice

Can use the static method FindAll of the Array class:
It will return the string itself though, if that works..
string[] test = { "Sex", "Love", "Rock and Roll", "Drugs", "Computer"};
Array.FindAll(test, item => item.Contains("Sex") || item.Contains("Drugs") || item.Contains("Computer"));
The => indicates a lamda expression. Basically a method without a concrete implementation.
You can also do this if the lamda gives you the creeps.
//Declare a method
private bool HasTag(string s)
{
return s.Contains("Sex") || s.Contains("Drugs") || s.Contains("Computer");
}
string[] test = { "Sex", "Love", "Rock and Roll", "Drugs", "Computer"};
Array.FindAll(test, HasTag);

What about a simple regular expression?
^(\d)\s=\s\"\d\s(SEX|BIRT|FAMS){1}.*$
First group captures the address, second group the tag.
Also, it might be quicker to dump all array items into a string and do your regex on the whole lot at once.

"But it seems inefficient because i end up calling it twice to make sure it doesnt return a -1"
Copy the returned value to a variable before you test to prevent multiple calls.
IndexResults = FindIndexinArray(SubNode, "1 BIRT ")
if (IndexResults != -1)
{
// add birthday to Struct
I.BirthDay = SubNode[IndexResults].Replace("2 DATE ", "").Trim();
}

The for loop in method FindIndexinArray shd break once you find a match if you are interested in only the first match.

Related

Replacing value before space?

I want to replace only the value before space
for example:
1. 1 3
2. 23 5
3. 650 300
4. 1350 19
would be:
1. 2 3
2. 55 5
3. 950 300
4. 5602 19
I only need to change the value before space... after space should remain same. Every value is in a separate row. Before space value can be 1 to 4 digits and after space value can be 1 to 3 digits.
string num = "650 3";
string afterspace = num.Substring(0, 4);
Console.WriteLine(afterspace);
string beforespace = num.Substring(4);
Console.WriteLine(beforespace);
If this is a space separated string, you can try the bewlo approach..
var arr = str.Split(' ');
arr[0] = newValue;//here you can use the index and new value to assign the new value.
str = string.Join(" ",arr);

Find multiple values and strings within another string in C#

So I have this string with 4 lines:
id score ping guid name lastmsg address qport rate
--- ----- ---- ---------- --------------- ------- --------------------- ------ -----
1 11 45 176387877 Player 1 3250 101.102.103.104:555 3647 25000
2 23 61 425716719 Player 2 3250 105.106.107.108:555 5978 25000
How can I 'extract' all of these values? Like, I want to save "id", "score", "ping", "guid", "name", etc.
I have played around with a "GetBetween" function I found here. I also tried to learn the string.Split function. But I don't think I'm getting close to what I want to archive, also I don't really understand splitting a string quite yet.
I basically need to remove all of the " " empty spaces between the values, problem is, the value length may change, e.g "name".
Can someone give me an example how I could extract the values?
Thanks in advance!
RegEx.Split is your friend, and this works well enough.
void Main()
{
// fun fact, the # in front of the string means it's literal, so you
// literally get the new lines
var input =
#"id score ping guid name lastmsg address qport rate
-- - -------------------------------------------------------------------------
1 11 45 176387877 Player 1 3250 101.102.103.104:555 3647 25000
2 23 61 425716719 Player 2 3250 105.106.107.108:555 5978 25000";
//Gets you each line
var lines = input.Split('\n');
// Skip 2 because I doubt you care about the column title
// or the row with the dashes
foreach (var line in lines.Skip(2))
{
// For each line, Regex split will return an array with each entry
// Set a breakpoint with the debugger and inspect to see what I mean.
// Splits using regex - assumes at least 2 spaces between items
// so space in 'Player 1' is handled it's a fickle solution though
// Trim the line before RegEx split to avoid extra data in the split
var r = Regex.Split(line.Trim(), #"\s{2,}");
}
}
You can do this with Regex and named groups.
Sample Input
var str = #"id score ping guid name lastmsg address qport rate
--- ----- ---- ---------- --------------- ------- --------------------- ------ -----
1 11 45 176387877 Player 1 3250 101.102.103.104:555 3647 25000
2 23 61 425716719 Player 2 3250 105.106.107.108:555 5978 25000";
Regex Definition
var regex = new Regex(#"^(?<id>[\d]+)(\s{2,})(?<score>[\d]+)(\s{2,})(?<ping>[\d]+)(\s{1,})(?<guid>[\d]+)(\s{2,})(?<name>([\w]+\s[\w]+))(\s{2,})(?<lastmsg>[\d]+)(\s{2,})(?<ip>[\d.:]+)(\s{2,})(?<port>[\d]+)(\s{2,})(?<rate>[\d]+)$",RegexOptions.Compiled);
Parsing Code
var lines = str.Split(new []{Environment.NewLine},StringSplitOptions.RemoveEmptyEntries);
foreach(var line in lines)
{
var match = regex.Match(line.Trim());
if(!match.Success) continue;
Console.WriteLine($"ID = {match.Groups["id"].Value}");
Console.WriteLine($"Score = {match.Groups["score"].Value}");
Console.WriteLine($"Ping = {match.Groups["ping"].Value}");
Console.WriteLine($"Guid = {match.Groups["guid"].Value}");
Console.WriteLine($"Name = {match.Groups["name"].Value}");
Console.WriteLine($"Last Msg = {match.Groups["lastmsg"].Value}");
Console.WriteLine($"Port = {match.Groups["port"].Value}");
Console.WriteLine($"Rate = {match.Groups["rate"].Value}");
}
Output
ID = 1
Score = 11
Ping = 45
Guid = 176387877
Name = Player 1
Last Msg = 3250
Port = 3647
Rate = 25000
ID = 2
Score = 23
Ping = 61
Guid = 425716719
Name = Player 2
Last Msg = 3250
Port = 5978
Rate = 25000

To Count Occurrences of all sub strings in string C#

Question: I have a long string and I require to find the count of occurrences of all sub strings present under that string and print a list of all sub strings and their count (if count is > 1) in decreasing order of count.
Example:
String = "abcdabcd"
Result:
Substrings Count
abcd 2
abc 2
bcd 2
ab 2
bc 2
cd 2
a 2
b 2
c 2
d 2
Problem: My string can be 5000 character long and I am not able to find a efficient way to achieve this.( Efficiency is very important for application)
Is there any algorithm present or by multi threading it is possible. please help.
Example using: Find a common string within a list of strings
void Main()
{
"abcdabcd".getAllSubstrings()
.AsParallel()
.GroupBy(x => x)
.Select(g => new {g.Key, count=g.Count()})
.Dump();
}
// Define other methods and classes here
public static class Ext
{
public static IEnumerable<string> getAllSubstrings(this string word)
{
return from charIndex1 in Enumerable.Range(0, word.Length)
from charIndex2 in Enumerable.Range(0, word.Length - charIndex1 + 1)
where charIndex2 > 0
select word.Substring(charIndex1, charIndex2);
}
}
Produces:
a 2
dabc 1
abcdabc 1
b 2
abc 2
dabcd 1
bc 2
bcda 1
abcd 2
ab 2
bcdab 1
cdabc 1
abcda 1
d 2
bcdabc 1
dab 1
bcd 2
abcdab 1
c 2
bcdabcd 1
abcdabcd 1
cd 2
da 1
cdab 1
cda 1
cdabcd 1

Prepend string and Suffix sting to record using CSVHelper

I need to export Entities to a CSV File using CSVHelper. I made a trial work but I would have to write every field manually. What I want is to Write a record Prepended with either an 'H' or a 'D' and end every line with a single space. My Demo models:
PersonId FirstName LastName DateOfBirth
1 Randy Smith 1968-08-31
2 Zachary Smith 2002-01-10
3 Angie Smith 1969-11-20
4 Khelzie Smith 1996-07-27
AutoId Year Make Model OwnerId
1 2000 Toyota 4Runner 1
2 1995 Ford Mustang 1
3 2014 Chevrolet Corvette Stingray Coupe 2
4 2014 Volkswagen Beetle Coupe 4
5 1980 Ford F-150 2
6 1968 Chevrolet Camaro 3
7 2000 Tonka Truck 3
8 1993 Honda Accord 4
Into a CSV File Like this:
H 1 Randy Smith 8/31/1968
D 1 2000 Toyota 4Runner
D 2 1995 Ford Mustang
H 2 Zachary Smith 1/10/2002
D 3 2014 Chevy Corevett
D 5 1980 Ford F-150
H 3 Angie Smith 11/20/1969
D 6 1968 Chevrolet Camaro
D 7 2000 Tonka Truck
H 4 Khelzie Smith 7/27/1996
D 4 2014 Volkswagen Beetle Coupe
This is the Code I finally got to work:
StreamWriter textWriter = File.CreateText(fileName);
var csv = new CsvWriter(textWriter);
csv.Configuration.Delimiter = delimiter;
csv.Configuration.QuoteNoFields = true;
// This will skip those people who don't own a vehicle
foreach (Person person in people.Where(person => person.Vehicles.Count > 0))
{
// The letter 'H' must prefix every Header line
csv.WriteField((#"H " + person.PersonId));
csv.WriteField(person.FirstName);
csv.WriteField(person.LastName);
// Headers lines must end with a single space.
csv.WriteField((person.DateOfBirth.ToShortDateString() + " "));
csv.NextRecord();
foreach (Automobile auto in person.Vehicles)
{
// The letter 'D' must prefix every Detail line
csv.WriteField((#"D " + auto.AutoId));
csv.WriteField(auto.Year);
csv.WriteField(auto.Make);
// Details lines must end with a single space.
csv.WriteField((auto.Model + " "));
csv.NextRecord();
}
}
The real tables have ~70 fields apiece.
Just for those that have as thick a skull as mine, here is a solution:
foreach (TransactionHeader header in headers)
{
csv.WriteField("H");
csv.WriteRecord(header);
csv.WriteField(" ");
csv.NextRecord();
foreach (TransactionDetail detail in header.TransactionDetail)
{
csv.WriteField("D");
csv.WriteRecord(detail);
csv.WriteField(" ");
csv.NextRecord();
}
}
Thanks to everyone who saw this as pretty obvious and patiently waited for me to bash my head down on my desk enough times and then figure this out myself.

Regarding finding middle element of a string

The below snippet takes a string as input. What I am trying is to get middle 2 elements of the string if the length is even.
string input = "confir";
string op = "";
op = input.Substring((input.Length - 1) / 2,input.Length/2 -1);//logic
Console.WriteLine(op);//display the output
Output for above snippet is nf.
When input is changed to confirme, output should be fi and not fir
How do I generalize? What is the error in the logic?
Second argument of String.Substring is a length of substring, not index like in Java. So if you need to get substring of two characters, pass 2:
string input = "confirme";
string op = input.Substring((input.Length - 1) / 2, 2);
BTW you should handle case when string is less than 2 characters long:
string op = input.Substring((input.Length - 1) / 2, Math.Min(input.Length, 2));
Tests:
input | op |
---------------------
"" | "" |
"c" | "c" |
"co" | "co" |
"con" | "on" |
"conf" | "on" |
"confir" | "nf" |
"confirme" | "fi" |
string input = "confir";
if(input.Length % 2 == 0)
Console.WriteLine(input.Substring((input.Length / 2)-1, 2));
That should give you the expected result.First check whether the string length is even then instead of (input.Length - 1) / 2, divide length by 2 and subtract 1,then take two characters like this: (input.Length / 2) - 1
Problem : you are providing total length of the string as second argument to the Substring() method.
Solution : Substring() method takes total number of characters tobe removed starting from first argument .
From MSDN : Substring(Int32, Int32)
Retrieves a substring from this instance. The substring starts at a
specified character position and has a specified length.
Replace This:
op = input.Substring((input.Length - 1) / 2,input.Length/2 -1);//logic
With This:
op = input.Substring((input.Length - 1) / 2,2);//logic
Suggestion : you need to check for Empty String and for Even number of charatcters in String.
Complete Code:
string input = "confirme";
string op = "";
if (input.Length > 0 && input.Length % 2 == 0)
{
op = input.Substring((input.Length - 1) / 2, 2);//logic
Console.WriteLine(op);//display the output
}

Categories

Resources