split a string into 2 arrays based on 2 delimiters - c#

I want to split a string into 2 arrays, one with the text that's delimited by vbTab (I think it's \t in c#) and another string with the test thats delimited by vbtab (I think it's \n in c#).
By searching I found this (StackOverFlow Question: 1254577):
string input = "abc][rfd][5][,][.";
string[] parts1 = input.Split(new string[] { "][" }, StringSplitOptions.None);
string[] parts2 = Regex.Split(input, #"\]\[");
but my string would be something like this:
aaa\tbbb\tccc\tddd\teee\nAccount\tType\tCurrency\tBalance\t123,456.78\nDate\tDetails\tAmount\n03NOV13\tTransfer\t9,999,999.00-\n02NOV13\t\Cheque\t125.00\nDebit Card Cash\t200.00
so in the above code input becomes:
string input = "aa\tbbb\tccc\tddd\teee\nAccount\tType\tPersonal Current Account\tCurrency\tGBP\tBalance\t123,456.78\nDate\tDetails\tAmount\n03NOV13\tTransfer\t9,999,999.00-\n02NOV13\t\Cheque\t125.00\nDebit Card Cash\t200.00\n30OCT13\tLoan Repayment\t1,234.56-\n\tType\t30-Day Notice Savings Account\tCurrency\tGBP\tBalance\t983,456.78\nDate\tDetails\tAmount\n03NOV13\tRepaid\t\250\n"
but how do I create one string array with everthing up to the first newline and another array that holds everything after?
Then the second one will have to be split again into several string arrays so I can write out a mini-statement with the account details, then showing the transactions for each account.
I want to be able to take the original string and produce something like this on A5 paper:

You can use a LINQ query:
var cells = from row in input.Split('\n')
select row.Split('\t');
You can get just the first row using First() and the remaining rows using Skip(). For example:
foreach (string s in cells.First())
{
Console.WriteLine("First: " + s);
}
Or
foreach (string[] row in cells.Skip(1))
{
Console.WriteLine(String.Join(",", row));
}

The code below should do what you requested. This resulted in part1 having 5 entries and part2 having 26 entries
string input = "aa\tbbb\tccc\tddd\teee\nAccount\tType\tPersonal Current Account\tCurrency\tGBP\tBalance\t123,456.78\nDate\tDetails\tAmount\n03NOV13\tTransfer\t9,999,999.00-\n02NOV13\t\Cheque\t125.00\nDebit Card Cash\t200.00\n30OCT13\tLoan Repayment\t1,234.56-\n\tType\t30-Day Notice Savings Account\tCurrency\tGBP\tBalance\t983,456.78\nDate\tDetails\tAmount\n03NOV13\tRepaid\t\250\n";
// Substring starting at 0 and ending where the first newline begins
string input1 = input.Substring(0, input.IndexOf(#"\n"));
/* Substring starting where the first newline begins
plus the length of the new line to the end */
string input2 = input.Substring(input.IndexOf(#"\n") + 2);
string[] part1 = Regex.Split(input1, #"\\t");
string[] part2 = Regex.Split(input2, #"\\t");

Related

Split text into two sentences in C#

I want to divide a text into sentences.the sentence contains whitespace characters
For example:
Orginal sentence: 100 10 20 13
the result:
first sentence:100 10 20
second sentence:13
I tried split but the result was :
first:100
second:10
third:20
fourth:13
How can I do that?
You want all before the last space and the rest? You can use String.LastIndexOf and Substring:
string text = "100 10 20 13";
string firstPart = text;
string lastPart;
int lastSpaceIndex = text.LastIndexOf(' ');
if(lastSpaceIndex >= 0)
{
firstPart = text.Substring(0, lastSpaceIndex);
lastPart = text.Substring(lastSpaceIndex).TrimStart();
}
You can use Linq for this;
// This splits on white space
var split = original.Split(' ');
// This takes all split parts except for the last one
var first = split.Take(split.Count() - 1);
// And rejoins it
first = String.Join(" ", first);
// This gets the last one
var last = split.Last();
Note: This is assuming that you want the first result to be every word except for the last and the second result to be only the last... If you have different requirements please clarify your question

C# Replace Everything before the first Space

I need to remove everything in a string before the first occurrence of a space.
Every string starts with a number and followed by a space
Replace the number and the space, thus leaving the rest of the string in tact
For Example:
22 The cats of India
4 Royal Highness
562 Eating Potatoes
42 Biscuits in the 2nd fridge
2564 Niagara Falls at 2 PM
I just need:
The cats of India
Royal Highness
Eating Potatoes
Biscuits in the 2nd fridge
Niagara Falls at 2 PM
Basically remove every number before the first space, including the first space.
I tried this:
foreach (string line in lines)
{
string newline = line.Trim().Remove(0, line.IndexOf(' ') + 1);
}
This works for numbers below 10. After it hits 2 digits, it doesn't work properly.
How should I change my code?
If you want to make sure you only match digits at the beginning of the string, you can use the following regex:
^\d+\p{Zs}
See demo
Declare it like:
public static readonly Regex rx = new Regex(#"^\d+\p{Zs}", RegexOptions.Compiled);
The ^\d+\p{Zs} regex means: one or more digits at the start of the string followed with 1 whitespace.
And then use it like
string newline = rx.Replace(line, string.Empty);
EDIT: To make sure the line has no leading whitespace, we can add .Trim() to strip it like:
Regex rx = new Regex(#"^\d+\p{Zs}", RegexOptions.Compiled);
string newline = rx.Replace(line.Trim(), string.Empty);
I know you already found a resolution to your issue. But I am going to explain why your code didn't work in the first place.
Your data has extra spaces which is why you are trimming it: line.Trim(). But the real problem lies in the the following statement:
string newline = line.Trim().Remove(0, line.IndexOf(' ') + 1);
You are making the assumption about the order of the operation and the fact that string data type is not immutable. When the operation of Trim() function is complete it returns a whole new string which is used in the Remove() operation. But the IndexOf() function is done on the original line of data.
So the correct line of code would be the following:
foreach (string line in lines)
{
// trim the line first
var temp = line.Trim();
// now perform all operation on the new temporary string
string newline = temp.Remove(0, temp.IndexOf(' ') + 1);
// debugging purpose
Console.WriteLine(newline);
}
Another solution:
var lines = new string[]
{
"22 The cats of India",
"4 Royal Highness",
"562 Eating Potatoes",
"42 Biscuits in the 2nd fridge",
"2564 Niagara Falls at 2 PM"
};
foreach (var line in lines)
{
var newLine = string.Join(" ", line.Split(' ').Skip(1));
}
Use a regex like so:
string newline = Regex.Replace(line, #"^\s*\d+\s*", "");
This will remove numbers only, not other text before the first space.
This is what you are looking for
foreach (string line in lines)
{
string newline = line.Replace(line.Split(new Char[]{' '})[0] + ' ',string.Empty);
}
UPDATE
string search=line.Split(new Char[]{' '})[0];
int pos=line.indexOf(search);
string newline = line.Substring(0, pos) + string.Empty + line.Substring(pos + search.Length);
FULL CODE
using System;
public class Program
{
public static void Main()
{
var lines = new string[]
{
"22 The cats of India",
"4 Royal Highness",
"562 Eating Potatoes",
"42 Biscuits in the 2nd fridge",
"2 Niagara Falls at 2 PM"
};
foreach(string line in lines){
string search=line.Split(new Char[]{' '})[0];
int pos=line.IndexOf(search);
string newline = line.Substring(0, pos) + string.Empty + line.Substring(pos + search.Length);
Console.WriteLine(newline);
}
}
}

getting string and numbers

I got a string
string newString = "[17, Appliance]";
how can I put the 17 and Appliance in two separate variables while ignoring the , and the [ and ]?
I tried looping though it but the loop doesn't stop when it reaches the ,, not to mention it separated 1 & 7 instead of reading it as 17.
For example, you could use this:
newString.Split(new[] {'[', ']', ' ', ','}, StringSplitOptions.RemoveEmptyEntries);
This is another option, even though I wouldn't go with it, especially if you might have more than one [something, anothersomething] in the string.
But there you go:
string newString = "assuming you might [17, Appliance] have it like this";
int first = newString.IndexOf('[')+1; // location of first after the `[`
int last = newString.IndexOf(']'); // location of last before the ']'
var parts = newString.Substring(first, last-first).Split(','); // an array of 2
var int_bit = parts.First ().Trim(); // you could also go with parts[0]
var string_bit = parts.Last ().Trim(); // and parts[1]
This may not be the most performant method, but I'd go with it for ease of understanding.
string newString = "[17, Appliance]";
newString = newString.Replace("[", "").Replace("]",""); // Remove the square brackets
string[] results = newString.Split(new string[] { ", " }, StringSplitOptions.RemoveEmptyEntries); // Split the string
// If your string is always going to contain one number and one string:
int num1 = int.Parse(results[0]);
string string1 = results[1];
You'd want to include some validation to ensure your first element is indeed a number (use int.TryParse), and that there are indeed two elements returned after you split the string.

How do I know which delimiter was used when delimiting a string on multiple delimiters? (C#)

I read strings from a file and they come in various styles:
item0 item1 item2
item0,item1,item2
item0_item1_item2
I split them like this:
string[] split_line = line[i].split(new char[] {' ',',','_'});
I change an item (column) and then i stitch the strings back together using string builder.
But now when putting the string back I have to use the right delimiter.
Is it possible to know which delimiter was used when splitting the string?
UPDATE
the caller will pass me the first item so that I only change that line.
Unless you keep track of splitting action (one at the time) you don't.
Otherwise, you could create a regular expression, to catch the item and the delimiter and go from there.
Instead of passing in an array of characters, you can use a Regex to split the string instead. The advantage of doing this, is that you can capture the splitting character. Regex.Split will insert any captures between elements in the array like so:
string[] space = Regex.Split("123 456 789", #"([,_ ])");
// Results in { "123", " ", "456", " ", "789" }
string[] comma = Regex.Split("123,456,789", #"([,_ ])");
// Results in { "123", ",", "456", ",", "789" }
string[] underscore = Regex.Split("123_456_789", #"([,_ ])");
// Results in { "123", "_", "456", "_", "789" }
Then you can edit all items in the array with something like
for (int x = 0; x < space.Length; x += 2)
space[x] = space[x] + "x";
Console.WriteLine(String.Join("", space));
// Will print: 123x 456x 789x
One thing to be wary of when dealing with multiple separators is if there are any lines that have spaces, commas and underscores in them. e.g.
37,hello world,238_3
This code will preserve all the distinct separators but your results might not be expected. e.g. the output of the above would be:
37x,hellox worldx,238x_3x
As I mentioned that the caller passes me the first item so I tried something like this:
// find the right row
if (lines[i].ToLower().StartsWith(rowID))
{
// we have to know which delim was used to split the string since this will be
// used when stitching back the string together.
for (int delim = 0; delim < delims.Length; delim++)
{
// we split the line into an array and then use the array index as our column index
split_line = lines[i].Trim().Split(delims[delim]);
// we found the right delim
if (split_line.Length > 1)
{
delim_used = delims[delim];
break;
}
}
}
basically I iterate each line over the delims and check the resulting array length. If it is > 1 that means that delim worked otherwise skip to next one. I am using split functions property "If this instance does not contain any of the characters in separator, the returned array consists of a single element that contains this instance."

Extract the last word from a string using C#

My string is like this:
string input = "STRIP, HR 3/16 X 1 1/2 X 1 5/8 + API";
Here actually I want to extract the last word, 'API', and return.
What would be the C# code to do the above extraction?
Well, the naive implementation to that would be to simply split on each space and take the last element.
Splitting is done using an instance method on the String object, and the last of the elements can either be retrieved using array indexing, or using the Last LINQ operator.
End result:
string lastWord = input.Split(' ').Last();
If you don't have LINQ, I would do it in two operations:
string[] parts = input.Split(' ');
string lastWord = parts[parts.Length - 1];
While this would work for this string, it might not work for a slightly different string, so either you'll have to figure out how to change the code accordingly, or post all the rules.
string input = ".... ,API";
Here, the comma would be part of the "word".
Also, if the first method of obtaining the word is correct, that is, everything after the last space, and your string adheres to the following rules:
Will always contain at least one space
Does not end with one or more spaces (in case of this you can trim it)
Then you can use this code that will allocate fewer objects on the heap for GC to worry about later:
string lastWord = input.Substring(input.LastIndexOf(' ') + 1);
However, if you need to consider commas, semicolons, and whatnot, the first method using splitting is the best; there are fewer things to keep track of.
First:
using System.Linq; // System.Core.dll
then
string last = input.Split(' ').LastOrDefault();
// or
string last = input.Trim().Split(' ').LastOrDefault();
// or
string last = input.Trim().Split(' ').LastOrDefault().Trim();
var last = input.Substring(input.LastIndexOf(' ')).TrimStart();
This method doesn't allocate an entire array of strings as the others do.
string workingInput = input.Trim();
string last = workingInput.Substring(workingInput.LastIndexOf(' ')).Trim();
Although this may fail if you have no spaces in the string. I think splitting is unnecessarily intensive just for one word :)
static class Extensions
{
private static readonly char[] DefaultDelimeters = new char[]{' ', '.'};
public string LastWord(this string StringValue)
{
return LastWord(StringValue, DefaultDelimeters);
}
public string LastWord(this string StringValue, char[] Delimeters)
{
int index = StringValue.LastIndexOfAny(Delimeters);
if(index>-1)
return StringValue.Substring(index);
else
return null;
}
}
class Application
{
public void DoWork()
{
string sentence = "STRIP, HR 3/16 X 1 1/2 X 1 5/8 + API";
string lastWord = sentence.LastWord();
}
}
var lastWord = input.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries).Last();
string input = "STRIP, HR 3/16 X 1 1/2 X 1 5/8 + API";
var a = input.Split(' ');
Console.WriteLine(a[a.Length-1]);

Categories

Resources