Split big text string into small ones - c#

I'm reading text file into one string and then spliting the whole file into array of strings with this code:
string[] split_text = Regex.Split(whole_text, #"\W+");
But when I do that every word is alone on one index and I don't want that.
I want biger string on one index let say about 10 words on one index in array and then 10 words on second index and so on.
So if I read file of 90 words I want to have size of array 9 and on every index 10 words.

You can use Batch method:
string[] split_text = Regex.Split(whole_text, #"\W+")
.Batch(10)
.Select(x => string.Concat(x))
.ToArray();

ok there is full sollution :
class Program
{
static void Main(string[] args)
{
List<string> result = new List<string>();
string text = "Now im checking first ten chars from sentence and some random chars : asdasdasdasdasdasdasdasd";
try
{
for (int i = 0; i < text.Length; i = i + 10)
{
string res = text.Substring(i,10);
result.Add(res);
Console.WriteLine(res);
}
}
catch (Exception)
{
}
}
}
I recommend use List<string> instead of array of strings.

Related

Split string into multiple smaller strings

I have a multiline textbox that contains 10 digit mobile numbers separated by comma. I need to achieve string in group of at least 100 mobile numbers.
100 mobile numbers will be separated by 99 comma in total. What i am trying to code is to split the strings containing commas less than 100
public static IEnumerable<string> SplitByLength(this string str, int maxLength)
{
for (int index = 0; index < str.Length; index += maxLength) {
yield return str.Substring(index, Math.Min(maxLength, str.Length - index));
}
}
By using above code, I can achieve 100 numbers as 100 numbers will have 10*100(for mobile number)+99(for comma) text length. But the problem here is user may enter wrong mobile number like 9 digits or even 11 digits.
Can anyone guide me on how can I achieve this.
Thank you in advance.
You could use this extension method to put them into max-100 number groups:
public static IEnumerable<string[]> SplitByLength(this string str, string[] splitBy, StringSplitOptions options, int maxLength = int.MaxValue)
{
var allTokens = str.Split(splitBy, options);
for (int index = 0; index < allTokens.Length; index += maxLength)
{
int length = Math.Min(maxLength, allTokens.Length - index);
string[] part = new string[length];
Array.Copy(allTokens, index, part, 0, length);
yield return part;
}
}
Sample:
string text = string.Join(",", Enumerable.Range(0, 1111).Select(i => "123456789"));
var phoneNumbersIn100Groups = text.SplitByLength(new[] { "," }, StringSplitOptions.None, 100);
foreach (string[] part in phoneNumbersIn100Groups)
{
Assert.IsTrue(part.Length <= 100);
Console.WriteLine(String.Join("|", part));
}
You have a few options,
Put some kind of mask on the input data to prevent the user entering invalid data. In your UI you could then flag the error and prompt the user to reenter correct information. If you go down this route then something like this string[] nums = numbers.Split(','); will be fine.
Alternatively, you could use regex.split or regex.match and match on the pattern. Something like this should work assuming your numbers are in a string with a leading comma or space
Regex regex = new Regex("(\s|,)\d{10},)";
string[] nums = regex.Split(numbers);
This be can resolved with a simple Linq code
public static IEnumerable<string> SplitByLength(this string input, int groupSize)
{
// First split the input to the comma.
// this will give us an array of all single numbers
string[] numbers = input.Split(',');
// Now loop over this array in groupSize blocks
for (int index = 0; index < numbers.Length; index+=groupSize)
{
// Skip numbers from the starting position and
// take the following groupSize numbers,
// join them in a string comma separated and return
yield return string.Join(",", numbers.Skip(index).Take(groupSize));
}
}

Split string into 2D array

I have .txt file with 10 954 lines in the format of 6,4:10
I need to load that .txt file with StreamReader and split every line at ':' into 2D array.
To look like this
[6,4 10]
[5,2 15]
[9,3 20]
So i can later on count each column and place it in a particular category.
This is the furthest i got so far
StreamReader ulaz = new StreamReader("C:\\Users\\Core\\Desktop\\podaciC.txt");
string[] polje = new string[10954];
while (!ulaz.EndOfStream)
{
for (int i = 0; i < polje.GetLength(0); i++)
{
polje[i] = ulaz.ReadLine();
}
}
ulaz.Close();
foreach (string s in polje)
{
Console.WriteLine(s);
}
Console.ReadKey();
Here is documentation for the ReadLine() method of the StreamReader: https://msdn.microsoft.com/da-dk/library/system.io.streamreader.readline(v=vs.110).aspx
Here is documentation for the Split() method of the String: https://msdn.microsoft.com/en-us/library/system.string.split(v=vs.110).aspx
You can solve this using those methods.
I would provide code, but that wouldn't be ethical since you haven't indicated that you've tried to solve this problem. I can help you solve your homework, but I won't solve it for you.
No need to fight with arrays and StreamReader. Easy to understand example:
using System.Collections.Generic;
using System.IO;
namespace ConsoleApplication2
{
class Program
{
static void Main (string[] args)
{
var filePath = "C:\\Users\\Core\\Desktop\\podaciC.txt";
var result = new List<KeyValuePair<decimal, int>> (11000); // create list with initial capacity
var lines = File.ReadLines (filePath); // allows to read file line by line (without loading entire file into memory)
foreach (var line in lines)
{
string[] splittedLine = line.Split (':'); // split line into array of two elements
decimal key = decimal.Parse (splittedLine[0]); // parse first element
int value = int.Parse (splittedLine[1]); // parse second element
var kvp = new KeyValuePair<decimal, int> (key, value); // create tuple
result.Add (kvp);
}
}
}
}
I suggest you to create custom class like LineParser or something which will handle errors for invalid lines etc.
My example file uses these strings in it one string on each line:
6,4:10
6,4:10
6,4:10
6,4:10
Use Linq to get all lines and split them based on :
Get all lines and every line will be represented by a string all lines make up one array of strings
then select these result and split the string by and : on every index of the string array (lines).
Convert the the IEnumerable of string[] to array
string[][] array = File.ReadAllLines("C:\\t\\My File2.txt")
.Select(x => x.Split(':'))
.ToArray();
Out putting values to the console:
int counter = 0;
foreach (var rows in array)
{
foreach (var number in rows)
{
Console.Write("Number row {0} = {1}\t", counter, number);
}
Console.WriteLine();
counter++;
}
How the array looks:
string[4][]
{
string[2] { [0] => 6,4 [1] => 10 }
string[2] { [0] => 6,4 [1] => 10 }
string[2] { [0] => 6,4 [1] => 10 }
string[2] { [0] => 6,4 [1] => 10 }
}
Console Output
Number row 0 = 6,4 Number row 0 = 10
Number row 1 = 6,4 Number row 1 = 10
Number row 2 = 6,4 Number row 2 = 10
Number row 3 = 6,4 Number row 3 = 10

Splitting string into an array on 16th char [duplicate]

This question already has answers here:
Splitting a string into chunks of a certain size
(39 answers)
Split string after certain character count
(4 answers)
Closed 8 years ago.
I have a text file with various 16 char strings both appended to one another and on separate lines. I've done this
FileInfo f = new FileInfo("d:\\test.txt");
string FilePath = ("d:\\test.txt");
string FileText = new System.IO.StreamReader(FilePath).ReadToEnd().Replace("\r\n", "");
CharCount = FileText.Length;
To remove all of the new lines and create one massively appended string. I need now to split this massive string into an array. I need to split it up on the consecutive 16th char until the end. Can anyone guide me in the right direction? I've taken a look at various methods in String such as Split and in StreamReader but am confused as to what the best way to go about it would be. I'm sure it's simple but I can't figure it out.
Thank you.
Adapting the answer from here:
You could try something like so:
string longstr = "thisisaverylongstringveryveryveryveryverythisisaverylongstringveryveryveryveryvery";
IEnumerable<string> splitString = Regex.Split(longstr, "(.{16})").Where(s => s != String.Empty);
foreach (string str in splitString)
{
System.Console.WriteLine(str);
}
Yields:
thisisaverylongs
tringveryveryver
yveryverythisisa
verylongstringve
ryveryveryveryve
ry
One possible solution could look like this (extracted as extension method and made dynamic, in case different token size is needed and no hard-coded dependencies):
public static class ProjectExtensions
{
public static String[] Chunkify(this String input, int chunkSize = 16)
{
// result
var output = new List<String>();
// temp helper
var chunk = String.Empty;
long counter = 0;
// tokenize to 16 chars
input.ToCharArray().ToList().ForEach(ch =>
{
counter++;
chunk += ch;
if ((counter % chunkSize) == 0)
{
output.Add(chunk);
chunk = String.Empty;
}
});
// add the rest
output.Add(chunk);
return output.ToArray();
}
}
The standard usage (16 chars) looks like this:
// 3 inputs x 16 characters and 1 x 10 characters
var input = #"1234567890ABCDEF1234567890ABCDEF1234567890ABCDEF1234567890";
foreach (var chunk in input.Chunkify())
{
Console.WriteLine(chunk);
}
The output is:
1234567890ABCDEF
1234567890ABCDEF
1234567890ABCDEF
1234567890
Usage with different token size:
foreach (var chunk in input.Chunkify(13))
{
Console.WriteLine(chunk);
}
and the corresponding output:
1234567890ABC
DEF1234567890
ABCDEF1234567
890ABCDEF1234
567890
It is not a fancy solution (and could propably be optimised for speed), but it works and it is easy to understand and implement.
Create a list to hold your tokens. Then get subsequent substrings of length 16 and add them to the list.
List<string> tokens = new List<string>();
for(int i=0; i+16<=FileText.Length; i+=16) {
tokens.Add(FileText.Substring(i,16));
}
As mentioned in the comments, this ignores the last token if it has less than 16 characters. If you want it anyway you can write:
List<string> tokens = new List<string>();
for(int i=0; i<FileText.Length; i+=16) {
int len = Math.Min(16, FileText.Length-i));
tokens.Add(FileText.Substring(i,len);
}
Please try this method. I haven't tried it , but used it once.
int CharCount = FileText.Length;
int arrayhold = (CharCount/16)+2;
int count=0;
string[] array = new string[arrayhold];
for(int i=0; i<FileText.Length; i+=16)
{
int currentleft = FileText.Length-(16*count);
if(currentleft>16)
{
array[count]=FileText.Substring(i,16);
}
if(currentleft<16)
{
array[count]=FileText.Substring(i,currentleft);
}
count++;
}
This is the new code and provide a TRUE leftovers handling. Tested in ideone
Hope it works
Try this one:
var testArray = "sdfsdfjshdfalkjsdfhalsdkfjhalsdkfjhasldkjfhasldkfjhasdflkjhasdlfkjhasdlfkjhasdlfkjhasldfkjhalsjfdkhklahjsf";
var i = 0;
var query = from s in testArray
let num = i++
group s by num / 16 into g
select new {Value = new string(g.ToArray())};
var result = query.Select(x => x.Value).ToList();
result is List containing all the 16 char strings.

SubString editing

I've tried a few different methods and none of them work correctly so I'm just looking for someone to straight out show me how to do it . I want my application to read in a file based on an OpenFileDialog.
When the file is read in I want to go through it and and run this function which uses Linq to insert the data into my DB.
objSqlCommands.sqlCommandInsertorUpdate
However I want to go through the string , counting the number of ","'s found . when the number reaches four I want to only take the characters encountered until the next "," and do this until the end of the file .. can someone show me how to do this ?
Based on the answers given here my code now looks like this
string fileText = File.ReadAllText(ofd.FileName).Replace(Environment.NewLine, ",");
int counter = 0;
int idx = 0;
List<string> foo = new List<string>();
foreach (char c in fileText.ToArray())
{
idx++;
if (c == ',')
{
counter++;
}
if (counter == 4)
{
string x = fileText.Substring(idx);
foo.Add(fileText.Substring(idx, x.IndexOf(',')));
counter = 0;
}
}
foreach (string s in foo)
{
objSqlCommands.sqlCommandInsertorUpdate("INSERT", s);//laClient[0]);
}
However I am getting an "length cannot be less than 0" error on the foo.add function call , any ideas ?
A Somewhat hacky example. You would pass this the entire text from your file as a single string.
string str = "1,2,3,4,i am some text,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20";
int counter = 0;
int idx = 0;
List<string> foo = new List<string>();
foreach (char c in str.ToArray())
{
idx++;
if (c == ',')
{
counter++;
}
if (counter == 4)
{
string x = str.Substring(idx);
foo.Add(str.Substring(idx, x.IndexOf(',')));
counter = 0;
}
}
foreach(string s in foo)
{
Console.WriteLine(s);
}
Console.Read();
Prints:
i am some text
9
13
17
As Raidri indicates in his answer, String.Split is definitely your friend. To catch every fifth word, you could try something like this (not tested):
string fileText = File.ReadAllText(OpenDialog.FileName).Replace(Environment.NewLine, ",");
string words[] = fileText.Split(',');
List<string> everFifthWord = new List<string>();
for (int i = 4; i <= words.Length - 1, i + 5)
{
everyFifthWord.Add(words[i]);
}
The above code reads the selected file from the OpenFileDialog, then replaces every newline with a ",". Then it splits the string on ",", and starting with the fifth word takes every fifth word in the string and adds it to the list.
File.ReadAllText reads a text file to a string and Split turns that string into an array seperated at the commas:
File.ReadAllText(OpenDialog.FileName).Split(',')[4]
If you have more than one line use:
File.ReadAllLines(OpenDialog.FileName).Select(l => l.Split(',')[4])
This gives an IEnumerable<string> where each string contains the wanted part from one line of the file
It's not clear to me if you're after every fifth piece of text between the commas or if there are multiple lines and you want only the fifth piece of text on each line. So I've done both.
Every fifth piece of text:
var text = "1,2,3,4,i am some text,6,7,8,9"
+ ",10,11,12,13,14,15,16,17,18,19,20";
var everyFifth =
text
.Split(',')
.Where((x, n) => n % 5 == 4);
Only the fifth piece of text on each line:
var lines = new []
{
"1,2,3,4,i am some text,6,7",
"8,9,10,11,12,13,14,15",
"16,17,18,19,20",
};
var fifthOnEachLine =
lines
.Select(x => x.Split(',')[4]);

Getting parts of a string and combine them in C#?

I have a string like this: C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg
Now, what I want to do is to dynamically combine the last 4 numbers, in this case its 10000080 as result. My idea was ti split this and combine them in some way, is there an easier way? I cant rely on the array index, because the path can be longer or shorter as well.
Is there a nice way to do that?
Thanks :)
A compact way using string.Join and Regex.Split.
string text = #"C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg";
string newString = string.Join(null, Regex.Split(text, #"[^\d]")); //10000080
Use String.Split
String toSplit = "C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg";
String[] parts = toSplit.Split(new String[] { #"\" });
String result = String.Empty;
for (int i = 5, i > 1; i--)
{
result += parts[parts.Length - i];
}
// Gives the result 10000080
You can rely on array index if the last part always is the filename.
since the last part is always
array_name[array_name.length - 1]
the 4 parts before that can be found by
array_name[array_name.length - 2]
array_name[array_name.length - 3]
etc
If you always want to combine the last four numbers, split the string (use \ as the separator), start counting from the last part and take 4 numbers, or the 4 almost last parts.
If you want to take all the digits, just scan the string from start to finish and copy just the digits to a new string.
string input = "C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg";
string[] parts = toSplit.Split(new char[] {'\\'});
IEnumerable<string> reversed = parts.Reverse();
IEnumerable<string> selected = reversed.Skip(1).Take(4).Reverse();
string result = string.Concat(selected);
The idea is to extract the parts, reverse them to keep only the last 4 (excluding the file name) and re reversing to rollback to the initial order, then concat.
Using LINQ:
string path = #"C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg";
var parts = Path.GetDirectoryName(path).Split('\\');
string numbersPart = parts.Skip(parts.Count() - 4)
.Aggregate((acc, next) => acc + next);
Result: "10000080"
var r = new Regex(#"[^\d+]");
var match = r
.Split(#"C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg")
.Aggregate((i, j) => i + j);
return match.ToString();
to find the number you can use regex:
(([0-9]{2})\\){4}
use concat all inner Group ([0-9]{2}) to get your searched number.
This will always find your searched number in any position in the given string.
Sample Code:
static class TestClass {
static void Main(string[] args) {
string[] tests = { #"C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg",
#"C:\Projects\test\whatever\files\media\10\00\00\80\some\foldertest.jpg",
#"C:\10\00\00\80\test.jpg",
#"C:\10\00\00\80\test.jpg"};
foreach (string test in tests) {
int number = ExtractNumber(test);
Console.WriteLine(number);
}
Console.ReadLine();
}
static int ExtractNumber(string path) {
Match match = Regex.Match(path, #"(([0-9]{2})\\){4}");
if (!match.Success) {
throw new Exception("The string does not contain the defined Number");
}
//get second group that is where the number is
Group #group = match.Groups[2];
//now concat all captures
StringBuilder builder = new StringBuilder();
foreach (var capture in #group.Captures) {
builder.Append(capture);
}
//pares it as string and off we go!
return int.Parse(builder.ToString());
}
}

Categories

Resources