split text in a text file - c#

How can I split a text file where I have various length of sentences inside and I want to read the text file when I click to button1 on my form and take, extract words from that text file that are between start and the end of ' character and which contains # symbol or # symbol inside the start and end of ' character and I want to know which line is it in and output the words into the text file.
Example, lets say I have a text like
abc'123'#def'456''#ghi'
abc'123'#def'#456''#ghi'123456'
output:
1st sentence #ghi
2nd sentence #456 #ghi
PS: #def is not in start and end of ' character so not in the output
I tied with split function but couldn't make it and turned into mass: ( How can I make this. I will be pleased if someone who knows helps.
Thanks.

here ur input string is s & the string contains # or # at first index is str
int start = s.indexOf("'");
int end = s.indexOf("'", start + 1);
string str = s.SubString(start, end);
if(str.ToCharArray()[0] == "#" || str.ToCharArray()[0] == "#")
// proceed

As far as this example is concerned here is a sample code that works
string sen1="abc'123'#def'456''#ghi'";
string sen2 = "abc'123'#def'#456''#ghi'123456'";
string[] NewSen = Regex.Split(sen1, "''");
string YourFirstOP=NewSen[1].ToString(); //gets #ghi
NewSen = Regex.Split(sen2, "''");
string[] A1 = Regex.Split(NewSen[0], "'");
string[] A2 = Regex.Split(NewSen[1], "'");
string YourSecondOP= A1[A1.Length - 1] + "" + A2[A2.Length - 3].ToString();// gets #456 #ghi
But thats just this example
Hope this helps

Try this,
string testString = #"abc'123'#def'456''#ghi'abc'123'#def'#456''#ghi'123456'";
List<string> output = new List<string>();
int startIndex = 0;
int endIndex = 0;
while (startIndex >= 0 && endIndex >= 0)
{
startIndex = testString.IndexOf("'", endIndex + 1);
endIndex = testString.IndexOf("'", startIndex + 1);
if (startIndex >= 0 && endIndex >= 0)
{
string str = testString.Substring(startIndex + 1, (endIndex - startIndex) - 1);
int indexOfSpecialChar = str.IndexOf("#");
if (indexOfSpecialChar < 0)
{
indexOfSpecialChar = str.IndexOf("#");
}
if (indexOfSpecialChar >= 0)
{
output.Add(str.Substring(indexOfSpecialChar));
}
}
}

string [] Mass = s.Split('\'');
if (Mass.Length > 1)
for (int i = 1; i < (Mass.Length - 1); i += 2)
{
if (Mass[i].Contains("#") || Mass[i].Contains("#"))
// proceed
}

Related

How to parse below string in C#?

Please someone to help me to parse these sample string below? I'm having difficulty to split the data and also the data need to add carriage return at the end of every event
sample string:
L,030216,182748,00,FF,I,00,030216,182749,00,FF,I,00,030216,182750,00,FF,I,00
batch of events
expected output:
L,030216,182748,00,FF,I,00 - 1st Event
L,030216,182749,00,FF,I,00 - 2nd Event
L,030216,182750,00,FF,I,00 - 3rd Event
Seems like an easy problem. Something as easy as this should do it:
string line = "L,030216,182748,00,FF,I,00,030216,182749,00,FF,I,00,030216,182750,00,FF,I,00";
string[] array = line.Split(',');
StringBuilder sb = new StringBuilder();
for(int i=0; i<array.Length-1;i+=6)
{
sb.AppendLine(string.Format("{0},{1} - {2} event",array[0],string.Join(",",array.Skip(i+1).Take(6)), "number"));
}
output (sb.ToString()):
L,030216,182748,00,FF,I,00 - number event
L,030216,182749,00,FF,I,00 - number event
L,030216,182750,00,FF,I,00 - number event
All you have to do is work on the function that increments the ordinals (1st, 2nd, etc), but that's easy to get.
This should do the trick, given there are no more L's inside your string, and the comma place is always the sixth starting from the beginning of the batch number.
class Program
{
static void Main(string[] args)
{
String batchOfevents = "L,030216,182748,00,FF,I,00,030216,182749,00,FF,I,00,030216,182750,00,FF,I,00,030216,182751,00,FF,I,00,030216,182752,00,FF,I,00,030216,182753,00,FF,I,00";
// take out the "L," to start processing by finding the index of the correct comma to slice.
batchOfevents = batchOfevents.Substring(2);
String output = "";
int index = 0;
int counter = 0;
while (GetNthIndex(batchOfevents, ',', 6) != -1)
{
counter++;
if (counter == 1){
index = GetNthIndex(batchOfevents, ',', 6);
output += "L, " + batchOfevents.Substring(0, index) + " - 1st event\n";
batchOfevents = batchOfevents.Substring(index + 1);
} else if (counter == 2) {
index = GetNthIndex(batchOfevents, ',', 6);
output += "L, " + batchOfevents.Substring(0, index) + " - 2nd event\n";
batchOfevents = batchOfevents.Substring(index + 1);
}
else if (counter == 3)
{
index = GetNthIndex(batchOfevents, ',', 6);
output += "L, " + batchOfevents.Substring(0, index) + " - 3rd event\n";
batchOfevents = batchOfevents.Substring(index + 1);
} else {
index = GetNthIndex(batchOfevents, ',', 6);
output += "L, " + batchOfevents.Substring(0, index) + " - " + counter + "th event\n";
batchOfevents = batchOfevents.Substring(index + 1);
}
}
output += "L, " + batchOfevents + " - " + (counter+1) + "th event\n";
Console.WriteLine(output);
}
public static int GetNthIndex(string s, char t, int n)
{
int count = 0;
for (int i = 0; i < s.Length; i++)
{
if (s[i] == t)
{
count++;
if (count == n)
{
return i;
}
}
}
return -1;
}
}
Now the output will be in the format you asked for, and the original string has been decomposed.
NOTE: the getNthIndex method was taken from this old post.
If you want to split the string into multiple strings, you need a set of rules,
which are implementable. In your case i would start splitting the complete
string by the given comma , and than go though the elements in a loop.
All the strings in the loop will be appended in a StringBuilder. If your ruleset
say you need a new line, just add it via yourBuilder.Append('\r\n') or use AppendLine.
EDIT
Using this method, you can also easily add new chars like L or at the end rd Event
Look for the start index of 00,FF,I,00 in the entire string.
Extract a sub string starting at 0 and index plus 10 which is the length of the characters in 1.
Loop through it again each time with a new start index where you left of in 2.
Add a new line character each time.
Have a try the following:
string stream = "L,030216,182748,00,FF,I,00, 030216,182749,00,FF,I,00, 030216,182750,00,FF,I,00";
string[] lines = SplitLines(stream, "L", "I", ",");
Here the SplitLines function is implemented to detect variable-length events within the arbitrary-formatted stream:
string stream = "A;030216;182748 ;00;FF;AA;01; 030216;182749;AA;02";
string[] lines = SplitLines(batch, "A", "AA", ";");
Split-rules are:
- all elements of input stream are separated by separator(, for example).
- each event is bounded by the special markers(L and I for example)
- end marker is previous element of event-sequence
static string[] SplitLines(string stream, string startSeq, string endLine, string separator) {
string[] elements = stream.Split(new string[] { separator }, StringSplitOptions.RemoveEmptyEntries);
int pos = 0;
List<string> line = new List<string>();
List<string> lines = new List<string>();
State state = State.SeqStart;
while(pos < elements.Length) {
string current = elements[pos].Trim();
switch(state) {
case State.SeqStart:
if(current == startSeq)
state = State.LineStart;
continue;
case State.LineStart:
if(++pos < elements.Length) {
line.Add(startSeq);
state = State.Line;
}
continue;
case State.Line:
if(current == endLine)
state = State.LineEnd;
else
line.Add(current);
pos++;
continue;
case State.LineEnd:
line.Add(endLine);
line.Add(current);
lines.Add(string.Join(separator, line));
line.Clear();
state = State.LineStart;
continue;
}
}
return lines.ToArray();
}
enum State { SeqStart, LineStart, Line, LineEnd };
f you want to split the string into multiple strings, you need a set of rules, which are implementable. In your case i would start splitting the complete string by the given comma , and than go though the elements in a loop. All the strings in the loop will be appended in a StringBuilder. If your ruleset say you need a new line, just add it via yourBuilder.Append('\r\n') or use AppendLine.

Unable to collect substring from a string

I am extracting a substring from a string that comes from a word file. But I am getting an error of index out of range even if the starting and ending index of substring is less then the length of the string.
for(int i=0;i<y.Length-1;i++)
{
if (Regex.IsMatch(y[i], #"^[A]"))
{
NumberOfWords= y[i].Split(' ').Length;
if (NumberOfWords > 5)
{
int le = y[i].Length;
int indA = y[i].IndexOf("A");
int indB = y[i].IndexOf("B");
int indC = y[i].IndexOf("C");
int indD = y[i].IndexOf("D");
//if (indD > 1 && indC > 1)
// breakop2 = breakop2 + '\n' + '\n' + y[i].Substring(indC, indD);
if (indC > 1 && indB > 1)
breakop1 = breakop1 + '\n' + y[i].Substring(indB, indC);
if (indB > 1)
sr = y[i].Substring(indA, indB);
else
sr = y[i];
breakop = breakop +'\n'+'\n'+ sr;
Acount++;
//textBox1.Text = s[i];
check1 = check1 + '\n' + '\n' + y[i];
//i++;
}
}
}
String.Substring(int, int) doesn't take a start index and an end index (as it does in Java); it takes a start index and a length. So perhaps you want:
sr = y[i].Substring(indA, indB - indA);
But you should also check that indB is greater than indA. (You need to work out how you want this to behave if B comes before A, basically.)
You'd also need to apply the same behaviour for the Substring(indB, indC).
The String.Substring method takes a starting index and a length. You are passing in two indices.

finding an string expression in an sentence

I have like a three word expression: "Shut The Door" and I want to find it in a sentence. Since They are kind of seperated by space what would be the best solution for it.
If you have the string:
string sample = "If you know what's good for you, you'll shut the door!";
And you want to find where it is in a sentence, you can use the IndexOf method.
int index = sample.IndexOf("shut the door");
// index will be 42
A non -1 answer means the string has been located. -1 means it does not exist in the string. Please note that the search string ("shut the door") is case sensitive.
Use build in Regex.Match Method for matching strings.
string text = "One car red car blue car";
string pat = #"(\w+)\s+(car)";
// Compile the regular expression.
Regex r = new Regex(pat, RegexOptions.IgnoreCase);
// Match the regular expression pattern against a text string.
Match m = r.Match(text);
int matchCount = 0;
while (m.Success)
{
Console.WriteLine("Match"+ (++matchCount));
for (int i = 1; i <= 2; i++)
{
Group g = m.Groups[i];
Console.WriteLine("Group"+i+"='" + g + "'");
CaptureCollection cc = g.Captures;
for (int j = 0; j < cc.Count; j++)
{
Capture c = cc[j];
System.Console.WriteLine("Capture"+j+"='" + c + "', Position="+c.Index);
}
}
m = m.NextMatch();
}
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.match(v=vs.71).aspx
http://support.microsoft.com/kb/308252
if (string1.indexOf(string2) >= 0)
...
The spaces are nothing special, they are just characters, so you can find a string like this like yuo would find any other string in your sentence, for example using "indexOf" if you need the position, or just "Contains" if you need to know if it exists or not.
E.g.
string sentence = "foo bar baz";
string phrase = "bar baz";
Console.WriteLine(sentence.Contains(phrase)); // True
Here is some C# code to find a substrings using a start string and end string point but you can use as a base and modify (i.e. remove need for end string) to just find your string...
2 versions, one to just find the first instance of a substring, other returns a dictionary of all starting positions of the substring and the actual string.
public Dictionary<int, string> GetSubstringDic(string start, string end, string source, bool includeStartEnd, bool caseInsensitive)
{
int startIndex = -1;
int endIndex = -1;
int length = -1;
int sourceLength = source.Length;
Dictionary<int, string> result = new Dictionary<int, string>();
try
{
//if just want to find string, case insensitive
if (caseInsensitive)
{
source = source.ToLower();
start = start.ToLower();
end = end.ToLower();
}
//does start string exist
startIndex = source.IndexOf(start);
if (startIndex != -1)
{
//start to check for each instance of matches for the length of the source string
while (startIndex < sourceLength && startIndex > -1)
{
//does end string exist?
endIndex = source.IndexOf(end, startIndex + 1);
if (endIndex != -1)
{
//if we want to get length of string including the start and end strings
if (includeStartEnd)
{
//make sure to include the end string
length = (endIndex + end.Length) - startIndex;
}
else
{
//change start index to not include the start string
startIndex = startIndex + start.Length;
length = endIndex - startIndex;
}
//add to dictionary
result.Add(startIndex, source.Substring(startIndex, length));
//move start position up
startIndex = source.IndexOf(start, endIndex + 1);
}
else
{
//no end so break out of while;
break;
}
}
}
}
catch (Exception ex)
{
//Notify of Error
result = new Dictionary<int, string>();
StringBuilder g_Error = new StringBuilder();
g_Error.AppendLine("GetSubstringDic: " + ex.Message.ToString());
g_Error.AppendLine(ex.StackTrace.ToString());
}
return result;
}
public string GetSubstring(string start, string end, string source, bool includeStartEnd, bool caseInsensitive)
{
int startIndex = -1;
int endIndex = -1;
int length = -1;
int sourceLength = source.Length;
string result = string.Empty;
try
{
if (caseInsensitive)
{
source = source.ToLower();
start = start.ToLower();
end = end.ToLower();
}
startIndex = source.IndexOf(start);
if (startIndex != -1)
{
endIndex = source.IndexOf(end, startIndex + 1);
if (endIndex != -1)
{
if (includeStartEnd)
{
length = (endIndex + end.Length) - startIndex;
}
else
{
startIndex = startIndex + start.Length;
length = endIndex - startIndex;
}
result = source.Substring(startIndex, length);
}
}
}
catch (Exception ex)
{
//Notify of Error
result = string.Empty;
StringBuilder g_Error = new StringBuilder();
g_Error.AppendLine("GetSubstring: " + ex.Message.ToString());
g_Error.AppendLine(ex.StackTrace.ToString());
}
return result;
}
You may want to make sure the check ignores the case of both phrases.
string theSentence = "I really want you to shut the door.";
string thePhrase = "Shut The Door";
bool phraseIsPresent = theSentence.ToUpper().Contains(thePhrase.ToUpper());
int phraseStartsAt = theSentence.IndexOf(
thePhrase,
StringComparison.InvariantCultureIgnoreCase);
Console.WriteLine("Is the phrase present? " + phraseIsPresent);
Console.WriteLine("The phrase starts at character: " + phraseStartsAt);
This outputs:
Is the phrase present? True
The phrase starts at character: 21

Find string in Haystack and display that particular paragraph where the string was found

I have an sql resultset which is retrieved after searching through the database using the LIKE keyword. I want to display the result on a page but without showing the whole text. Just the paragraph where the result was found. Maybe even put that particular word in bold. Anyone with an idea of how best I can implement this?
Get the text into a string.
Split on your paragraph character (line break?) - text.split('\n')
Iterate over each paragraph
Get the index(es) of your keyword - text.IndexOf("keyword")
Then perform some logic to cut number of characters at the start and end
Insert bold tag with for example a string replace - text = text.Replace("keyword", "<b>keyword</b>")
[Edit - added code sample]
public List<string> HighLightedParagraphs(string word, string text)
{
int charBeforeAndAfter = 100;
List<string> matchParagraphs = new List<string>();
Regex wordMatch = new Regex(#"\b" + word + #"\b", RegexOptions.IgnoreCase);
foreach (string paragraph in text.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries))
{
int startIdx = -1;
int length = -1;
foreach (Match match in wordMatch.Matches(paragraph))
{
int wordIdx = match.Index;
if (wordIdx >= startIdx && wordIdx <= startIdx + length) continue;
startIdx = wordIdx > charBeforeAndAfter ? wordIdx - charBeforeAndAfter : 0;
length = wordIdx + match.Length + charBeforeAndAfter < paragraph.Length
? match.Length + charBeforeAndAfter
: paragraph.Length - startIdx;
string extract = wordMatch.Replace(paragraph.Substring(startIdx, length), "<b>" + match.Value + "</b>");
matchParagraphs.Add("..." + extract + "...");
}
}
return matchParagraphs;
}

how to find a string pattern and print it from my text file using C#

i have a text file with string "abcdef"
I want to search for the string "abc" in my test file ... and print the next two character for abc ..here it is "de".
how i could accomplish ?
which class and function?
Try this:
string s = "abcde";
int index = s.IndexOf("abc");
if (index > -1 && index < s.Length - 4)
Console.WriteLine(s.SubString(index + 3, 2));
Update: tanascius noted a bug. I fixed it.
Read you file line by line an use something like:
string line = "";
if line.Contains("abc") {
// do
}
Or you could use regular expressions.
Match match = Regex.Match(line, "REGEXPRESSION_HERE");
In order to print all instances you can use the following code:
int index = 0;
while ( (index = s.IndexOf("abc", index)) != -1 )
{
Console.WriteLine(s.Substring(index + 3, 2));
}
This code assumes there will always be two characters after the string instance.
I think this is a more clear example:
// Find the full path of our document
System.IO.FileInfo ExecutableFileInfo = new System.IO.FileInfo(System.Reflection.Assembly.GetEntryAssembly().Location);
string path = System.IO.Path.Combine(ExecutableFileInfo.DirectoryName, "MyTextFile.txt");
// Read the content of the file
string content = String.Empty;
using (StreamReader reader = new StreamReader(path))
{
content = reader.ReadToEnd();
}
// Find the pattern "abc"
int index = -1; //First char index in the file is 0
index = content.IndexOf("abc");
// Outputs the next two caracters
// [!] We need to validate if we are at the end of the text
if ((index >= 0) && (index < content.Length - 4))
{
Console.WriteLine(content.Substring(index + 3, 2));
}
Note that this only works for the first coincidence. I dunno if you want to show all the coincidences.

Categories

Resources