c# Split sentence

c# Split sentence - c#

Is it possible to split this combined words into two?
ex: "Firstname" to
"First"
"Name"
I have a bunch of properties eg FirstName,LastName etc. and I need to display this on my page. Thats why I need to separate this property name to display into more appropriate way.

Your aim is fuzzy.
If properties alway have Uppercase letter, you can find positions of all uppercase letters in the word and devide it by that positions.
If uppercase letters is not guaranteed, the best way would be to create transform table. The table would be define pairs of initial property name and resulting text. In this way you will have simple map for transormation

Edit: OP specified that he needs to split property names
If you follow CamelCase naming convention for properties (i.e. "FirstName" instead of "Firstname"), you can split the words by upper case characters quite easily.
string[] SplitByCaps(string input)
{
StringBuilder output = new StringBuilder();
for (int i = 0; i < input.Length; i++)
{
char c = input[i];
if (i > 0 && Char.IsUpper(c))
output.Append(' ');
output.Append(c);
}
return output.ToString().Split(' ');
}
Orinal answer:
I would say, for practical purposes, it's not possible to do this for any arbitrary string.
Of course it is possible to write a program to do this, but whatever your actual needs are, that program would be overkill. There might also be libraries that already do this, but they would be so heavy that you wouldn't want to take a dependency on them.
Any program which could achieve this would have to have know all words in the English language (let's not even consider multilanguage solutions). You would also require an intelligent lexical parser, because for any word, there might be more than one possible way to split it.
I suggest you look into some other way to solve your particular problem.

Unless you have a dictionary of all 'single' words the only solution I can think of is to split on upper letters:
FirstName -> First Name
The problem will still exist for UIFilter -> UI Filter.

You can use substring to get the first 5 characters from the string. Then replace the first five characters in original string to blank.
string str = "Firstname";
string firstPart = str.Substring(0,5); // "First"
string secondPart = str.replace(firstPart,""); // "name"
If you want to make it generic for any word to be split, then you need to have some definite criteria on which you can divide the word into parts. Without definite criteria, it is not possible to split the string as expected by you.

Related

break multiple string into each

iam just curious, is there a way to break multiple string in cell gridview and store it or display it one by one.
earlier when i messagebox.show it would diplay the whole name or number like
abdullah ali ashonie; adefitri; candry. so what i want is, it display one by one abdullah ali shonie then adefitri then candry and how to store it
sorry for bad english, because i dont quite sure you guys know what i want

The simple way is String.Split():
var parts = GridView1.Rows[0].Cells[0].Text.Split(";".ToCharArray())
Just be warned: String.Split() has all kinds of pitfalls and gotchas. If you can't put meaningful constraints on the possible values — be absolutely certain you won't find things like new-lines or other semi-colon(;) characters as part of individual names, have quoted text, etc — you should really look into a dedicated delimited text parser. There are three (at least) built into the .Net Framework (see TextFieldParser as one option), and a plethora more on NuGet.

Look at String.Split
Returns a string array that contains the substrings in this instance that are delimited by elements of a specified string or Unicode character array.
For example:
string text = "abdullah ali ashonie; adefitri; candry";
string[] names = text.Split(';');
foreach (string name in names)
{
System.Console.WriteLine(name);
}
Outputs:
abdullah ali ashonie
adefitri
candry
There is some more information here too

I'm not 100% sure I completely understand what you're trying to do, but this is a basic string split example:
string input = "abdullah ali ashonie; adefitri; candry";
string[] pieces = input.Split(';');
foreach (var s in pieces) {
Console.WriteLine(s.Trim());
}
Fiddle here.

Reading in a text file more 'intelligently'

I have a text file which contains a list of alphabetically organized variables with their variable numbers next to them formatted something like follows:
aabcdef 208
abcdefghijk 1191
bcdefga 7
cdefgab 12
defgab 100
efgabcd 999
fgabc 86
gabcdef 9
h 11
ijk 80
...
...
I would like to read each text as a string and keep it's designated id# something like read "aabcdef" and store it into an array at spot 208.
The 2 issues I'm running into are:
I've never read from file in C#, is there a way to read, say from
start of line to whitespace as a string? and then the next string as
an int until the end of line?
given the nature and size of these files I do not know the highest ID value of each file (not all numbers are used so some
files could house a number like 3000, but only actually list 200
variables) So how could I make a flexible way to store these
variables when I don't know how big the array/list/stack/etc.. would
need to be.

Basically you need a Dictionary instead of an array or list. You can read all lines with File.ReadLines method then split each of them based on space and \t (tab), like this:
var values = File.ReadLines("path")
.Select(line => line.Split(new [] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries))
.ToDictionary(parts => int.Parse(parts[1]), parts => parts[0]);
Then values[208] will give you aabcdef. It looks like an array doesn't it :)
Also make sure you have no duplicate numbers because Dictionary keys should be unique otherwise you will get an exception.

I've been thinking about how I would improve other answers and I've found this alternative solution based on Regex which makes the search into the whole string (either coming from a file or not) safer.
Check that you can alter the whole regular expression to include other separators. Sample expression will detect spaces and tabs.
At the end of the day, I found that MatchCollection returns a safer result, since you always know that 3rd group is an integer and 2nd group is a text because regular expression does a lot of checking for you!
StringBuilder builder = new StringBuilder();
builder.AppendLine("djdodjodo\t\t3893983");
builder.AppendLine("dddfddffd\t\t233");
builder.AppendLine("djdodjodo\t\t39838");
builder.AppendLine("djdodjodo\t\t12");
builder.AppendLine("djdodjodo\t\t444");
builder.AppendLine("djdodjodo\t\t5683");
builder.Append("djdodjodo\t\t33");
// Replace this line with calling File.ReadAllText to read a file!
string text = builder.ToString();
MatchCollection matches = Regex.Matches(text, #"([^\s^\t]+)(?:[\s\t])+([0-9]+)", RegexOptions.IgnoreCase | RegexOptions.Multiline);
// Here's the magic: we convert an IEnumerable<Match> into a dictionary!
// Check that using regexps, int.Parse should never fail because
// it matched numbers only!
IDictionary<int, string> lines = matches.Cast<Match>()
.ToDictionary(match => int.Parse(match.Groups[2].Value), match => match.Groups[1].Value);
// Now you can access your lines as follows:
string value = lines[33]; // <-- By value
Update:
As we discussed in chat, this solution wasn't working in some actual use case you showed me, but it's not the approach what's not working but your particular case, because keys are "[something].[something]" (for example: address.Name).
I've changed given regular expression to ([\w\.]+)[\s\t]+([0-9]+) so it covers the case of key having a dot.
It's about improving the matching regular expression to fit your requirements! ;)
Update 2:
Since you told me that you need keys having any character, I've changed the regular expression to ([^\s^\t]+)(?:[\s\t])+([0-9]+).
Now it means that key is anything excepting spaces and tabs.
Update 3:
Also I see you're stuck in .NET 3.0 and ToDictionary was introduced in .NET 3.5. If you want to get the same approach in .NET 3.0, replace ToDictionary(...) with:
Dictionary<int, string> lines = new Dictionary<int, string>();
foreach(Match match in matches)
{
lines.Add(int.Parse(match.Groups[2].Value), match.Groups[1].Value);
}

validate excel worksheet name

I'm getting the below error when setting the worksheet name dynamically. Does anyone has regexp to validate the name before setting it ?
The name that you type does not exceed 31 characters. The name does
not contain any of the following characters: : \ / ? * [ or ]
You did not leave the name blank.

You can use the method to check if the sheet name is valid
private bool IsSheetNameValid(string sheetName)
{
if (string.IsNullOrEmpty(sheetName))
{
return false;
}
if (sheetName.Length > 31)
{
return false;
}
char[] invalidChars = new char[] {':', '\\', '/', '?', '*', '[', ']'};
if (invalidChars.Any(sheetName.Contains))
{
return false;
}
return true;
}

To do worksheet validation for those specified invalid characters using Regex, you can use something like this:
string wsName = #"worksheetName"; //verbatim string to take special characters literally
Match m = Regex.Match(wsName, #"[\[/\?\]\*]");
bool nameIsValid = (m.Success || (string.IsNullOrEmpty(wsName)) || (wsName.Length > 31)) ? false : true;
This also includes a check to see if the worksheet name is null or empty, or if it's greater than 31. Those two checks aren't done via Regex for the sake of simplicity and to avoid over engineering this problem.

Let's match the start of the string, then between 1 and 31 things that aren't on the forbidden list, then the end of the string. Requiring at least one means we refuse empty strings:
^[^\/\\\?\*\[\]]{1,31}$
There's at least one nuance that this regex will miss: this will accept a sequence of spaces, tabs and newlines, which will be a problem if that is considered to be blank (as it probably is).
If you take the length check out of the regex, then you can get the blankness check by doing something like:
^[^\/\\\?\*\[\]]*[^ \t\/\\\?\*\[\]][^\/\\\?\*\[\]]*$
How does that work? If we defined our class above as WORKSHEET, that would be:
^[^WORKSHEET]*[^\sWORKSHEET][^WORKSHEET]*$
So we match one or more non-forbidden characters, then a character that is neither forbidden nor whitespace, then zero or more non-forbidden characters. The key is that we demand at least one non-whitespace character in the middle section.
But we've lost the length check. It's hard to do both the length check and the regex in one expression. In order to count, we have to phrase things in terms of matching n times, and the things being matched have to be known to be of length 1. But in order to allow whitespace to be placed freely - as long as it's not all whitespace - we need to have a part of the match that is not necessarily of length 1.
Well, that's not quite true. At this point this starts to become a really bad idea, but nevertheless: onwards, into the breach! (for educational purposes only)
Instead of using * for the possibly-blank sections, we can specify the number we expect of each, and include all the possible ways for those three sections to add up to 31. How many ways are there for two numbers to add up to 30? Well, there's 30 of them. 0+30, 1+29, 2+28, ... 30+0:
^[^WORKSHEET]{0}[^\sWORKSHEET][^WORKSHEET]{30}$
|^[^WORKSHEET]{1}[^\sWORKSHEET][^WORKSHEET]{29}$
|^[^WORKSHEET]{2}[^\sWORKSHEET][^WORKSHEET]{28}$
....
|^[^WORKSHEET]{30}[^\sWORKSHEET][^WORKSHEET]{0}$
Obviously if this was a good idea, you'd write a program that expression rather than specifying it all by hand (and getting something wrong). But I don't think I need to tell you it's not a good idea. It is, however, the only answer I have to your question.
While admittedly not actually answering your question, I think #HatSoft has the right approach, encoding the conditions directly and clearly. After all, I'm now satisfied that an answer to your question as asked is not actually a helpful thing.

You might want to do a check for the name History as this is a reserved sheet name in Excel.

Something like that?
public string validate(string name)
{
foreach (char c in Path.GetInvalidFileNameChars())
name = name.Replace(c.ToString(), "");
if (name.Length > 31)
name = name.Substring(0, 31);
return name;
}

Splitting a string in C#

Let's say I have this string:
"param1,r:1234,p:myparameters=1,2,3"
...and I would like to split it into:
param1
r:1234
p:myparameters=1,2,3
I've used the split function and of course it splits it at every comma. Is there a way to do this using regex or will I have to write my own split function?

Personally, I would try something like this:
,(?=[^,]+:.*?)
Basically, use a positive look-ahead to find a comma, followed by a "key-value" pair (this defined by a key, a colon, and more information [data] (including other commas). This should disqualify the commas between the numbers, too.

You can use ; for separating values which makes easy to work with it.
Since you have , for separation and also for values it is difficult to split it.
You have
string str = "param1,r:1234,p:myparameters=1,2,3"
Recommended to use
string str = "param1;r:1234;p:myparameters=1,2,3"
which can be splited as
var strArray = str.Split(';');
strArray[0]; // contains param1
strArray[1]; // r:1234
strArray[2]; // p:myparameters=1,2,3

I'm not sure how you would write a split that knew which commas to split on there, honestly.
Unless it's a fixed number each time in which case, just use the String.Split overload that takes an int specifying how many substrings to return at max
If you're going to have comma-delimited data that's not always a fixed number of items and it could have literal commas in the data itself, they really should be quoted. If you can control the input in any way, you should encourage that, and use an actual CSV parser instead of String.Split

That depends. You can't parse it with regex (or anything else) unless you can identify a consistent rule separating one group from another. Based on your sample, I can't clearly identify such a rule (though I have some guesses). How does the system know that p:myparameters=1,2,3 is a single item? For example, if there were another item after it, what would be the difference between that and the 1,2,3? Figure that out and you'll be pretty close to a solution.
If you're able to change the format of the input string, why not decide on a consistent delimiter between your groups? ; would be a good choice. Use an input like param1;r:1234;p:myparameters=1,2,3 and there will be no ambiguity where the groups are, plus you can just split on ; and you won't need regex.

The simplest approach would be changing your delimiter from "," to something like "|". Then you can split on "|" no problem. However if you can't change the delimiting character then maybe you could encode the sections in a fashion similar to CSV.
CSV files have the same issue... the standard there is to put double quotes "" around columns.
For example, your string would be "param1","r:1234","p:myparameters=1,2,3".
Then you could use the Microsoft.VisualBasic.FileIO.TextFieldParser to split/parse. You can include this in c# even though its in the VisualBasic namespace.
TextFieldParser

Do you mean that:string[] str = System.Text.RegularExpression.Regex.Spilt("param1,r:1234,p:myparameters=1,2,3",#"\,");

How to display word differences using c#?

I would like to show the differences between two blocks of text. Rather than comparing lines of text or individual characters, I would like to just compare words separated by specified characters ('\n', ' ', '\t' for example). My main reasoning for this is that the block of text that I'll be comparing generally doesn't have many line breaks in it and letter comparisons can be hard to follow.
I've come across the following O(ND) logic in C# for comparing lines and characters, but I'm sort of at a loss for how to modify it to compare words.
In addition, I would like to keep track of the separators between words and make sure they're included with the diff. So if space is replaced by a hard return, I would like that to come up as a diff.
I'm using Asp.net to display the entire block of text including the deleted original text and added new text (both will be highlighted to show that they were deleted/added). A solution that works with those technologies would be appreciated.
Any advice on how to accomplish this is appreciated?
Thanks!

Microsoft has released a diff project on CodePlex that allows you to do word, character, and line diffs. It is licensed under Microsoft Public License (Ms-PL).
https://github.com/mmanela/diffplex

Other than a few general optimizations, if you need to include the separators in the comparison you are essentially doing a character by character comparison with breaks. Though you could use the O(ND) you linked, you are going to make as many changes to it as you would basically writing your own.
The main problem with difference comparison is finding the continuation (if I delete a single word, but leave the rest the same).
If you want to use their code start with the example and do not write the deleted characters, if there are replaced characters in the same place, do not output this result. You then need to compute the longest continuous run of "changed" words, highlight this string and output.
Sorry thats not much of an answer, but for this problem the answer is basically writing and tuning the function.

Well String.Split with '\n', ' ' and '\t' as the split characters will return you an array of words in your block of text.
You could then compare each array for differences. A simple 1:1 comparison would tell you if any word had been changed. Comparing:
hello world how are you
and:
hello there how are you
would give you that world and changed to there.
What it wouldn't tell you was if words had been inserted or removed and you would still need to parse the text blocks character by character to see if any of the separator characters had been changed.

string string1 = "hello world how are you";
string string2 = "hello there how are you";
var first = string1.Split(' ');
var second = string2.Split(' ');
var primary = first.Length > second.Length ? first : second;
var secondary = primary == second ? first : second;
var difference = primary.Except(secondary).ToArray();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

c# Split sentence - c#

Is it possible to split this combined words into two? ex: "Firstname" to "First" "Name" I have a bunch of properties eg FirstName,LastName etc. and I need to display this on my page. Thats why I need to separate this property name to display into more appropriate way.

Unless you have a dictionary of all 'single' words the only solution I can think of is to split on upper letters: FirstName -> First Name The problem will still exist for UIFilter -> UI Filter.

Related

break multiple string into each

Reading in a text file more 'intelligently'

validate excel worksheet name

Splitting a string in C#

How to display word differences using c#?

Categories

Resources