How to make these 2 methods more Efficient [closed] - c#

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Hi guys this is my first question ever on SO so please go easy on me.
I am playing with Lambda/LINQ while building myself few utility methods.
First method takes string like,
"AdnanRazaBhatti"
and breaks it up like ,
"Adnan Raza Bhatti"
Second Methods takes string like first method and also takes,
out String[] brokenResults
and returns broken string like the first method as well as fill up brokenResults array as follows.
"Adnan" "Raza" "Bhatti"
Questions:
A. Can you please suggest how to make these methods more efficient?
B. When I try to use StringBuilder it tells me extension methods like, Where, Select does not exist for StringBuilder class, why is it so? Although indexer works on StringBuilder to get the characters like StringBuilder s = new StrinBuilder("Dang"); char c = s[0]; Here char will be D;
Code
Method 1:
public static string SplitCapital( string source )
{
string result = "";
int i = 0;
//Separate all the Capital Letter
var charUpper = source.Where( x => char.IsUpper( x ) ).ToArray<char>( );
//If there is only one Capital letter then it is already atomic.
if ( charUpper.Count( ) > 1 ) {
var strLower = source.Split( charUpper );
foreach ( string s in strLower )
if ( i < strLower.Count( ) - 1 && !String.IsNullOrEmpty( s ) )
result += charUpper.ElementAt( i++ ) + s + " ";
return result;
}
return source;
}
Method 2:
public static string SplitCapital( string source, out string[] brokenResults )
{
string result = "";
int i = 0;
var strUpper = source.Where( x => char.IsUpper( x ) ).ToArray<char>( );
if ( strUpper.Count( ) > 1 ) {
var strLower = source.Split( strUpper );
brokenResults = (
from s in strLower
where i < strLower.Count( ) - 1 && !String.IsNullOrEmpty( s )
select result = strUpper.ElementAt( i++ ) + s + " " ).ToArray( );
result = "";
foreach ( string s in brokenResults )
result += s;
return result;
}
else { brokenResults = new string[] { source }; }
return source;
}
Note:
I am planning to use these utility methods to break up the table column names I get from my database.
For Example if column name is "BooksId" I will break it up using one of these methods as "Books Id" programmatically, I know there are other ways or renaming the column names like in design window or [dataset].[tableName].HeadersRow.Cells[0].Text = "Books Id" but I am also planning to use this method somewhere else in the future.
Thanks

you can use the following extension methods to split your string based on Capital letters:
public static string Wordify(this string camelCaseWord)
{
/* CamelCaseWord will become Camel Case Word,
if the word is all upper, just return it*/
if (!Regex.IsMatch(camelCaseWord, "[a-z]"))
return camelCaseWord;
return string.Join(" ", Regex.Split(camelCaseWord, #"(?<!^)(?=[A-Z])"));
}
To split a string in a string array, you can use this:
public static string[] SplitOnVal(this string text,string value)
{
return text.Split(new[] { value }, StringSplitOptions.None);
}
If we take your example for consideration, the code will be as follows:
string strTest = "AdnanRazaBhatti";
var capitalCase = strTest.Wordify(); //Adnan Raza Bhatti
var brokenResults = capitalCase.SplitOnVal(" "); //seperate by a blank value in an array

Check this code
public static string SeperateCamelCase(this string value)
{
return Regex.Replace(value, "((?<=[a-z])[A-Z]|[A-Z](?=[a-z]))", " $1");
}
Hope this answer helps you. If you find solution kindly mark my answer and point it up.

Looks to me like regular expressions is the way to go.
I think [A-Z][a-z]+ might be a good one to start with.

Updated version. String builder was used to reduce memory utilization.
string SplitCapital(string str)
{
//Search all capital letters and store indexes
var indexes = str
.Select((c, i) => new { c = c, i = i }) // Select information about char and position
.Where(c => Char.IsUpper(c.c)) // Get only capital chars
.Select(cl => cl.i); // Get indexes of capital chars
// If no indexes found or if indicies count equal to the source string length then return source string
if (!indexes.Any() || indexes.Count() == str.Length)
{
return str;
}
// Create string builder from the source string
var sb = new StringBuilder(str);
// Reverse indexes and remove 0 if necessary
foreach (var index in indexes.Reverse().Where(i => i != 0))
{
// Insert spaces before capital letter
sb.Insert(index, ' ');
}
return sb.ToString();
}
string SplitCapital(string str, out string[] parts)
{
var splitted = SplitCapital(str);
parts = splitted.Split(new[] { ' ' }, StringSplitOptions.None);
return splitted;
}

Related

Don't split by escaped string - C# [duplicate]

This question already has answers here:
How can I Split(',') a string while ignore commas in between quotes?
(3 answers)
C# Regex Split - commas outside quotes
(7 answers)
Closed 5 years ago.
I need to split a csv file by comma apart from where the columns is between quote marks. However, what I have here does not seem to be achieving what I need and comma's in columns are being split into separate array items.
public List<string> GetData(string dataFile, int row)
{
try
{
var lines = File.ReadAllLines(dataFile).Select(a => a.Split(';'));
var csv = from line in lines select (from piece in line select piece.Split(',')).ToList();
var foo = csv.ToList();
var result = foo[row][0].ToList();
return result;
}
catch
{
return null;
}
}
private const string QUOTE = "\"";
private const string ESCAPED_QUOTE = "\"\"";
private static char[] CHARACTERS_THAT_MUST_BE_QUOTED = { ',', '"', '\n' };
public static string Escape(string s)
{
if (s.Contains(QUOTE))
s = s.Replace(QUOTE, ESCAPED_QUOTE);
if (s.IndexOfAny(CHARACTERS_THAT_MUST_BE_QUOTED) > -1)
s = QUOTE + s + QUOTE;
return s;
}
I am not sure where I can use my escape function in this case.
Example:
Degree,Graduate,08-Dec-17,Level 1,"Advanced, Maths"
The string Advanced, Maths are being split into two different array items which I don't want
You could use regex, linq or just loop through each character and use Booleans to figure out what the current behaviour should be. This question actually got me thinking, as I'd previously just looped through and acted on each character. Here is Linq way of breaking an entire csv document up, assuming the end of line can be found with ';':
private static void Main(string[] args)
{
string example = "\"Hello World, My name is Gumpy!\",20,male;My sister's name is Amy,29,female";
var result1 = example.Split(';')
.Select(s => s.Split('"')) // This will leave anything in abbreviation marks at odd numbers
.Select(sl => sl.Select((ss, index) => index % 2 == 0 ? ss.Split(',') : new string[] { ss })) // if it's an even number split by a comma
.Select(sl => sl.SelectMany(sc => sc));
Console.WriteLine("Press any key to continue.");
Console.ReadKey();
}
Not sure how this performes - but you can solve that with Linq.Aggregate like this:
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static IEnumerable<string> SplitIt(
char[] splitters,
string text,
StringSplitOptions opt = StringSplitOptions.None)
{
bool inside = false;
var result = text.Aggregate(new List<string>(), (acc, c) =>
{
// this will check each char of your given text
// and accumulate it in the (empty starting) string list
// your splitting chars will lead to a new item put into
// the list if they are not inside. inside starst as false
// and is flipped anytime it hits a "
// at end we either return all that was parsed or only those
// that are neither null nor "" depending on given opt's
if (!acc.Any()) // nothing in yet
{
if (c != '"' && (!splitters.Contains(c) || inside))
acc.Add("" + c);
else if (c == '"')
inside = !inside;
else if (!inside && splitters.Contains(c)) // ",bla"
acc.Add(null);
return acc;
}
if (c != '"' && (!splitters.Contains(c) || inside))
acc[acc.Count - 1] = (acc[acc.Count - 1] ?? "") + c;
else if (c == '"')
inside = !inside;
else if (!inside && splitters.Contains(c)) // ",bla"
acc.Add(null);
return acc;
}
);
if (opt == StringSplitOptions.RemoveEmptyEntries)
return result.Where(r => !string.IsNullOrEmpty(r));
return result;
}
public static void Main()
{
var s = ",,Degree,Graduate,08-Dec-17,Level 1,\"Advanced, Maths\",,";
var spl = SplitIt(new[]{','}, s);
var spl2 = SplitIt(new[]{','}, s, StringSplitOptions.RemoveEmptyEntries);
Console.WriteLine(string.Join("|", spl));
Console.WriteLine(string.Join("|", spl2));
}
}
Output:
|Degree|Graduate|08-Dec-17|Level 1|Advanced, Maths||
Degree|Graduate|08-Dec-17|Level 1|Advanced, Maths
The function gets comma separated fields within a string, excluding commas embedded in a quoted field
The assumptions
It should return empty fields ,,
There are no quotes within a quote field (as per the example)
The method
I uses a for loop with i as a place holder of the current field
It scans for the next comma or quote and if it finds a quote it scans for the next comma to create the field
It needed to be efficient otherwise we would use regex or Linq
The OP didn't want to use a CSV library
Note : There is no error checking, and scanning each character would be faster this was just easy to understand
Code
public List<string> GetFields(string line)
{
var list = new List<string>();
for (var i = 0; i < line.Length; i++)
{
var firstQuote = line.IndexOf('"', i);
var firstComma = line.IndexOf(',', i);
if (firstComma >= 0)
{
// first comma is before the first quote, then its just a standard field
if (firstComma < firstQuote || firstQuote == -1)
{
list.Add(line.Substring(i, firstComma - i));
i = firstComma;
continue;
}
// We have found quote so look for the next comma afterwards
var nextQuote = line.IndexOf('"', firstQuote + 1);
var nextComma = line.IndexOf(',', nextQuote + 1);
// if we found a comma, then we have found the end of this field
if (nextComma >= 0)
{
list.Add(line.Substring(i, nextComma - i));
i = nextComma;
continue;
}
}
list.Add(line.Substring(i)); // if were are here there are no more fields
break;
}
return list;
}
Tests 1
Degree,Graduate,08-Dec-17,Level 1,"Advanced, Maths",another
Degree
Graduate
08-Dec-17
Level 1
"Advanced, Maths"
another
Tests 2
,Degree,Graduate,08-Dec-17,\"asdasd\",Level 1,\"Advanced, Maths\",another
<Empty Line>
Degree
Graduate
08-Dec-17
"asdasd"
Level 1
"Advanced, Maths"
another

How to remove all but the first occurences of a character from a string?

I have a string of text and want to ensure that it contains at most one single occurrence of a specific character (,). Therefore I want to keep the first one, but simply remove all further occurrences of that character.
How could I do this the most elegant way using C#?
This works, but not the most elegant for sure :-)
string a = "12,34,56,789";
int pos = 1 + a.IndexOf(',');
return a.Substring(0, pos) + a.Substring(pos).Replace(",", string.Empty);
You could use a counter variable and a StringBuilder to create the new string efficiently:
var sb = new StringBuilder(text.Length);
int maxCount = 1;
int currentCount = 0;
char specialChar = ',';
foreach(char c in text)
if(c != specialChar || ++currentCount <= maxCount)
sb.Append(c);
text = sb.ToString();
This approach is not the shortest but it's efficient and you can specify the char-count to keep.
Here's a more "elegant" way using LINQ:
int commasFound = 0; int maxCommas = 1;
text = new string(text.Where(c => c != ',' || ++commasFound <= maxCommas).ToArray());
I don't like it because it requires to modify a variable from a query, so it's causing a side-effect.
Regular expressions are elegant, right?
Regex.Replace("Eats, shoots, and leaves.", #"(?<=,.*),", "");
This replaces every comma, as long as there is a comma before it, with nothing.
(Actually, it's probably not elegant - it may only be one line of code, but it may also be O(n^2)...)
If you don't deal with large strings and you reaaaaaaly like Linq oneliners:
public static string KeepFirstOccurence (this string #string, char #char)
{
var index = #string.IndexOf(#char);
return String.Concat(String.Concat(#string.TakeWhile(x => #string.IndexOf(x) < index + 1)), String.Concat(#string.SkipWhile(x=>#string.IndexOf(x) < index)).Replace(#char.ToString(), ""));
}
You could write a function like the following one that would split the string into two sections based on the location of what you were searching (via the String.Split() method) for and it would only remove matches from the second section (using String.Replace()) :
public static string RemoveAllButFirst(string s, string stuffToRemove)
{
// Check if the stuff to replace exists and if not, return the original string
var locationOfStuff = s.IndexOf(stuffToRemove);
if (locationOfStuff < 0)
{
return s;
}
// Calculate where to pull the first string from and then replace the rest of the string
var splitLocation = locationOfStuff + stuffToRemove.Length;
return s.Substring(0, splitLocation) + (s.Substring(splitLocation)).Replace(stuffToRemove,"");
}
You could simply call it by using :
var output = RemoveAllButFirst(input,",");
A prettier approach might actually involve building an extension method that handled this a bit more cleanly :
public static class StringExtensions
{
public static string RemoveAllButFirst(this string s, string stuffToRemove)
{
// Check if the stuff to replace exists and if not, return the
// original string
var locationOfStuff = s.IndexOf(stuffToRemove);
if (locationOfStuff < 0)
{
return s;
}
// Calculate where to pull the first string from and then replace the rest of the string
var splitLocation = locationOfStuff + stuffToRemove.Length;
return s.Substring(0, splitLocation) + (s.Substring(splitLocation)).Replace(stuffToRemove,"");
}
}
which would be called via :
var output = input.RemoveAllButFirst(",");
You can see a working example of it here.
static string KeepFirstOccurance(this string str, char c)
{
int charposition = str.IndexOf(c);
return str.Substring(0, charposition + 1) +
str.Substring(charposition, str.Length - charposition)
.Replace(c, ' ').Trim();
}
Pretty short with Linq; split string into chars, keep distinct set and join back to a string.
text = string.Join("", text.Select(c => c).Distinct());

Enabling partial name input to return values in c#

I have a c# web service that takes an String input and checks the input vs a text document full of Strings.
It works as follows, lets say I input "Australia" into the input, the service will return "Australia". However if I also input Aus (or aus, currently made it case insensitive) it should also return "Australia".
On the other hand if I input "tra", it shouldn't return Australia, only Strings that their first 3 indexes are "tra". (If it was Ch, it should return China, Chad... etc)
Currently my code looks like
public String countryCode(String input)
{
StringBuilder strings = new StringBuilder("", 10000);
String text = System.IO.File.ReadAllText(Server.MapPath("countryCodes.txt"));
String[] countries = Regex.Split(text, "#");
int v;
for (v = 0; v < countries.Length; v++)
{
if (countries[v].ToUpper().Contains(input) || countries[v].ToLower().Contains(input))
{
bool c = countries[v].ToUpper().Contains(input);
bool b = countries[v].ToLower().Contains(input);
if (b == true || c == true)
{
strings.Append(countries[v] + " ");
}
else
{
strings.Append("Country not found");
break;
}
}
}
String str = strings.ToString();
return str;
}
This is a start, but I am really having trouble comparing the indexes of strings.
My question is how can I construct something to check countries[v][0] vs input[0], if its the same, then check [1] and [1], and so on, until they aren't the same or input.Length is exceeded then return values appropriate?
Comment for clarifications if needed
Regards
I think your loop can be reduced to:
var valids = new List<String>();
foreach(String c in countries)
if(c.ToUpper().StartsWith(input.ToUpper()))
valids.Add(c);
return (valids.Any()) ? String.Join(",",valids) : "No Matches";
or LINQ:
var valids = countries.Select(c => c.ToUpper().StartsWith(input.ToUpper())).ToList();
return (valids.Any()) ? String.Join(",",valids) : "No Matches";

Find and replace text in a string using C#

Anyone know how I would find & replace text in a string? Basically I have two strings:
string firstS = "/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDABQODxIPDRQSERIXFhQYHzMhHxwcHz8tLyUzSkFOTUlBSEZSXHZkUldvWEZIZoxob3p9hIWET2ORm4+AmnaBhH//2wBDARYXFx8bHzwhITx/VEhUf39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f3//";
string secondS = "abcdefg2wBDABQODxIPDRQSERIXFh/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/abcdefg";
I want to search firstS to see if it contains any sequence of characters that's in secondS and then replace it. It also needs to be replaced with the number of replaced characters in squared brackets:
[NUMBER-OF-CHARACTERS-REPLACED]
For example, because firstS and secondS both contain "2wBDABQODxIPDRQSERIXFh" and "/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/" they would need to be replaced. So then firstS becomes:
string firstS = "/9j/4AAQSkZJRgABAQEAYABgAAD/[22]QYHzMhHxwcHz8tLyUzSkFOTUlBSEZSXHZkUldvWEZIZoxob3p9hIWET2ORm4+AmnaBhH//2wBDARYXFx8bHzwhITx/VEhUf39[61]f3//";
Hope that makes sense. I think I could do this with Regex, but I don't like the inefficiency of it. Does anyone know of another, faster way?
Does anyone know of another, faster way?
Yes, this problem actually has a proper name. It is called the Longest Common Substring, and it has a reasonably fast solution.
Here is an implementation on ideone. It finds and replaces all common substrings of ten characters or longer.
// This comes straight from Wikipedia article linked above:
private static string FindLcs(string s, string t) {
var L = new int[s.Length, t.Length];
var z = 0;
var ret = new StringBuilder();
for (var i = 0 ; i != s.Length ; i++) {
for (var j = 0 ; j != t.Length ; j++) {
if (s[i] == t[j]) {
if (i == 0 || j == 0) {
L[i,j] = 1;
} else {
L[i,j] = L[i-1,j-1] + 1;
}
if (L[i,j] > z) {
z = L[i,j];
ret = new StringBuilder();
}
if (L[i,j] == z) {
ret.Append(s.Substring( i-z+1, z));
}
} else {
L[i,j]=0;
}
}
}
return ret.ToString();
}
// With the LCS in hand, building the answer is easy
public static string CutLcs(string s, string t) {
for (;;) {
var lcs = FindLcs(s, t);
if (lcs.Length < 10) break;
s = s.Replace(lcs, string.Format("[{0}]", lcs.Length));
}
return s;
}
You need to be very careful between "Longest common substring and "longest common subsequence"
For Substring: http://en.wikipedia.org/wiki/Longest_common_substring_problem
For SubSequence: http://en.wikipedia.org/wiki/Longest_common_subsequence_problem
I would suggest you to also see few videos on youtube on these two topics
http://www.youtube.com/results?search_query=longest+common+substring&oq=longest+common+substring&gs_l=youtube.3..0.3834.10362.0.10546.28.17.2.9.9.2.225.1425.11j3j3.17.0...0.0...1ac.lSrzx8rr1kQ
http://www.youtube.com/results?search_query=longest+common+subsequence&oq=longest+common+s&gs_l=youtube.3.0.0l6.2968.7905.0.9132.20.14.2.4.4.0.224.2038.5j2j7.14.0...0.0...1ac.4CYZ1x50zpc
you can find c# implementation of longest common subsequence here:
http://www.alexandre-gomes.com/?p=177
http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_subsequence
I have a similar issue, but for word occurrences! so, I hope this can help. I used SortedDictionary and a binary search tree
/* Application counts the number of occurrences of each word in a string
and stores them in a generic sorted dictionary. */
using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;
public class SortedDictionaryTest
{
public static void Main( string[] args )
{
// create sorted dictionary
SortedDictionary< string, int > dictionary = CollectWords();
// display sorted dictionary content
DisplayDictionary( dictionary );
}
// create sorted dictionary
private static SortedDictionary< string, int > CollectWords()
{
// create a new sorted dictionary
SortedDictionary< string, int > dictionary =
new SortedDictionary< string, int >();
Console.WriteLine( "Enter a string: " ); // prompt for user input
string input = Console.ReadLine();
// split input text into tokens
string[] words = Regex.Split( input, #"\s+" );
// processing input words
foreach ( var word in words )
{
string wordKey = word.ToLower(); // get word in lowercase
// if the dictionary contains the word
if ( dictionary.ContainsKey( wordKey ) )
{
++dictionary[ wordKey ];
}
else
// add new word with a count of 1 to the dictionary
dictionary.Add( wordKey, 1 );
}
return dictionary;
}
// display dictionary content
private static void DisplayDictionary< K, V >(
SortedDictionary< K, V > dictionary )
{
Console.WriteLine( "\nSorted dictionary contains:\n{0,-12}{1,-12}",
"Key:", "Value:" );
/* generate output for each key in the sorted dictionary
by iterating through the Keys property with a foreach statement*/
foreach ( K key in dictionary.Keys )
Console.WriteLine( "{0,- 12}{1,-12}", key, dictionary[ key ] );
Console.WriteLine( "\nsize: {0}", dictionary.Count );
}
}
This is probably dog slow, but if you're willing to incur some technical debt and need something now for prototyping, you could use LINQ.
string firstS = "123abc";
string secondS = "456cdeabc123";
int minLength = 3;
var result =
from subStrCount in Enumerable.Range(0, firstS.Length)
where firstS.Length - subStrCount >= 3
let subStr = firstS.Substring(subStrCount, 3)
where secondS.Contains(subStr)
select secondS.Replace(subStr, "[" + subStr.Length + "]");
Results in
456cdeabc[3]
456cde[3]123

C# fix sentence

I need to take a sentence in that is all on one line with no spaces and each new word has a captial letter EX. "StopAndSmellTheRoses" and then convert it to "Stop and smell the roses" This is my function that I have but I keep getting an argument out of range error on the insert method. Thanks for any help in advance.
private void FixSentence()
{
// String to hold our sentence in trim at same time
string sentence = txtSentence.Text.Trim();
// loop through the string
for (int i = 0; i < sentence.Length; i++)
{
if (char.IsUpper(sentence, i) & sentence[i] != 0)
{
// Change to lowercase
char.ToLower(sentence[i]);
// Insert space behind the character
// This is where I get my error
sentence = sentence.Insert(i-1, " ");
}
}
// Show our Fixed Sentence
lblFixed.Text = "";
lblFixed.Text = "Fixed: " + sentence;
}
The best way to build up a String in this manner is to use a StringBuilder instance.
var sentence = txtSentence.Text.Trim();
var builder = new StringBuilder();
foreach (var cur in sentence) {
if (Char.IsUpper(cur) && builder.Length != 0) {
builder.Append(' ');
}
builder.Append(cur);
}
// Show our Fixed Sentence
lblFixed.Text = "";
lblFixed.Text = "Fixed: " + builder.ToString();
Using the Insert method creates a new string instance every time resulting in a lot of needlessly allocated values. The StringBuilder though won't actually allocate a String until you call the ToString method.
You can't modify the sentence variable in the loop that is going through it.
Instead, you need to have a second string variable that you append all of the found words.
Here is the answer
var finalstr = Regex.Replace(
"StopAndSmellTheRoses",
"(?<=[a-z])(?<x>[A-Z])|(?<=.)(?<x>[A-Z])(?=[a-z])|(?<=[^0-9])(?<x>[0-9])(?=.)",
me => " " + me.Value.ToLower()
);
will output
Stop and smell the roses
Another version:
public static class StringExtensions
{
public static string FixSentence(this string instance)
{
char[] capitals = Enumerable.Range(65, 26).Select(x => (char)x).ToArray();
string[] words = instance.Split(capitals);
string result = string.Join(' ', words);
return char.ToUpper(result[0]) + result.Substring(1).ToLower();
}
}

Categories

Resources