parse string value from a URL using c#

parse string value from a URL using c# - c#

I have a string "http://site1/site2/site3". I would like to get the value of "site2" out of the string. What is the best algorythm in C# to get the value. (no regex because it needs to be fast). I also need to make sure it doesn't throw any errors (just returns null).
I am thinking something like this:
currentURL = currentURL.ToLower().Replace("http://", "");
int idx1 = currentURL.IndexOf("/");
int idx2 = currentURL.IndexOf("/", idx1);
string secondlevelSite = currentURL.Substring(idx1, idx2 - idx1);

Assuming currentURL is a string
string result = new Uri(currentURL).Segments[1]
result = result.Substring(0, result.Length - 1);
Substring is needed because Segments[1] returns "site2/" instead of "site2"

Your example should be fast enough. If we really want to be nitpicky, then don't do the initial replace, because that will be at least an O(n) operation. Do a
int idx1 = currentURL.IndexOf("/", 8 /* or something */);
instead.
Thus you have two O(n) index look-ups that you optimized in the best possible way, and two O(1) operations with maybe a memcopy in the .NET's Substring(...) implementation... you can't go much faster with managed code.

currentURL = currentURL.ToLower().Replace("http://", "");
var arrayOfString = String.spilt(currentUrl.spit('/');

My assumption is you only need the second level. if there's no second level then it'll just return empty value.
string secondLevel = string.Empty;
try
{
string currentURL = "http://stackoverflow.com/questionsdgsgfgsgsfgdsggsg/3358184/parse-string-value-from-a-url-using-c".Replace("http://", string.Empty);
int secondLevelStartIndex = currentURL.IndexOf("/", currentURL.IndexOf("/", 0)) + 1;
secondLevel = currentURL.Substring(secondLevelStartIndex, (currentURL.IndexOf("/", secondLevelStartIndex) - secondLevelStartIndex));
}
catch
{
secondLevel = string.Empty;
}

Related

Replacing characters in a string with another string

So what I am trying to do is as follows :
example of a string is A4PC
I am trying to replace for example any occurance of "A" with "[A4]" so I would get and similar any occurance of "4" with "[A4]"
"[A4][A4]PC"
I tried doing a normal Replace on the string but found out I got
"[A[A4]]PC"
string badWordAllVariants =
restriction.Value.Replace("A", "[A4]").Replace("4", "[A4]")
since I have two A's in a row causing an issue.
So I was thinking it would be better rather than the replace on the string I need to do it on a character per character basis and then build up a string again.
Is there anyway in Linq or so to do something like this ?

You don't need any LINQ here - String.Replace works just fine:
string input = "AAPC";
string result = input.Replace("A", "[A4]"); // "[A4][A4]PC"
UPDATE: For your updated requirements I suggest to use regular expression replace
string input = "A4PC";
var result = Regex.Replace(input, "A|4", "[A4]"); // "[A4][A4]PC"

This works well for me:
string x = "AAPC";
string replace = x.Replace("A", "[A4]");
EDIT:
Based on the updated question, the issue is the second replacement. In order to replace multiple strings you will want to do this sequentially:
var original = "AAPC";
// add arbitrary room to allow for more new characters
StringBuilder resultString = new StringBuilder(original.Length + 10);
foreach (char currentChar in original.ToCharArray())
{
if (currentChar == 'A') resultString.Append("[A4]");
else if (currentChar == '4') resultString.Append("[A4]");
else resultString.Append(currentChar);
}
string result = resultString.ToString();
You can run this routine with any replacements you want to make (in this case the letters 'A' and '4' and it should work. If you would want to replace strings the code would be similar in structure but you would need to "look ahead" and probably use a for loop. Hopefully this helps!
By the way - you want to use a string builder here and not strings because strings are static which means space gets allocated every time you loop. (Not good!)

I think this should do the trick
string str = "AA4PC";
string result = Regex.Replace(str, #"(?<Before>[^A4]?)(?<Value>A|4)(?<After>[^A4]?)", (m) =>
{
string before = m.Groups["Before"].Value;
string after = m.Groups["After"].Value;
string value = m.Groups["Value"].Value;
if (before != "[" || after != "]")
{
return "[A4]";
}
return m.ToString();
});
It is going to replace A and 4 that hasn't been replaced yet for [A4].

Unable to Split the string accordingly

I know this question would have been asked infinite number of times, but I'm kinda stuck.
I have a string something like
"Doc1;Doc2;Doc3;12"
it can be something like
"Doc1;Doc2;Doc3;Doc4;Doc5;56"
Its like few pieces of strings separated by semicolon, followed by a number or id.
I need to extract the number/id and the strings separately.
To be exact, I can have 2 strings: one having "Doc1;Doc2;Doc3" or "Doc1;Doc2;Doc3;Doc4" and the other having just the number/id as "12" or "34" or "45" etc.
And yeah I am using C# 3.5
I understand its a pretty easy and witty question, but this guy is stuck.
Assistance required from experts.
Regards
Anurag

string.LastIndexOf and string.Substring are the keys to what you're trying to do.
var str = "Doc1;Doc2;Doc3;12";
var ind = str.LastIndexOf(';');
var str1 = str.Substring(0, ind);
var str2 = str.Substring(ind+1);

One way:
string[] tokens = str.Split(';');
var docs = tokens.Where(s => s.StartsWith("Doc", StringComparison.OrdinalIgnoreCase));
var numbers = tokens.Where(s => s.All(Char.IsDigit));

String docs = s.Substring(0, s.LastIndexOf(';'));
String number = s.Substring(s.LastIndexOf(';') + 1);

One possible approach would be this:
var ids = new List<string>();
var nums = new List<string>();
foreach (var s in input.Split(';'))
{
int val;
if (!int.TryParse(s, out val)) { ids.Add(s); }
else { nums.Add(s); }
}
where input is something like Doc1;Doc2;Doc3;Doc4;Doc5;56. Now, ids will house all of the Doc1 like values and nums will house all of the 56 like values.

you can use StringTokenizer functionality.
http://www.c-sharpcorner.com/UploadFile/pseabury/JavaLikeStringTokenizer11232005015829AM/JavaLikeStringTokenizer.aspx
split string using ";"
StringTokenizer st = new StringTokenizer(src1,";");
collect final String. that will be your ID.

You may try one of two options: (assuming your input string is in string str;
Approach 1
Get LastIndexOf(';')
Split the string based on the index. This will give you string and int part.
Split the string part and process it
Process the int part
Approach 2
Split the string on ;
Run a for loop - for (int i = 0; i < str.length - 2; i++) - this is the string part
Process str[length - 1] separately - this is the int part
Please take this as a starting point as there could be other approaches to implement a solution for this

string actual = "Doc1;Doc2;Doc3;12";
int lstindex = actual.LastIndexOf(';');
string strvalue = actual.Substring(0, lstindex);
string id = actual.Substring(lstindex + 1);

How to use replace only the first occurence of www

In my code behind in C# I have the following code. How do I change the replace so that only
the first occurance of www is replaced?
For example if the User enters www.testwww.com then I should be saving it as testwww.com.
Currently as per the below code it saves as www.com (guess due to substr code).
Please help. Thanks in advance.
private string FilterUrl(string url)
{
string lowerCaseUrl = url.ToLower();
lowerCaseUrl = lowerCaseUrl.Replace("http://", string.Empty).Replace("https://", string.Empty).Replace("ftp://", string.Empty);
lowerCaseUrl = lowerCaseUrl.Replace("www.", string.Empty);
string lCaseUrl = url.Substring(url.Length - lowerCaseUrl.Length, lowerCaseUrl.Length);
return lCaseUrl;
}

As Ally suggested. You are much better off using System.Uri. This also replaces the leading www as you wish.
private string FilterUrl(string url)
{
Uri uri = new UriBuilder(url).Uri; // defaults to http:// if missing
return Regex.Replace(uri.Host, "^www.", "") + uri.PathAndQuery;
}
Edit: The trailing slash is because of the PathAndQuery property. If there was no path you are left with the slash only. Just add another regex replace or string replace. Here's the regex way.
return Regex.Replace(uri.Host, "^www.", "") + Regex.Replace(uri.PathAndQuery, "/$", "");

I would suggest using indexOf(string) to find the first occurrence.
Edit: okay someone beat me to it ;)

You could use IndexOf like Felipe suggested OR do it the low tech way..
lowerCaseUrl = lowerCaseUrl.Replace("http://", string.Empty).Replace("https://", string.Empty).Replace("ftp://", string.Empty).Replace("http://www.", string.Empty).Replace("https://www.", string.Empty)
Would be interested to know what you're trying to achieve.

Came up with a cool static method, also works for replacing the first x occurrences:
public static string ReplaceOnce(this string s, string replace, string with)
{
return s.ReplaceCount(replace, with);
}
public static string ReplaceCount(this string s, string replace, string with, int howManytimes = 1)
{
if (howManytimes < 0) throw InvalidOperationException("can not replace a string less than zero times");
int count = 0;
while (s.Contains(replace) && count < howManytimes)
{
int position = s.IndexOf(replace);
s = s.Remove(position, replace.Length);
s = s.Insert(position, with);
count++;
}
return s;
}
The ReplaceOnce isn't necessary, just a simplifier. Call it like this:
string url = "http://www.stackoverflow.com/questions/www/www";
var urlR1 - url.ReplaceOnce("www", "xxx");
// urlR1 = "http://xxx.stackoverflow.com/questions/www/www";
var urlR2 - url.ReplaceCount("www", "xxx", 2);
// urlR2 = "http://xxx.stackoverflow.com/questions/xxx/www";
NOTE: this is case-sensitive as it is written

The Replace method will change all content of the string. You have to locate the piece you want to remove using IndexOf method, and remove using Remove method of string. Try something like this:
//include the namespace
using System.Globalization;
private string FilterUrl(string url)
{
// ccreate a Comparer object.
CompareInfo myCompare = CultureInfo.InvariantCulture.CompareInfo;
// find the 'www.' on the url parameter ignoring the case.
int position = myCompare.IndexOf(url, "www.", CompareOptions.IgnoreCase);
// check if exists 'www.' on the string.
if (position > -1)
{
if (position > 0)
url = url.Remove(position - 1, 5);
else
url = url.Remove(position, 5);
}
//if you want to remove http://, https://, ftp://.. keep this line
url = url.Replace("http://", string.Empty).Replace("https://", string.Empty).Replace("ftp://", string.Empty);
return url;
}
Edits
There was a part in your code that is removing a piece of string. If you just want to remove the 'www.' and 'http://', 'https://', 'ftp://', take a look the this code.
This code also ignore the case when it compares the url parameter and what you have been findind, on case, 'www.'.

Why isn't this C# code working? It should return the string between two other strings, but always returns an empty string

The title explains it all. It seems simple enough, so I must be overlooking something stupid. Here's what I've got.
private string getBetween(string strSource, string strStart, string strEnd)
{
int start, end;
if (strSource.Contains(strStart) && strSource.Contains(strEnd))
{
start = strSource.IndexOf(strStart, 0) + strStart.Length;
end = strSource.IndexOf(strEnd, start);
return strSource.Substring(start, end - start);
}
else
{
return "";
}
}
Thanks, guys.

Your code doesn't make sure that start and end are in order.
static string SubString(string source, string prefix, string suffix)
{
int start = source.IndexOf(prefix); // get position of prefix
if (start == -1)
return String.Empty;
int subStart = start + prefix.Length; // get position of substring
int end = source.IndexOf(suffix, subStart); // make sure suffix also exists
if (end == -1)
return String.Empty;
int subLength = end - subStart; // calculate length of substring
if (subLength == 0)
return String.Empty;
return source.Substring(subStart, subLength); // return substring
}

As couple of peoples said the problem that you code is working on very specific input, it's all because of this start and end IndexOf magic =) But when you try to update you code to work correct on more inputs you will get into problem that your code become very long with many indexes, comparsions, substrings, conditions and so on. To avoid this i like to recommend you use regular expressions with theirs help you can express what you need on special language.
Here is the sample which solves your problem with regular expressions:
public static string getBetween(string source, string before, string after)
{
var regExp = new Regex(string.Format("{0}(?<needle>[^{0}{1}]+){1}",before,after));
var matches = regExp.Matches(source).Cast<Match>(). //here we use LINQ to
OrderBy(m => m.Groups["needle"].Value.Length). //find shortest string
Select(m => m.Groups["needle"].Value); //you can use foreach loop instead
return matches.FirstOrDefault();
}
All tricky part is {0}(?<needle>[^{0}{1}]+){1} where 0 - before string and 1 - after string. This expression means that we nned to find string that lies beetween 0 and 1, and also don't contains 0 and 1.
Hope this helps.

I get the correct answer if I try any of these:
var a = getBetween("ABC", "A", "C");
var b = getBetween("TACOBBURRITO", "TACO", "BURRITO");
var c = getBetween("TACOBACONBURRITO", "TACO", "BURRITO");
The problem is likely with your input argument validation, as this fails:
var a = getBetween("ABC", "C", "A");
var a = getBetween("ABC", "C", "C");
You can improve your validation of the issue by writing some test cases like these as a separate fixture (xUnit, or main loop in throw away app).

Retrieving string from a source string which is between 2 strings

It might be very simple but would like to know that,is there any alternative to find a string between a source string which by passing it start and end string
the following is achievable by this code ,but this there any better code than this as i think this will slow the system if used in many conditions.
string strSource = "The LoadUserProfile call failed with the following error: ";
string strResult = string.Empty;
string strStart = "loaduserProfile";
string strEnd = "error";
int startindex = strSource.IndexOf(strStart, StringComparison.OrdinalIgnoreCase);
int endindex = strSource.LastIndexOf(strEnd, StringComparison.OrdinalIgnoreCase);
startindex = startindex + strStart.Length;
int endindex = endindex - startindex;
strResult = strSource.Substring(startindex, endindex);
Thanks
D.Mahesh

Use regex and find the group value, but not sure if it will be faster or slower.
Here is an example code to implement this using Regex (no VS, so excuse if there is syntax error)
string pattern = Regex.Escape(strStart) + "(?<middle>[\s\S]*)" + Regex.Escape(strEnd);
Match match = Regex.Match(strSource, pattern);
if (match.Success)
{
// read the group value matches the name "middle"
......
}

Your code is pretty spot-on string manipulation. I don't think it can be made faster algorithmically. You can also do this using a regular expression, but I don't believe it will end up being faster in that case as well.
If you don't need case insensitivity, changing StringComparison.OrdinalIgnoreCase to StringComparison.Ordinal should provide some speedup.
Otherwise, you probably have to look elsewhere for speed improvements.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

parse string value from a URL using c# - c#

Assuming currentURL is a string string result = new Uri(currentURL).Segments[1] result = result.Substring(0, result.Length - 1); Substring is needed because Segments[1] returns "site2/" instead of "site2"

currentURL = currentURL.ToLower().Replace("http://", ""); var arrayOfString = String.spilt(currentUrl.spit('/');

Related

Replacing characters in a string with another string

Unable to Split the string accordingly

How to use replace only the first occurence of www

Why isn't this C# code working? It should return the string between two other strings, but always returns an empty string

Retrieving string from a source string which is between 2 strings

Categories

Resources