How to decode "\u0026" in a URL? - c#

I want decode URL A to B:
A) http:\/\/example.com\/xyz?params=id%2Cexpire\u0026abc=123
B) http://example.com/xyz?params=id,expire&abc=123
This is a sample URL and I look for a general solution not A.Replace("\/", "/")...
Currently I use HttpUtility.UrlDecode(A, Encoding.UTF8) and other Encodings but cannot generate URL B !

You only need this function
System.Text.RegularExpressions.Regex.Unescape(str);

This is a basic example I was able to come up with:
static void Sample()
{
var str = #"http:\/\/example.com\/xyz?params=id%2Cexpire\u0026abc=123";
str = str.Replace("\\/", "/");
str = HttpUtility.UrlDecode(str);
str = Regex.Replace(str, #"\\u(?<code>\d{4})", CharMatch);
Console.Out.WriteLine("value = {0}", str);
}
private static string CharMatch(Match match)
{
var code = match.Groups["code"].Value;
int value = Convert.ToInt32(code, 16);
return ((char) value).ToString();
}
This is probably missing a lot depending on the types of URLs you are going to get. It doesn't handle error checking, escaping of literals, like \\u0026 should be \u0026. I'd recommend writing a few unit tests around this with various inputs to get started.

Related

Remove parts of string

I have the following string
string a = #"\\server\MainDirectory\SubDirectoryA\SubdirectoryB\SubdirectoryC\Test.jpg";
I'm trying to remove part of the string so in the end I want to be left with
string a = #"\\server\MainDirectory\SubDirectoryA\SubdirectoryB";
So currently I'm doing
string b = a.Remove(a.LastIndexOf('\\'));
string c = b.Remove(b.LastIndexOf('\\'));
Console.WriteLine(c);
which gives me the correct result. I was wondering if there is a better way of doing this? because I'm having to do this in a fair few places.
Note: the SubdirectoryC length will be unknown. As it is made of the numbers/letters a user inputs
There is Path.GetDirectoryName
string a = #"\\server\MainDirectory\SubDirectoryA\SubdirectoryB\SubdirectoryC\Test.jpg";
string b = Path.GetDirectoryName(Path.GetDirectoryName(a));
As explained in MSDN it works also if you pass a directory
....passing the returned path back into the GetDirectoryName method will
result in the truncation of one folder level per subsequent call on
the result string
Of course this is safe if you have at least two directories level
Heyho,
if you just want to get rid of the last part.
You can use :
var parentDirectory = Directory.GetParent(Path.GetDirectoryName(path));
https://msdn.microsoft.com/de-de/library/system.io.directory.getparent(v=vs.110).aspx
An alternative answer using Linq:
var b = string.Join("\\", a.Split(new string[] { "\\" }, StringSplitOptions.None)
.Reverse().Skip(2).Reverse());
Some alternatives
string a = #"\\server\MainDirectory\SubDirectoryA\SubdirectoryB\SubdirectoryC\Test.jpg";
var b = Path.GetFullPath(a + #"\..\..");
var c = a.Remove(a.LastIndexOf('\\', a.LastIndexOf('\\') - 1));
but I do find this kind of string extensions generally usefull:
static string beforeLast(this string str, string delimiter)
{
int i = str.LastIndexOf(delimiter);
if (i < 0) return str;
return str.Remove(i);
}
For such repeated tasks, a good solution is often to write an extension method, e.g.
public static class Extensions
{
public static string ChopPath(this string path)
{
// chopping code here
}
}
Which you then can use anywhere you need it:
var chopped = a.ChopPath();

How to convert a string containing escape characters to a string

I have a string that is returned to me which contains escape characters.
Here is a sample string
"test\40gmail.com"
As you can see it contains escape characters. I need it to be converted to its real value which is
"test#gmail.com"
How can I do this?
If you are looking to replace all escaped character codes, not only the code for #, you can use this snippet of code to do the conversion:
public static string UnescapeCodes(string src) {
var rx = new Regex("\\\\([0-9A-Fa-f]+)");
var res = new StringBuilder();
var pos = 0;
foreach (Match m in rx.Matches(src)) {
res.Append(src.Substring(pos, m.Index - pos));
pos = m.Index + m.Length;
res.Append((char)Convert.ToInt32(m.Groups[1].ToString(), 16));
}
res.Append(src.Substring(pos));
return res.ToString();
}
The code relies on a regular expression to find all sequences of hex digits, converting them to int, and casting the resultant value to a char.
string test = "test\40gmail.com";
test.replace(#"\40","#");
If you want a more general approach ...
HTML Decode
The sample string provided ("test\40gmail.com") is JID escaped. It is not malformed, and HttpUtility/WebUtility will not correctly handle this escaping scheme.
You can certainly do it with string or regex functions, as suggested in the answers from dasblinkenlight and C.Barlow. This is probably the cleanest way to achieve the desired result. I'm not aware of any .NET libraries for decoding JID escaping, and a brief search hasn't turned up much. Here is a link to some source which may be useful, though.
I just wrote this piece of code and it seems to work beautifully... It requires that the escape sequence is in HEX, and is valid for value's 0x00 to 0xFF.
// Example
str = remEscChars(#"Test\x0D") // str = "Test\r"
Here is the code.
private string remEscChars(string str)
{
int pos = 0;
string subStr = null;
string escStr = null;
try
{
while ((pos = str.IndexOf(#"\x")) >= 0)
{
subStr = str.Substring(pos + 2, 2);
escStr = Convert.ToString(Convert.ToChar(Convert.ToInt32(subStr, 16)));
str = str.Replace(#"\x" + subStr, escStr);
}
}
catch (Exception ex)
{
throw ex;
}
return str;
}
.NET provides the static methods Regex.Unescape and Regex.Escape to perform this task and back again. Regex.Unescape will do what you need.
https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.unescape

Splitting a string at every character

let's say I have a string "hello world". I would like to end up with " dehllloorw". As I don't find any ready-made solution I thought: I can split the string into a character array, sort it and convert it back to a string.
In perl I can do s// but in .Net I'd have to do a .Split() but there's no overload with no parameters... if I do .Split(null) it seems to split by whitespace and .Split('') won't compile.
how do I do this (I hate to run a loop!)?
Array.Sort("hello world".ToCharArray());
Below is a quick demo console app
class Program
{
static void Main(string[] args)
{
var array = "hello world".ToCharArray();
Array.Sort(array);
Console.WriteLine(new String(array));
Console.ReadLine();
}
}
The characters in a string can be directly used, the string class exposed them as an enumeration - combine that with Linq / OrderBy and you have a one-liner to create the ordered output string:
string myString = "hello world";
string output = new string(myString.OrderBy(x => x).ToArray()); // dehllloorw
You could always do this:
private static string SortStringCharacters(string value)
{
if (value == null)
return null;
return new string(value.ToList().Sort().ToArray());
}

C#/.NET: Reformatting a very long string

I need to read a string, character by character, and build a new string as the output.
What's the best approach to do this in C#?
Use a StringBuilder? Use some writer/stream?
Note that there will be no I/O operations--this is strictly an in-memory transformation.
If the size of the string cannot be determined at compile time and it may also be relatively large, you should use a StringBuilder for concatenation as it acts like a mutable string.
var input = SomeLongString;
// may as well initialize the capacity as well
// as the length will be 1 to 1 with the unprocessed input.
var sb = new StringBuilder( input.Length );
foreach( char c in input )
{
sb.Append( Process( c ) );
}
if it's just one string you can use a collection to hold your characters and then just create the string using the constructor:
IEnumerable<char> myChars = ...;
string result = new string(myChars);
Using Linq and with the help of a method ProcessChar(char c) that transforms each character to its output value this could be just a query transformation (using the string constructor that takes an IEnumerable<char> as input):
string result = new string(sourceString.Select(c => ProcessChar(c)));
This is as efficient as using a StringBuilder (since StringBuilder is used internally in the string class to construct the string from the IEnumerable), but much more readable in my opinion.
Stringbuilder is usually a pretty good bet. I've written lots of javascript in webpages using it.
A StringBuilder is good idea for building your new string, because you can efficiently append new values to it. As for reading the characters from the input string, a StringReader would be a sufficient choice.
void Main()
{
string myLongString = "lf;kajsd;lfkjal;dfkja;lkdfja;lkdjf;alkjdfa";
var transformedTString = string.Join(string.Empty, myLongString.ToCharArray().Where(x => x != ';'));
transformedTString.Dump();
}
If you have more complicated logic you can move your validation to separate predicated method
void Main()
{
string myLongString = "lf;kajsd;lfkjal;dfkja;lkdfja;lkdjf;alkjdfa";
var transformedTString = string.Join(string.Empty, myLongString.ToCharArray().Where(MyPredicate));
transformedTString.Dump();
}
public bool MyPredicate(char c)
{
return c != ';';
}
What's the difference between read string and output string? I mean why do you have to read char by char?
I use this method for reading string
string str = "some stuff";
string newStr = ToNewString(str);
string ToNewString(string arg)
{
string r = string.Empty;
foreach (char c in arg)
r += DoWork(c);
return r;
}
char DoWorK(char arg)
{
// What do you want to do here?
}

What should I name this Extension method?

I have written an extension method for string manipulation. I'm confused what should I name it - since this will become part of the base library front-end developers in the team will use. Here's the profile of the class member.
Info: Utility Extension method for String types. Overloads of this method may do the same thing characters other than space [with what supplied in argument]
Purpose: Trims down all intermediate or in-between spaces to single space.
Ex:
string Input = "Hello Token1 Token2 Token3 World! ";
string Output = Input.TrimSpacesInBetween();
//Output will be: "Hello Token1 Token2 Token3 World!"
I have read [in fact I'm reading] the Framework Design guidelines but this seems to be bothering me.
Some options I think..
TrimIntermediate();
TrimInbetween();
Here's the code on Request:
It's recursive..
public static class StringExtensions
{
public static string Collapse(this string str)
{
return str.Collapse(' ');
}
public static string Collapse(this string str, char delimeter)
{
char[] delimeterts = new char[1];
delimeterts[0] = delimeter;
str = str.Trim(delimeterts);
int indexOfFirstDelimeter = str.IndexOf(delimeter);
int indexTracker = indexOfFirstDelimeter + 1;
while (str[indexTracker] == delimeter)
indexTracker++;
str = str.Remove(indexOfFirstDelimeter + 1, indexTracker - indexOfFirstDelimeter - 1);
string prevStr = str.Substring(0, indexOfFirstDelimeter + 1);
string nextPart = str.Substring(indexOfFirstDelimeter + 1);
if (indexOfFirstDelimeter != -1)
nextPart = str.Substring(indexOfFirstDelimeter + 1).Collapse(delimeter);
string retStr = prevStr + nextPart;
return retStr;
}
}
What about CollapseSpaces?
CollapseSpaces is good for just spaces, but to allow for the overloads you might want CollapseDelimiters or CollapseWhitespace if it's really just going to be for various whitespace characters.
Not really an answer, more a comment on your posted code...
You could make the method a lot shorter and more understandable by using a regular expression. (My guess is that it would probably perform better than the recursive string manipulations too, but you would need to benchmark to find out for sure.)
public static class StringExtensions
{
public static string Collapse(this string str)
{
return str.Collapse(' ');
}
public static string Collapse(this string str, char delimiter)
{
str = str.Trim(delimiter);
string delim = delimiter.ToString();
return Regex.Replace(str, Regex.Escape(delim) + "{2,}", delim);
}
}
In ruby I believe they call this squeeze
NormalizeWhitespace ?
This way is more clear that there will be a usable value left after processing.
As other have stated earlier, 'Collapse' sounds somewhat rigorous and might even mean that it can return an empty string.
Try this, it works for me and seems to be a lot less complicated than a recursive solution...
public static class StringExtensions
{
public static string NormalizeWhitespace(this string input, char delim)
{
return System.Text.RegularExpressions.Regex.Replace(input.Trim(delim), "["+delim+"]{2,}", delim.ToString());
}
}
It can be called as such:
Console.WriteLine(input.NormalizeWhitespace(' '));
CollapseExtraWhitespace
PaulaIsBrilliant of course!
How is makeCompact?

Categories

Resources