Converting email subject from "?UTF-8?..." to string? - c#

I'm using these techniques to convert =?utf-8?B?...?= to a readable string:
How convert email subject from “?UTF-8?…?=” to readable string?
string encode / decode
It works for simple input, but I have some input that have nested =?utf-8?B?...?=, for example:
"=?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2YbYr9is?="
I know the part between =?UTF-8?B? and ?= is a base64 encoded string, But in this case I don't have any idea how to extract them.

You can use a regex to extract the string between =?UTF-8?B? and ?= then convert the rest. Here's an example:
string input = "=?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2YbYr9is?=";
Regex regex = new Regex(string.Format("{0}(.*?){1}",Regex.Escape("=?utf-8?B?"), Regex.Escape("?=")));
var matches = regex.Matches(input);
foreach (Match match in matches)
{
Console.WriteLine(
Encoding.UTF8.GetString(Convert.FromBase64String(match.Groups[1].Value))
);
}
This will print:
این یک متن ساده است
این یک متن ساده است
ندج
Don't forget to include these using statements:
using System.Text.RegularExpressions;
using System.Text;
Working example available here.

Try with something like:
string str = "=?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2YbYr9is?=";
const string utf8b = "=?utf-8?B?";
var parts = str.Split(new[] { "?=" }, 0);
foreach (var part in parts)
{
string str2 = part.Trim();
if (str2.StartsWith(utf8b, StringComparison.OrdinalIgnoreCase))
{
str2 = str2.Substring(utf8b.Length);
byte[] bytes = Convert.FromBase64String(str2);
string final = Encoding.UTF8.GetString(bytes);
Console.WriteLine(final);
}
else if (str2 == string.Empty)
{
// Nothing to do here
}
else
{
Console.WriteLine("Not recognized {0}", str2);
}
}
Note that technically the rfc 1342 is a little more complex... instead of utf-8 you could have any encoding, and instead of B you could have Q (for Quoted Printable)

Related

String replace in specific positioned in c#

I have a string Classic-T50 (Black+Grey)
when i send to by query string it will show in next page
Classic-T50 (Black Grey)
so i want to add + in space in this string only within bracket() only portion.
Classic-T50 (Black+Grey).
I have tried string.Replace(" ","+").But it produce
Classic-T50+(Black+Grey).
But i want string Classic-T50 (Black+Grey).
Help me please.
You can use a regular expression for replacing all spaces inside brackets:
var pattern = #"\s(?![^)]*\()";
var data = "Classic-T50 (Black Grey)";
var replacement = "+";
var regex = new Regex(pattern);
var transformedData = regex.Replace(data, replacement); // Classic-T50 (Black+Grey)
This approach will work for any input string. E.g., the string Caption ( A B C D ) will transform to Caption (+A+B+C+D+).
Additional links:
Regex explanation: https://regex101.com/r/aN8fV2/1
MSDN: Regex.Replace Method (String, String)
What is the format of the strings that you want to modify? Will this code work?
void Main()
{
var str = "Classic-T50 (Black Grey)";
Console.WriteLine(FormatWithPlus(str));
}
public string FormatWithPlus(string str){
var str1 = str.Substring(0, str.IndexOf('('));
var str2 = str.Substring(str.IndexOf('('));
return str1 + str2.Replace(' ', '+');
}
Use a StringBuilder and convert back to string. StringBuilder's differ from strings in that they're mutable and noninterned. It stores data in an array, and so you can replace characters like you would in an array:
void Main()
{
var input = "Classic-T50 (Black Grey)";
StringBuilder inputsb = new StringBuilder(input);
var openParens = input.IndexOf('(');
var closeParens = input.IndexOf(')');
var count = closeParens - openParens;
//Console.WriteLine(input);
//inputsb[18] = '+';
inputsb.Replace(' ', '+', openParens, count);
Console.WriteLine(inputsb.ToString());
}
See StringBuilder.Replace Method (Char, Char, Int32, Int32
try:
String.Replace("k ","k+")
use this code
public class Program
{
private static void Main(string[] args)
{
const string abc = "Classic-T50 (Black Grey)";
var a = abc.Replace("k ", "k+");
Console.WriteLine(a);
Console.ReadKey();
}
}
You can UrlEncode it to preserve the special character:
Server.UrlEncode(mystring)
Or replace + sign with "%252b" and then encode it:
string myTitle = mystring.Trim().Replace("+", "%252b");
Response.Redirect("~/default.aspx?title=" + Server.UrlEncode(myTitle));
Remember to decode it once you try to retrieve it :
Server.UrlDecode(Request.QueryString["title"]);

Using C# regex to get any words after one or more "\" characters

Edit: It turns out my code below does work, just not when entering the values from the debugger. The single backslash example failed when entered from the debugger because the single backslash was treated as an escape character rather than a backslash.
I'm trying to get a user alias from a string that may be any one of the following inputs:
alias
domain\alias
domain\\alias
My C# regex pattern looks like this:
string pattern = #"(.*\\)(.*)";
And I'm doing this in code:
string alias = Regex.Replace(input, pattern, "$2", RegexOptions.None);
This returns:
alias
domain\alias
alias
Note that it does not work for #2 (with a single '\'). What's the solution to make this work?
Here is the exact code of the method (sorry if this doesn't format nicely):
private string[] CreateEmailArrayFromString(string p)
{
string[] address = new string[] { string.Empty };
if (p != null)
{
address = p.Split(new char[] { ';', ',', ' ' }, StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < address.Length; i++)
{
if (address[i].Contains("#") == false)
{
string pattern = #"(.*\\)(.*)";
address[i] = Regex.Replace(address[i], pattern, "$2", RegexOptions.None);
address[i] = string.Concat(address[i].Trim(), "#mycompany.com");
}
}
}
return address;
}
This code does work:
string pattern = #"(.*\\)(.*)";
string alias = Regex.Replace(input, pattern, "$2", RegexOptions.None);
It probably won't work if the input is coming from the debugger. That is what was happening in my case.
This is a LINQ solution that works for all three cases:
string str = #"domain\\alias";
string str2 = #"domain\alias";
string str3 = #"alias";
string res = new string(str.Reverse().TakeWhile(c => c != '\\').Reverse().ToArray());

How to replace email address with name in text using regular expressions in C#?

how can I replace the email addresses in a paragraph assuming it's a string now, with names ?
like xx#yahoo.com.my = xx , .com, .ae
Input = "contact abc#yahoo.com or defg#hotmail.eu for more details"
Output = "contact Abc or Defg for more details"
Since you're asking for a Regex, I'm going to give you one.
Regex regex = new Regex(#"(\.|[a-z]|[A-Z]|[0-9])*#(\.|[a-z]|[A-Z]|[0-9])*");
foreach (Match match in regex.Matches(inputString))
{
// match.Value == "xx#yahoo.com.my"
string name = match.Groups[1]; // "xx"
string domain = match.Groups[2]; // "yahoo.com.my"
}
int end = myString.IndexOf('#');
string name=myString.Substring(0, end);
Try like this.
You can refer substring function here>>
http://www.dotnetperls.com/substring
Sting input = "contact abc#yahoo.com or defg#hotmail.eu for more details";
String pattern = #"(\S*)#\S*\.\S*";
String result = Regex.Replace(input , pattern , "$1");
public static string ReplaceEmail(string emailBody) {
string scrubbedemailBody = emailBody;
Regex regex = new Regex(#"(\.|[a-z]|[A-Z]|[0-9])*#(\.|[a-z]|[A-Z]|[0-9])*");
scrubbedemailBody = regex.Replace(scrubbedemailBody, match => {
return new string(' ', match.Length);
});
return scrubbedemailBody;
}

Using C# to edit text within a binary file

I have a binary file (i.e., it contains bytes with values between 0x00 and 0xFF). There are also ASCII strings in the file (e.g., "Hello World") that I want to find and edit using Regex. I then need to write out the edited file so that it's exactly the same as the old one but with my ASCII edits having been performed. How?
byte[] inbytes = File.ReadAllBytes(wfile);
string instring = utf8.GetString(inbytes);
// use Regex to find/replace some text within instring
byte[] outbytes = utf8.GetBytes(instring);
File.WriteAllBytes(outfile, outbytes);
Even if I don't do any edits, the output file is different from the input file. What's going on, and how can I do what I want?
EDIT: Ok, I'm trying to use the offered suggestion and am having trouble understanding how to actually implement it. Here's my sample code:
string infile = #"C:\temp\in.dat";
string outfile = #"C:\temp\out.dat";
Regex re = new Regex(#"H[a-z]+ W[a-z]+"); // looking for "Hello World"
byte[] inbytes = File.ReadAllBytes(infile);
string instring = new SoapHexBinary(inbytes).ToString();
Match match = re.Match(instring);
if (match.Success)
{
// do work on 'instring'
}
File.WriteAllBytes(outfile, SoapHexBinary.Parse(instring).Value);
Obviously, I know I'll not get a match doing it that way, but if I convert my Regex to a string (or whatever), then I can't use Match, etc. Any ideas? Thanks!
Not all binary strings are valid UTF-8 strings. When you try to interpret the binary as a UTF-8 string, the bytes that can't be thus interpreted are probably getting mangled. Basically, if the whole file is not encoded text, then interpreting it as encoded text will not yield sensible results.
An alternative to playing with binary file can be: converting it to hex string, working on it(Regex can be used here) and then saving it back
byte[] buf = File.ReadAllBytes(file);
var str = new SoapHexBinary(buf).ToString();
//str=89504E470D0A1A0A0000000D49484452000000C8000000C808030000009A865EAC00000300504C544......
//Do your work
File.WriteAllBytes(file,SoapHexBinary.Parse(str).Value);
PS: Namespace : System.Runtime.Remoting.Metadata.W3cXsd2001.SoapHexBinary
I got it! Check out the code:
string infile = #"C:\temp\in.dat";
string outfile = #"C:\temp\out.dat";
Regex re = new Regex(#"H[a-z]+ W[a-z]+"); // looking for "Hello World"
string repl = #"Hi there";
Encoding ascii = Encoding.ASCII;
byte[] inbytes = File.ReadAllBytes(infile);
string instr = ascii.GetString(inbytes);
Match match = re.Match(instr);
int beg = 0;
bool replaced = false;
List<byte> newbytes = new List<byte>();
while (match.Success)
{
replaced = true;
for (int i = beg; i < match.Index; i++)
newbytes.Add(inbytes[i]);
foreach (char c in repl)
newbytes.Add(Convert.ToByte(c));
Match nmatch = match.NextMatch();
int end = (nmatch.Success) ? nmatch.Index : inbytes.Length;
for (int i = match.Index + match.Length; i < end; i++)
newbytes.Add(inbytes[i]);
beg = end;
match = nmatch;
}
if (replaced)
{
var newarr = newbytes.ToArray();
File.WriteAllBytes(outfile, newarr);
}
else
{
File.WriteAllBytes(outfile, inbytes);
}

How can extract part of a string separated by spaces?

If I have string like:
String^ str ="hhB2LWq50a+9HZiNLKuwdQ==.pdf aaaaaaaa bbbbbbbbb cccccdddddeee ffffffgggghhh";
and I want to extract the first part of it which is
hhB2LWq50a+9HZiNLKuwdQ==.pdf
How can do that in C++/CLI or C# ?
You can use String.Split() method
string str ="hhB2LWq50a+9HZiNLKuwdQ==.pdf aaaaaaaa bbbbbbbbb cccccdddddeee";
string[] parts = str.Split(' ');
if (parts != null)
{
string firstPart = parts[0];
}
Or using LINQ First():
using System.Linq;
string firstPart = str.Split(' ').First();
Use string.IndexOf to find the first space, then string.Substring to copy:
string str ="hhB2LWq50a+9HZiNLKuwdQ==.pdf aaaaaaaa bbbbbbbbb cccccdddddeee";
int spacePos = str.IndexOf(' ');
if (spacePos == -1)
return null;
else
return str.Substring(0, spacePos);
This assumes that the string doesn't have any leading spaces. If it can have leading spaces, you should probably call Trim on it first.
in C# it's so easy
string tem = "test test";
string[] s = tem.Split(' ');
Console.WriteLine(s[0]);
Console.ReadLine();
you can use regular expression to parse your string and extract the desire text

Categories

Resources