Parsing the output of FtpWebRequest - c#

I can see that there are 2 types of responses:
Windows
Unix
Examples
"08-25-12 06:52AM 139874418 3.03.06P13.12NB.rar"
"-r-xr-xr-x 1 owner group 1 Jun 3 1999 NotCurrentYear.txt"
I need to parse it and I used the following logic:
AnalyzedFolder folderToBeAnalyzed = new AnalyzedFolder();
folderToBeAnalyzed.Name = folder;
Job.AnalyzedFolders.Add(folderToBeAnalyzed);
FtpWebRequest request = (FtpWebRequest)WebRequest.Create(textBoxFTPSite.Text + folder);
request.Method = WebRequestMethods.Ftp.ListDirectoryDetails;
request.Credentials = new NetworkCredential(textBoxFTPUserName.Text, textBoxFTPPassword.Text);
FtpWebResponse response = (FtpWebResponse)request.GetResponse();
Stream responseStream = response.GetResponseStream();
StreamReader reader = new StreamReader(responseStream);
string[] outputlines = reader.ReadToEnd().Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
foreach (string info in outputlines)
{
var tokens = info.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
string name;
string type;
string size;
DateTime dateModified;
string lsLine;
if (tokens.Length == 4) //WINDOWS
{
name = tokens[3];
if (tokens[2] == "<DIR>")
{
type = "D";
size = "";
}
else
{
type = "F";
size = tokens[2];
}
dateModified = DateTime.ParseExact(tokens[0] + " " + tokens[1], "MM-dd-yy h:mmtt", CultureInfo.InvariantCulture);
lsLine = info;
FTPFolderEntity entity = new FTPFolderEntity() { FolderName = folder, Name = name, Type = type, Size = size, DateModified = dateModified, LSLine = lsLine };
folderToBeAnalyzed.Entities.Add(entity);
}
else //UNIX
{
}
}
The problem is that for this file:
"11-15-12 10:02PM 324 Copy (10) of 1040.txt.zip"
because of the spaces, the logic fails. Also, like this bug, I suspect I may run into other problems also. Can anyone guide me for a better parsing method please ?

You can use a Regular Expression here to remove extra Whitesapces:
string info = "11-15-12 10:02PM 324 Copy (10) of 1040.txt.zip";
string result = Regex.Replace(info, #"\s\s+", " ");
After that you will get result as
// result = "11-15-12 10:02PM 324 Copy (10) of 1040.txt.zip";
ADDED, If you want to limit your tokens assuming the first is always the date, second is the time, and rest is your fileName or something:
var tokens = Regex.Split(info, #"\s+");
var newTokens = new string[]
{
tokens[0],
tokens[1],
tokens[2],
tokens[3] + ' ' + tokens[4] + ' ' + tokens[5] + ' ' + tokens[6]
};

You could do a split using a Regex.
var tokens = Regex.Split(info, #"\s+");

Related

Download websites web pages in parallel

I have the following code and it works fine when I loop through one by one using foreach, but when I change it to use Parallel.ForEach I'm getting errors. Trying to figure out how I can correct this. FYI the dParms list contain unique id's.
The error I'm getting is
Error writing to path..... ErrorMessage:Item has already been added.
Key in dictionary: 'IIS_WasUrlRewritten' Key being added:
'IIS_WasUrlRewritten',
private void GeneratePages(RequestStatus response, string localDir, List<Parameters> dParms, int total, DateTime generatedTime)
{
int current = 0;
Parallel.ForEach(dParms, curJob =>
{
try
{
DownloadPage(localDir, curJob, generatedTime);
}
catch (Exception ex)
{
response.Status = false;
response.Message = ex.Message;
}
finally
{
Interlocked.Increment(ref current);
if (current % 10 == 0)
//to do: Send Progress to UI
}
//});
}
private string DownloadPage(string localDir, Parameters p, DateTime generatedTime)
{
string strExtension = "html";
string url = string.Empty;
url = this.Url.Action("MyAction", "Home", new { area = "", #id = p.MyId, #generatedTime = generatedTime }, this.Request.Url.Scheme);
var document = new HtmlWeb().Load(url);
string strFileName = url.Substring(url.LastIndexOf("/") + 1);
strFileName = strFileName.Substring(0, strFileName.IndexOf("generatedTime") - 1);
string strDiskFileName = strFileName.Replace(".aspx?", "");
strDiskFileName = strDiskFileName.Replace("?", "");
strDiskFileName = strDiskFileName.Replace(".aspx", "");
strDiskFileName = strDiskFileName.Replace("&", "");
strDiskFileName = strDiskFileName.Replace("=", "");
strDiskFileName = strDiskFileName.Replace("%20", "");
strDiskFileName += "." + strExtension;
document.Save(localDir + strDiskFileName);
return url;
}

Reading a specific value from a GitHub text file

I would like to read from a text file in the Internet the certain assignment to a word.
In the output "content" I get the complete content of the text file.
But I only want v7.7.3 from the line: version = "v7.7.3".
How can I filter by version with the streamreader?
That is the LastVersion.txt file:
[general]
version = "v7.7.3"
messagenew = "Works with June 2018 Update!\n Plus new Smart Farm strategy\n New Siege Machines\n For more information, go to \n https://mybot.run \n Always free and open source."
messageold = "A new version of MyBot (v7.7.3) is available!\nPlease download the latest from:\nhttps://mybot.run"
Updated: That's my current code.
public string myBotNewVersionURL = "https://raw.githubusercontent.com/MyBotRun/MyBot/master/LastVersion.txt";
public string myBotDownloadURL = null;
public string userDownloadFolder = #"C:\Users\XXX\Download\";
public string newMyBotVersion = null;
public string currentMyBotVersion = null;
public string currentMyBotFileName = null;
public string currentMyBotPath = null;
public void Btn_checkUpdate_Click(object sender, EventArgs e)
{
OpenFileDialog openCurrentMyBot = new OpenFileDialog();
openCurrentMyBot.Title = "Choose MyBot.run.exe";
openCurrentMyBot.Filter = "Application file|*.exe";
openCurrentMyBot.InitialDirectory = userDownloadFolder;
if (openCurrentMyBot.ShowDialog() == DialogResult.OK)
{
MyBot_set.SetValue("mybot_path", Path.GetDirectoryName(openCurrentMyBot.FileName));
MyBot_set.SetValue("mybot_exe", Path.GetFullPath(openCurrentMyBot.FileName));
string latestMyBotPath = Path.GetFullPath(openCurrentMyBot.FileName);
var latestMyBotVersionInfo = FileVersionInfo.GetVersionInfo(latestMyBotPath);
currentMyBotVersion = "v" + latestMyBotVersionInfo.FileVersion;
MyBot_set.SetValue("mybot_version", currentMyBotVersion);
WebClient myBotNewVersionClient = new WebClient();
Stream stream = myBotNewVersionClient.OpenRead(myBotNewVersionURL);
StreamReader reader = new StreamReader(stream);
String content = reader.ReadToEnd();
var sb = new StringBuilder(content.Length);
foreach (char i in content)
{
if (i == '\n')
{
sb.Append(Environment.NewLine);
}
else if (i != '\r' && i != '\t')
sb.Append(i);
}
content = sb.ToString();
var vals = content.Split(
new[] { Environment.NewLine },
StringSplitOptions.None
)
.SkipWhile(line => !line.StartsWith("[general]"))
.Skip(1)
.Take(1)
.Select(line => new
{
Key = line.Substring(0, line.IndexOf('=')),
Value = line.Substring(line.IndexOf('=') + 1).Replace("\"", "").Replace(" ", "")
});
newMyBotVersion = vals.FirstOrDefault().Value;
}
Read From local
var vals = File.ReadLines("..\\..\\test.ini")
.SkipWhile(line => !line.StartsWith("[general]"))
.Skip(1)
.Take(1)
.Select(line => new
{
Key = line.Substring(0, line.IndexOf('=')),
Value = line.Substring(line.IndexOf('=') + 1)
});
Console.WriteLine("Key : " + vals.FirstOrDefault().Key +
" Value : " + vals.FirstOrDefault().Value);
Updated
for read from Git , File.ReadLines not work with URL.
string myBotNewVersionURL = "https://raw.githubusercontent.com/MyBotRun/MyBot/master/LastVersion.txt";
WebClient myBotNewVersionClient = new WebClient();
Stream stream = myBotNewVersionClient.OpenRead(myBotNewVersionURL);
StreamReader reader = new StreamReader(stream);
String content = reader.ReadToEnd();
var sb = new StringBuilder(content.Length);
foreach (char i in content)
{
if (i == '\n')
{
sb.Append(Environment.NewLine);
}
else if (i != '\r' && i != '\t')
sb.Append(i);
}
content = sb.ToString();
var vals = content.Split(
new[] { Environment.NewLine },
StringSplitOptions.None
)
.SkipWhile(line => !line.StartsWith("[general]"))
.Skip(1)
.Take(1)
.Select(line => new
{
Key = line.Substring(0, line.IndexOf('=')),
Value = line.Substring(line.IndexOf('=') + 1)
});
Console.WriteLine("Key : " + vals.FirstOrDefault().Key + " Value : " + vals.FirstOrDefault().Value);

Checking whether a unique data is present in a json file

I need to check whether a word is present in a JSON file or not. So if I'm searching for "root", then even though the word "byroots" contain root, it should give me false.
Here's my code
using (StreamReader r = new StreamReader("filename.json"))
{
string json1 = r.ReadToEnd();
if (json1.Contains("root"))
{
filename = path + #"" + branch + "-" + testsuite.Title + ".json";
}
}
I've also tried this condition:-
if (json1.IndexOf(testsuite.Title, StringComparison.OrdinalIgnoreCase) >= 0)
But I'm getting the same results.
Here's the json data
{
"LV": {
"build_number": "20180517.1",
"blah_blah": "blah",
"name": "byroots",
}
}
You should use Regex
var pattern = #"*root*";
Regex rgx = new Regex(pattern);
using (StreamReader r = new StreamReader("filename.json"))
{
string json1 = r.ReadToEnd();
if (rgx.IsMatch(json1))
{
filename = path + #"" + branch + "-" + testsuite.Title + ".json";
}
}

C# WebClient.DownloadFile to specific path

Hi i'm trying to download a png file and place it in a custom location and i have tried adding to the line but wc.DownloadFile does not allow 3 arguments. does anyone have a suggestion? (rookie programmer)
if i change wc.DownloadFile to wc.DownloadFileAsync it gives me an error on y[2]
string lookat = args[0];
string[] exploded = lookat.Split('/');
WebClient wc = new WebClient();
wc.Proxy = new WebProxy();
string content = wc.DownloadString(args[0]);
Regex rx = new Regex("data-id=\"(.*)\">");
MatchCollection matches = rx.Matches(content);
string uri = "http://" + exploded[2] + "/v2/photo/=";
string id = matches[0].ToString().Replace("\"", "").Replace(">", "").Replace("data-id=", "");
content = wc.DownloadString(uri + id);
string[] res = content.Split(new string[] { "filetobedownloaded_" }, StringSplitOptions.None);
foreach (string s in res)
{
if (s.Contains(".png"))
{
string[] y = s.Replace("\\", "").Split('"');
wc.DownloadFile(y[2], "filetobedownloaded_" + y[0].Replace("_png", ".jpg"));
}
}
The DownloadFileAsync accepts Uri and not string so you should convert your download link to Uri like this:
wc.DownloadFileAsync(new Uri(y[2]), "C:\\" + "filetobedownloaded_" + y[0].Replace("_png", ".jpg"));

Union of million line urls in 2 files

File A B contains million urls.
1, go through the url in file A one by one.
2, extract subdomain.com (http://subdomain.com/path/file)
3, if subdomain.com exist file B, save it to file C.
Any quickest way to get file C with c#?
Thanks.
when i use readline, it have no much different.
// stat
DateTime start = DateTime.Now;
int totalcount = 0;
int n1;
if (!int.TryParse(num1.Text, out n1))
n1 = 0;
// memory
dZLinklist = new Dictionary<string, string>();
// read file
string fileName = openFileDialog1.FileName; // get file name
textBox1.Text = fileName;
StreamReader sr = new StreamReader(textBox1.Text);
string fullfile = File.ReadAllText(#textBox1.Text);
string[] sArray = fullfile.Split( '\n');
//IEnumerable<string> sArray = tool.GetSplit(fullfile, '\n');
//string sLine = "";
//while (sLine != null)
foreach ( string sLine in sArray)
{
totalcount++;
//sLine = sr.ReadLine();
if (sLine != null)
{
//string reg = "http[s]*://.*?/";
//Regex R = new Regex(reg, RegexOptions.Compiled);
//Match m = R.Match(sLine);
//if(m.Success)
int length = sLine.IndexOf(' ', n1); // default http://
if(length > 0)
{
//string urls = sLine.Substring(0, length);
dZLinklist[sLine.Substring(0,length)] = sLine;
}
}
}
TimeSpan time = DateTime.Now - start;
int count = dZLinklist.Count;
double sec = Math.Round(time.TotalSeconds,2);
label1.Text = "(" + totalcount + ")" + count.ToString() + " / " + sec + " = " + (Math.Round(count / sec,2)).ToString();
sr.Close();
I would go for using Microsoft LogParser for processing big files: MS LogParser. Are you limited to implement it in described way only?

Categories

Resources