I have a script that scrapes a website, but the while loop does not work. The script downloads a website and then checks if it is a website or a image.
If the downloaded item is a HTML file it saves it and adds 1 to i and the URL.
Problem: The URL does not change, even tho I think it should with this code.
int i = 0;
while (i < 5)
{
using var client = new WebClient();
client.Headers.Add("User-Agent", "C# console program");
int urlnumb = 1;
string url = "http://localhost:7211/database/resource/pk/" + urlnumb;
string content = client.DownloadString(url);
string htmldefiner = "html";
if (content.Contains(htmldefiner))
{
string savedirectory = #"C:/Temp/" + i + ".html";
System.IO.File.WriteAllText(savedirectory, content);
i++;
urlnumb++;
File.WriteAllText(#"C:/Temp/" + i + ".txt", url);
}
else
{
urlnumb++;
}
}
Related
I have pulled some information from the internet using HTMLAGilityPack. No problem.
I then pass the innerHTML through a method I took from stackoverflow (this is to remove mark ups etc and make it plaintext).
I then call a boolean to determine if the new output is the same as a txtInput on the form. It is returning false even though they are the same?
I know nothing about unicode, UT-8, Cry, character bytes etc.. Though i'm assuming the binary are different? even though they appear the same? How can I get around this problem.
This is the string in the input box, the same one it pulls from HTMLAGilitypack
"When I Grow Up (feat. Lauren Ward & Bailey Ryon)"
This is the 2 outputs side by side.
As you can see from the pictures, face value they look exactly the same. Yet it returns false. Please how can I fix this?
Here is my code:
This checks if the values are different and always returns false.
private bool CheckText(string node)
{
string value = HtmlToPlainText(txtSong.Text);
if (value == node)
return true;
else
return false;
}
This is the method that actually pulls the data, If it matches it will open the page, if it doesn't it retry.
private void pullTable(int pageNum, string keyWord, int resultStart)
{
int countCheck = 0;
while (countCheck == 0)
{
System.Threading.Thread.Sleep(3000);
HtmlWeb web = new HtmlWeb();
string amazon = "https://www.amazon.co.uk/s/ref=nb_sb_noss_2?url=search-alias%3Ddigital-music&page=" + pageNum + "";
if (txtSong.Text != "")
{
string temp = txtSong.Text.Replace("(", "%28");
temp = temp.Replace(")", "%26");
amazon = amazon + "&field-keywords=" + temp;
}
if (txtArtist.Text != "")
{
string temp = txtArtist.Text.Replace("(", "%28");
temp = temp.Replace(")", "%26");
amazon = amazon + "&field-author=" + temp;
}
if (radioArtistAZ.Checked)
amazon = amazon + "&sort=artist-album-asc-rank";
else if (radioArtistZA.Checked)
amazon = amazon + "&sort=artist-album-desc-rank";
else if (radioSongAZ.Checked)
amazon = amazon + "&sort=title-asc-rank";
else if (radioSongZA.Checked)
amazon = amazon + "&sort=title-desc-rank";
{
}
var doc = web.Load(amazon);
System.Threading.Thread.Sleep(200);
var nodes = doc.DocumentNode.SelectNodes("//body");
try
{
nodes = doc.DocumentNode.SelectNodes("//tr[starts-with(#id, 'result_')]/td[2]/div/a");
}
catch (Exception)
{
}
try
{
for (int i = 0; i < 50; i++)
{
// string tempValue = nodes[i].InnerHtml.Replace("&", "&");
var plainText = HtmlToPlainText(nodes[i].InnerText);
if (CheckText(plainText))
{
AppendTextBox("Opening on page " + pageNum);
System.Diagnostics.Process.Start(amazon);
found = 1;
countCheck = 1;
return;
}
else
{
}
}
countCheck = 1;
AppendTextBox("Not found on page " + pageNum);
}
catch (Exception)
{
AppendTextBox("error on page " + pageNum);
System.Threading.Thread.Sleep(1500);
}
}
}
I have developed an asp.net c# webpage to allow user to download or view server logs.
Once the server and log date is selected they have the option to either open it via Notepad++, or view part of the log in a textbox.
Part of the requirement is to show only the last 50 lines of the log in the textbox, this in the only part I'm not sure of can anyone point me in the right direction?
Just now I'm building up the path then setting the text property of the textbox as follows:
_PathFrom = #"\\" + ddlServer.SelectedItem.Value + #"\Logs\" + AppOrSession.SelectedItem.Value + #"\" + ddlKernel.SelectedItem.Value + #"\" + txtLogName.Text;
WebClient MyClient = new WebClient();
_Log = MyClient.DownloadString(_PathFrom);
txtLog.Text = _Log;
thanks
using this method pick last 50 lines from file and display to front end
public static IList<string> GetLog(string logname, string numrows)
{
int lineCnt = 1;
List<string> lines = new List<string>();
int maxLines;
if (!int.TryParse(numrows, out maxLines))
{
maxLines = 50;
}
string logFile = HttpContext.Current.Server.MapPath("~/" + logname);
BackwardReader br = new BackwardReader(logFile);
while (!br.SOF)
{
string line = br.Readline();
lines.Add(line + System.Environment.NewLine);
if (lineCnt == maxLines) break;
lineCnt++;
}
lines.Reverse();
return lines;
}
I am confused whether it is a server issue or coding problem. My strong guess is that it might be a server issue because it is working fine on my local system.
I have upgraded a Asp.net 1.1 website to 4.0. In the application the a file is created by the values that the user enter on the forms. The file is saved in the attachments folder in the application. After the file is created the email is sent to the administrator with attaching the file as an attachment.
On my local system the email is sent just fine. As the application was built in 1.1 CDO is being used to send the emails. When I publish the application on server then attachment is failed and the following error is displayed,
The process cannot access the file 'E:\HostingSpaces\testuser\testapplication.mydomain.com\wwwroot\eTest\Attachment\4orsysil3dulr1iv1thvpade\ef_Comp.exp' because it is being used by another process.
I have given read, write, delete access to the attachments folder. If there is problem in the code then it should also effect the application on the local system too. I have checked all the StreamWriter is closed everywhere it is used.
If this is a server error then what could be the reason?
Edit:
The code is very old written a long time ago and it was working just fine. What I have done is changed the email sending code and specified the SSL and new Port for sending the SMS. Other then that it was not giving error before.
So below is the function which is used to generate the files when the user submit the forms. There are multiple files generated for each form.
private bool GenerateFile()
{
string strSupportDocFile=string.Empty;
string strBespokeFile=string.Empty;
EFormDetails objEFormDetails=new EFormDetails();
DataRow drEForm=objEFormDetails.ResultRow;
string strDirPath = Server.MapPath(#"Attachment/" + Session.SessionID);
try
{
if (!Directory.Exists(strDirPath))
{
Directory.CreateDirectory(strDirPath);
}
StreamWriter ef_File;
StringBuilder strFile=new StringBuilder();
ef_File = new StreamWriter(Server.MapPath(#"Attachment/" + Session.SessionID + #"/" + ConstantsData.EF_COMP_FILENAME));
if(objEFormDetails._EF_COMP != string.Empty)
{
strFile.Append(objEFormDetails._EF_COMP);
ef_File.WriteLine(strFile.ToString());
ef_File.Close();
sbOnFloppyComp=strFile;
}
ef_File = null;
strFile=null;
StringBuilder ef_Cost=new StringBuilder();
strFile=new StringBuilder();
ef_File = new StreamWriter(Server.MapPath(#"Attachment/" + Session.SessionID + #"/" + ConstantsData.EF_COST_FILENAME));
if(objEFormDetails._EF_COST != string.Empty)
{
strFile.Append(objEFormDetails._EF_COST);
ef_File.WriteLine(strFile.ToString());
ef_File.Close();
sbOnFloppyCost=strFile;
}
GetMemberData();
GetOtherDirectorsData();
if(base.IsGuestUser())
{
string strEmailBody;
strEmailBody=GenerateBody();
string strpackage = strEmailBody;
GetPackageDetails(strpackage);
}
return true;
}
catch(Exception ex)
{
lblError.Text=ex.Message.ToString();
return false;
}
}
GetMemberData() function:
private void GetMemberData()
{
EFormDetails objEFormDetails = new EFormDetails();
DataRow drEForm = objEFormDetails.ResultRow;
if (drEForm != null)
{
string strDirPath = Server.MapPath(#"Attachment/" + Session.SessionID);
eFormation.Business.EFDIR efdir = new eFormation.Business.EFDIR();
eFormationResult objResult;
objResult = efdir.LoadEFDIRData(Convert.ToInt64(drEForm[EFORMData.ID_FIELD]), Convert.ToString(drEForm[EFORMData.COMPANYNAME_FIELD]));// + " " + Convert.ToString(drEForm[EFORMData.LIMITED_FIELD])==DBNull.Value ? string.Empty : drEForm[EFORMData.LIMITED_FIELD])));//give efromid adn comapany name
if (!Directory.Exists(strDirPath))
{
Directory.CreateDirectory(strDirPath);
}
StreamWriter swMember;
StringBuilder sb = new StringBuilder();
swMember = new StreamWriter(Server.MapPath(#"Attachment/" + Session.SessionID + #"/" + ConstantsData.EFDIR_MEMBER_FILENAME));
for (int i = 0; i < objResult.ResultData.Tables[0].Rows.Count; i++)
{
sb.Append(objResult.ResultTable.Rows[i][0].ToString());
sb.Append(Environment.NewLine);
}
sbOnFloppyMember = sb;
swMember.WriteLine(sb.ToString());
swMember.Close();
sb = null;
swMember = null;
}
}
GetOtherDirectorsData() function:
private void GetOtherDirectorsData()
{
EFormDetails objEFormDetails = new EFormDetails();
DataRow drEForm = objEFormDetails.ResultRow;
if (drEForm != null)
{
string strDirPath = Server.MapPath(#"Attachment/" + Session.SessionID);
eFormationResult objResult;
eFormation.Business.EFODIR objefodir = new eFormation.Business.EFODIR();
objResult = objefodir.LoadEFODIRData(Convert.ToInt64(drEForm[EFORMData.ID_FIELD]));//change
if (!Directory.Exists(strDirPath))
{
Directory.CreateDirectory(strDirPath);
}
StreamWriter swMember;
StringBuilder sb = new StringBuilder();
swMember = new StreamWriter(Server.MapPath(#"Attachment/" + Session.SessionID + #"/" + ConstantsData.EFODIR_OTHERDIRECTOR_FILENAME));
for (int i = 0; i < objResult.ResultData.Tables[0].Rows.Count; i++)
{
sb.Append(objResult.ResultTable.Rows[i][0].ToString());
sb.Append(Environment.NewLine);
}
sbOnFloppyOtherDirectors = sb;
swMember.WriteLine(sb.ToString());
swMember.Close();
sb = null;
swMember = null;
}
}
GetPackageDetails() function:
private void GetPackageDetails(String strBodyContent)
{
StreamWriter ef_File;
StringBuilder strFile = new StringBuilder();
ef_File = new StreamWriter(Server.MapPath(#"Attachment/" + Session.SessionID + #"/" + ConstantsData.EF_PACKAGE_FILENAME));
ef_File.WriteLine(strBodyContent);
ef_File.Close();
sbOnFloppyComp = strFile;
}
Now all these above mentioned methods are used to create the files.
Now all these files created are added as attachments in the
MailAttachment attachment = new MailAttachment(Server.MapPath(#"Attachment/" + Session.SessionID + #"/" + ConstantsData.EF_COMP_FILENAME));
mEmailMessage.Attachments.Add(attachment);
MailAttachment attachment = new MailAttachment(Server.MapPath(#"Attachment/" + Session.SessionID + #"/" + ConstantsData.EFDIR_MEMBER_FILENAME));
mEmailMessage.Attachments.Add(attachment);
MailAttachment attachment = new MailAttachment(Server.MapPath(#"Attachment/" + Session.SessionID + #"/" + ConstantsData.EF_COST_FILENAME));
mEmailMessage.Attachments.Add(attachment);
See any error?
I have just added these two lines as the email server is now changed to Office 365.
mEmailMessage.Fields.Add("http://schemas.microsoft.com/cdo/configuration/smtpserverport", System.Configuration.ConfigurationSettings.AppSettings["SmtpPort"]);
mEmailMessage.Fields.Add("http://schemas.microsoft.com/cdo/configuration/smtpusessl", true);
Make sure you dispose of your Attachments and your mail message. Otherwise a lock can linger.
mail.Attachments.Dispose()
mail.Dispose()
Working in C# with the EWS Managed API, we're having trouble efficiently retrieving the images stored as inline attachments.
The endpoint is to show an email with inline images as a fully formed html page in a panel. The code we currently us:
string sHTMLCOntent = item.Body;
FileAttachment[] attachments = null;
if (item.Attachments.Count != 0)
{
attachments = new FileAttachment[item.Attachments.Count];
for (int i = 0; i < item.Attachments.Count; i++)
{
string sType = item.Attachments[i].ContentType.ToLower();
if (sType.Contains("image"))
{
attachments[i] = (FileAttachment)item.Attachments[i];
string sID = attachments[i].ContentId;
sType = sType.Replace("image/", "");
string sFilename = sID + "." + sType;
string sPathPlusFilename = Directory.GetCurrentDirectory() + "\\" + sFilename;
attachments[i].Load(sFilename);
string oldString = "cid:" + sID;
sHTMLCOntent = sHTMLCOntent.Replace(oldString, sPathPlusFilename);
}
}
}
(sourced: http://social.technet.microsoft.com/Forums/en-US/exchangesvrdevelopment/thread/ad10283a-ea04-4b15-b20a-40cbd9c95b57)
.. this is not very efficient though and is slowing down the responsiveness of our web app. Does anyone have a better solution for this problem? We are using Exchange 2007 SP1, so the IsInline property wont work as its Exchange 2010 only.
I build an index of your "cid:"s first:
private const string CidPattern = "cid:";
private static HashSet<int> BuildCidIndex(string html)
{
var index = new HashSet<int>();
var pos = html.IndexOf(CidPattern, 0);
while (pos > 0)
{
var start = pos + CidPattern.Length;
index.Add(start);
pos = html.IndexOf(CidPattern, start);
}
return index;
}
Then you need a replace function that replaces the cids based on your index
private static void AdjustIndex(HashSet<int> index, int oldPos, int byHowMuch)
{
var oldIndex = new List<int>(index);
index.Clear();
foreach (var pos in oldIndex)
{
if (pos < oldPos)
index.Add(pos);
else
index.Add(pos + byHowMuch);
}
}
private static bool ReplaceCid(HashSet<int> index, ref string html, string cid, string path)
{
var posToRemove = -1;
foreach (var pos in index)
{
if (pos + cid.Length < html.Length && html.Substring(pos, cid.Length) == cid)
{
var sb = new StringBuilder();
sb.Append(html.Substring(0, pos-CidPattern.Length));
sb.Append(path);
sb.Append(html.Substring(pos + cid.Length));
html = sb.ToString();
posToRemove = pos;
break;
}
}
if (posToRemove < 0)
return false;
index.Remove(posToRemove);
AdjustIndex(index, posToRemove, path.Length - (CidPattern.Length + cid.Length));
return true;
}
so now, you can check your attachments
FileAttachment[] attachments = null;
var index = BuildCidIndex(sHTMLCOntent);
if (index.Count > 0 && item.Attachments.Count > 0)
{
var basePath = Directory.GetCurrentDirectory();
attachments = new FileAttachment[item.Attachments.Count];
for (var i = 0; i < item.Attachments.Count; ++i)
{
var type = item.Attachments[i].ContentType.ToLower();
if (!type.StartsWith("image/")) continue;
type = type.Replace("image/", "");
var attachment = (FileAttachment)item.Attachments[i];
var cid = attachment.ContentId;
var filename = cid + "." + type;
var path = Path.Combine(basePath, filename);
if(ReplaceCid(index, ref sHTMLCOntent, cid, path))
{
// only load images when they have been found
attachment.Load(path);
attachments[i] = attachment;
}
}
}
Additional to that: instead of calling attachment.Load right away, and pass the path to the image directly, you could link to another script, where you pass the cid as a parameter and then check back with the exchange for that image; then the process of loading the image from exchange does not block the html cid replacement and could lead to loading the page faster, since the html can send to the browser sooner.
PS: Code is not tested, just so you get the idea!
EDIT
Added the missing AdjustIndex function.
EDIT 2
Fixed small bug in AdjustIndex
I am using http://lite.facebook.com And i want to get some data from my account.
I am using HttpWebRequest for this.
I am able to login to facebook from my credential using web request And I got profile url from home page html.
Now when i am trying to get list of all friends then its kick me out login page.
for login I am using This Code.
string HtmlData = httpHelper.getHtmlfromUrl(new Uri(FacebookUrls.Lite_MainpageUrl));
lstInput = globussRegex.GetInputControlsNameAndValueInPage(HtmlData);
foreach (string str in lstInput)
{
if (str.Contains("lsd"))
{
int FirstPoint = str.IndexOf("name=\"lsd\"");
if (FirstPoint > 0)
{
TempHtmlData = str.Substring(FirstPoint).Replace("name=\"lsd\"","").Replace("value","");
}
int SecondPoint = TempHtmlData.IndexOf("/>");
if (SecondPoint > 0)
{
Value = TempHtmlData.Substring(0, SecondPoint).Replace("=", "").Replace("\\", "").Replace("\"", "").Replace(" ", "");
}
}
}
string LoginData = "form_present=1&lsd=" + Value + "&email=" + UserName + "&password=" + Password + "";
string ResponseData = httpHelper.postFormData(new Uri(FacebookUrls.Lite_LoginUrl), LoginData);
int FirstProfileTag = ResponseData.IndexOf("/p/");
int SecondProfileTag = ResponseData.IndexOf("\">Profile");
if (FirstProfileTag > 0 && SecondProfileTag > 0)
{
string TempProfileUrl = ResponseData.Substring(FirstProfileTag, SecondProfileTag - FirstProfileTag);
string ProfileUrl = FacebookUrls.Lite_ProfileUrl + TempProfileUrl;
GetUserProfileData(ProfileUrl);
}
And For getting Profile Url And FriendList Url Iam doing This
string HtmlData = httpHelper.getHtmlfromUrl(new Uri(ProfileUrl));
string FriendUrl = "http://lite.facebook.com" + "/p/Pankaj-Mishra/1187787295/friends/";
string HtmlData1 = httpHelper.getHtmlfromUrl(new Uri(FriendUrl));
I got perfect result when i tried for ProfileUrl.
but when i tried for frindUrl its logged out how can i solve this problem
Plz help me.
Stop scraping HTML data and use their API