Continuing from part 2 of the ongoing missing data saga, my Eve Online alliance had managed to loose over 3 months of data, following a hard drive failure in our alliance’s web server. Although the most recent data could be recovered using Eve’s API, and data feeds from other alliances, much of the data was unrecoverable. In part 2 we had successfully copied historic killmails from other alliance’s killboards, and stored them locally in a database.
Although the collection process outlined in part 2 is still running, it is estimated that we’ll end up with approximately 70,000 raw killmails which mention our alliance (from the predicted total of 6million). Looking at the killmails themselves, many are duplicated. This is because both the victim, and the pilot who landed the final blow, receive a copy of the killmail; and where more than one alliance is attacking, each alliance will copy the killmail to their own killboard. Of the 70,000 killmails, approximately 25,000 are duplicates, and can be ignored.
De-duping SQL tables is a bit of a nightmare prior to MSSQL 2005. You have to produce a subquery in order to find the latest record for each group, and then delete all others. It’s fairly prone to error if you’re not sure what you’re doing. Fortunately, MSSQL 2005 solves this problem with the introduction of the RANK() keyword. RANK() will give an integer column of where a particular record ranks within a group, for example:
SELECT kID, iID, killmail, posted,uploaded, RANK() OVER (PARTITION BY SUBSTRING(killmail, 0, 50) ORDER BY kID, iID) AS rnk FROM ekKillmail
A sixth column, rnk, will rank the records in order of kID and iID. We only group on the first 50 characters of the killmail, as this speeds things up slightly. The timestamp, victim, and lost ship are covered in these first 50 characters, and provide a unique enough record identifier.
In order to get the first record for each group, we push the above query into a subselect:
SELECT kID, iID, killmail, uploaded FROM (SELECT kID, iID, killmail, posted,uploaded, RANK() OVER (PARTITION BY SUBSTRING(killmail, 0, 50) ORDER BY kID, iID) AS rnk FROM ekKillmail) AS k WHERE k.rnk > 1
This instantly de-duplicates our killmails, giving us about 50,000 killmails to process. I should mention that there’s an excellent guide to the RANK() keywork on Rafeal Salas’ blog.
We can reduce this 50,000 figure further still by filtering records on date. As mentioned at the start of this story, only 3 months of records were lost, whereas we’re requesting killmails from other alliances’ killboards for as long as they’ve been active. Looking at the data we’ve gathered, some killmails are as early as 2005. Although each killmail begins with a timestamp of when the kill happened, this is stored as text, and is very difficult to filter on. This is a relatively easy thing to fix, as we can simply cast the first 17 characters (which represent the date/time), into a datetime data type.
UPDATE ekKillmail SET posted = CAST(SUBSTRING(killmail, 0, 17) AS DATETIME) WHERE posted IS NULL
Meaning we can expand our killmail query to then filter on the date and time of the kill (posted column):
SELECT kID, iID, killmail, uploaded FROM (SELECT kID, iID, killmail, posted,uploaded, RANK() OVER (PARTITION BY SUBSTRING(killmail, 0, 50) ORDER BY kID, iID) AS rnk FROM ekKillmail) AS k WHERE k.rnk > 1 AND k.posted > '2008-09-24 00:00' AND k.posted < '2009-01-06 00:00'
This reduces the number of killmails we have to upload back to our alliance killboard to a much more manageable 5,000. We now need to tackle the uploading process.
Uploading killmails manually to our alliance’s killboard is a fairly rare occurrence, the website itself uses Eve Online’s API to make frequent queries on each pilot’s kills, as well as pull kills from other alliances’ killboards. However as we learnt in part 2, these data feeds only go back a few days, or shorter. Prior to the Eve Online API, pilots had to copy and paste killmails from their Eve Online client, into a web form on the alliance’s website. This form is still available, however it is restricted to authenticated users – preventing random internet users from posting fictitious killmails.
Authentication is based upon VBulletin, which uses cookies and sessions to keep track of the user. Firefox’s Web Developer add-on will allow you to inspect cookies within a given session, so its then just a case of copying them into our C# page request/post.
static void writeKills() { //Declarations SqlConnection DBConn; SqlDataAdapter DBAdapt; DataTable DBTable; int i; //Connect to DB DBConn = new SqlConnection("Data Source=SERVER\\SQLEXPRESS;Initial Catalog=eveKills;User Id=sa;Password=secret;MultipleActiveResultSets=True;"); DBConn.Open(); //Get killmails DBTable = new DataTable(); DBAdapt = new SqlDataAdapter("SELECT kID, iID, killmail, uploaded FROM (SELECT kID, iID, killmail, posted,uploaded, RANK() OVER (PARTITION BY SUBSTRING(killmail, 0, 50) ORDER BY kID, iID) AS rnk FROM ekKillmail) AS k WHERE k.rnk = 1 AND k.posted > '2008-09-24 00:00' AND k.posted < '2009-01-06 00:00' AND uploaded IS NULL", DBConn); DBAdapt.Fill(DBTable); //Loop through killmails for (i = 0; i < DBTable.Rows.Count; i++) { //Process killmail postKill(DBTable.Rows[i]); } //Close DB Connection DBConn.Close(); } static void postKill(DataRow drKill) { //Declarations HttpWebRequest HRequest; HttpWebResponse HResponse; StreamReader RResponse; Stream Request; SqlConnection DBConn = null; SqlCommand DBComm; string sResp = null; string sRequ = null; //Create HTTP Request HRequest = (HttpWebRequest)WebRequest.Create("http://www.iron-alliance.com/kb/post.php"); //Emulate an authenticated session HRequest.CookieContainer = new CookieContainer(); HRequest.CookieContainer.Add(new Cookie("bbuserid", "secret", "/", ".iron-alliance.com")); HRequest.CookieContainer.Add(new Cookie("bbsessionhash", "secret", "/", ".iron-alliance.com")); HRequest.CookieContainer.Add(new Cookie("bbpassword", "secret", "/", ".iron-alliance.com")); HRequest.CookieContainer.Add(new Cookie("bblastvisit", "secret", "/", ".iron-alliance.com")); HRequest.CookieContainer.Add(new Cookie("bblastactivity", "0", "/", ".iron-alliance.com")); HRequest.CookieContainer.Add(new Cookie("vbulletin", "secret", "/", "www.iron-alliance.com")); //This is a POST, so change header accordingly HRequest.Method = "POST"; HRequest.ContentType = "application/x-www-form-urlencoded"; //POST Request Payload sRequ = "killmail=" + drKill[2].ToString() + "&Post=Post"; HRequest.ContentLength = sRequ.Length; //Make POST Request = HRequest.GetRequestStream(); Request.Write(Encoding.ASCII.GetBytes(sRequ), 0, sRequ.Length); Request.Close(); //Read Response HResponse = (HttpWebResponse)HRequest.GetResponse(); RResponse = new StreamReader(HResponse.GetResponseStream()); sResp = RResponse.ReadToEnd(); //Check response for keywords if (sResp.IndexOf("Success!") > -1) { //Kill posted successfully sResp = "Success!"; } else if (sResp.IndexOf("already posted") > -1) { //Already posted sResp = "Duplicate!"; } else if (sResp.IndexOf("problem!") > -1) { //Problem sResp = "Problem! " + sResp.Substring(19000, 1024); //Write the killmail to a text file, so that we can deal with it later File.WriteAllText(drKill[0].ToString() + "_" + drKill[1].ToString() + ".txt", drKill[2].ToString()); } else { //Other (not successful) //Write the killmail to a text file, so that we can deal with it later File.WriteAllText(drKill[0].ToString() + "_" + drKill[1].ToString() + ".txt", drKill[2].ToString()); } //Connect to DB DBConn = new SqlConnection("Data Source=SERVER\\SQLEXPRESS;Initial Catalog=eveKills;User Id=sa;Password=secret;MultipleActiveResultSets=True;"); DBConn.Open(); //Update killmail DBComm = new SqlCommand("UPDATE ekKillmail SET uploaded = GETDATE(), response = @P1 WHERE kID = @P2 AND iID = @P3", DBConn); DBComm.Parameters.AddWithValue("@P1", sResp); DBComm.Parameters.AddWithValue("@P2", drKill[0]); DBComm.Parameters.AddWithValue("@P3", drKill[1]); DBComm.ExecuteNonQuery(); //Close DB DBConn.Close(); }
Finally, we read back the response of the killmail posting. The Eve Dev killboard gives fairly good feedback on how a killmail posting went, so we can search for the phrase “Successful”, “already exists”, or “problem” to determine the status of the posting. These messages correspond to a successful posting, a duplicate killmail being found, and a failed posting. In our code, we’ve caught the failed postings, and written the killmail out to a simple text file, so that a human can determine the cause of the problem. We’ve also written the status back to the database so that we don’t attempt to resubmit problematic killmails.
That’s it! Overall, about 5,000 killmails were recovered, with about 400 failing to submit to the alliance killboard. It took approximately 72 hours to mine all the original killmails, partly due to throttling on the remote web servers (at its peak, I was downloading killmails at a rate of 300kb/s), and approximately 2 hours to filter and submit killmails to our killboard.
[url=http://pl.tinypic.com/player.php?s=6&v=w1tr91]Flat Rate Movers[/url]
Apartment Mover
New York Mortgage Companies
Picking Professional Movers
San Diego Movers
Shipping Company
[url=http://www.livevideo.com/video/33E132F6A18849F9A69FDF1A110D43F9/manila-call-center-call-917-48.aspx]Manila call center[/url]
NJ Moving Piano
[url=http://www.livevideo.com/video/9D8FD3C46BAA4681847FA052D9533C2B/car-service-nyc-phone-1917-484.aspx]air port pick up[/url]
[url=http://www.youtube.com/watch?v=-dvYRJ9Bg98]Staten Island Dentist[/url]
Comment by toliExits — July 21, 2010 @ 2:53 pm
Niksshew 2 I join told all above.
_____________
levetra
natual
1
Comment by ortaNits — August 5, 2010 @ 2:55 am