projectwhite

Filed under: Personal — Tags: , — ben on January 8, 2009 at 9:12 pm

I believe that it’s already been posted on a number of other blogs, but these photos of frozen bubbles is awesome.

Filed under: Personal — Tags: , , , , , , , , , — ben on January 6, 2009 at 9:23 pm

Continuing from part 2 of the ongoing missing data saga, my Eve Online alliance had managed to loose over 3 months of data, following a hard drive failure in our alliance’s web server. Although the most recent data could be recovered using Eve’s API, and data feeds from other alliances, much of the data was unrecoverable. In part 2 we had successfully copied historic killmails from other alliance’s killboards, and stored them locally in a database.

Although the collection process outlined in part 2 is still running, it is estimated that we’ll end up with approximately 70,000 raw killmails which mention our alliance (from the predicted total of 6million). Looking at the killmails themselves, many are duplicated. This is because both the victim, and the pilot who landed the final blow, receive a copy of the killmail; and where more than one alliance is attacking, each alliance will copy the killmail to their own killboard. Of the 70,000 killmails, approximately 25,000 are duplicates, and can be ignored.

De-duping SQL tables is a bit of a nightmare prior to MSSQL 2005. You have to produce a subquery in order to find the latest record for each group, and then delete all others. It’s fairly prone to error if you’re not sure what you’re doing. Fortunately, MSSQL 2005 solves this problem with the introduction of the RANK() keyword. RANK() will give an integer column of where a particular record ranks within a group, for example:

SELECT kID, iID, killmail, posted,uploaded, RANK() OVER (PARTITION BY SUBSTRING(killmail, 0, 50) ORDER BY kID, iID) AS rnk FROM ekKillmail

A sixth column, rnk, will rank the records in order of kID and iID. We only group on the first 50 characters of the killmail, as this speeds things up slightly. The timestamp, victim, and lost ship are covered in these first 50 characters, and provide a unique enough record identifier.
In order to get the first record for each group, we push the above query into a subselect:

SELECT kID, iID, killmail, uploaded FROM
 (SELECT kID, iID, killmail, posted,uploaded, RANK() OVER (PARTITION BY SUBSTRING(killmail, 0, 50) ORDER BY kID, iID) AS rnk FROM ekKillmail) AS k
 WHERE k.rnk > 1

This instantly de-duplicates our killmails, giving us about 50,000 killmails to process. I should mention that there’s an excellent guide to the RANK() keywork on Rafeal Salas’ blog.

We can reduce this 50,000 figure further still by filtering records on date. As mentioned at the start of this story, only 3 months of records were lost, whereas we’re requesting killmails from other alliances’ killboards for as long as they’ve been active. Looking at the data we’ve gathered, some killmails are as early as 2005. Although each killmail begins with a timestamp of when the kill happened, this is stored as text, and is very difficult to filter on. This is a relatively easy thing to fix, as we can simply cast the first 17 characters (which represent the date/time), into a datetime data type.

UPDATE ekKillmail SET posted = CAST(SUBSTRING(killmail, 0, 17) AS DATETIME) WHERE posted IS NULL

Meaning we can expand our killmail query to then filter on the date and time of the kill (posted column):

SELECT kID, iID, killmail, uploaded FROM
 (SELECT kID, iID, killmail, posted,uploaded, RANK() OVER (PARTITION BY SUBSTRING(killmail, 0, 50) ORDER BY kID, iID) AS rnk FROM ekKillmail) AS k
 WHERE k.rnk > 1
 AND k.posted > '2008-09-24 00:00'
 AND k.posted < '2009-01-06 00:00'

This reduces the number of killmails we have to upload back to our alliance killboard to a much more manageable 5,000. We now need to tackle the uploading process.

Uploading killmails manually to our alliance’s killboard is a fairly rare occurrence, the website itself uses Eve Online’s API to make frequent queries on each pilot’s kills, as well as pull kills from other alliances’ killboards. However as we learnt in part 2, these data feeds only go back a few days, or shorter. Prior to the Eve Online API, pilots had to copy and paste killmails from their Eve Online client, into a web form on the alliance’s website. This form is still available, however it is restricted to authenticated users – preventing random internet users from posting fictitious killmails.

Authentication is based upon VBulletin, which uses cookies and sessions to keep track of the user. Firefox’s Web Developer add-on will allow you to inspect cookies within a given session, so its then just a case of copying them into our C# page request/post.

        static void writeKills()
        {
            //Declarations
            SqlConnection DBConn;
            SqlDataAdapter DBAdapt;
            DataTable DBTable;
            int i;

            //Connect to DB
            DBConn = new SqlConnection("Data Source=SERVER\\SQLEXPRESS;Initial Catalog=eveKills;User Id=sa;Password=secret;MultipleActiveResultSets=True;");
            DBConn.Open();

            //Get killmails
            DBTable = new DataTable();
            DBAdapt = new SqlDataAdapter("SELECT kID, iID, killmail, uploaded FROM (SELECT kID, iID, killmail, posted,uploaded, RANK() OVER (PARTITION BY SUBSTRING(killmail, 0, 50) ORDER BY kID, iID) AS rnk FROM ekKillmail) AS k WHERE k.rnk = 1 AND k.posted > '2008-09-24 00:00' AND k.posted < '2009-01-06 00:00' AND uploaded IS NULL", DBConn);
            DBAdapt.Fill(DBTable);

            //Loop through killmails
            for (i = 0; i < DBTable.Rows.Count; i++)
            {
                //Process killmail
                postKill(DBTable.Rows[i]);

            }
            //Close DB Connection
            DBConn.Close();
        }

        static void postKill(DataRow drKill)
        {
            //Declarations
            HttpWebRequest HRequest;
            HttpWebResponse HResponse;
            StreamReader RResponse;
            Stream Request;
            SqlConnection DBConn = null;
            SqlCommand DBComm;
            string sResp = null;
            string sRequ = null;

            //Create HTTP Request
            HRequest = (HttpWebRequest)WebRequest.Create("http://www.iron-alliance.com/kb/post.php");
            //Emulate an authenticated session
            HRequest.CookieContainer = new CookieContainer();
            HRequest.CookieContainer.Add(new Cookie("bbuserid", "secret", "/", ".iron-alliance.com"));
            HRequest.CookieContainer.Add(new Cookie("bbsessionhash", "secret", "/", ".iron-alliance.com"));
            HRequest.CookieContainer.Add(new Cookie("bbpassword", "secret", "/", ".iron-alliance.com"));
            HRequest.CookieContainer.Add(new Cookie("bblastvisit", "secret", "/", ".iron-alliance.com"));
            HRequest.CookieContainer.Add(new Cookie("bblastactivity", "0", "/", ".iron-alliance.com"));
            HRequest.CookieContainer.Add(new Cookie("vbulletin", "secret", "/", "www.iron-alliance.com"));
            //This is a POST, so change header accordingly
            HRequest.Method = "POST";
            HRequest.ContentType = "application/x-www-form-urlencoded";
            //POST Request Payload
            sRequ = "killmail=" + drKill[2].ToString() + "&Post=Post";
            HRequest.ContentLength = sRequ.Length;
            //Make POST            
            Request = HRequest.GetRequestStream();
            Request.Write(Encoding.ASCII.GetBytes(sRequ), 0, sRequ.Length);
            Request.Close();

            //Read Response
            HResponse = (HttpWebResponse)HRequest.GetResponse();
            RResponse = new StreamReader(HResponse.GetResponseStream());
            sResp = RResponse.ReadToEnd();

            //Check response for keywords
            if (sResp.IndexOf("Success!") > -1)
            {
                //Kill posted successfully
                sResp = "Success!";
            }

            else if (sResp.IndexOf("already posted") > -1)
            {
                //Already posted
                sResp = "Duplicate!";
            }
            else if (sResp.IndexOf("problem!") > -1)
            {
                //Problem
                sResp = "Problem! " + sResp.Substring(19000, 1024);
                //Write the killmail to a text file, so that we can deal with it later
                File.WriteAllText(drKill[0].ToString() + "_" + drKill[1].ToString() + ".txt", drKill[2].ToString());
            }
            else
            {
                //Other (not successful)
                //Write the killmail to a text file, so that we can deal with it later
                File.WriteAllText(drKill[0].ToString() + "_" + drKill[1].ToString() + ".txt", drKill[2].ToString());
            }

            //Connect to DB
            DBConn = new SqlConnection("Data Source=SERVER\\SQLEXPRESS;Initial Catalog=eveKills;User Id=sa;Password=secret;MultipleActiveResultSets=True;");
            DBConn.Open();
            //Update killmail
            DBComm = new SqlCommand("UPDATE ekKillmail SET uploaded = GETDATE(), response = @P1 WHERE kID = @P2 AND iID = @P3", DBConn);
            DBComm.Parameters.AddWithValue("@P1", sResp);
            DBComm.Parameters.AddWithValue("@P2", drKill[0]);
            DBComm.Parameters.AddWithValue("@P3", drKill[1]);
            DBComm.ExecuteNonQuery();
            //Close DB
            DBConn.Close();
        }

Finally, we read back the response of the killmail posting. The Eve Dev killboard gives fairly good feedback on how a killmail posting went, so we can search for the phrase “Successful”,  “already exists”, or “problem” to determine the status of the posting. These messages correspond to a successful posting, a duplicate killmail being found, and a failed posting. In our code, we’ve caught the failed postings, and written the killmail out to a simple text file, so that a human can determine the cause of the problem. We’ve also written the status back to the database so that we don’t attempt to resubmit problematic killmails.

That’s it! Overall, about 5,000 killmails were recovered, with about 400 failing to submit to the alliance killboard. It took approximately 72 hours to mine all the original killmails, partly due to throttling on the remote web servers (at its peak, I was downloading killmails at a rate of 300kb/s), and approximately 2 hours to filter and submit killmails to our killboard.

Filed under: Personal — Tags: , , , , , , , , , — ben on January 6, 2009 at 7:15 pm

Following on from part 1 of this story, my Eve Online alliance had managed to loose over 3 months of data, following a hard drive failure in our alliance’s web server. Although the most recent data could be recovered using Eve’s API, and data feeds from other alliances, much of the data was unrecoverable.

Fortunately, much of this data was in existence, just not available via machine-readable means. Many killboards provide the original killmail for reference, and for pilots who’ve failed to report a kill – something rare in these days of API usage, but common with less-organised alliances. A means to crawl these older killmails, and parse them into our own killboard was required.

The Eve Dev killboard is arguably the most popular style of killboard, due to its open source nature and mature development. It’s also fairly easy to scrape data from, as each killmail is given a page similar to http://www.northern-coalition.com/index.php?a=kill_detail&kll_id=330124, with the original killmail being pulled into a div via AJAX from http://www.northern-coalition.com/index.php?a=kill_mail&kll_id=330124. The format of this URL is fairly predictable across all users of the Eve Dev killboard, so one can guess it for many other killboards, and the ID given to each killmail is sequential for that board (in this case the latest is 330124).

All that is required in order to find out missing killmails, is to find other alliances using the Eve Dev killboard, and have some relation to our alliance. With Google, this is surprisingly easy, just search for our alliance name and a unique part of the killboard URL. Once found, these killboards were then fed into a MSSQL table, with the base URL and the highest killmail ID available, resulting in something similar to the below:

Name BaseURL MinID MaxID
Morsus Mihi http://kb.eve-42.com/index.php?a=kill_mail&kll_id= 0 200000
RAZOR http://www.eve-razor.com/killboard/?a=kill_mail&kll_id= 0 200000
Northern Coalition http://www.northern-coalition.com/index.php?a=kill_mail&kll_id= 0 400000

The original killmail pages themselves have some additional HTML formatting, but as this is standardised across all Eve Dev killboards, we can ignore it with some fairly simple pattern matching.

        static void getData()
        {
            //Declarations
            SqlConnection DBConn;
            SqlDataAdapter DBAdapt;
            DataTable DBTable;
            int i;

            //Connect to database
            DBConn = new SqlConnection("Data Source=SERVER\\SQLEXPRESS;Initial Catalog=eveKills;User Id=sa;Password=secret;MultipleActiveResultSets=True;");
            DBConn.Open();
            //Get a list of killboards
            DBTable = new DataTable();
            DBAdapt = new SqlDataAdapter("SELECT * FROM ekKillboard WHERE kProcess = 1", DBConn);
            DBAdapt.Fill(DBTable);
            //Loop killboards and add to thread pool
            for (i = 0; i < DBTable.Rows.Count; i++)
            {
                ThreadPool.QueueUserWorkItem(new WaitCallback(doKillboard), DBTable.Rows[i]);
            }
            //Continue main thread, push killmails into IRON killboard
            writeKills();

            //Loop main thread, post an update every 5seconds
            while (true)
            {
                System.Threading.Thread.Sleep(5000);
                Console.WriteLine("...5s....................................................................");
            };
            //Close DB
            DBConn.Close();
        }

The above code requests a list of killboards from our database, and starts a thread to process each one. As we can’t anticipate how many killboards we need to process, the threads are added to a threadpool (there’s a good guide to thread pooling at Mark Kaelin’s site). Each thread is provided with a datarow containing the BaseURL, MinID, and MaxID from earlier. We don’t need to worry about thread safety as each thread is only consuming its own local variables. The only shared resource is the MSSQL server, which can deal with multiple concurrent clients, however we do have to ensure that MARS is enabled in our connection string, otherwise the SQL Server complains that it has too many connections open.

Each doKillboard() process attempts to read killmails for the given killboard, starting with the MinID and increasing to the MaxID. If a valid page is returned, and if it’s of interest to our alliance, then it’s inserted into the database. Whilst storing all killmails would be interesting from a datamining perspective, we’re requesting a potential 6million killmails, more than enough to exceed the 4GB limit on a SQL Server Express database (my first run capped out at 1.2million killmails).

        static void doKillboard(object objData)
        {
            //Cast object back to DataRow
            DataRow drData = (DataRow)objData;
            //Declarations
            int i = 0;
            int iCF = -1;
            int iCT = -1;
            int iIRON = -1;
            string sURL = (string)drData["kURL"];
            string sKill = null;
            string sData = null;
            string sMIn = "<textarea class=\"killmail\" name=\"killmail\" cols=\"60\" rows=\"30\" readonly=\"readonly\">";
            string sMOut = "</textarea>";
            HttpWebRequest HRequest;
            HttpWebResponse HResponse;
            StreamReader RResponse;
            SqlConnection DBConn = null;
            SqlCommand DBComm;
            //Update console
            Console.WriteLine(drData["kName"].ToString().PadRight(20) + "- Processing...");
            //Loop from MinID to MaxID
            for (i = ((int)drData["kMinID"] + 1); i < (int)drData["kMaxID"]; i++)
            {
                //Sleep here, as loop is broken later on
                System.Threading.Thread.Sleep(5);

                //Reset markers
                iCF = -1;
                iCT = -1;
                bIRON = false;
                sKill = null;
                sData = "";
                //Try collecting the killmail
                try
                {
                    //Again, update console
                    sData += drData["kName"].ToString().PadRight(20) + "- ";
                    sData += i.ToString().PadLeft(8, '0') + "/" + drData["kMaxID"].ToString().PadLeft(8, '0');
                    //Make HTTP request
                    HRequest = (HttpWebRequest)WebRequest.Create(sURL + i);
                    HResponse = (HttpWebResponse)HRequest.GetResponse();
                    RResponse = new StreamReader(HResponse.GetResponseStream());
                    //Read HTTP response
                    sKill = RResponse.ReadToEnd();

                    //Grab <textarea> tags (defined above)
                    iCF = sKill.IndexOf(sMIn);
                    iCT = sKill.IndexOf(sMOut);

                    //Abort here if killmail request failed
                    if (sKill.IndexOf("<b>Fatal error</b>") > -1) continue;

                    //Abort here if killmail has invalid <textarea> tags
                    if (iCF == -1 || iCT == -1) continue;

                    //At this point we assume that we've read a valid killmail

                    //Connect to database
                    DBConn = new SqlConnection("Data Source=SERVER\\SQLEXPRESS;Initial Catalog=eveKills;User Id=sa;Password=secret;MultipleActiveResultSets=True;");
                    DBConn.Open();

                    //Let the database know we procssed this killmail, and that the MinID can increase
                    DBComm = new SqlCommand("UPDATE ekKillboard SET kMinID = @P1 WHERE kID = @P2", DBConn);
                    DBComm.Parameters.AddWithValue("@P1", i);
                    DBComm.Parameters.AddWithValue("@P2", (int)drData["kID"]);
                    DBComm.ExecuteNonQuery();

                    //Abort here if killmail not IRON - saves DB Diskspace (SQLServer Express can only hold 4GB pr DB)
                    iIRON = sKill.IndexOf("Imperial Republic Of the North") ;
                    if (iIRON != -1)
                    {
                        //Truncate killmail, we only want content between <textarea> tags
                        sKill = sKill.Substring(iCF + sMIn.Length, (iCT - (iCF + sMIn.Length)));
                        sKill = sKill.Trim();

                        //Add killmail to database
                        DBComm = new SqlCommand("INSERT INTO ekKillmail (iID, kID, killmail) VALUES (@P1, @P2, @P3)", DBConn);
                        DBComm.Parameters.AddWithValue("@P1", i);
                        DBComm.Parameters.AddWithValue("@P2", (int)drData["kID"]);
                        DBComm.Parameters.AddWithValue("@P3", sKill);
                        DBComm.ExecuteNonQuery();
                        sData += " - Pass";
                    }
                    //Close DB
                    DBConn.Close();
                }
                catch (Exception ex)
                {
                    DBConn.Dispose();
                    //???
                    sData += " - Fail";
                }
                //Update console
                Console.WriteLine(sData);
            }
        }

An important note here. The above code tries to request killmails as fast as it can, there’s only a 5ms delay at the start of each request, and any delay that crops up in the HTTP request/response and SQL Update/Insert. In some cases I’ve been requesting pages at a rate of about 10-per-second, and this is bordering on a Denial-of-Service attack. A handful of killboard servers seem to have throttled the number of requests-per-minute though, as I see a short-term peak gradually decrease to a fixed rate.

In part 3 of this story, we’ll look at filtering the killmails further, and submitting back to our own killboard.

Filed under: Personal — Tags: , , , , , , , , , — ben on January 6, 2009 at 6:15 pm

A few days ago, around the time of my previous post, I was in Eve Online when the cry went out that there was something amiss with our alliance’s website (or guild, for those of you who play World of Warcraft). The web server was throwing up a mix of Server Error 500 pages, and half-rendered pages complaining about a missing database.

Whilst I don’t administer the server or the website, word soon got around that the database was indeed missing. The drive which held the database and its backups had become corrupted and was unrecoverable, fortunately the web server itself was sitting on a different drive. When the webserver came back online, the only recoverable backup was over 3 months old, causing a bit of a timewarp with forum posts and a whole chunk of game data missing.

To tangent for a bit, Eve Online is a single-shard space-based MMOG, very similar to Braben and Bell’s Elite. Just like Elite, pilots gain kudos and credit for defeating their enemies, and this competitiveness has driven a lot of community based projects. There are many websites which will take the computer-generated kill receipts (known as killmails), and provide a search-able database (called a killboard) for players to keep track of their progress. For reference, an example killmail is below:

2009.01.05 23:56

Victim: phree4u
Corp: Shadow-Tech. Industry's
Alliance: None
Faction: NONE
Destroyed: Caracal
System: 5ZXX-K
Security: 0.0
Damage Taken: 10967

Involved parties:

Name: Morrganna (laid the final blow)
Security: 5.0
Corp: VentureCorp
Alliance: Imperial Republic Of the North
Faction: NONE
Ship: Cerberus
Weapon: Caldari Navy Scourge Heavy Missile
Damage Done: 10967

Name: BigggSexy
Security: 3.3
Corp: g guild
Alliance: Imperial Republic Of the North
Faction: NONE
Ship: Falcon
Weapon: ECM - Spatial Destabilizer II
Damage Done: 0

Destroyed items:

Bloodclaw Light Missile, Qty: 144
Large F-S9 Regolith Shield Induction
10MN Afterburner II
Ballistic Deflection Field II
Heat Dissipation Field II
Bloodclaw Light Missile, Qty: 230 (Cargo)
Anti-Kinetic Screen Reinforcer I
Anti-Thermal Screen Reinforcer I
Anti-EM Screen Reinforcer I

Dropped items:

Bloodclaw Light Missile, Qty: 216
'Arbalest' Assault Missile Launcher, Qty: 5
Large F-S9 Regolith Shield Induction
Ballistic Control System II, Qty: 2

Websites will take these killmails and break them down, assigning points relating to the value of the ship destroyed, and the contribution of each assailant. Depending on the complexity of the killboard, these points may relate to an in-game rank, allowing more skilled players access to higher level alliance functions. So as you can understand, many players were suitably upset when they faced the potential of loosing the last 3 months worth of work.

Going further into the tangent, the developers of Eve Online have been increasingly generous to 3rd-party developers. They have always made their game data available to developers who want to use it (within reason), and have recently made a REST API available, allowing developers to directly access in-game statistics.

Using this API, in addition to the data feeds made available with most popular killboards, the alliance was able to recover its most recent killmails. However, like most API’s and data feeds, this data was incomplete – only covering the last 14 days. A way to crawl further back was required.

Where I cone into this, and the crawling process itself, is covered in part 2.

Final note. The title of this post was inspired by the most excellent Bill and Ted.

Filed under: Personal — ben on January 4, 2009 at 10:41 pm

Ok, yes, the website has been unusually quiet. I’ve really not got my head around updating this frequently. Hopefully that should change soon, as I am working on some things in the background which should come to fruition soon.

RSSTwitter: bensteeples

13 queries. 0.699 seconds. Powered by WordPress