projectwhite

Filed under: Personal — Tags: , , , , , , , , , — ben on January 6, 2009 at 7:15 pm

Following on from part 1 of this story, my Eve Online alliance had managed to loose over 3 months of data, following a hard drive failure in our alliance’s web server. Although the most recent data could be recovered using Eve’s API, and data feeds from other alliances, much of the data was unrecoverable.

Fortunately, much of this data was in existence, just not available via machine-readable means. Many killboards provide the original killmail for reference, and for pilots who’ve failed to report a kill – something rare in these days of API usage, but common with less-organised alliances. A means to crawl these older killmails, and parse them into our own killboard was required.

The Eve Dev killboard is arguably the most popular style of killboard, due to its open source nature and mature development. It’s also fairly easy to scrape data from, as each killmail is given a page similar to http://www.northern-coalition.com/index.php?a=kill_detail&kll_id=330124, with the original killmail being pulled into a div via AJAX from http://www.northern-coalition.com/index.php?a=kill_mail&kll_id=330124. The format of this URL is fairly predictable across all users of the Eve Dev killboard, so one can guess it for many other killboards, and the ID given to each killmail is sequential for that board (in this case the latest is 330124).

All that is required in order to find out missing killmails, is to find other alliances using the Eve Dev killboard, and have some relation to our alliance. With Google, this is surprisingly easy, just search for our alliance name and a unique part of the killboard URL. Once found, these killboards were then fed into a MSSQL table, with the base URL and the highest killmail ID available, resulting in something similar to the below:

Name BaseURL MinID MaxID
Morsus Mihi http://kb.eve-42.com/index.php?a=kill_mail&kll_id= 0 200000
RAZOR http://www.eve-razor.com/killboard/?a=kill_mail&kll_id= 0 200000
Northern Coalition http://www.northern-coalition.com/index.php?a=kill_mail&kll_id= 0 400000

The original killmail pages themselves have some additional HTML formatting, but as this is standardised across all Eve Dev killboards, we can ignore it with some fairly simple pattern matching.

        static void getData()
        {
            //Declarations
            SqlConnection DBConn;
            SqlDataAdapter DBAdapt;
            DataTable DBTable;
            int i;

            //Connect to database
            DBConn = new SqlConnection("Data Source=SERVER\\SQLEXPRESS;Initial Catalog=eveKills;User Id=sa;Password=secret;MultipleActiveResultSets=True;");
            DBConn.Open();
            //Get a list of killboards
            DBTable = new DataTable();
            DBAdapt = new SqlDataAdapter("SELECT * FROM ekKillboard WHERE kProcess = 1", DBConn);
            DBAdapt.Fill(DBTable);
            //Loop killboards and add to thread pool
            for (i = 0; i < DBTable.Rows.Count; i++)
            {
                ThreadPool.QueueUserWorkItem(new WaitCallback(doKillboard), DBTable.Rows[i]);
            }
            //Continue main thread, push killmails into IRON killboard
            writeKills();

            //Loop main thread, post an update every 5seconds
            while (true)
            {
                System.Threading.Thread.Sleep(5000);
                Console.WriteLine("...5s....................................................................");
            };
            //Close DB
            DBConn.Close();
        }

The above code requests a list of killboards from our database, and starts a thread to process each one. As we can’t anticipate how many killboards we need to process, the threads are added to a threadpool (there’s a good guide to thread pooling at Mark Kaelin’s site). Each thread is provided with a datarow containing the BaseURL, MinID, and MaxID from earlier. We don’t need to worry about thread safety as each thread is only consuming its own local variables. The only shared resource is the MSSQL server, which can deal with multiple concurrent clients, however we do have to ensure that MARS is enabled in our connection string, otherwise the SQL Server complains that it has too many connections open.

Each doKillboard() process attempts to read killmails for the given killboard, starting with the MinID and increasing to the MaxID. If a valid page is returned, and if it’s of interest to our alliance, then it’s inserted into the database. Whilst storing all killmails would be interesting from a datamining perspective, we’re requesting a potential 6million killmails, more than enough to exceed the 4GB limit on a SQL Server Express database (my first run capped out at 1.2million killmails).

        static void doKillboard(object objData)
        {
            //Cast object back to DataRow
            DataRow drData = (DataRow)objData;
            //Declarations
            int i = 0;
            int iCF = -1;
            int iCT = -1;
            int iIRON = -1;
            string sURL = (string)drData["kURL"];
            string sKill = null;
            string sData = null;
            string sMIn = "<textarea class=\"killmail\" name=\"killmail\" cols=\"60\" rows=\"30\" readonly=\"readonly\">";
            string sMOut = "</textarea>";
            HttpWebRequest HRequest;
            HttpWebResponse HResponse;
            StreamReader RResponse;
            SqlConnection DBConn = null;
            SqlCommand DBComm;
            //Update console
            Console.WriteLine(drData["kName"].ToString().PadRight(20) + "- Processing...");
            //Loop from MinID to MaxID
            for (i = ((int)drData["kMinID"] + 1); i < (int)drData["kMaxID"]; i++)
            {
                //Sleep here, as loop is broken later on
                System.Threading.Thread.Sleep(5);

                //Reset markers
                iCF = -1;
                iCT = -1;
                bIRON = false;
                sKill = null;
                sData = "";
                //Try collecting the killmail
                try
                {
                    //Again, update console
                    sData += drData["kName"].ToString().PadRight(20) + "- ";
                    sData += i.ToString().PadLeft(8, '0') + "/" + drData["kMaxID"].ToString().PadLeft(8, '0');
                    //Make HTTP request
                    HRequest = (HttpWebRequest)WebRequest.Create(sURL + i);
                    HResponse = (HttpWebResponse)HRequest.GetResponse();
                    RResponse = new StreamReader(HResponse.GetResponseStream());
                    //Read HTTP response
                    sKill = RResponse.ReadToEnd();

                    //Grab <textarea> tags (defined above)
                    iCF = sKill.IndexOf(sMIn);
                    iCT = sKill.IndexOf(sMOut);

                    //Abort here if killmail request failed
                    if (sKill.IndexOf("<b>Fatal error</b>") > -1) continue;

                    //Abort here if killmail has invalid <textarea> tags
                    if (iCF == -1 || iCT == -1) continue;

                    //At this point we assume that we've read a valid killmail

                    //Connect to database
                    DBConn = new SqlConnection("Data Source=SERVER\\SQLEXPRESS;Initial Catalog=eveKills;User Id=sa;Password=secret;MultipleActiveResultSets=True;");
                    DBConn.Open();

                    //Let the database know we procssed this killmail, and that the MinID can increase
                    DBComm = new SqlCommand("UPDATE ekKillboard SET kMinID = @P1 WHERE kID = @P2", DBConn);
                    DBComm.Parameters.AddWithValue("@P1", i);
                    DBComm.Parameters.AddWithValue("@P2", (int)drData["kID"]);
                    DBComm.ExecuteNonQuery();

                    //Abort here if killmail not IRON - saves DB Diskspace (SQLServer Express can only hold 4GB pr DB)
                    iIRON = sKill.IndexOf("Imperial Republic Of the North") ;
                    if (iIRON != -1)
                    {
                        //Truncate killmail, we only want content between <textarea> tags
                        sKill = sKill.Substring(iCF + sMIn.Length, (iCT - (iCF + sMIn.Length)));
                        sKill = sKill.Trim();

                        //Add killmail to database
                        DBComm = new SqlCommand("INSERT INTO ekKillmail (iID, kID, killmail) VALUES (@P1, @P2, @P3)", DBConn);
                        DBComm.Parameters.AddWithValue("@P1", i);
                        DBComm.Parameters.AddWithValue("@P2", (int)drData["kID"]);
                        DBComm.Parameters.AddWithValue("@P3", sKill);
                        DBComm.ExecuteNonQuery();
                        sData += " - Pass";
                    }
                    //Close DB
                    DBConn.Close();
                }
                catch (Exception ex)
                {
                    DBConn.Dispose();
                    //???
                    sData += " - Fail";
                }
                //Update console
                Console.WriteLine(sData);
            }
        }

An important note here. The above code tries to request killmails as fast as it can, there’s only a 5ms delay at the start of each request, and any delay that crops up in the HTTP request/response and SQL Update/Insert. In some cases I’ve been requesting pages at a rate of about 10-per-second, and this is bordering on a Denial-of-Service attack. A handful of killboard servers seem to have throttled the number of requests-per-minute though, as I see a short-term peak gradually decrease to a fixed rate.

In part 3 of this story, we’ll look at filtering the killmails further, and submitting back to our own killboard.

No comments yet.

RSS feed for comments on this post. TrackBack URL

RSSTwitter: bensteeples

18 queries. 0.551 seconds. Powered by WordPress