Wednesday, January 14, 2015

Static Data Export as a Giant Google Sheet!

I've loaded the entire EVE Static Data Export (SDE) into a single Google Sheet.  I've publicly shared the sheet for Proteus here, and Rhea here.  You can use these links in your own sheets, or you can make your own copy of each sheet.  I've documented how I built these sheets towards the bottom of this post.

Why, you may ask, would one want to do this?  The main reason is to perform SDE cross references when you don't have convenient access to a database.  For example, suppose you want to look up type info for assets.  If you only need one particular table (e.g. invTypes), then it's probably easiest to import that one table.  In the more general case, however, you may need to cross reference multiple tables (e.g. industry planning).  This is where a single sheet with everything in it comes in handy.  If you don't like reading, you can stop now and watch this video instead, where I show how to make a cross reference from one Google sheet to another (the sheet used in the demo is available here).  Otherwise, read on to see how it works.

You can build sheet to sheet cross references in Google Sheets using a combination of IMPORTRANGE and the surprisingly awesome QUERY function.  A formula containing IMPORTRANGE simply copies a range of cells from another sheet, where the source sheet is specified by a special URL.  When used by itself, IMPORTRANGE stores the copied cells into the local sheet.  What the QUERY function brings into the mix is the ability to use a SQL like query to only pull in the bits you care about.  That is, if you wrap IMPORTRANGE with QUERY, you'll only copy the parts of the imported range that match your query.

Here's an example.  Suppose you have a sheet that looks like this:

A
B
1
TypeID
Type Name
219551

Then we can fill in the "Type Name" column with the following formula:
=QUERY( IMPORTRANGE(TABLEREF, "invTypes!A1:C65000"), "select Col3 where Col1 = " & A2 & " limit 1", FALSE)
where "TABLEREF" is a URL referring to the SDE Google Sheet.  The IMPORTRANGE function pulls in the given range from a sheet named "invTypes" (this is a sheet in the workbook of TABLEREF).  The QUERY statement selects the third column of the imported range when the first column matches our type ID.  We're using a very simple query here, but the QUERY function allows many more SQL-like abstractions (like "avg"), you can read all about it here.  One thing the documentation doesn't make explicit is that column names from an imported sheet are always "Col1", "Col2", ..., etc., regardless of how they're actually named in the source sheet.  If you're using QUERY within the same sheet, then you use the regular columns names for the sheet you're querying.

It's important to get the TABLEREF correct, otherwise the import won't work, and it's non-obvious how to get the correct URL here.  The short version is: if you're using a link that has the form: https://drive.google.com/...  then you're doing it wrong.  The URL you want will look something like: https://docs.google.com/spreadsheets/d/...  (this is the form shared at the top of this post).  The easiest way to find this link is to actually open the sheet in Drive.  The correct URL is the link in your browser address bar when you're looking at the actual page.  To make it easier to figure out the correct URL, each of the sheets I exported above installs a menu option which shows the proper URL.  You can access this function by loading the SDE Sheet, then selecting "SDE Tools" → "Get Import URL".

Still confused?  Take a look at this video for a demo using an SDE sheet.

How It's Made


It's a multi-step process to construct the whole sheet, but the process always starts from the SDE export that Steve Ronuken provides here.  Among the many formats provided, Steve provides each table as an excel file.  The TL;DR is that I import all of these individual files as sheets in a giant Google Sheets workbook.  There are a few intermediate steps along the way to deal with various sizing limits.  Here's the process (this is on a Windows machine since I need Excel and VisualBasic for one of the steps):

  1. First, download all the excel files.  I use a simple bash oneliner running in cygwin:

    for i in `curl -s https://www.fuzzwork.co.uk/dump/proteus-1.0-109795/ | egrep -o '[a-zA-Z]+\.xls\.bz2' | sort -u` ; do curl -o ${i} https://www.fuzzwork.co.uk/dump/proteus-1.0-109975/${i} ; done

    The important bit is the path for the release you want to build (in this case, Proteus).  When this step completes, you'll have bzip'd copies of every excel file.  Run "bzip2 -d *.bz2" to uncompress everything.
  2. Ideally, I'd now compile all the sheets together into a giant Excel workbook and import into Google Sheets.  But that doesn't work because the resulting Excel file is too big for Google Sheets.  So instead, I have to do things in three batches (two might work, I didn't try).  Each batch consists of building a consolidated Excel workbook, and proceeds as follows:
    1. Divide the downloaded Excel files into three batches.  Here's how I divided things:
      batch 1 : agtAgents through dgmTypeEffects
      batch 2 : eveIcons through invTypes
      batch 3: mapConstellationJumps through warCombatZoneSystems
    2. Assemble the files in each batch into a consolidated Excel file.  You can do this following the guidance here.  My version of their Visual Basic script looks as follows:

      Sub simpleXlsMerger()
      Dim bookList As Workbook
      Dim mergeObj As Object, dirObj As Object, filesObj As Object, everyObj As Object
      Dim worksheetName As String
      Application.ScreenUpdating = False
      Set mergeObj = CreateObject("Scripting.FileSystemObject")

      'change folder path of excel files here
      Set dirObj = mergeObj.Getfolder("C:\<path to your excel files>\<current batch directory>")
      Set filesObj = dirObj.Files
      For Each everyObj In filesObj
      Set bookList = Workbooks.Open(everyObj)
      worksheetName = mergeObj.GetBaseName(everyObj)
      'sheet names are limited to 31 characters in length
      worksheetName = Left(worksheetName, 31)

      'copy the range to a new worksheet
      Range("A1:AG" & Range("A65536").End(xlUp).Row).Copy
      Set ws = ThisWorkbook.Sheets.Add(, ThisWorkbook.Sheets(ThisWorkbook.Sheets.Count))
      ws.Name = worksheetName
      ws.Activate

      'Do not change the following column. It's not the same column as above
      Range("A65536").End(xlUp).Offset(0, 0).PasteSpecial
      Application.CutCopyMode = False
      bookList.Close
      Next
      End Sub

      One important bit is that sheet names have a size limit in Excel, but some of the SDE files exceed this limit.  This happens for two of the RAM tables which I fix later.

      When each batch has been created, I remove the one empty sheet that was created for each batch, and save out the merged Excel file.  When you're done, you'll have three large Excel files.
  3. Create a new Google Sheets file.  Import each of the batch Excel files you created above as follows:
    1. Select File → Import and use Upload to pull in the next batch file.  Use the "insert new sheets" option when it asks how you want to import the file.
    2. You'll have one empty sheet in addition to all the sheets you imported.  Delete it if you're a purist.
    3. You'll have two shortened sheet names: ramAssemblyLineTypeDetailPerCategory and ramAssemblyLineTypeDetailPerGroup.  Rename those if you're a purist.
  4. (Optional): if you want to add the Google App Script (GAS) that provides the IMPORTRANGE URL, or the cell count then:
    1. Select Tools → Script Editor... on your sheet.  This pulls up the GAS editor.
    2. Replace the default script with the following:

      function onOpen() {
        SpreadsheetApp.getUi().createMenu('SDE Tools')
        .addItem('Get Import URL', 'getImportURL')
        .addItem('Count cells...', 'countCells')
        .addToUi();
      };

      function getImportURL() {
        var url = SpreadsheetApp.getActiveSpreadsheet().getUrl();
        showInfo("IMPORTRANGE URL: " + url);
      }

      function countCells() {
        var sheets = SpreadsheetApp.getActiveSpreadsheet().getSheets();
        var cellCount = 0;
        for (var i = 0; i < sheets.length; i++) {
          cellCount += sheets[i].getLastRow() * sheets[i].getLastColumn();
        }
        showInfo("Total cell count: " + cellCount);
      }

      function showInfo(msg) {
        var ui = SpreadsheetApp.getUi();
        ui.alert(msg);
      }

      Save with Ctrl-S.  You'll either have run the "onOpen" method at least once, or reload your sheet to get the menu to appear.
    3.  
That's it!  You're done!  You'll have to make the file shareable if you want to allow others to use it.





19 comments:

  1. There are lots of information about latest technology and how to get trained in them, like Big Data Course in Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies. By the way you are running a great blog. Thanks for sharing this.

    ReplyDelete
  2. A table is the basic unit of data storage in an oracle database. The table of a database hold all of the user accesible data. Table data is stored in rows and columns. But what is all about the clusters and how to handle it using oracle database system? Expecting a right answer from you. By the way you are maintaining a great blog. Thanks for sharing this in here.
    Oracle Training in Chennai | Oracle Course in Chennai | Oracle Training Center in Chennai

    ReplyDelete
  3. I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly.
    Regards,
    sas training in Chennai|sas course in Chennai|sas training institute in Chennai

    ReplyDelete
  4. Thanks Admin for sharing such a useful post, I hope it’s useful to many individuals for developing their skill to get good career.
    Regards,
    Informatica training in chennai|Informatica training center in Chennai|Informatica training chennai

    ReplyDelete
  5. Excellant post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
    SAS Training in Chennai | SAS Course in Chennai

    ReplyDelete
  6. Well Said, you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas.
    Python Training in Chennai | Python Course in Chennai

    ReplyDelete
  7. Nice Post! It is really interesting to read from the beginning & I would like to share your blog to my circles, keep your blog as updated.
    Regards,
    JAVA Training in Chennai|JAVA Course in Chennai|PHP Training in Chennai

    ReplyDelete

  8. The strategy you posted was nice. The people who want to shift their career to the IT sector then it is the right option to go with the ethical hacking course.
    Ethical hacking course in Chennai | Ethical hacking training in chennai

    ReplyDelete
  9. Thanks for sharing this informative content which provided me the required information about the latest technology.
    Salesforce training in Chennai | Salesforce CRM training in Chennai

    ReplyDelete
  10. In coming years, cloud computing is going to rule the world. The cloud based CRM tool provider like Salesforce have massive demand in the market. Thus talking salesforce training in Chennai from reputed Salesforce training institutes in Chennai will ensure bright career prospects for aspiring professionals.

    ReplyDelete
  11. Thank you so much I am also looking for tool similar like you have shared here. This is a great way to make people understand tools. I am using google sheet add ons to explore more features of google sheet and it helps me alot.

    ReplyDelete
  12. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

    ReplyDelete
  13. This article offers some points to consider when choosing data storage solutions designed for growing small to medium sized businesses.Self Storage

    ReplyDelete
  14. I simply wanted to write down a quick word to say thanks to you for those wonderful tips and hints you are showing on this site.
    sas training in bangalore

    ReplyDelete
  15. My partner and I stumbled over here different website and thought I might as well check things out. I like what I see so now I’m following you. Look forward to checking out your web page repeatedly.
    big-data-hadoop-training-institute-in-bangalore

    ReplyDelete
  16. I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.

    Best Hadoop Training in Chennai

    ReplyDelete
  17. Thanks a lot very much for the high your blog post quality and results-oriented help. I won’t think twice to endorse to anybody who wants and needs support about this area.
    datascience training in chennai

    ReplyDelete