Cross System Data View Using Python and Smartsheet

Published on 14 March 2014

Smartsheet Data Tracker is a command line Python application that updates an existing sheet with data from multiple external sources (essentially, a one-way sync). It uses JSON config files for connection settings as well as the data mappings between Smartsheet and those external sources.

The application uses a lookup value from a sheet to search for matching data in the external sources, returning data related to the matches found.  This simple diagram illustrates a basic CSV-to-Smartsheet example:

Data Tracker Mappings Outline

Connecting disconnected systems

Here is the problem Data Tracker is aiming to solve.  Nearly every organization uses multiple software systems to accomplish various aspects of work. Coordinating data between multiple systems quickly becomes untenable.  A number of our customers have reached out to us recently wondering whether we can can help them tame this problem - by using Smartsheet.

Turns out the Smartsheet interface is great for not only coordinating work between users, but also between multiple systems.  It lets you add up to 100 columns to a sheet to track all sorts of business-specific information like type, status, cost, owner, etc.  So, why not use this to create at-a-glance view of what's going on across multiple external systems?

Many of these vital systems - such as ERP systems, databases or employee directories - sit behind firewalls, and are not accessible via the Internet. This requirement led us to a solution that could run on a machine inside your organization's network, rather than as an application on the public Internet.

With this in mind, we picked a language that could easily be called from the command line, was modular by nature to allow for an extensible architecture able to handle a variety of connector types, and provided a straightforward way to add additional connectors as needed.  Python felt like a good fit for a number of reason, not the least of which was its popularity as a systems scripting language.

Two major functions encompass how Data Tracker solves this problem. First, the application is able to connect to a variety of different sources to retrieve data. Then, Data Tracker maps that data to the relevant columns in the sheet. These two functions of connecting to sources and mapping the data are configured by a pair of JSON files.  Both files, `sources.json` and `mapping.json` respectively, are found in the settings directory and must be configured by the customer.

Sources

Every external system requires an entry in `sources.json`. Each source entry contains the settings required to connect and pull data from that source. For instance, a MySQL source may look like this:

{
    "sourceId": "productDB", 
    "connectorClassName": "MySQLCon", 
    "dbServer": "localhost", 
    "dbUser": "root", 
    "dbPassword": "root", 
    "dbName": "dvDB", 
    "lookupQuery": "SELECT sku,name,description,price,quantity FROM product WHERE sku = %s", "isStrict": false 
 }

Two settings are required for each source: sourceId is an arbitrary name you give the source, and it must be unique; and connectorClassName is the name of a Python class used to connect and retrieve data for this source.  You will need to use one of the connector classes shipped with Data Tracker, or create your own. The rest of the settings will vary and are typically unique to the source.  Each source can have any number of settings.

Mappings

The magic of Data Tracker is in the mappings. Each mapping pairs a sheet with a list of sources. Each source has a sourceId, a lookupMapping and a list of outputMappings. The lookupMapping allows you to map which column in a sheet is used to lookup a record in the source; the outputMappings specify which field(s) in the record will be used to update the cell(s) in a row. Each of the mapping entries contain a sourceKey which is where the data comes from, and a sheetColumn setting which is where in the sheet the data is updated:

{  
    "sheetId":  0000000000000000,  
    "sources": [  
        {  
            "sourceId": "openldap",  
            "lookupMapping": {"sourceKey": "mail", "sheetColumn": "Assignee Email"},  
            "outputMappings": [  
                {"sourceKey": "roomNumber", "sheetColumn": "Assignee Location"},  
                {"sourceKey": "telephoneNumber", "sheetColumn": "Assignee Phone"}  
            ]  
        }
    ] 
}

This setup is pretty versatile sourceKey because it lets us support both attribute names (as in an attribute returned by a directory service or an API) and an array index (to navigate any array-based source, such as a CSV file).

Extending Data Tracker

Additional connectors can be created to support any data source, public or private. To add a new connector, create a connector Python class in the `connectors` directory.  The connector class should follow the same structure as the other connector classes. Namely, the new class should include the following function signatures:

 def __init__(self, sourceConfig):  # validates the sourceConfig    
 def findSourceMatch(self, lookupVal, lookupKey):  # queries source and returns matchingRecord

To use a new source in the Data Tracker application, a matching sourceConfig entry in the `sources.json` file will need to be created for the class. Each sourceConfig node must have a unique sourceId attribute, as well as a connectorClassName attribute that is set to the name of the new connector class.

For further instructions on how to install, configure, and extend Data Tracker, please refer to project documentation.

comment

Comments