As soon as I joined Alvin as a Data Engineer I was given the task of building an integration with Mode, a collaborative data platform which we’re seeing get some rapid adoption. It was the first time I worked with core software engineering and it has been a really nice journey so far with lots of learning, accountability and building stuff.

As a former civil engineer, building things is something I have always enjoyed. Now instead of construction, I am helping building a really amazing data product with an awesome and talented team at Alvin 🥳

It is not news that the data space has been getting a lot of attention (and tools!) in recent years. At Alvin, we are building a product that helps improve the quality and usability of data, offering features such as impact analysis (what happens if I delete this column?), data lineage, problem tracing and usage analytics, all within a lightweight data catalog.

To achieve this, we are integrating with the most prominent data tools. We currently have integrations with data stores such as RedShift, BigQuery and Snowflake, BI tools such as Looker and Tableau, data transformation frameworks such as DBT, and data orchestration platforms such as Airflow.

To prioritize our integrations, we always listen to the data community, and this time they shouted: Mode!

How did we integrated with Mode

As described on the Mode website : ”Mode is a modern analytics and BI solution that combines SQL, Python, R and visual analysis to answer questions faster”

Mode has 3 metadata sources which are:

- Core API (API reference in their developer website)

- Discovery API (batch API, only for enterprise plans)

- Discovery Database (provided via Secure Data Sharing by Snowflake).

To make sure the integration works, regardless of Mode plan (at the time of writing at least!), I decided to use the Core API.

We used Amundsen databuilder extractors as starting point to create our own metadata extractors, and we took inspiration from their ETL framework to load that data into our backend.

A job with a task which comprises of the extractor, transformers and loader.

Although Amundsen has some ready to use Mode extractors they do not fit our use case, so I created extractors based on their RestApiExtractor that utilises RestApiQuery to extract data.

The RestApiQuery basically works by chaining a sequence of API calls (chaining RestApiQuery objects) as we hardly get any of the data we want by making the request to a single endpoint.

To tackle this, RestApiQuery response payload is parsed using JSON expressions. This seems a bit like magic but it basically just means that we can extract values from certain paths in the response object easily. A simple example is if you have an object like {"name":"alvin"} you ask for “name” and you get “alvin” back. The extracted fields can then be used to construct the next API call/endpoint for the next RestApiQuery object — for instance supplying parameters for pagination.

For instance, let’s say we want to extract reports metadata that exists inside spaces. This would require us to retrieve data from https://app.mode.com/api/spaces/<space_token>/reports/. So, prior to getting the reports, we would have to first get the list of the available spaces in the workspace/organisation to plug the space_token in this URL.

That’s what the ModeDashboardExtractor does, which is an extractor from Amundsen. So, first it gets the spaces query building a ModePaginatedRestApiQuery object which is just the RestApiQuery with some pagination logic added to it.

ModeDashboardUtils

From the API response payload it parses the JSON to get the token (which is like the space id), name and description and builds a python dictionary object for each record in spaces.

Spaces endpoint response payload

These records are then passed to the parent RestApiQuery which is the reports query.

ModeDashboardExtractor

And then we have the the space_token (which they call dashboard_group_id) available to call the reports endpoint. For now, we don’t need any more API calls to get the data we want so with this final RestApiQuery object we build the RestAPIExtractor to then start extracting the records from Mode.

ModeDashboardExtractor

Besides building the extractors, I also built our own transformers and mappers to create our data entities model objects to load to our database (we have our customised Loader class for that!).

One thing to note here is that Mode does not provide granular usage data (i.e. individual report views) through their API, but just the accumulated view count. This isn’t much good to us as we need to know when the views took place. For example, a report could have 500 views, but if they all took place more than12 months ago, then it doesn’t have much business value anymore. I solved this by calculating the daily view count in our backend.

The main objective of this integration was to get data from Mode reports and charts, including the queries from reports so we can generate the lineage, which in turn powers our impact analysis feature.

✅ Milestone achieved: our plug and play Mode integration is up and running!

How does it look in Alvin?

Let’s check out a couple of examples of what all that metadata powers in the Alvin UI:

  • Upstream lineage: Find out which tables and columns are being used to in your reports.

  • Downstream lineage: View all reports that use a certain table or column in your data warehouse.
Lineage list for a Redshift column
Downstream column level lineage

  • Impact analysis (I love this one!): Find out which reports/charts will be affected by your changes (e.g. dropping a table or column).
Impact analysis when dropping a table

  • Powerful search: Discover your reports and charts and check usage over time. See what is popular and investigate those that have gone stale.

Wrapping Up

So that wraps up how I got on integrating Mode with Alvin. If you want to try Alvin, request access here and a crew member will reach out to set up a demo and walk you through the onboarding :)

So that’s one down, many, many more to go 😅 Feel free to let me know what you think our next integration should be in the comments or via Linkedin. If you want to know more about Alvin and our mission, check out our other posts here. Otherwise, stay tuned for my next one!

Alvin is helping data teams of all sizes meet the challenges of data quality, reliability, cost and governance.

Click get started to unleash the power of highly accurate lineage and usage data, or check out open roles if you want to join the mission.
Get started for freeOpen roles