Data Infrastructure Engineering

We can help you design and build the core infrastructure and pipeline capabilities necessary to keep data flowing within your organization.

Introduction

Data Infrastructure Engineering is putting together the distributed components, systems and processes to drive value from data. Enabling data scientists and analysts to perform their job is the focus of data infrastructure engineering.

Value can be derived from data in two ways:

  1. unique data products, such as content recommendation services, to be used by other product teams
  2. analytics capabilities to generate insights into customer experiences and using these insights to improve the experience

Services Offered

We understand that your business needs are unique and we tailor our services to meet the uniqueness of your needs. Some projects we’ve worked on:

  • scale the extract-transform-load (ETL) pipeline to handle new volumes of data
  • add new aggregation steps to the statistics database to make new classes of questions answerable
  • promote the correct use of data and analytics within the organization
  • introduce tools to perform adhoc analysis

Technologies

We’re comfortable with the following technologies, with a preference for open-source software:

  • Elasticsearch for real-time search and analytics
  • Amazon Redshift for batch analytics and data-warehousing
  • Kafka (or Kinesis) for the messaging glue
  • Ansible for configuration management

Feel free to contact us for further information.