Data Engineer

Bangalore Full-time

The thrill of working at a start-up that is starting to scale massively is something else.

 

Simpl (getsimpl.com) was formed in 2015 by Nitya Sharma, an investment banker from Wall Street and Chaitra Chidanand, a tech executive from the Valley, when they teamed up with a very clear mission - to make money simple, so that people can live well and do amazing things.

 

Simpl is the payment platform for the mobile-first world, and we’re backed by some of the best names in fintech globally (folks who have invested in Visa, Square and Transferwise), and has Joe Saunders, Ex Chairman and CEO of Visa as a board member.

 

Everyone at Simpl is an internal entrepreneur who is given a lot of bandwidth and resources to create the next breakthrough towards the long term vision of “making money Simpl”.

 

Our first product is a payment platform that lets people buy instantly, anywhere online, and pay later. In the background, Simpl uses big data for credit underwriting, risk and fraud modelling, all without any paperwork, and enables Banks and Non-Bank Financial Companies to access a whole new consumer market.

 

Responsibilities

 

You will be focused on making sure of data correctness and accessibility, and building scalable systems to access/process it. Another major responsibility is helping AI/ML Engineers write better code. You will also build scalable, high performance data intensive services.

Examples:

- We have data pipelines processing aggregate and statistical data. Should we store this

in Redshift, in flat files in S3, or somewhere else?

-  How should we structure our data pipelines?

-  We need to track various data points to identify our customers in various locations, including from different devices, and determine that two seemingly disparate users are actually the same. How can we do this efficiently and effectively?

 

Your job is to understand what we’re trying to build, make informed choices about this and then get us going.

 

Example interview questions:

-  Consider the query `SELECT * FROM foo INNER JOIN bar ON foo.x = bar.x WHERE foo.primary_key = ?`. What happens if you run this in Postgres? How does that differ if you run it in Redshift, or SparkSQL?

-  Suppose we store a table in flat CSV files on S3. What kinds of jobs is this good for, and bad for? How is Parquet or BerkeleyDB different?

-  What data structures are good for storing a graph, assuming the common query is finding a connected component?

On these questions, we’re primarily interested in computer science fundamentals. A good answer might be “a B-Tree, with keys structured as ...”. A bad answer might be be “use Neo4J, I don’t know how it works but it’s fast”.

 

Apply for this opening at http://getsimpl.recruiterbox.com/jobs/fk014l9?apply=true