Are you interested in the opportunity to work remotely?
For over 10 years this company has pioneered a remote working strategy and while the head office is based in Ireland they currently have a globally distributed R&D team working from over 30 countries.
The team are passionate about scraping, web crawling, and data science and they are now a leading company for turning web content into useful data.
You'll join a team that are constantly improving their cloud-based web crawling platform, off-the-shelf datasets, and turn-key web scraping services.
THE ROLE:
As a Senior Engineer, your primary goal will be to develop and grow a new web crawling and extraction SaaS.
The new SaaS provides an API for automated e-commerce and article extraction from web pages using Machine Learning.
This is a distributed application written in Java, Scala and Python where components communicate via Apache Kafka and HTTP. It is orchestrated using Kubernetes.
You will design and implement distributed systems: large-scale web crawling platform, integrating Deep Learning-based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments.
As this SaaS is still in the early stages of development, you will have a large impact on the system design!
DAY TO DAY:
-
Work on the core platform: develop and troubleshoot Kafka-based distributed application, write and change components implemented in Java, Scala and Python.
-
Work on new features, including design and implementation. You will be responsible for the complete lifecycle of your features and code.
-
Solve distributed systems problems, such as scalability, transparency, failure handling, security, multi-tenancy.
WHAT WE LOVE TO SEE:
-
3+ years of experience building large scale data processing systems or high load services
-
Strong background in algorithms and data structures.
-
Strong track record in at least two of these technologies: Java, Scala, Python.
-
3+ years of experience with at least one of them.
-
Experience working with Linux and Docker.
BONUS POINTS FOR (or your opportunity to learn!)
-
Kubernetes experience
-
Apache Kafka experience
-
Experience building event-driven architectures
-
Understanding of web browser internals
-
Good knowledge of at least one RDBMS.
-
Knowledge of today’s cloud provider offerings: GCP, Amazon AWS, etc.
-
Web data extraction experience: web crawling, web scraping.
-
Experience with web data processing tasks: finding similar items, mining data streams, link analysis, etc.