Reverse Ads is a digital advertising platform which is an alternative to traditional search ads. It provides cookie-less tracking solutions using Client-side, first party data compliance with all major data protection regulations such as GDPR, PDPA, CASL & CCPA.
The Client initially approached Nobel Link with a request for building a machine learning algorithm which, for a given set of keywords, finds the most relevant users based on the list of websites they have previously visited.

Scalable system to process data, download web pages with access restrictions, and build groups of relevant users for campaigns

The Nobel Link team was challenged with the development of a robust NLP semantic analysis solution that can download and analyze millions of web pages visited by users. The solution must quickly identify and provide to the Client’s advertising platform user segments that are most relevant to the keywords specified for advertising campaigns.
Building a targeted advertising system for efficient ad campaign optimization and semantic text analysis required solving many technical challenges, including:

  • Scalable processing of lots of user data, incoming hourly. Estimated monthly data size is ~50 TB or about a billion events.
  • Identification of the most priority web pages to download. Since downloading is a long and expensive process, our engineers had to reduce this amount to a minimum.
  • Downloading web pages. This required building a reliable system that allowed downloading pages from different regions and with different access restrictions.
  • Updating segments in real-time. Due to the huge amount of data, the relational database could not cope with this in the required time.
  • Calculation of embeddings for downloaded pages. The main task was to find a balance between the calculation speed and quality of embeddings.
  • Maintenance costs. The Client wanted to build a system with the limited monthly maintenance costs.

Stand-alone system for user data analysis, web page prioritization & semantic analysis

The task was to build a stand-alone system using AWS Cloud Environment and several open-source products that communicate with the Client’s advertising platform via API.
We thoroughly analyzed the Client’s needs and requirements. The project was splitted into three phases:

  • Proof of Concept (PoC) – verify the idea that machine learning can be used for targeted advertising
  • Minimum viable product (MVP) – build the end-to-end data pipeline from processing user data to building user segments
  • Scalable system – make the solution scalable and connect it to the Client’s advertising platform via API.

Here are some technical details of the project:

  • Storing input user data and output user segments was organized using public and private AWS S3 buckets
  • To process Big data faster than it arrives, we used caching to filter records of user visits
  • To make the system more resilient, we used a loose coupling approach using the AWS SQS message queue
  • To solve the problem with a slow update of user segments, we stored segments not in RDS, but in Amazon Elastic Block Storage (EBS)
  • We have achieved scalability in several parts of the system – processing of historical user data, downloading web pages, and calculating embeddings
  • To speed up the calculation of embeddings on EC2 instances, we created our own inference code in SageMaker, and also used the AWS auto-scaling option
  • Client API was implemented via AWS Lambda functions.

The Proof of Concept was completed in 2 months by the team of 3 people: a Project Manager, a Data Scientist, and an AWS Data Engineer.
The MVP stage was completed in 3 months by the team of 3 people: a Project Manager, a Data Scientist, and an AWS Data Engineer. The scalable system stage was completed in 2 months by the team of 4 people: a Project Manager, a Data Scientist, a Software Engineer, and an AWS Data Engineer.

 

Increased ad campaign performance

Our team has built a system that allows the Client to build relevant user segments for their advertising campaigns, monitor ad campaign performance, thereby increasing their effectiveness. The built system is scalable and allows them to increase/decrease the number of workers in different parts of the system if it’s needed.
Real-life test conducted by the Client showed that the use of this system reduced advertising costs by 54%.