Tomas Montvilas, the Chief Commercial Officer at Oxylabs, shares his thoughts on how eCommerce businesses can harness web scraping, AI and ML to enhance their business.
Tomas will join Big Data & AI World on 8-9 March at ExCeL London to further explore how using real-time public data can deliver competitive advantages in eCommerce.
Q: The challenges of managing, processing, and analysing an enormous amount of internal company data will be a recurring topic at Big Data and AI World 2023. Employing external data adds yet another layer of complexity to these processes. Why is it important for companies to utilise external data, and how can they extract value from it?
Many companies rely on CRMs, software logs, and other traditional internal sources or advanced internal data such as first and zero-party data gathered directly from consumers. However, a growing number of businesses are trying to harness the power of big data scattered all over the internet as it offers an alternative way to augment decision-making, optimise commercial processes, and win the competition.
The COVID-19 pandemic accelerated the digitalisation of various industries and the proliferation of data as a service (DaaS) companies.
Previously, extracting public web data was a luxury of big companies with highly specialised teams dedicated to crawling and scraping tasks. Today, external data is way more accessible as a number of DaaS providers offer data gathering infrastructure or the data itself.
In one of my articles, I called it a ‘revolution’, and it’s nothing less of it. Data is the new oil in the global market – the alternative data industry is already worth almost $3 billion, and its value is growing. Big data has been a hyped topic for many years now, but web scraping actually opened ways to dig out and utilise it.
Though many industries are still only scratching the surface of it, web scraping already hosts a lot of use cases, including market intelligence, cybersecurity, ad verification, contextual advertising, SEO monitoring, extracting alternative financial data for investment decisions, and gathering information for academic or investigative research. Non-traditional external data can benefit almost any digital business trying to scale its operations or do some growth hacking.
Q: In your presentation, you are going to focus on how eCommerce businesses can gain a competitive advantage using real-time public data. Why is it exceptional, and how do these companies utilise web scraping?
Pure eCommerce, retail, and digital marketing intelligence companies were the early adopters of web scraping technology and, today, are the most active and experienced in this field. In these industries, web scraping is already a mainstream activity.
For eCommerce companies, public web data is vital to understand market trends, consumer behavior, and competitor strategies. Customer sentiment, real-time pricing, product listings, categories, keywords, stock fluctuations – all this data can be collected automatically through web scraping.
Price intelligence and benchmarking is the most prevalent use case; it can range from tracking specific products’ pricing trends to implementing dynamic pricing according to supply limitations or real-time competitor price matching. Employing web scraping can help solve a common dynamic pricing issue – “race-to-the-bottom” price matching – by providing more context and insights.
Another popular application is optimising product offerings by scraping product-related information (such as existing stock) and identifying assortment gaps. Web scraping can provide eCommerce companies with such competitor performance-related data as detailed product and service descriptions, shipping policies, inventory information, brand mentions, keyword rankings, and more.
Q: How is AI and ML contributing to the eCommerce sector?
AI and its subset,machine learning (ML), is probably the single main driver behind the developments in data science. Scraping is no exception, as it has a lot of complex parts that can be made significantly easier with ML. Oxylabs has already integrated ML in its newest product – Web Unblocker – and in a recent Scraper APIs feature, the Adaptive Parser.
ML-powered proxy solutions can imitate human browsing, making it much easier to bypass even the most advanced anti-botting systems. ML models can also help with the parsing of extracted data. Parsing is really time-consuming as it requires writing custom parsers (sometimes a few for the same page) for every eCommerce site one is scraping.
Adaptive ML models can handle this part and transform otherwise unreadable and messy HTML data into a humanly-understandable JSON format.
It is worth noting that, for eCommerce companies, AI does not only ease the extraction part but also assists with data analysis. Advancements in deep learning algorithms, NLP, and semantic contextualisation open completely new possibilities in sentiment analysis, brand monitoring, and promotional targeting.
Oxylabs will continue to invest heavily in AI and ML-related research and development efforts, as this is one of the properties that distinguishes us from our competitors. I believe that AI will play a massive role in the coming years by making web scraping not just more efficient but more accessible to everyone.
Q: What are the main data scraping challenges today, how do you see the future of Oxylabs and the broader industry fitting into this landscape?
The web scraping industry has only started to show its potential.
Besides today’s popular use cases that I have discussed – pricing intelligence, market research, travel fare aggregation, and consumer sentiment tracking – it has other applications with huge social value. For example, cybersecurity testing, brand counterfeiting, detecting illegal and malicious content or obtaining publicly available information for investigative journalism. It can go as far as tracking worrying lobbying cases or scraping extremist groups online.
Since data scraping is a relatively young industry, its most pressing challenge is legal obscurity. Proxies and bots have a shaky reputation as, in the past, they were misused by irresponsible people. To bring web scraping out of the shadows, Oxylabs is pushing industry-wide ethical standards and a code of conduct. I believe complete public legitimacy will be achieved through developing partnerships with government institutions and academia.
Educating the market on ethical web scraping practices will remain one of our main goals in the near future. That is why Oxylabs is actively working on various pro bono initiatives. For example, we have developed an AI-driven scraper that identifies harmful and illegal content on the internet, aiding the Lithuanian governmental institutions in their work.
Data scraping is not just something businesses can exploit to grow their profits; it can contribute to the common good.
Yet another web scraping challenge is accessibility. Today, web scraping is mainly employed by developers and data scientists, but it is still too technically complex for broader usage. Yet, I believe this challenge will soon be left in the past, as AI and ML should enable a faster proliferation of low-code and no-code tools for big data gathering.
Without a doubt, technological innovation will continue to be at the forefront of Oxylabs. We are working closely with our AI and ML Advisory Board to advance the proxy industry through state-of-the-art ML and deep-learning technologies that enable effective, intentional, and insightful use of the immense volumes of public web data.
Oxylabs helps businesses of various sizes gather publicly available web data on a large scale. It can be anything from competitor real-time pricing to consumer reviews, and such external data is extremely useful for generating competitive business insights.
Today, Oxylabs has medium-sized companies, startups, and Fortune Global 500 companies among their clients. A typical Oxylabs customer is a large international company that must collect and analyse big data to enhance its operations and commercial offerings.