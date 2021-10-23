What is web scraping and how does it happen?

Photo of cpadmin cpadminOctober 23, 2021
0
what-is-web-scraping-and-how-does-it-happen?

Using the internet and social networks in our daily lives, we leave a lot of our information publicly available on the network, without realizing it. It’s easy to forget that, in the early years 2021, it seemed an insane act to use your real name in a forum, while in 2021 the most common situation is to find the full name and city where a user lives from the Instagram bio.

  • What is a cell phone blocker? How does it work?
  • What is digital security anyway?
  • What is ransomware? Learn all about the threat and how to remove it

While this normalization of data sharing has completely changed society, transforming the internet into something useful and effective, it also increased people’s exposure. And many criminals take advantage of this to carry out scams, using a method called web scraping.

web scraping (net scraping, in free translation), also known as web data extraction, is the given name the process of collecting structured data from the web in an automated way. In general, this method is used by people, businesses and, worryingly, criminals, who want to use the vast amount of publicly available web data to make smarter decisions or commit crimes.

Want to catch up on the best tech news of the day? Access and subscribe to our new channel on youtube, Canaltech News. Every day a summary of the main news from the tech world for you!

How data scraping is performed

(Image: Playback/Apify)

The basic scraping process, in fact, is performed daily by most of the world’s population, in the act of copying and pasting information from one website to another medium, the difference is that the network scraping does it on a microscopic scale and with intelligent automation, to extract millions of data from internet pages.

The scraping The web crawler is performed using two tools, the web crawler (network crawler, in free translation) and the

web scraper (network scraper, also in translation l free).

The crawler, popularly called a “spider”, is an artificial intelligence that browses the internet looking for and indexing content. After that, the scraper, which is a specialized tool, checks the contents indexed by the “spider”, extracting data quickly and accurately that are in accordance with the localizers in the languages ​​they have implemented on the web page, such as CSS, regex, among others.

The dangers of data scraping

For companies, data scraping serves as a tool to, for example, better target advertising campaigns. From information from internet sites or responses to digital polls, it is possible to discover the interests or work of many people, opening the door to more effective marketing. However, for criminals, this same information can be used in a harmful way, or even just for profit.

But the ease of data scraping is worrying, as explained by Cecilia Pastorino, researcher at ESET Latin America:

“Data scraping is a technique for extracting information from websites in bulk and through automated scripts. This technique is used for indexing websites or analyzing data from different pages and has become very popular in some digital marketing actions such as improving web positioning or obtaining metrics. makes many of the scraping tools available on the Internet and very easy to use.”

Post on cybercriminal forum announcing the Dice. (Image: Reproduction/Privacy Affairs)

A recent example was reported on October 4th, the day Facebook services suffered a “blackout” and were unavailable for more than 6 hours . On that day, a post was found in a well-known forum for the sale of stolen virtual data, claiming to have information such as the name, e-mail, location, gender and telephone number of more than 1.5 billion users of the created social network. by Mark Zuckerberg. To date there is no confirmation whether this leak was real, as on October 6th the sales topic was deleted from the forum in question.

But, if it is real, the availability of information such as phone number and location already compromise the overall security of affected users, and data such as name and email can be used to more easily target virtual hijacking (ransomware), phishing, pharming, and social engineering scams.

The best way to prevent personal data from being scraped, in addition to not making it available on the internet, is that the general population does not leave their profiles on social networks, whether Facebook, Instagram or Twitter, totally public. In addition, it is suggested that polls offered by unknown companies or developers be avoided, as there is a high chance they are part of criminal schemes.

Source: GoCache

Did you like this article?

Subscribe your email on Canaltech to receive daily updates with the latest news from the world of technology.

Photo of cpadmin cpadminOctober 23, 2021
0
Photo of cpadmin

cpadmin

Related Articles

Photo of China launches new crew of astronauts to the country's space station

China launches new crew of astronauts to the country's space station

October 15, 2021
Photo of Tesla | Scoring system makes drivers drive better, company says

Tesla | Scoring system makes drivers drive better, company says

October 23, 2021
Photo of Tecnofit gym management startup receives an investment of R$ 13 million

Tecnofit gym management startup receives an investment of R$ 13 million

September 18, 2021
Photo of WhatsApp's main rival, Telegram reaches historic milestone on Android

WhatsApp's main rival, Telegram reaches historic milestone on Android

October 15, 2021
Back to top button