Good morning!
Welcome to The Daily Grind for Tuesday, August 5.
Today, Perplexity was caught skirting web-crawling rules, and we read a page from Karen Hao’s excellent new book that goes into OpenAI.
Finally, we’ll reflect a bit on what makes life worth living.
Let’s get into it:
Yesterday, internet security platform Cloudflare reported they caught Perplexity sending secret web crawlers to websites that explicitly blocked AI crawlers.
Perplexity, an AI search engine, was reportedly circumventing robots.txt instructions (the “code of conduct” for bots on websites) and Web Application Firewall (WAF) rules with a series of tricks to scrape content from sites without their permission.
We received complaints from customers who had both disallowed Perplexity crawling activity in their robots.txt files and also created WAF rules to specifically block both of Perplexity’s declared crawlers: PerplexityBot and Perplexity-User. These customers told us that Perplexity was still able to access their content even when they saw its bots successfully blocked.
Cloudflare ran a series of tests to determine how Perplexity’s secret bots were accessing sites.
First, Perplexity bots would simply ignore robots.txt instructions—as long as they weren’t blocked by the firewall, they would scrape the site.
If Perplexity bots were blocked, they would switch to their secret bot, which posed as a Google Chrome or MacOS bot. They would try to use this bot to get around the firewall.
If that was unsuccessful, the secret bot would then switch IPs or ASNs (Autonomous system numbers, i.e., groups of IPs) to hide their crawling activities and try to get around the firewall again.
In the end, if they failed, Perplexity bots would return less specific or accurate information, proving they did not access the site.
TechCrunch reached out to Perplexity for comment:
Perplexity spokesperson Jesse Dwyer dismissed Cloudflare’s blog post as a “sales pitch,” adding in an email to TechCrunch that the screenshots in the post “show that no content was accessed.” In a follow-up email, Dwyer claimed the bot named in the Cloudflare blog “isn’t even ours.”
But this is not the first time Perplexity has been accused of illegal content scraping. Last year, news outlets like The Verge claimed that Perplexity was plagiarising their content.
Not all AI providers are bad actors. Cloudflare calls out OpenAI as a company following all the best practices of internet security, like following robot.txt instructions and clearly stating the purpose of each of their crawlers.
Cloudflare has taken a stance against AI web crawlers on behalf of content publishers. Last month, the company announced Content Independence Day, where they declared “no AI web crawling without compensation.”
Efforts to curb free AI web-crawling include:
Changing the default to block AI crawlers unless they pay creators for their content
Building a marketplace for content creators to get paid for high-value content per crawl
De-listing bad actors like Perplexity and fingerprinting their secret bots so they can be detected in the future.
Other service providers are fighting for publishers too, like the startup Tollbit, which raised $24 million last year to create a pay-per-crawl compensation system.
But those systems will be futile if AI tools circumvent the rules with secret crawlers. In the AI gold rush, some people will do anything to get ahead.
Here are more stories to explore today:
TechCrunch: A top designer was banned from Dribbble. Now he’s building his own competitor
NYT: Clay, a Sales Tool for the A.I. Era, Raises $100 Million
Axios: Patreon crosses $10 billion creator payout milestone
Worth Watching: Patreon Jack Conte: Death of the Follow and the Future of Creativity (SXSW 2024)
Power. Wealth. Betrayal. Anything is possible in the age of AI, and companies like Perplexity are willing to skirt the rules to win.
OpenAI is the rare good guy in Cloudflare’s research, but it’s no Garden of Eden. Karen Hao’s deeply reported book Empire of AI, goes inside the AI centacorn ($100 billion+ startup) with over 300 interviews and mounds of documents and correspondence.
Hao is also an excellent storyteller. The prologue throws the reader right into the drama:
On Friday, November 17, 2023, around noon Pacific time, Sam Altman, CEO of OpenAI, Silicon Valley’s golden boy, avatar of the generative AI revolution, logged on to a Google Meet to see four of his five board members staring at him.
From his video square, board member Ilya Sutskever, OpenAI’s chief scientist, was brief:
Altman was being fired. The announcement would go out momentarily.
Altman was in his room at a luxury hotel in Las Vegas to attend the city’s first Formula One race in a generation, a star-studded affair with guests from Rihanna to David Beckham. The trip was a short reprieve in the middle of the punishing travel schedule he had maintained ever since the company released ChatGPT about a year earlier. For a moment, he was too stunned to speak. He looked away as he sought to regain his composure. As the conversation continued, he tried in his characteristic way to smooth things over.
“How can I help?” he asked.
The board told him to support the interim chief executive they had selected, Mira Murati, who had been serving as his chief technology officer. Altman, still confused and wondering whether this was a bad dream, acquiesced.
Minutes later, Sutskever sent another Google Meet link to Greg Brockman, OpenAI’s president and a close ally to Altman who had been the only board member missing from the previous meeting. Sutskever told Brockman he would no longer be on the board but would retain his role at the company.
The public announcement went up soon thereafter. “Mr. Altman’s departure follows a deliberative review process by the board, which concluded that he was not consistently candid in his communications with the board, hindering its ability to exercise its responsibilities. The board no longer has confidence in his ability to continue leading OpenAI.”
Empire of AI was recommended to me by Danny Goodman. It’s worth a read if you want to better understand the current AI gold rush.
It’s easy to get caught up in the rush of busyness, work, family, and life. I sometimes find myself thinking, “I just have to get through this hectic time and then everything will be better.”
The truth is, that future state of calm is never going to come. Instead, we need to embrace and appreciate the moment as it is.
Imagine looking in a mirror, but instead of seeing your current self, you see the 80-year-old version of you. Your hair is gray and thin, your face is wrinkly, your nose and ears are bigger (since they never stop growing!), and you are sitting in a wheelchair.
From that perspective, think back to your life now—your present day: What do you cherish most from this time in your life?
What will you look back on with a wistful smile, wishing you could relive for just one more moment?
Once you know the answer, be sure to pay extra special attention to that thing today. Because, as they say, these are the good ol’ days, and you don’t want to miss them.
That’s it for today’s Daily Grind! Thank you as always for reading.
Please rate this newsletter so I can continue making it better for you:
Talk to you tomorrow!
Cheers,
Ben