Connect with us

Trending Spy News

Google confirms it’s training Bard on scraped web data, too

A trippy graphic displaying a collection of items like paintbrushes, books, phone messages, and a notepad to represent generative AI. A large pair of eyes and hands can be seen at the center of the image.
Whatever content is publically available on the web, Google has given itself permission to use it to train AI. | Illustration by Haein Jeong / The Verge

On Monday, Gizmodo spotted that the search giant updated its privacy policy to disclose that its various AI services, such as Bard and Cloud AI, may be trained on public data that the company has scraped from the web.

“Our privacy policy has long been transparent that Google uses publicly available information from the open web to train language models for services like Google Translate,” said Google spokesperson Christa Muldoon to The Verge. “This latest update simply clarifies that newer services like Bard are also included. We incorporate privacy principles and safeguards into the development of our AI technologies, in line with our AI Principles.”

A screenshot taken from Google’s privacy policy
Image: Google
These are the most recent changes to Google’s privacy policy. The company is now openly admitting to where your data is being used at least…

Following the update on July 1st, 2023, Google’s privacy policy now says that “Google uses information to improve our services and to develop new products, features, and technologies that benefit our users and the public” and that the company may “use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.”

You can see from the policy’s revision history that the update provides some additional clarity as to the services that will be trained using the collected data. For example, the document now says that the information may be used for “AI Models” rather than “language models,” granting Google more freedom to train and build systems beside LLMs on your public data. And even that note is buried under an embedded link for “publically accessible sources” underneath the policy’s “Your Local Information” tab that you have to click to open the relevant section.

The updated policy specifies that “publicly available information” is used to train Google’s AI products but doesn’t say how (or if) the company will prevent copyrighted materials from being included in that data pool. Many publicly accessible websites have policies in place that ban data collection or web scraping for the purpose of training large language models and other AI toolsets. It’ll be interesting to see how this approach plays out with various global regulations like GDPR that protect people against their data being misused without their express permission, too.

A combination of these laws and increased market competition have made makers of popular generative AI systems like OpenAI’s GPT-4 extremely cagey about where they got the data used to train them and whether or not it includes social media posts or copyrighted works by human artists and authors.

The matter of whether or not the fair use doctrine extends to this kind of application currently sits in a legal gray area. The uncertainty has sparked various lawsuits and pushed lawmakers in some nations to introduce stricter laws that are better equipped to regulate how AI companies collect and use their training data. It also raises questions regarding how this data is being processed to ensure it doesn’t contribute to dangerous failures within AI systems, with the people tasked with sorting through these vast pools of training data often subjected to long hours and extreme working conditions.

Gannett, the largest newspaper publisher in the United States, is suing Google and its parent company, Alphabet, claiming that advancements in AI technology have helped the search giant to hold a monopoly over the digital ad market. Products like Google’s AI search beta have also been dubbed “plagiarism engines” and criticized for starving websites of traffic.

Meanwhile, Twitter and Reddit — two social platforms that contain vast amounts of public information — have recently taken drastic measures to try and prevent other companies from freely harvesting their data. The API changes and limitations placed on the platforms have been met with backlash by their respective communities, as anti-scraping changes have negatively affected the core Twitter and Reddit user experiences.

Trending Spy News

What Are the Best Counter-Surveillance Tactics?

Have you ever considered how effective your current counter-surveillance tactics truly are? As you navigate through the complexities of safeguarding your privacy and security, it becomes essential to reassess and adapt your strategies accordingly. By exploring the intersection of physical security, digital privacy, and behavioral awareness, you may uncover innovative ways to protect yourself from potential surveillance threats. Stay tuned to discover practical insights and actionable tips that could enhance your counter-surveillance efforts to the next level.

Article Summary

Understanding Surveillance Risks

To effectively counter surveillance, you must first grasp the potential risks associated with being monitored. Surveillance poses various threats to your privacy and freedom. One risk is the collection of your personal information, which can be used for targeted advertising or even manipulation.

Another risk is the possibility of your activities being tracked, leading to potential profiling or discrimination. Additionally, surveillance can infringe upon your right to freedom of expression and association, as it may deter you from expressing dissenting opinions or attending certain events.

Understanding these risks is vital in developing a comprehensive counter-surveillance strategy. By recognizing the potential consequences of being monitored, you can take proactive steps to protect yourself and your information. This may involve using encryption tools to secure your communications, being mindful of the data you share online, and being cautious of your physical surroundings to avoid being surreptitiously monitored.

In essence, by being aware of the risks posed by surveillance, you can better safeguard your privacy and freedom in an increasingly monitored world.

Physical Security Measures

Understanding the risks of surveillance provides a foundation for implementing effective physical security measures to protect your privacy and freedom. When addressing physical security, consider securing your home or workspace with robust locks, reinforced doors, and security cameras to deter potential intruders. Implementing access controls, such as biometric scanners or smart locks, can further strengthen your security by limiting unauthorized entry. Additionally, installing window bars or shatter-resistant films can fortify your space against forced entry attempts.

Maintaining situational awareness is vital; be vigilant of your surroundings and identify any anomalies that could indicate surveillance or potential threats. Conduct regular sweeps for hidden cameras or listening devices in your environment. Utilize secure storage options for sensitive documents and electronic devices to prevent unauthorized access.

Digital Privacy Tools

Boost your digital privacy defenses by leveraging a variety of advanced tools and techniques. Start by using a reputable Virtual Private Network (VPN) to fortify your internet connection, making it harder for snoopers to track your online activities.

Secure messaging apps like Signal or Wickr provide end-to-end encryption, ensuring your conversations remain private.

For secure browsing, consider using the Tor Browser, which anonymizes your web traffic by routing it through a series of servers. Password managers like LastPass or Dashlane help you create and store complex passwords securely. Enable two-factor authentication whenever possible to add an extra layer of security to your accounts.

Regularly update your software and operating systems to patch any vulnerabilities that hackers could exploit. Use encrypted email services such as ProtonMail or Tutanota for sensitive communications.

Behavioral Awareness Techniques

Strengthening your digital security with advanced tools is essential; now, shift your focus to mastering Behavioral Awareness Techniques to improve your overall surveillance defense strategy.

Behavioral Awareness Techniques involve being mindful of your surroundings, recognizing patterns, and understanding social cues to detect potential surveillance. Start by varying your daily routines to make it harder for any observer to predict your movements. Pay attention to individuals exhibiting unusual behavior or appearing out of place. Trust your instincts and be cautious of unsolicited approaches or attempts to gather information about you.

Practice good operational security by limiting the information you share publicly and being discreet about sensitive details. Maintain situational awareness in public spaces, including monitoring for physical surveillance, such as someone following you.

Frequently Asked Questions

How Can I Detect Hidden Surveillance Cameras in My Surroundings?

Want to detect hidden surveillance cameras around you? Look for unusual items, check for blinking lights, and use a camera detector. Sweep your surroundings methodically, paying attention to potential hiding spots. Stay vigilant for your privacy.

Are There Any Specific Tactics to Counter Tracking Devices on Vehicles?

To counter tracking devices on vehicles, implement regular physical inspections underneath the car, seek professional bug sweeping services periodically, use radio frequency detectors to locate hidden trackers, and consider investing in GPS signal jammers for added protection against potential surveillance.

What Steps Can I Take to Protect My Sensitive Information During Travel?

To protect your sensitive information during travel, safeguard your devices with strong passwords, encryption, and regular updates. Avoid public Wi-Fi and use a virtual private network (VPN). Be cautious of shoulder surfers and consider using RFID-blocking wallets for cards.

Is There a Way to Identify if My Phone Is Being Monitored Remotely?

Ever vigilant, you can determine if your phone is remotely monitored by looking for unusual battery drain, overheating, unexplained noises, or sudden data usage spikes. Stay alert to any irregularities to safeguard your privacy.

How Do I Secure My Home Against Advanced Surveillance Methods Like Drones?

To secure your home against advanced surveillance methods like drones, consider installing privacy fences, using window films that block infrared, and planting tall trees or shrubs. Regularly check for any unfamiliar devices or signs of surveillance.

Continue Reading

Trending Spy News

FTC investigating OpenAI on ChatGPT data collection and publication of false information

OpenAI CEO Samuel Altman Testifies To Senate Committee On Rules For Artificial Intelligence
Photo by Win McNamee / Getty Images

The Federal Trade Commission (FTC) is investigating ChatGPT creator OpenAI over possible consumer harm through its data collection and the publication of false information.

First reported by The Washington Post, the FTC sent a 20-page letter to the company this week. The letter requests documents related to developing and training its large language models, as well as data security.

The FTC wants to get detailed information on how OpenAI vets information used in training for its models and how it prevents false claims from being shown to ChatGPT users. It also wants to learn more about how APIs connect to its systems and how data is protected when accessed by third parties.

The FTC declined to comment. OpenAI did not immediately respond to requests for comment.

This is the first major US investigation into OpenAI, which burst into the public consciousness over the past year with the release of ChatGPT. The popularity of ChatGPT and the large language models that power it kicked off an AI arms race prompting competitors like Google and Meta to release their own models.

The FTC has signaled increased regulatory oversight of AI before. In 2021, the agency warned companies against using biased algorithms. Industry watchdog Center for AI and Digital Policy also called on the FTC to stop OpenAI from launching new GPT models in March.

Large language models can put out factually inaccurate information. OpenAI warns ChatGPT users that it can occasionally generate incorrect facts, and Google’s chatbot Bard’s first public demo did not inspire confidence in its accuracy. And based on personal experience, both have spit out incredibly flattering, though completely invented, facts about myself. Other people have gotten in trouble for using ChatGPT. A lawyer was sanctioned for submitting fake cases created by ChatGPT, and a Georgia radio host sued the company for results that claimed he was accused of embezzlement.

US lawmakers showed great interest in AI, both in understanding the technology and possibly looking into enacting regulations around it. The Biden administration released a plan to provide a responsible framework for AI development, including a $140 million investment to launch research centers. Supreme Court Justice Neil Gorsuch also discussed chatbots’ potential legal liability earlier this year.

It is in this environment that AI leaders like OpenAI CEO Sam Altman have made the rounds in Washington. Altman lobbied Congress to create regulations around AI.

Continue Reading

Trending Spy News

OpenAI will use Associated Press news stories to train its models

An illustration of a cartoon brain with a computer chip imposed on top.
Illustration by Alex Castro / The Verge

OpenAI will train its AI models on The Associated Press’ news stories for the next two years, thanks to an agreement first reported by Axios. The deal between the two companies will give OpenAI access to some of the content in AP’s archive as far back as 1985.

As part of the agreement, AP will gain access to OpenAI’s “technology and product expertise,” although it’s not clear exactly what that entails. AP has long been exploring AI features and began generating reports about company earnings in 2014. It later leveraged the technology to automate stories about Minor League Baseball and college sports.

AP joins OpenAI’s growing list of partners. On Tuesday, the AI company announced a six-year deal with Shutterstock that will let OpenAI license images, videos, music, and metadata to train its text-to-image model, DALL-E. BuzzFeed also says it will use AI tools provided by OpenAI to “enhance” and “personalize” its content. OpenAI is also working with Microsoft on a number of AI-powered products as part of Microsoft’s partnership and “‘multibillion dollar investment” into the company.

“The AP continues to be an industry leader in the use of AI; their feedback — along with access to their high-quality, factual text archive — will help to improve the capabilities and usefulness of OpenAI’s systems,” Brad Lightcap, OpenAI’s chief operating officer, says in a statement.

Earlier this year, AP announced AI-powered projects that will publish Spanish-language news alerts and document public safety incidents in a Minnesota newspaper. The outlet also launched an AI search tool that’s supposed to make it easier for news partners to find photos and videos in its library based on “descriptive language.”

AP’s partnership with OpenAI seems like a natural next step, but there are still a lot of crucial details missing about how the outlet will use the technology. AP makes it clear it “does not use it in its news stories.”

Did you miss our previous article…

Continue Reading