Connect with us

Trending Spy News

Google confirms it’s training Bard on scraped web data, too


A trippy graphic displaying a collection of items like paintbrushes, books, phone messages, and a notepad to represent generative AI. A large pair of eyes and hands can be seen at the center of the image.
Whatever content is publically available on the web, Google has given itself permission to use it to train AI. | Illustration by Haein Jeong / The Verge

On Monday, Gizmodo spotted that the search giant updated its privacy policy to disclose that its various AI services, such as Bard and Cloud AI, may be trained on public data that the company has scraped from the web.

“Our privacy policy has long been transparent that Google uses publicly available information from the open web to train language models for services like Google Translate,” said Google spokesperson Christa Muldoon to The Verge. “This latest update simply clarifies that newer services like Bard are also included. We incorporate privacy principles and safeguards into the development of our AI technologies, in line with our AI Principles.”

A screenshot taken from Google’s privacy policy
Image: Google
These are the most recent changes to Google’s privacy policy. The company is now openly admitting to where your data is being used at least…

Following the update on July 1st, 2023, Google’s privacy policy now says that “Google uses information to improve our services and to develop new products, features, and technologies that benefit our users and the public” and that the company may “use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.”

You can see from the policy’s revision history that the update provides some additional clarity as to the services that will be trained using the collected data. For example, the document now says that the information may be used for “AI Models” rather than “language models,” granting Google more freedom to train and build systems beside LLMs on your public data. And even that note is buried under an embedded link for “publically accessible sources” underneath the policy’s “Your Local Information” tab that you have to click to open the relevant section.

The updated policy specifies that “publicly available information” is used to train Google’s AI products but doesn’t say how (or if) the company will prevent copyrighted materials from being included in that data pool. Many publicly accessible websites have policies in place that ban data collection or web scraping for the purpose of training large language models and other AI toolsets. It’ll be interesting to see how this approach plays out with various global regulations like GDPR that protect people against their data being misused without their express permission, too.

A combination of these laws and increased market competition have made makers of popular generative AI systems like OpenAI’s GPT-4 extremely cagey about where they got the data used to train them and whether or not it includes social media posts or copyrighted works by human artists and authors.

The matter of whether or not the fair use doctrine extends to this kind of application currently sits in a legal gray area. The uncertainty has sparked various lawsuits and pushed lawmakers in some nations to introduce stricter laws that are better equipped to regulate how AI companies collect and use their training data. It also raises questions regarding how this data is being processed to ensure it doesn’t contribute to dangerous failures within AI systems, with the people tasked with sorting through these vast pools of training data often subjected to long hours and extreme working conditions.

Gannett, the largest newspaper publisher in the United States, is suing Google and its parent company, Alphabet, claiming that advancements in AI technology have helped the search giant to hold a monopoly over the digital ad market. Products like Google’s AI search beta have also been dubbed “plagiarism engines” and criticized for starving websites of traffic.

Meanwhile, Twitter and Reddit — two social platforms that contain vast amounts of public information — have recently taken drastic measures to try and prevent other companies from freely harvesting their data. The API changes and limitations placed on the platforms have been met with backlash by their respective communities, as anti-scraping changes have negatively affected the core Twitter and Reddit user experiences.

Trending Spy News

FTC investigating OpenAI on ChatGPT data collection and publication of false information


OpenAI CEO Samuel Altman Testifies To Senate Committee On Rules For Artificial Intelligence
Photo by Win McNamee / Getty Images

The Federal Trade Commission (FTC) is investigating ChatGPT creator OpenAI over possible consumer harm through its data collection and the publication of false information.

First reported by The Washington Post, the FTC sent a 20-page letter to the company this week. The letter requests documents related to developing and training its large language models, as well as data security.

The FTC wants to get detailed information on how OpenAI vets information used in training for its models and how it prevents false claims from being shown to ChatGPT users. It also wants to learn more about how APIs connect to its systems and how data is protected when accessed by third parties.

The FTC declined to comment. OpenAI did not immediately respond to requests for comment.

This is the first major US investigation into OpenAI, which burst into the public consciousness over the past year with the release of ChatGPT. The popularity of ChatGPT and the large language models that power it kicked off an AI arms race prompting competitors like Google and Meta to release their own models.

The FTC has signaled increased regulatory oversight of AI before. In 2021, the agency warned companies against using biased algorithms. Industry watchdog Center for AI and Digital Policy also called on the FTC to stop OpenAI from launching new GPT models in March.

Large language models can put out factually inaccurate information. OpenAI warns ChatGPT users that it can occasionally generate incorrect facts, and Google’s chatbot Bard’s first public demo did not inspire confidence in its accuracy. And based on personal experience, both have spit out incredibly flattering, though completely invented, facts about myself. Other people have gotten in trouble for using ChatGPT. A lawyer was sanctioned for submitting fake cases created by ChatGPT, and a Georgia radio host sued the company for results that claimed he was accused of embezzlement.

US lawmakers showed great interest in AI, both in understanding the technology and possibly looking into enacting regulations around it. The Biden administration released a plan to provide a responsible framework for AI development, including a $140 million investment to launch research centers. Supreme Court Justice Neil Gorsuch also discussed chatbots’ potential legal liability earlier this year.

It is in this environment that AI leaders like OpenAI CEO Sam Altman have made the rounds in Washington. Altman lobbied Congress to create regulations around AI.

Continue Reading

Trending Spy News

OpenAI will use Associated Press news stories to train its models


An illustration of a cartoon brain with a computer chip imposed on top.
Illustration by Alex Castro / The Verge

OpenAI will train its AI models on The Associated Press’ news stories for the next two years, thanks to an agreement first reported by Axios. The deal between the two companies will give OpenAI access to some of the content in AP’s archive as far back as 1985.

As part of the agreement, AP will gain access to OpenAI’s “technology and product expertise,” although it’s not clear exactly what that entails. AP has long been exploring AI features and began generating reports about company earnings in 2014. It later leveraged the technology to automate stories about Minor League Baseball and college sports.

AP joins OpenAI’s growing list of partners. On Tuesday, the AI company announced a six-year deal with Shutterstock that will let OpenAI license images, videos, music, and metadata to train its text-to-image model, DALL-E. BuzzFeed also says it will use AI tools provided by OpenAI to “enhance” and “personalize” its content. OpenAI is also working with Microsoft on a number of AI-powered products as part of Microsoft’s partnership and “‘multibillion dollar investment” into the company.

“The AP continues to be an industry leader in the use of AI; their feedback — along with access to their high-quality, factual text archive — will help to improve the capabilities and usefulness of OpenAI’s systems,” Brad Lightcap, OpenAI’s chief operating officer, says in a statement.

Earlier this year, AP announced AI-powered projects that will publish Spanish-language news alerts and document public safety incidents in a Minnesota newspaper. The outlet also launched an AI search tool that’s supposed to make it easier for news partners to find photos and videos in its library based on “descriptive language.”

AP’s partnership with OpenAI seems like a natural next step, but there are still a lot of crucial details missing about how the outlet will use the technology. AP makes it clear it “does not use it in its news stories.”

Did you miss our previous article…
https://eyespypro.com/congressistrying-to-stop-discriminatory-algorithms-again/

Continue Reading

Trending Spy News

Congress is trying to stop discriminatory algorithms again


A person with their hand hovering over the Like button on Facebook.
Photo by Amelia Holowaty Krales / The Verge

US policymakers hope to require online platforms to disclose information about their algorithms and allow the government to intervene if these are found to discriminate based on criteria like race or gender.

Sen. Edward Markey (D-MA) and Rep. Doris Matsui (D-CA) reintroduced the Algorithmic Justice and Online Platform Transparency Act, which aims to ban the use of discriminatory or “harmful” automated decision-making. It would also establish safety standards, require platforms to provide a plain language explanation of algorithms used by websites, publish annual reports on content moderation practices, and create a governmental task force to investigate discriminatory algorithmic processes.

The bill applies to “online platforms” or any commercial, public-facing website or app that “provides a community forum for user-generated content.” This can include social media sites, content aggregation services, or media and file-sharing sites.

Markey and Matsui introduced a previous version of the bill in 2021. It moved to the Subcommittee on Consumer Protection and Commerce but died in committee.

Data-based decision-making, including social media recommendation algorithms or machine learning systems, often lives in proverbial black boxes. This opacity sometimes exists because of intellectual property concerns or a system’s complexity.

But lawmakers and regulators worry this could obscure biased decision-making with a huge impact on people’s lives, well beyond the reach of the online platforms the bill covers. Insurance companies, including those working with Medicaid patients, already use algorithms to grant or deny patient coverage. Agencies such as the FTC signaled in 2021 that they may pursue legal action against biased algorithms.

Calls to make more transparent algorithms have grown over the years. After several scandals in 2018 — which included the Cambridge Analytica debacle — AI research group AI Now found governments and companies don’t have a way to punish organizations that produce discriminatory systems. In a rare move, Facebook and Instagram announced the formation of a group to study potential racial bias in its algorithms.

“Congress must hold Big Tech accountable for its black-box algorithms that perpetuate discrimination, inequality, and racism in our society – all to make a quick buck,” Markey said in a statement.

Most proposed regulations around AI and algorithms include a push to create more transparency. The European Union’s proposed AI Act, in its final stages of negotiation, also noted the importance of transparency and accountability.

Continue Reading

Trending