Lasso Security investigation uncovers major HuggingFace API token exposure



summary
Summary

An investigation by cybersecurity startup Lasso Security reveals that more than 1,500 HuggingFace API tokens are exposed, including those from Meta.

A recent investigation into HuggingFace, a major platform for developers, has revealed that more than 1,500 API tokens are exposed. According to Lasso Security, a start-up specialising in cybersecurity for language models and other generative AI models, this leaves millions of Meta Llama, Bloom and Pythia users vulnerable to potential attacks.

HuggingFace is an important resource for developers working on AI projects such as language models. The platform offers an extensive library of AI models and datasets, including Meta’s widely used Llama models.

The HuggingFace API allows developers and organisations to integrate models and read, create, modify, and delete repositories or files within them using API tokens.

Ad

Ad

Lasso Security gains full access to Meta repositories

The team searched GitHub and HuggingFace repositories for exposed API tokens using their search functions. According to best practices such as OpenAI, API tokens should not be stored directly in code for this very reason.

The Lasso Security team found 1,681 tokens in their search and were able to uncover accounts from major organizations including Meta, Microsoft, Google, and VMware. The data also gave the team full access to the widely used Meta Llama, Bloom, Pythia, and HuggingFace repositories. Exposing such a large number of API tokens poses significant risks to organizations and their users, the team said.

Lasso lists some key dangers associated with exposed API tokens:

1. Supply chain vulnerabilities: If potential attackers gained full access to accounts such as Meta Llama2, BigScience Workshop and EleutherAI, they could manipulate existing models and potentially turn them into malicious entities, the team says. This could affect millions of users who rely on these basic models for their applications.

2. Training data poisoning: With write access to 14 datasets with tens and hundreds of thousands of downloads per month, attackers could manipulate trusted datasets, compromising the integrity of AI models based on them, with far-reaching consequences.

Recommendation

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top