Meta’s Llama has sparked significant debate at the intersection of generative artificial intelligence (Gen AI), copyright, and trade secrets. Although marketed as an "open-source" model, the actual openness of Llama is under scrutiny. Meta positioned Llama as a major advancement in AI, promoting it as a free resource for both research and commercial use. However, this characterisation is contentious. Unlike traditional open-source models, which are usually fully accessible and come with comprehensive documentation, Llama requires users to request access through a form. Moreover, its licensing imposes restrictions that challenge conventional open-source principles. For instance, the Llama Community License Agreement prohibits certain uses, such as enhancing other language models or deploying the model for specific commercial purposes, diverging from the typical open-source ethos.
Furthermore, the transparency regarding Llama’s training data has been minimal. Meta has only provided vague information, stating that the model was trained on a “new mix of publicly available data.” This lack of detailed disclosure contrasts sharply with the comprehensive data documentation provided by other open-source projects, raising concerns about potential copyright infringement and the ethical use of training data. The opacity surrounding the data used to train Llama introduces uncertainties regarding the inclusion of copyrighted or biased materials, which could have significant legal implications.
In recent months, several language models, notably Meta's LLama and Mistral's Mixtral, have sparked a significant debate about the true meaning of openness in AI. The central question is whether releasing a model's weights while keeping the training methodology and data proprietary can be considered true open-sourcing. Models like LLama and Mixtral exhibit remarkable capabilities, emphasizing the importance of transparency in their development processes. It is crucial to differentiate between "open weights" and "open source." Open weights involve providing only the pre-trained parameters, allowing others to use and fine-tune the model. However, critical details like the training code, original dataset, and architectural specifics remain undisclosed.
The confusion stems from some equating AI weights with open source, which overlooks the fundamental differences between these components. Releasing models under an open weights framework increases accessibility to powerful tools but limits transparency, reproducibility, and customisation. Users of such models rely on the creators' judgments without full visibility or control. In contrast, open-source models provide comprehensive access to source code, enabling scrutiny, modification, and redistribution. This approach promotes decentralised innovation but requires a significant commitment from creators. The choice between open weight and open source has significant implications for AI development and responsible progress. While open weights facilitate rapid application development, they centralise control within a few entities. Conversely, open source empowers global researchers to assess biases, evaluate societal impacts, and suggest improvements.
Generative AI models also raise complex legal questions related to copyright infringement and fair use. These models rely on extensive datasets that often include copyrighted materials, leading to concerns about direct infringement and the potential creation of derivative works. The case of Sarah Silverman v. Meta illustrates this issue, where Silverman and other authors alleged that their copyrighted books were used without authorisation to train Meta’s models. Although the court dismissed some claims, the ongoing litigation highlights the tension between AI training practices and copyright protections.
Similarly, the defence, where the court allowed a claim to proceed over the use of copyrighted images in training Stability AI’s model, Stable Diffusion, reflects the broader legal uncertainties. These cases emphasise the need to determine whether AI outputs that closely resemble original works constitute infringement or are sufficiently transformative. The legal resolution of such claims will significantly impact how courts interpret AI-generated content in relation to existing copyright laws.
The doctrine of fair use offers a potential defence for AI developers, permitting limited use of copyrighted material under specific conditions. However, the application of fair use to AI training practices is contentious. Previous cases, like Google Books, demonstrated that digitisation for indexing could be fair use, yet AI’s extensive use of copyrighted data complicates this application. The challenge lies in demonstrating that AI-generated outputs do not adversely affect the market or substantially copy original works.
Trade secret law offers a vital alternative for protecting innovations in generative AI. Unlike copyright and patent protections, trade secrets do not necessitate formal registration, instead relying on confidentiality measures. For instance, if a company devises a novel cure using AI, the methods and results can be classified as trade secrets provided they are kept confidential and have economic value. This approach enables the protection of proprietary processes and outputs without the need for formal intellectual property registration.
In the context of models like Llama, trade secret considerations are particularly relevant. Meta’s licensing and access strategies may involve protecting certain aspects of the model’s development as trade secrets. Despite claims of being “open-source,” the restrictions and access controls could be indicative of efforts to safeguard proprietary elements of the model’s design and functionality. As generative AIs such as Llama become more widespread, companies might adopt encryption and confidentiality measures to protect sensitive information. However, the effectiveness of these measures in ensuring trade secret protection continues to be a subject of ongoing examination.
GenAI brings unique challenges and opportunities in the realm of trade secrets. GenAI tools, which generate new content based on user prompts, are increasingly used in workplaces for tasks ranging from content creation to predictive analytics. However, their use requires a careful balance between fostering innovation and protecting sensitive information. When employees utilise GenAI, there is a risk that proprietary data could inadvertently be included in the AI’s training set or outputs, potentially exposing trade secrets to unintended audiences. Companies must implement robust policies and technological safeguards to mitigate this risk, ensuring the benefits of GenAI do not come at the expense of their competitive edge.
The integration of GenAI into business processes also raises questions about the ownership and control of generated content. GenAI systems often rely on extensive and diverse datasets, making it challenging to distinguish between original content and derived information. This blurring of lines complicates the enforcement of trade secret protections, as it becomes difficult to determine whether the information is genuinely a trade secret or a combination of public data. Legal frameworks and corporate policies must evolve to address these nuances, providing clear guidelines on the use of GenAI and the handling of sensitive information to protect trade secrets effectively.
Moreover, the transparency and explainability of GenAI models are crucial in maintaining trade secret integrity. Companies need to understand how GenAI systems process and generate information to ensure trade secrets are not inadvertently exposed. This requires a combination of technical expertise and legal acumen to audit and oversee AI operations. By fostering a culture of compliance and awareness around the use of GenAI, organisations can leverage these powerful tools while safeguarding their most valuable intellectual assets. This balance is essential for maintaining trust and competitive advantage in an increasingly AI-driven business landscape.
As AI technology evolves rapidly, impacting fields from art and literature to journalism and research, regulatory frameworks are struggling to keep up. Japan, traditionally a civil law jurisdiction, is crafting a unique regulatory approach for AI, incorporating practices from common law systems like those in the UK and the USA.
In January 2024, Japan introduced the Draft AI Guidelines for Businesses, updated from the G7 Hiroshima Summit’s 2023 framework. These guidelines focus on human dignity, diversity, inclusiveness, and sustainability, outlining principles such as human-centric design, safety, fairness, privacy protection, transparency, and accountability. Although not legally binding, the guidelines are intended to guide responsible AI development and use.
Japan’s regulatory approach integrates existing laws to tackle AI challenges. The Copyright Act governs the use of copyrighted material in AI training, while the Personal Information Protection Law mandates transparency in data handling. Competition law monitors AI for antitrust issues, and the Economic Security Promotion Act supports AI research while safeguarding trade secrets and national security. Japan’s stance is notable for its leniency regarding the use of copyrighted material for AI model training, as long as it is not used for direct reproduction or distribution.
In contrast, the U.S. is grappling with numerous copyright lawsuits involving AI, with ongoing disputes highlighting an unsettled legal landscape. Japan’s flexible approach might offer insights for balancing innovation with creator rights, suggesting a potential framework for fostering technological progress while ensuring fair compensation for creators. As AI technology advances, clear standards and ongoing dialogue will be essential for reconciling innovation with intellectual property protections.
Meta’s Llama has become a focal point for examining the intersection of generative AI, copyright, and trade secrets, highlighting significant legal challenges and uncertainties. The controversy surrounding Llama underscores the need for clear standards and transparency in intellectual property protections for AI technologies. Issues such as the authenticity of its open-source claims, the transparency of its training data, and the broader implications for intellectual property law are central to ongoing debates. As the legal framework adapts to advancements in AI technology, addressing these challenges will be crucial in balancing innovation with intellectual property protections, ensuring fair and equitable development in this rapidly evolving field.
IP Round-up
By the end of 2023, China had 378,000 effective AI invention patents, marking a year-on-year growth of over 40%, according to the China National Intellectual Property Administration (CNIPA). This growth rate is 1.4 times the global average. The AI sector exemplifies the innovation within China's digital economy, which contributed 10% to the GDP last year. In 2023, China approved 406,000 invention patents in core digital economy industries, comprising 45% of all granted patents. Ge Shu, a senior CNIPA official, highlighted that technological innovations in the digital economy have thrived, with an average annual growth rate of 21% over the past five years. Additionally, 155,000 domestic enterprises held digital economy-related patents by the end of 2023, an increase of 31,000 from the previous year. Foreign enterprises are also expanding their patents in China, with 93 countries holding valid patents in these industries, 61.8% of which are in digital product manufacturing. (Source: China Daily)
The College of Engineering Thiruvananthapuram (CET) has patented a digital modulation scheme for electric vehicles (EVs) that provides variable speed control for electric motors. This method enhances performance, safety, and battery life while being more efficient than traditional pulse width modulation. Unlike conventional strategies, it reduces switching losses and power consumption, addressing issues of high torque conditions and torque ripple. The scheme utilises digital signal processing, Sigma Delta Modulation (SDM), and vector quantisation with space vector modulation, making it suitable for implementation with Digital Signal Processors (DSP) or Field Programmable Gate Arrays (FPGA). (Source: The Hindu)
Inspired by early 20th-century mapmakers, researchers from Imperial College London have introduced a novel method to trace the use of copyrighted content in Large Language Models (LLMs). The technique involves embedding unique "copyright traps" fictitious sentences into text. By monitoring for these traps in AI-generated outputs, content owners can detect if their material was used for training LLMs. This approach aims to enhance transparency and help authors understand how their work is utilised. (Source: Techxplorer)
The Telangana government plans to establish India’s largest Artificial Intelligence (AI) city on the outskirts of Hyderabad. The city, aimed at boosting the AI ecosystem, will be constructed on 200 acres along the Outer Ring Road in Maheshwaram, Serilingampally, Chevella, and Ibrahimpatnam mandals. The foundation stone for this ambitious project will be laid in September, with the city expected to be completed by 2028. In comparison, a similar project in Uttar Pradesh, announced in December 2023, and set for completion by 2030. (Source: Hindustan Times)
The Bombay High Court has fined Patanjali Ayurved Ltd ₹4 crore for breaching a 2023 injunction that banned the sale of its camphor products, following a trademark lawsuit by Mangalam Organics Ltd. Justice R.I. Chagla deemed the breach "wilful and deliberate." Earlier this month, Patanjali was ordered to deposit ₹50 lakh. Despite an apology and admission of ₹49.5 lakh in sales after the injunction. The court requires the ₹4 crore payment within two weeks, warning of immediate custody for Patanjali director Rajneesh Mishra if not complied with. (Source: Hindustan Times)
References:
Apparasu, S. R. (2024, July 30). Telangana announces plan to build AI city in Hyderabad. Hindustan Times. https://www.hindustantimes.com/india-news/telangana-announces-plan-to-build-ai-city-in-hyderabad-101722279784932.html
Bergmann, D. (2023, December 19). Llama 2. IBM. https://www.ibm.com/topics/llama-2
Cotton, R. (2024, April). Meta announces Llama 3: The next generation of open-source LLMs. DataCamp. https://www.datacamp.com/blog/meta-announces-llama-3-the-next-generation-of-open-source-llms
Donegan, J. (2024, February 24). The US should look at Japan’s unique approach to generative AI copyright law. ManageEngine Insights. https://insights.manageengine.com/artificial-intelligence/the-us-should-look-at-japans-unique-approach-to-generative-ai-copyright-law/#:~:text=Unlike%20the%20U.S.%20and%20E.U.,AI%20(gen%20AI)%20models
Elman, J. (2023, November 28). Trade secrets and AI: A winning combination. IPWatchdog. https://ipwatchdog.com/2023/11/28/ai-trade-secrets-winning-combination/id=170001/#:~:text=%E2%80%9CTrade%20secrets%20and%20AI%20are,protecting%20information%20created%20by%20AI.%E2%80%9D
Fernandes, F. (2023, December 12). Mapped: Interest in generative AI by country. Visual Capitalist. NeoMam Studios. https://www.visualcapitalist.com/cp/mapped-interest-in-generative-ai-by-country/
Hindustan Times. (2024, July 29). Trademark infringement case: HC imposes cost of Rs 4 cr on Patanjali for breach of court order. Hindustan Times. https://www.hindustantimes.com/india-news/trademark-infringement-case-hc-imposes-cost-of-rs-4-cr-on-patanjali-for-breach-of-court-order-101722246403718.html
Heath, A. (2024, July 23). Meta unveils Llama 3.1: An open-source assistant to rival OpenAI's ChatGPT. The Verge. https://www.theverge.com/2024/7/23/24204055/meta-ai-llama-3-1-open-source-assistant-openai-chatgpt
Khetarpal, A. (2024, March 22). How to train your llama: Copyright issues related to training of generative AI models. IPIC. https://ipic.ca/english/blog/how-to-train-your-llama-copyright-issues-related-to-training-of-generative-ai-models-2024-03-22
Imperial College London. (2024, July 29). The phantom copyright holders: AI and the future of copyright law. TechXplore. https://techxplore.com/news/2024-07-phantom-copyright-holders-ai.html
Olson, P. (2024, May 9). Meta's Llama AI model isn’t free; it’s bait. Bloomberg. https://www.bloomberg.com/opinion/articles/2024-05-09/meta-zuckerberg-s-ai-model-llama-isn-t-free-it-s-bait
Ramlochan, S. (2023, December 12). LLM: Open source vs. open weights vs. restricted weights. Prompt Engineering. https://promptengineering.org/llm-open-source-vs-open-weights-vs-restricted-weights/
Rota, D., & Douglass, S. M. (n.d.). The fast-moving race between generative AI and copyright law. Baker Donelson. Retrieved July 30, 2024, from https://www.bakerdonelson.com/the-fast-moving-race-between-gen-ai-and-copyright-law
Saito, S. (2024, April 15). AI regulation for businesses in Japan. LawAsia. https://law.asia/ai-regulation-businesses-japan/
Tarkowski, A. (2023, August 11). The mirage of open source AI: Analyzing Meta’s Llama 2 release strategy. https://openfuture.eu/blog/the-mirage-of-open-source-ai-analyzing-metas-llama-2-release-strategy/
The Hindu. (2024, July 26). CET bags patent for devising a digital modulation scheme for EVs. The Hindu. https://www.thehindu.com/news/national/kerala/cet-bags-patent-for-devising-a-digital-modulation-scheme-for-evs/article68450021.ece
Winston & Strawn LLP. (2024, June 26). Harnessing generative AI: Best practices for trade secret protection. Winston & Strawn. https://www.winston.com/en/insights-news/harnessing-generative-ai-best-practices-for-trade-secret-protection
Xinhua. (2024, July 29). China unveils new AI development plan: A major boost to technology sector. China Daily. https://www.chinadaily.com.cn/a/202407/29/WS66a73184a31095c51c510823.html