LLaMa Code Example and Large Language Model Overviews

In AI, LLM stands for Large Language Model. These are advanced types of machine learning models designed to process and generate human-like text based on vast amounts of text data. LLMs are trained on a variety of language tasks, including text completion, translation, summarization, and even coding. Popular examples of LLMs include OpenAI’s GPT series, Google’s BERT, and Meta’s LLaMA.

Their “large” nature comes from having billions to trillions of parameters (the internal adjustable elements that help the model learn patterns in data), enabling them to handle complex language tasks with high accuracy.

Note about Google. There is a common misconception that BERT is now Gemini. Gemini was previously called Bard.

BERT and Gemini are distinct models in Google’s AI landscape rather than one being a rebranding of the other. BERT (Bidirectional Encoder Representations from Transformers) is an influential language model from Google introduced in 2018, known for its ability to understand the context of words in a sentence through bidirectional training. BERT has been widely applied in natural language processing tasks, especially in Google Search.

Gemini, however, is a newer, multimodal language model series that Google launched in 2023, which powers its updated AI chatbot, formerly known as Bard. Gemini is advanced in handling diverse input formats—text, audio, images, and video—and has been optimized for complex tasks such as logical reasoning, contextual understanding, and multimodal data processing. The Gemini series includes several versions like Gemini Pro and Gemini Ultra (Gemini Advanced), with additional models launched throughout 2024 for various applications and devices. This evolution reflects Google’s broader AI ambitions beyond what BERT was initially designed to achieve.

If you’d like to try a LLM as a developer, here is how to install Facebook’s LLaMa:

Code Example for LLaMa

Install Dependencies:

pip install transformers torch

Then you can run it:

from transformers import LlamaTokenizer, LlamaForCausalLM
import torch

# Load the tokenizer and model
model_name = "meta-llama/LLaMA-7B"  # Replace with the model name you're using
tokenizer = LlamaTokenizer.from_pretrained(model_name)
model = LlamaForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16).to("cuda")

# Define the input prompt
input_prompt = "Once upon a time in a futuristic city, there was an AI that could"
inputs = tokenizer(input_prompt, return_tensors="pt").to("cuda")

# Generate text
with torch.no_grad():
    output = model.generate(
        **inputs,
        max_length=50,        # Adjust max_length based on desired output length
        do_sample=True,
        top_k=50,
        top_p=0.95,
        temperature=0.7
    )

# Decode and print the output
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)