Skip to Main Content

WRI-101, Prof. Frangipane, Fall 2025: Generative AI_2

MIT Exhibition "AI: Mind the Gap"

"GPT-3 [Generative Pre-trained Transformer] is an AI that can generate text by predicting the words that should follow a given word or phrase."

Keeping up to date with the latest ....

NYT: What Makes A.I. Chatbots Go Wrong?

NYT: What makes chatbots go wrong

NYT: What makes chatbot go wrong? by Cade Metz, March 29, 2023

NYT - AI chatbots pinpoint statistical patterns

Why not train all generative AI bots using "good" writing from reliable sources?

List of articles on the infringement of copyright by training AI language models

SAMPLE CHAT: Gemini explains calculus. TAKEAWAY: Human's language holds meaning; Generative AI holds numerical vectors.

[SD's prompt / question]

Is your knowledge base cataloged or indexed, so that words and phrases appropriate to calculus, such as "derivative" and "integral," are available to be retrieved and used to compose your answer? Is your large language model divided into subsets, based on the perceived topic? 

[Google Gemini's response]

That's an excellent set of questions that touches upon the internal workings of large language models like me!

Is your knowledge base cataloged or indexed?

Yes, absolutely, though not in a way that directly maps to human-readable "catalogs" or "indexes" in the traditional sense.

Think of it less like a library with books arranged by subject and more like a vast, multi-dimensional web of interconnected concepts and relationships. When I'm trained, I don't just memorize isolated facts. Instead, I learn the statistical relationships between words, phrases, ideas, and contexts.

  • Embeddings: Every word, phrase, and even concept is represented internally as a numerical vector (an "embedding") in a high-dimensional space. Words and phrases that are semantically similar (e.g., "derivative," "rate of change," "slope of tangent") are located closer together in this space. This proximity allows for efficient retrieval and association.
  • Semantic Networks (Implicit): While not explicitly built as a symbolic "semantic network" by humans, the training process effectively creates an implicit semantic network . When you mention "calculus," concepts like "derivative," "integral," "limit," "instantaneous change," "accumulation," "slope," "area under curve," and their associated contexts (physics, economics, engineering) are all highly activated and brought into consideration.
  • Pattern Recognition: I learn patterns in how language is used. For example, I learn that "derivative" is frequently followed by phrases like "of a function," "with respect to," and "represents the instantaneous rate of change." This allows me to generate coherent and contextually appropriate text.

 

SD's takeaways:

  • Humans learn words to convey and understand meaning.

  • Generative AI assigns numerical vectors to words, enabling statistical pattern recognition.

Gemini's explanation of calculus was not nearly as helpful as the articles written by Professor Steven Strogatz for The New York Times, cited below:

What's Not to Like about Generative AI? (Or, what is risked?)

Upcoming Trends: Increase in the Digital Divide

Large Language Models (LLMs): Digital Divide: Free vs. Subscription Model
ACCESS TRAINING DATA and SOURCE MATERIAL 

USE CASES 

Free

Scrapes text off of the Internet.

AI generated writing (Example: ChatGPT)

  • Dramatically increases the speed of producing written text.
  • Risk of inaccuracies and 'hallucinations.'

Subscription

  • Paid / premium versions available to individuals and/or institutions

Premium versions may claim to be based on newer, larger, and cleaner training datasets.

 

For users who wish to have access to:

  • the latest models
  • faster response times
  • higher usage limits
  • potential for integration with other functions and systems.

 

AI tools incorporated into other subscription services / products

  • Article databases

Articles contained within an enclosed environment, under copyright protection, and secured by vendor database constraints and contracts. 

Generative AI, search engines, and databases:
"But AI for discovery is only as good as the dataset. This is the sweet spot for library vendors. Web of Science, Scopus, and other large scholarly databases have benefited from injecting a little AI into the discovery layer. These AI layers, while expensive, are more complex than their free counterparts. They assist researchers in rewriting prompts, analyzing the suitability of a resource, and identifying gaps in the literature." 

"Answering your AI questions: Search engines, PDF readers, and special collections." 
Rachel Hendrick
June 11, 2025

AI summaries of journal article(s) 

  • Increases the speed of gathering and synthesizing (vetted) information.
  • Example: JSTOR

 

Energy, Environmental, & Financial Costs

 

Five proposed data centers from OpenAI may use the same amount of electricity as Massachusetts