Information Contrastive Learning (I-Con) header

Information Contrastive Learning (I-Con)

A periodic table for machine learning

Published

esearchers from MIT, Microsoft, and Google have introduced a “periodic table of machine learning” that stands to unify many different machine learning techniques using a single framework. Their framework, called Information Contrastive Learning (I-Con), shows that a variety of different algorithms including classification, regression, large language modeling, clustering, dimensionality reduction, and spectral graph theory, can all be viewed in a more general context.

I-Con reframes many popular techniques as variations of a shared mathematical idea: learning relationships between data points. Much like how the periodic table in chemistry organized elements and predicted new ones, the researchers’ table doesn’t just clarify existing machine learning methods but it also forecasts new ones. In fact, one such prediction already led to a state-of-the-art algorithm that can classify images without any human-labeled data.

Project I-Con machine learning periodic table

Understanding I-Con: The clustering gala

At the heart of their framework is a simple idea: most algorithms can be re-written in terms of learning the relationships between different data points. I-Con shows that although different algorithms preserve different kinds of relationships, the core mathematics behind these approaches are all exactly the same. 

To get an intuitive understanding of the mathematics behind I-Con, imagine you’re at a grand ballroom party. You know a few people, but most of the guests are strangers. Suddenly, you hear the host tapping a champagne glass, signaling everyone to find a table for dinner. You quickly scan the room, looking for your friends to sit with. Depending on how social or introverted the guests are, you might be able to figure out the friends groups just by looking at the guests at each table. 

This scene gives a simple way to understand how “clustering” looks like in I-Con. Here, each guest represents a datapoint, and their friends are nearby datapoints. Each table is a “cluster,” and the guests are happiest when sitting with their friends, forming tight, compact groups that reflect the data’s structure. It’s not always possible to seat every friend together, and better seating arrangements keep more friends together. In this view, clustering looks like approximating the complex network of friend connections with the connections people can form by sitting at the same table together.

Unifying machine learning methods with I-Con

In a broader sense, many machine learning algorithms are like this party scene, with slight variations in how guests (datapoints) find their friends and arrange themselves at tables (clusters). In a real party, there are many different ways people can connect: some bonds are strong, others tenuous. Sometimes a connection is formed over common interests, other times by a shared hometown. By changing how datapoints connect with their neighbors, one can form many different machine learning algorithms.  

In the team’s recent paper, which will appear at the 2025 International Conference on Learning Representations, they show that by changing the algorithm’s notion of which datapoints are neighbors they can recreate over 20 different common machine learning algorithms. More concretely, I-Con aims to approximate the connections present in the underlying data with a simplified representation, like trying to approximate complex social networks with a concrete seating arrangement in the party example. Crucially, in I-Con a “connection” is a flexible idea: it can mean visual similarity, shared class labels, cluster membership, and many other types of relationships. Furthermore, relationships don’t have to be absolute—they can have degrees of confidence, just like in the room.

AlgorithmInput Data ConnectivityOutput Data Connectivity
The Gala ExampleFriendshipSharing a Table
Clustering (K-Means)Physical ProximitySharing a Cluster
Dimensionality Reduction (SNE, t-SNE, PCA)High Dimensional Physical ProximityLow Dimensional Physical Proximity
Self-Supervised Representation LearningDid the two datapoints arise from the same process?Physical Proximity
Graph Clustering (Spectral Clustering)Are two nodes connected by an edge?Sharing a Cluster
Classification (Cross Entropy)Is a datapoint associated with a particular class?Physical Proximity
Large Language ModellingDoes this token complete this text?Physical Proximity
Project I-Con overview diagram showing representations of Spatian, Discrete, Cluster, and Graph

This simple yet fundamental idea unifies a broad spectrum of techniques. Dimensionality reduction tools like SNE, t-SNE or PCA define neighborhoods based on physical proximity of datapoints. Supervised classification methods group data by labels. Clustering algorithms focus on shared group membership. What once seemed like distinct algorithms invented in isolation sometimes hundreds of years apart now appear as variations within a single, coherent framework. More formally, each method aims to minimize how much their approximate connections deviate from the real data’s “connections” using a distance called the Kullback Liebler divergence.

Project I-Con overview diagram showing the flow of learned representations and distribution

Organizing algorithms into a periodic table

Just like in chemistry, organizing the known elements opens the door to discovering new ones. As the team studied the I-Con framework, several patterns began to emerge. In particular, the team noticed that different “connection” types appeared over and over again in the algorithms they studied. The team then decided to make a table enumerating all of the main ways points could connect together in a real dataset, and all of the ways they could approximate these connections in an algorithm. Each machine learning method they studied fit neatly into a square. Most surprisingly after filling in the methods they knew, there were still many “gaps” in this table waiting to be filled. These gaps pointed to techniques that did not yet exist—but plausibly could.

Filling gaps in the periodic table

One such method that the team introduced combined recent advances in debiased contrastive representation learning with clustering to create a state-of-the-art algorithm for recognizing images—without a single human label. By combining the connections between data points used in debiased contrastive learning, with the approximate connections used in clustering the team built a new method for clustering images. In particular the method they discovered that was missing in a particular square of the table, could classify images from the ImageNet-1K dataset 8% better than prior approaches. 

To understand debiasing intuitively we can return to our clustering gala example. In this context debiasing adds a little bit of friendship between every guest. This small amount of friendship not only improves the overall vibe of the party, but makes it a bit easier to create pleasant seating arrangements. Though the technique was originally developed to improve “representation learning” they found that they could easily apply the technique to anything in the I-Con framework, including clustering.

A framework for discovery

What makes I-Con powerful isn’t just that it explains existing algorithms—it gives researchers a toolkit to design new ones. Once different methods are expressed in the same conceptual language, it becomes easier to experiment: redefine neighborhoods, adjust uncertainty, combine strategies. Each variation corresponds to a new entry in the periodic table.

“It’s not just a metaphor,” says MIT Master’s student and first author Shaden Alshammari. “We’re starting to see machine learning as a system with structure that is a space we can explore, rather than just guess our way through.”

In this light, I-Con reframes machine learning as a kind of design science. It not only organizes what we already know, but also reveals what’s missing—and how we might build it.

Looking forward

As artificial intelligence continues to expand its reach, frameworks like I-Con offer a way to bring order to the chaos. They help researchers see the hidden structure beneath the surface—and give them the tools to innovate with purpose, not just intuition.

For those outside the AI world, it’s a reminder that even in fields as complex as machine learning, simple patterns may still be waiting to be found. Much like how the periodic table once brought coherence to chemistry, I-Con offers a hopeful step toward understanding the deeper structure of learning—not by solving intelligence, but by revealing that, at its heart, learning might just be the art of mapping relationships.

And there’s still plenty of space left on the table.

Related publications

Continue reading

See all blog posts