Which Modality should I use - Text, Motif, or Image? : Understanding Graphs with Large Language Models

Abstract

Large language models (LLMs) are revolutionizing various fields by leveraging large text corpora for context-aware intelligence. Due to the context size, however, encoding an entire graph with LLMs is fundamentally limited. This paper explores how to integrate graph data with LLMs better and presents a novel approach using various encoding modalities (e.g., text, image, and motif) and approximation of global connectivity of a graph using different prompting methods to enhance LLMs' effectiveness in handling complex graph structures. The study also introduces GRAPHTMI, a new benchmark for evaluating LLMs in graph structure analysis, focusing on factors such as homophily, motif presence, and graph difficulty. Key findings reveal that image modality, supported by advanced vision-language models like GPT-4V, is more effective than text in managing token limits while retaining critical information. The research also examines the influence of different factors on each encoding modality's performance. This study highlights the current limitations and charts future directions for LLMs in graph understanding and reasoning tasks.


Main Contributions

  • We conduct a comprehensive breadth-first analysis of graph-structure prompting in different modalities, including text, image, and motif, using large language and vision-language models for node classification tasks.
  • We also conduct a depth-first analysis of how different factors influence the performance of each encoding modality.
  • We introduce Graph TMI a novel graph benchmark featuring a hierarchy of graphs, associated prompts, and encoding modalities designed to further the community's understanding of graph structure effects using LLMs.
Image Encoding:
Placeholder image

Text Encoding:
Adjacency list: {'710': ['2212', '2213', '51', '1392'], '2212': ['710', '1392', '2216'], '1392': ['2212', '710', '51'], '2216': ['2212', '2214', '2215', '457', '421', '1724'], '2213': ['710', '51', '421'], '51': ['710', '1392', '2213', '2214', '2215', '457'], '2214': ['2216', '51', '1201'], '2215': ['2216', '51', '457', '1724'], '457': ['2216', '51', '1201', '2215'], '421': ['2216', '2213'], '1724': ['2216', '2215'], '1201': ['2214', '457']}

Node to Label Mapping: Node 2212: Label 3| Node 710: Label 3| Node 1392: Label 3| Node 2216: Label 3| Node 2213: Label 3| Node 51: Label 3| Node 421: Label 3| Node 2214: Label 3| Node 1201: Label 3| Node 2215: Label 3| Node 457: Label 3| Node 1724: Label 2|

Motif Encoding:
No of star motifs: 0
No of triangle motifs: 6
Triangle motifs attached to ? node: [[710, 1392, 51], [710, 2213, 51], [51, 2215, 457]]
Star motifs connected to ? node: [[2213, 421, 710, 51], [2214, 2216, 1201, 51]]

Node classification on a Graph using Text, Graph and Image Modalities

Normal and Anomalous Representations

Key Takeaways


Overall Results across all Modalities

Specific Results from Text, Image and Motif modalities