Which Modality should I use - Text, Motif, or Image? : Understanding Graphs with Large Language Models

Debarati Das, Ishaan Gupta, Jaideep Srivastava, Dongyeop Kang

Minnesota NLP Lab

Abstract

Large language models (LLMs) are revolutionizing various fields by leveraging large text corpora for context-aware intelligence. Due to the context size, however, encoding an entire graph with LLMs is fundamentally limited. This paper explores how to integrate graph data with LLMs better and presents a novel approach using various encoding modalities (e.g., text, image, and motif) and approximation of global connectivity of a graph using different prompting methods to enhance LLMs' effectiveness in handling complex graph structures. The study also introduces GRAPHTMI, a new benchmark for evaluating LLMs in graph structure analysis, focusing on factors such as homophily, motif presence, and graph difficulty. Key findings reveal that image modality, supported by advanced vision-language models like GPT-4V, is more effective than text in managing token limits while retaining critical information. The research also examines the influence of different factors on each encoding modality's performance. This study highlights the current limitations and charts future directions for LLMs in graph understanding and reasoning tasks.

Main Contributions

We conduct a comprehensive breadth-first analysis of graph-structure prompting in different modalities, including text, image, and motif, using large language and vision-language models for node classification tasks.
We also conduct a depth-first analysis of how different factors influence the performance of each encoding modality.
We introduce Graph TMI a novel graph benchmark featuring a hierarchy of graphs, associated prompts, and encoding modalities designed to further the community's understanding of graph structure effects using LLMs.

Image Encoding:
Placeholder image

Text Encoding:
Adjacency list: {'710': ['2212', '2213', '51', '1392'], '2212': ['710', '1392', '2216'], '1392': ['2212', '710', '51'], '2216': ['2212', '2214', '2215', '457', '421', '1724'], '2213': ['710', '51', '421'], '51': ['710', '1392', '2213', '2214', '2215', '457'], '2214': ['2216', '51', '1201'], '2215': ['2216', '51', '457', '1724'], '457': ['2216', '51', '1201', '2215'], '421': ['2216', '2213'], '1724': ['2216', '2215'], '1201': ['2214', '457']}

Node to Label Mapping: Node 2212: Label 3| Node 710: Label 3| Node 1392: Label 3| Node 2216: Label 3| Node 2213: Label 3| Node 51: Label 3| Node 421: Label 3| Node 2214: Label 3| Node 1201: Label 3| Node 2215: Label 3| Node 457: Label 3| Node 1724: Label 2|

Motif Encoding:
No of star motifs: 0
No of triangle motifs: 6
Triangle motifs attached to ? node: [[710, 1392, 51], [710, 2213, 51], [51, 2215, 457]]
Star motifs connected to ? node: [[2213, 421, 710, 51], [2214, 2216, 1201, 51]]

Image Encoding:
Placeholder image

Text Encoding:
Adjacency list: {'55': ['771', '1156'], '771': ['55', '65', '1080', '1156'], '65': ['771', '1156'], '1080': ['771', '149', '2344', '61', '697', '738', '2162'], '1156': ['771', '55', '65'], '149': ['1080', '1416', '1655', '2025'], '2344': ['1080', '279', '2343', '2475'], '61': ['1080', '1416', '2312', '1309', '2162'], '697': ['1080', '30'], '738': ['1080', '1927', '405', '30', '2162'], '2162': ['1080', '738', '30', '61', '2163'], '405': ['1927', '738'], '1927': ['405', '738'], '30': ['738', '1416', '697', '2162', '2343'], '1416': ['30', '61', '149'], '2343': ['30', '2344'], '2312': ['61'], '1309': ['61'], '1655': ['149'], '2025': ['149'], '279': ['2344'], '2475': ['2344'], '2163': ['2162']}

Node to Label Mapping: Node 771: Label 0| Node 55: Label 4| Node 65: Label 0| Node 1080: Label 3| Node 1156: Label 0| Node 1927: Label 0| Node 405: Label 0| Node 738: Label 0| Node 1416: Label 0| Node 30: Label 3| Node 61: Label 0| Node 149: Label 0| Node 2312: Label 0| Node 1655: Label 0| Node 2025: Label 0| Node 279: Label 0| Node 2344: Label 3| Node 1309: Label 5| Node 697: Label 3| Node 2162: Label 3| Node 2343: Label 3| Node 2475: Label 3| Node 2163: Label 3|

Motif Encoding:
No of star motifs: 0
No of triangle motifs: 6
Triangle motifs attached to ? node: [[738, 2162, 1080], [2162, 1080, 61]]
Star motifs connected to ? node: [[149, 1416, 2025, 1655, 1080], [2344, 2475, 279, 2343, 1080], [61, 1416, 2312, 2162, 1309, 1080]]

Image Encoding:
Placeholder image

Text Encoding:
Adjacency list: {'1072': ['773', '1797', '1798', '1799', '1800', '1801', '1802', '1803', '1804', '1805', '20', '417', '1070', '189', '236', '244', '306', '342', '945', '958', '973', '1262', '1358', '1478', '1483', '1505', '1725', '1733', '1740', '1784'], '773': ['1072', '1505'], '1505': ['773', '1072', '1801'], '1797': ['1072', '306'], '1798': ['1072', '306', '958'], '1799': ['1072', '306'], '1800': ['1072'], '1801': ['1072', '1505'], '1802': ['1072', '306'], '1803': ['1072'], '1804': ['1072', '306', '1740'], '1805': ['1072', '306'], '20': ['1072'], '417': ['1072', '306'], '1070': ['1072', '728', '1358', '1725'], '189': ['1072', '1262'], '236': ['1072', '306'], '244': ['1072', '1358'], '306': ['1072', '1797', '1798', '1799', '1802', '1804', '1805', '417', '236', '958', '973', '1483'], '342': ['1072', '1512'], '945': ['1072', '1911', '2388', '1512'], '958': ['1072', '306', '1798'], '973': ['1072', '306'], '1262': ['1072', '189'], '1358': ['1072', '1740', '1070', '1725', '1733', '1483', '244'], '1478': ['1072'], '1483': ['1072', '306', '1358'], '1725': ['1072', '1740', '1070', '1358'], '1733': ['1072', '1358'], '1740': ['1072', '1804', '1725', '1358'], '1784': ['1072'], '728': ['1296', '1070', '2388', '2553'], '1296': ['728', '1911'], '1911': ['1296', '945', '1368', '2553'], '2388': ['728', '551', '175', '945', '1377', '1914'], '2553': ['728', '1911', '1368'], '1368': ['1911', '2553'], '551': ['2388'], '175': ['2388', '1914'], '1377': ['2388'], '1914': ['2388', '175'], '1512': ['342', '945', '2145', '2023'], '2145': ['1512'], '2023': ['1512']}

Node to Label Mapping: Node 773: Label 1| Node 1072: Label 2| Node 1505: Label 1| Node 1797: Label 1| Node 306: Label 1| Node 1798: Label 1| Node 958: Label 1| Node 1799: Label 3| Node 1800: Label 1| Node 1801: Label 0| Node 1802: Label 3| Node 1803: Label 0| Node 1804: Label 2| Node 1740: Label 2| Node 1805: Label 1| Node 1296: Label 0| Node 728: Label 5| Node 1911: Label 5| Node 20: Label 5| Node 417: Label 3| Node 551: Label 6| Node 2388: Label 5| Node 1070: Label 2| Node 1358: Label 2| Node 1725: Label 2| Node 175: Label 6| Node 1914: Label 6| Node 189: Label 2| Node 236: Label 1| Node 244: Label 2| Node 342: Label 1| Node 945: Label 1| Node 973: Label 5| Node 1262: Label 2| Node 1478: Label 5| Node 1483: Label 2| Node 1733: Label 2| Node 1784: Label 1| Node 1512: Label 5| Node 1377: Label 5| Node 1368: Label 5| Node 2553: Label 5| Node 2145: Label 1| Node 2023: Label 5|

Motif Encoding:
No of star motifs: 0
No of triangle motifs: 29
Triangle motifs attached to ? node: []
Star motifs connected to ? node: [[945, 1072, 1512, 1911, 2388], [1512, 2145, 342, 2023, 945]]

Node classification on a Graph using Text, Graph and Image Modalities

Key Takeaways

Overall Results across all Modalities

Modality encoder trends (T= Text, M= Motif, I= Image) with graph task difficulty based on homophily and no. of motifs, highlight the significance of integrating local and global information in LLM processing

We observe that while the text and image modalities have similar accuracy rates, the motif modality exhibits the highest mismatch rate, and the image modality stands out with the lowest denial rate and token limit fraction, as depicted along the mean metrics (y-axis) against each modality type (x-axis)

Specific Results from Text, Image and Motif modalities

Each dataset and sampling type (x-axis) is mapped against mean metrics (y-axis), with bar textures distinguishing between ego graph (ego) and forest fire (ff) sampling; the metrics include accuracy rate (↑), mismatch rate (↓), denial rate (↓), and token limit fraction (↓), indicating the desired trends for each. Graph structures of different datasets and sampling strategies influence node classification performance.

We compare the edge representation type (x-axis) with the value of the mean metrics (y-axis). The desired trend is given in brackets for each metric. The highest performing edge representation is the “adjacency list” representation with the highest accuracy (A ↑) and low mismatch rate (M ↓), denial rate (D ↓), and token limit fraction (T ↓).

Our comparison of image representations (x-axis) with mean metrics (y-axis) shows that human readability of images correlates with classification performance, considering Accuracy Rate (A ↑), Mismatch Rate (M ↓), and Denial Rate (D ↓), with desired trends indicated in brackets.

We compare the motif information (x-axis) to the mean metrics (y-axis). Desired trends are denoted in brackets. Metrics considered are Accuracy Rate (A ↑), Mismatch Rate (M ↓), and Denial Rate (D ↓). The highest performing motif information change “triangle and star attached to ?” has higher accuracy and lower mismatch and denial rate.