Graphs are a widely used method for visually representing relationships in data. They are particularly useful when data is too complex or abundant to be adequately described in text and requires less space. However, graphs should not be used for small amounts of data that can be succinctly conveyed in a sentence. Similarly, there is no need to repeat the data in the text when using a graph. Graphs should be employed when data displays significant trends or reveals relationships between variables. If the data does not show any notable evidence of a trend, then a graph may not be the most suitable choice.

Table of Contents

## Principles for Creating Clear and Readable Graphs

While there are numerous computer programs available to generate graphs, authors must still adhere to some basic principles. First and foremost, clarity and readability are essential requirements for any graph. This is not only determined by the font size and symbols used, but also by the type of graph itself. Providing a clear and descriptive legend for each graph is crucial. Depending on the format, graphs may have multiple parts: a figure number, a caption (not a title), a headnote, a data field, axes and scales, symbols, legends, and a credit or source line. In most cases, the vertical axis (ordinate, Y axis) of a graph represents the dependent variable, while the horizontal axis (abscissa, X axis) represents the independent variable. Consequently, time is always represented on the X axis. At minimum, graphs should include a caption, axes and scales, symbols, and a data field. It is important to ensure that plotting symbols are distinct, legible, and provide good contrast between the figure in the foreground and the background. Open and closed circles offer the best contrast and are more effective than a combination of open circles and squares. Just like the title of the paper itself, each legend should convey as much information as possible about what the graph is illustrating, without summarizing or interpreting the results or experimental details. Avoid restating the axis labels in the legend, such as “temperature vs. time.” Selecting the appropriate graph type based on the data to be presented is crucial. Line diagrams or scattergrams should be used when both the independent and dependent variables are numeric, while bar graphs are suitable when only the dependent variable is numeric. For proportions, bar graphs or pie charts can be used. The following sections provide brief descriptions of each graph type.

## Scattergrams

A scattergram is ideal for showing the relationship between two variables and determining if their values change consistently. For example, it can be used to analyze the relationship between the concentration levels of two different proteins.

## Line Graphs

Similar to scattergrams, line graphs showcase the relationship between two variables. However, the X values in line graphs represent a continuous variable, such as time, temperature, or pressure. They depict a series of related values that demonstrate how Y changes as a function of X. Typically, line graphs are designed with the dependent variable on the Y-axis and the independent variable on the horizontal X-axis. For instance, survival plots using the Kaplan-Meier method show the proportion of individuals remaining free of or experiencing a specific outcome over time.

## Bar Graphs

Bar graphs can have either horizontal or vertical columns, with the length of the bars representing the respective values. They are used to compare the value of a single variable across different groups. For example, bar graphs can be utilized to compare the mean protein concentration levels of a cohort of patients with those of a control group.

## Histograms

A histogram, also known as a frequency distribution graph, is a specialized type of bar graph that does not have any gaps between the columns. It represents data from the measurement of a continuous variable. In a histogram, individual data points are grouped together in classes to show the frequency of data in each class. The frequency is measured by the area of the column. Histograms are particularly helpful in demonstrating the distribution of a measured category along a measured variable. For instance, they can be used to examine if a variable follows a normal distribution, such as the distribution of protein levels among different individuals in a population.

## Pie Charts

Pie charts display classes or groups of data in relation to the entire dataset. The whole pie represents all the data, while each slice or segment represents a different class or group within the dataset. Each slice should show significant variations. It is generally recommended to limit the number of categories to between 3 and 10.

## Box Plots

Box plots can be either horizontal or vertical and are used to present a statistical summary of one or more variables. They display important statistical measures such as the minimum, lower quartile, median, and maximum. Additionally, box plots can identify outlier data points. The spacing between different parts of the box indicates the degree of dispersion and whether the data distribution is symmetrical or skewed.

## Common Errors to Avoid

There are several common errors that should be avoided when creating graphs. It is important to ensure that information in the text is not duplicated in the graphs, and vice versa. Proper legends must be included, and the correct graph type should be chosen to accurately represent the data. Graphs should be plotted to scale, and data should be labeled consistently and clearly. Manipulating data, such as exaggerating or interrupting it to achieve a desired effect, is misleading and should be avoided. Another error to be mindful of is including a line that suggests an unsubstantiated extrapolation between or beyond the data points. Connecting discrete data points with a continuous line can mislead readers into assuming the existence of values between the plotted points, when in reality, there is no data to support those interpolated values. A better approach for displaying separate values would be a bar chart, where each column represents the average value obtained from each group. If an extremely large range needs to be covered and cannot be practically shown with a continuous scale, indicating a discontinuity in the scale and data field with paired diagonal lines (—//—) can be used to represent a missing extent of the range.

For more information on graphs and other related topics, visit 5 WS.