Background & Summary

Soil contamination by toxic elements is a widespread phenomenon from the urban parks in Beijing1 to the Donetsk region of Ukraine2 to the Deûle river in northern France3. Produce crops worldwide have been found to contain toxic elements absorbed from contaminated soil or water used for irrigation4. Cleaning up these contaminated lands can be cost-prohibitive. To overcome the cost concerns, a possible solution is phytoremediation – using plants to remediate pollution5,6. In this process, the plants will extract the toxic element from the soil and sequester it in the plant tissue (i.e., bioconcentration). Given the changing climate, there is a growing interest in using non-edible commodity crops capable of producing significant biomass to remediate pollutants in semi-arid environments7.

Certain plants have been extensively studied for different applications to remove pollutants. Sunflower (Helianthus annuus L.) have been used to clean up toxic elements such as copper8, cadmium9, and radioactive cesium and strontium at the Chernobyl reactor site10. Castor bean (Ricinus communis L.), a poisonous plant in the family Euphorbiaceae, has been used to remediate lead11 and chromium12, and has also been assessed as a value-added crop for biofuel production13. Bamboo (Bambusa vulgaris L., Phyllostachys edulis L., and P. praecox C.D.Chu.& C.S.Chao) also has numerous applications for phytoremediation, from removing toxic elements14 to reducing the excessive nutrient content of pig slurry15. Hemp (Cannabis sativa L.) has commonly been used outside of the U.S. for cleaning up soils from toxic elements16 and often been considered for projects that want to use parts of the plants for value-added crops such as bioenergy17 or textiles18 along with phytoremediation19. With the passage of the 2018 Farm Bill20 and the legalization in the U.S. for growing hemp, it has emerged as a crop of interest not only for commercial uses but also for phytoattenuation (i.e., combining remediation with a value-added crop application such as fiber or seed oil for bioenergy17,18).

Despite the wealth of knowledge and interest in phytoremediation (with over 57,000 returns in PubMed), there is a lack of standardized methods to compare the phytoremediation potential of a variety of crops. The inconsistency in experimental design and data reporting increases the challenge of comparing phytoremediation efficiency indicators for different plants, an issue inherent to data integration and meta-analysis21. Here, we develop a harmonized set of variables to extract knowledge from peer-reviewed literature on hemp, sunflower, castor bean, and bamboo. Building on the previous effort22,23, this study aimed to provide proof of concept for a standardized comparison across different plant species and components for toxic element remediation using bioconcentration factors (BCFs), a ratio of chemical absorption by an organism to levels compared with the surrounding environment. This approach can facilitate the identification of the most suitable crop option based on project goals and site conditions, thereby optimizing phytoremediation to improve environmental quality.

Summary of the data

In this study, we collected 587 citations for phytoremediation articles on four plants: sunflower, hemp, castor bean, and bamboo. We found relevant data in 238 articles, with the majority of articles including sunflower data (126 articles), followed by castor bean (63 articles), hemp (46 articles), and bamboo (13 articles). Nine articles contained data for more than one of the four plants. A total of 6,679 observations (defined as unique calculations of BCFs; see the METHODS section) were collected, consisting of 3,732 sunflower observations, 1,263 castor bean observations, 1,483 hemp observations, and 201 bamboo observations. 2,408 observations specified plant varieties within each of the four plants. The sunflower studies included a total of 45 different varieties. The hemp studies included 43 different varieties and the castor bean studies had a total of five different varieties. Bamboo studies used three different species (Bambusa vulgaris, Phyllostachys edulis, and P. praecox), and one of the species, B. vulgaris, had three different varieties mentioned in the dataset studies. For all four plants, no single variety was used in more than five studies.

The dataset contains data on the experimental conditions for each study. Soil types were classified as: naturally found (natural); naturally found and cleaned; naturally polluted; commercially available; artificially spiked; and amended with enhancements (Table 1). The enhancements, which were described in the “treatment� variable, included interventions used to increase the ability of the plant to acquire the elements from the soil, such as the addition of chelators. Most of the research studies were conducted in soil polluted with toxic metals (60%). For a plant to acquire an element form the soil, it must be bioavailable, which means the element must be in a form that is soluble and can diffuse into the plant’s roots. However, most of the studies we analyzed did not determine the fraction of each element that was bioavailable in polluted soils. In these cases, the studies only reported the total amount of the element present in the soil. The percentage of studies using polluted soils ranged from 54% for castor bean to 85% for bamboo. We compared the BCF values of five elements in polluted soils and spiked soils (where the pollutants were introduced in a bioavailable form) in the USAGE NOTES section.

Table 1 Harmonized variables for data extraction from 238 phytoremediation studies of sunflower, hemp, castor bean, and bamboo.

Data availability for each plant component and element

A total of 11 plant components and 27 elements were reported in the analyzed studies (Figs. 1, 2). Stem had the most observations (2,273), followed by root (1,851), leaf (911), and whole plant (831). Observations were available in all 11 plant components of sunflower. For elements, cadmium had the most observations (1,429), followed by lead (1,224), zinc (1,048), and copper (831). The first-row transition metals (i.e., from chromium to zinc) were also heavily studied. The BCF values for most elements were less than 1, indicating that the soil almost always had higher concentrations compared to the plant (Fig. 3). Selenium exhibited the highest BCF values with a median value of 2.85. Following selenium, the order of BCF values was potassium (median value = 2.35), thallium (1.02), molybdenum (1.00), and manganese (0.915), respectively. It should be noted that some of the elements are essential nutrients for the plants such as potassium, iron, zinc, etc. Sunflower had the most observations in most of the elements reported.

Fig. 1

Number of bioconcentration factor (BCF) observations for bamboo, castor bean, hemp, and sunflower, organized by plant species and further categorized by specific plant components (e.g., roots, stems, leaves, and seeds). Sunflower has the highest number of BCF observations (3,732) in 283 analyzed studies, followed by hemp (1,483), castor bean (1,263), and bamboo (201). Among the plant components, stem had the most observations (2,273), followed by root (1,851), leaf (911), and whole plant (831).

Fig. 2

Number of bioconcentration factor observations for bamboo, castor bean, hemp, and sunflower, sorted by elements. A total of 27 elements were identified in 283 analyzed bioconcentration studies in sunflower, hemp, castor bean, and bamboo. Cadmium has the highest number of observations (1,429), followed by lead (1,224) and zinc (1,048). The highest number of cadmium BCF observations was in sunflower, followed by castor bean, hemp, and bamboo.

Fig. 3

The distribution of bioconcentration factor (BCF) observations by 27 elements, ranked by the median BCF of each element. A total of 6,679 BCF values obtained in 283 analyzed studies are shown on a log10 scale. The mean BCF values of the elements are ranked in a descending order and the boxes indicate the interquartile ranges. Selenium has the highest median BCF value (2.85) and thorium has the lowest (0.005).

Methods

Literature search and data collection

We searched for all peer-reviewed studies that used any of the four commodity crops suitable to remediate elemental pollution in semi-arid environments (Fig. 4). The literature search was conducted using Scopus, Google Scholar, and the Arizona State University Library platforms between August 1, 2022, and March 15, 2023. We employed a predefined set of search terms to retrieve citations for pertinent research articles. These search terms are detailed in Table 2. We then accessed all the resulting articles and manually excluded articles that we could not use for our dataset. The exclusion criteria were i.) not having relevant data, such as studies conducted in aquatic environments; ii.) not containing sufficient data to calculate BCFs; or iii.) not being available in English.

Fig. 4

Flowchart of data extraction from literature. Four commodity crops suitable for phytoremediation in semi-arid environments were included in this analysis. Relevant peer-reviewed studies were collected through Google Scholar, ASU Library Search, and Scopus. Papers meeting any of the following exclusion criteria—lack of data access, non-English language, or absence of relevant data (i.e., papers without obtainable BCF values)—were excluded from the analysis. Data were extracted for 22 harmonized variables from the selected studies. Among these, some studies directly reported BCF values, while others required BCF values to be calculated based on provided information.

Table 2 Search terms to identify phytoremediation studies of sunflower, hemp, castor bean, and bamboo.

All the remaining articles containing accessible and relevant data were manually data-mined for the information listed in Table 1. These harmonized variables ensured that we systematically gathered relevant and useful information. In cases where the article itself did not report the BCF value, we calculated individual BCF values by taking the element’s concentration in the plant matter and divided that by the reported concentration in the soil.

Database software and visualization

All information was collected using Microsoft Excel Spreadsheet Software (Microsoft Corporation, Redmond, WA, USA) for data collection and organization. Each data entry in the database represents a single bioconcentration factor calculation (i.e., summary statistic). Data were extracted from the spreadsheet and were stored in a single CSV file. Data were analyzed using the R programming language version 4.3.2 (R Core Team, Vienna, Austria) with the tidyverse package version 2.0.0 for data manipulation including ggplot2 for data visualization24.

Data Records

The datasets and code associated with this manuscript are archived on Zenodo (Ha et al.25) accessible through the https://doi.org/10.5281/zenodo.13363473. The main dataset can be found under the file name “phytoremediation_database.csv�. Table 1 defines the fields found in the dataset file, which includes up to 22 different types of information that was collected, if available, for each study.

Technical Validation

All data was manually extracted from the peer-reviewed articles using standardized criteria according to Table 1. Where terminology used was ambiguous, the data entry of each harmonized variable was discussed and input into the dataset only after a consensus was reached. Data included in the dataset was then manually double-checked for accuracy. Data was also inspected to ensure all entries for each variable had information consistent with the other variables at each data point. Post-processing code was utilized to ensure that the citations listed corresponded to those used in the dataset (Ha et al.25). The processing code also ensured data entered were the correct type for all variables.

Dataset consistency with previous studies

The high number of phytoremediation studies collected on sunflower and hemp in the dataset is consistent with the fact that those crops are well-established options for conventional phytoremediation. Both plants have an abundance of studies and ample data availability found in the other studies26,27,28. Castor bean and bamboo are less-used options for phytoremediation29.

Comparing bioconcentration factor between plant species and components

Besides uptake of an element, which tissue and location the plant moves the element to (i.e., translocation) is an important part of how the plant removes the element from the soil. Differences in elemental levels between different plant components are reported in Fig. 5. On average, the reported numbers for BCF values were higher in the roots compared to the leaf and stem (p < 0.05; Fig. 5 and data repositary25). These results suggest that movement from the roots to the stems and leaves is a significant impediment to concentrating the element in the above-ground biomass. Sunflower was the only plant that had a median BCF value for leaf samples as high as the median BCF for the roots, making this plant the best suited for phytoextraction30,31,32. This explains the popularity of using sunflower in bioremediation experiments.

Fig. 5

The distribution of bioconcentration factor (BCF) values on a log10 scale for root, stem, and leaf of bamboo, castor bean, hemp, and sunflower. The red line indicates the median BCF value. The medians for root are the highest in all four plants.

Usage Notes

The code in the Zenodo repository can be used to reproduce the statistical summaries, tests, and visualizations presented here. For instance, we conducted a set of statistical tests to illustrate potential insights that this database could support. The first step was to subset the data to only include three selected crops (castor bean, hemp, and sunflower), three plant components (stem, root, and leaf), five elements (Cd, Cu, Cr, As, or Pb), and two soil types (polluted and spiked). Because the BCF observations were bounded at zero and strongly right-skewed, we performed the following analyses on BCF transformed using the Box-Cox transformation33 – which is defined as (y^prime =frac(y^lambda -1)lambda ) – and we estimated the value of λ as −0.95 using the BoxCox function in the R MASS package34. We used analysis of variance (ANOVA) to test the hypothesis that BCF values vary as a function of plant component (stem, leaf, or root), element (Cd, Cu, Cr, As, or Pb), and soil type (polluted vs spiked; see Table 3). We assessed the normality of residuals using a Shapiro-Wilks test and visual inspection of the quantile-quantile plot.

Table 3 Comparison of mean bioconcentration factors (BCFs) of root, stem, and leaf observations in spiked and polluted soils by species and element, restricted to comparisons where the minimum number of observations (n) was ≥10.

Toxic element bioconcentration in naturally polluted and spiked soils

In the roots, stems, and leaves of sunflower and hemp, we observed that when the soil was deliberately spiked with toxic elements such as arsenic, cadmium, chromium, copper, and lead, the BCF values were elevated (p < 0.05) (Table 3). However, when the soil was naturally polluted, the BCF values were typically much notably lower. In the sunflower observations, the mean cadmium BCF was roughly three times as large in spiked soils relative to polluted soil (i.e., 5.85 to 1.89), while for copper, the BCF value has increased over six-fold (i.e., 2.57 to 0.42) and chromium BCF increased over 80 fold (i.e., 24.45 to 0.29). In the spiked experiments, the chemical was typically dissolved in water and introduced into the soil, rendering the chemical accessible to the plant. This variation in the data was expected as in naturally polluted soils, the element may be bound in an inaccessible form, such as a silicate or within the confines of a pebble, preventing the plant from accessing the element. Even within polluted and spiked soils, it is likely that the BCF values are also confounded by the heterogeneous methods used across studies. The database includes a number of covariates related to experimental design, and users are cautioned to account for such heterogeneity when interpreting results and consult original publications when encountering discrepancies.

Despite the Box-Cox transformation, the residuals of the ANOVA regression were not normal. The Shapiro-Wilk test was significant and the Q-Q plot showed deviations from normality in the tails of the distribution. However, due to the large sample size (2,803 observations of 6,679 were used in statistical analysis) and the fact that the results were significant and consistent under log 10 and rank transformation, we assume the impact of these deviations are minimal.