[[“value”:”
Abstract
This dataset was compiled between August 1, 2022, and March 15, 2023, through a comprehensive literature review of 587 studies on the uptake of elements from the soil by plants (i.e., phytoremediation). As a proof of concept, we compiled research results on four commodity crops suitable for phytoremediation in semi-arid environments, namely sunflower, hemp, castor bean, and bamboo. Two hundred thirty-eight studies had data on soil types, elemental pollution, and plant components for calculating bioconcentration factors. Using a harmonized set of variables, we extracted data from these studies to create a database to organize results for interpretation and enable consistent and further literature analysis. This approach can help industry experts and environmental researchers select crops for their intended extraction applications, as well as provide insights into the bioaccumulation of toxic elements in plants.
Background & Summary
Soil contamination by toxic elements is a widespread phenomenon from the urban parks in Beijing1 to the Donetsk region of Ukraine2 to the Deûle river in northern France3. Produce crops worldwide have been found to contain toxic elements absorbed from contaminated soil or water used for irrigation4. Cleaning up these contaminated lands can be cost-prohibitive. To overcome the cost concerns, a possible solution is phytoremediation – using plants to remediate pollution5,6. In this process, the plants will extract the toxic element from the soil and sequester it in the plant tissue (i.e., bioconcentration). Given the changing climate, there is a growing interest in using non-edible commodity crops capable of producing significant biomass to remediate pollutants in semi-arid environments7.
Certain plants have been extensively studied for different applications to remove pollutants. Sunflower (Helianthus annuus L.) have been used to clean up toxic elements such as copper8, cadmium9, and radioactive cesium and strontium at the Chernobyl reactor site10. Castor bean (Ricinus communis L.), a poisonous plant in the family Euphorbiaceae, has been used to remediate lead11 and chromium12, and has also been assessed as a value-added crop for biofuel production13. Bamboo (Bambusa vulgaris L., Phyllostachys edulis L., and P. praecox C.D.Chu.& C.S.Chao) also has numerous applications for phytoremediation, from removing toxic elements14 to reducing the excessive nutrient content of pig slurry15. Hemp (Cannabis sativa L.) has commonly been used outside of the U.S. for cleaning up soils from toxic elements16 and often been considered for projects that want to use parts of the plants for value-added crops such as bioenergy17 or textiles18 along with phytoremediation19. With the passage of the 2018 Farm Bill20 and the legalization in the U.S. for growing hemp, it has emerged as a crop of interest not only for commercial uses but also for phytoattenuation (i.e., combining remediation with a value-added crop application such as fiber or seed oil for bioenergy17,18).
Despite the wealth of knowledge and interest in phytoremediation (with over 57,000 returns in PubMed), there is a lack of standardized methods to compare the phytoremediation potential of a variety of crops. The inconsistency in experimental design and data reporting increases the challenge of comparing phytoremediation efficiency indicators for different plants, an issue inherent to data integration and meta-analysis21. Here, we develop a harmonized set of variables to extract knowledge from peer-reviewed literature on hemp, sunflower, castor bean, and bamboo. Building on the previous effort22,23, this study aimed to provide proof of concept for a standardized comparison across different plant species and components for toxic element remediation using bioconcentration factors (BCFs), a ratio of chemical absorption by an organism to levels compared with the surrounding environment. This approach can facilitate the identification of the most suitable crop option based on project goals and site conditions, thereby optimizing phytoremediation to improve environmental quality.
Summary of the data
In this study, we collected 587 citations for phytoremediation articles on four plants: sunflower, hemp, castor bean, and bamboo. We found relevant data in 238 articles, with the majority of articles including sunflower data (126 articles), followed by castor bean (63 articles), hemp (46 articles), and bamboo (13 articles). Nine articles contained data for more than one of the four plants. A total of 6,679 observations (defined as unique calculations of BCFs; see the METHODS section) were collected, consisting of 3,732 sunflower observations, 1,263 castor bean observations, 1,483 hemp observations, and 201 bamboo observations. 2,408 observations specified plant varieties within each of the four plants. The sunflower studies included a total of 45 different varieties. The hemp studies included 43 different varieties and the castor bean studies had a total of five different varieties. Bamboo studies used three different species (Bambusa vulgaris, Phyllostachys edulis, and P. praecox), and one of the species, B. vulgaris, had three different varieties mentioned in the dataset studies. For all four plants, no single variety was used in more than five studies.
The dataset contains data on the experimental conditions for each study. Soil types were classified as: naturally found (natural); naturally found and cleaned; naturally polluted; commercially available; artificially spiked; and amended with enhancements (Table 1). The enhancements, which were described in the “treatment� variable, included interventions used to increase the ability of the plant to acquire the elements from the soil, such as the addition of chelators. Most of the research studies were conducted in soil polluted with toxic metals (60%). For a plant to acquire an element form the soil, it must be bioavailable, which means the element must be in a form that is soluble and can diffuse into the plant’s roots. However, most of the studies we analyzed did not determine the fraction of each element that was bioavailable in polluted soils. In these cases, the studies only reported the total amount of the element present in the soil. The percentage of studies using polluted soils ranged from 54% for castor bean to 85% for bamboo. We compared the BCF values of five elements in polluted soils and spiked soils (where the pollutants were introduced in a bioavailable form) in the USAGE NOTES section.
Data availability for each plant component and element
A total of 11 plant components and 27 elements were reported in the analyzed studies (Figs. 1, 2). Stem had the most observations (2,273), followed by root (1,851), leaf (911), and whole plant (831). Observations were available in all 11 plant components of sunflower. For elements, cadmium had the most observations (1,429), followed by lead (1,224), zinc (1,048), and copper (831). The first-row transition metals (i.e., from chromium to zinc) were also heavily studied. The BCF values for most elements were less than 1, indicating that the soil almost always had higher concentrations compared to the plant (Fig. 3). Selenium exhibited the highest BCF values with a median value of 2.85. Following selenium, the order of BCF values was potassium (median value = 2.35), thallium (1.02), molybdenum (1.00), and manganese (0.915), respectively. It should be noted that some of the elements are essential nutrients for the plants such as potassium, iron, zinc, etc. Sunflower had the most observations in most of the elements reported.
Number of bioconcentration factor (BCF) observations for bamboo, castor bean, hemp, and sunflower, organized by plant species and further categorized by specific plant components (e.g., roots, stems, leaves, and seeds). Sunflower has the highest number of BCF observations (3,732) in 283 analyzed studies, followed by hemp (1,483), castor bean (1,263), and bamboo (201). Among the plant components, stem had the most observations (2,273), followed by root (1,851), leaf (911), and whole plant (831).
Number of bioconcentration factor observations for bamboo, castor bean, hemp, and sunflower, sorted by elements. A total of 27 elements were identified in 283 analyzed bioconcentration studies in sunflower, hemp, castor bean, and bamboo. Cadmium has the highest number of observations (1,429), followed by lead (1,224) and zinc (1,048). The highest number of cadmium BCF observations was in sunflower, followed by castor bean, hemp, and bamboo.
The distribution of bioconcentration factor (BCF) observations by 27 elements, ranked by the median BCF of each element. A total of 6,679 BCF values obtained in 283 analyzed studies are shown on a log10 scale. The mean BCF values of the elements are ranked in a descending order and the boxes indicate the interquartile ranges. Selenium has the highest median BCF value (2.85) and thorium has the lowest (0.005).
Methods
Literature search and data collection
We searched for all peer-reviewed studies that used any of the four commodity crops suitable to remediate elemental pollution in semi-arid environments (Fig. 4). The literature search was conducted using Scopus, Google Scholar, and the Arizona State University Library platforms between August 1, 2022, and March 15, 2023. We employed a predefined set of search terms to retrieve citations for pertinent research articles. These search terms are detailed in Table 2. We then accessed all the resulting articles and manually excluded articles that we could not use for our dataset. The exclusion criteria were i.) not having relevant data, such as studies conducted in aquatic environments; ii.) not containing sufficient data to calculate BCFs; or iii.) not being available in English.
Flowchart of data extraction from literature. Four commodity crops suitable for phytoremediation in semi-arid environments were included in this analysis. Relevant peer-reviewed studies were collected through Google Scholar, ASU Library Search, and Scopus. Papers meeting any of the following exclusion criteria—lack of data access, non-English language, or absence of relevant data (i.e., papers without obtainable BCF values)—were excluded from the analysis. Data were extracted for 22 harmonized variables from the selected studies. Among these, some studies directly reported BCF values, while others required BCF values to be calculated based on provided information.
All the remaining articles containing accessible and relevant data were manually data-mined for the information listed in Table 1. These harmonized variables ensured that we systematically gathered relevant and useful information. In cases where the article itself did not report the BCF value, we calculated individual BCF values by taking the element’s concentration in the plant matter and divided that by the reported concentration in the soil.
Database software and visualization
All information was collected using Microsoft Excel Spreadsheet Software (Microsoft Corporation, Redmond, WA, USA) for data collection and organization. Each data entry in the database represents a single bioconcentration factor calculation (i.e., summary statistic). Data were extracted from the spreadsheet and were stored in a single CSV file. Data were analyzed using the R programming language version 4.3.2 (R Core Team, Vienna, Austria) with the tidyverse package version 2.0.0 for data manipulation including ggplot2 for data visualization24.
Data Records
The datasets and code associated with this manuscript are archived on Zenodo (Ha et al.25) accessible through the https://doi.org/10.5281/zenodo.13363473. The main dataset can be found under the file name “phytoremediation_database.csv�. Table 1 defines the fields found in the dataset file, which includes up to 22 different types of information that was collected, if available, for each study.
Technical Validation
All data was manually extracted from the peer-reviewed articles using standardized criteria according to Table 1. Where terminology used was ambiguous, the data entry of each harmonized variable was discussed and input into the dataset only after a consensus was reached. Data included in the dataset was then manually double-checked for accuracy. Data was also inspected to ensure all entries for each variable had information consistent with the other variables at each data point. Post-processing code was utilized to ensure that the citations listed corresponded to those used in the dataset (Ha et al.25). The processing code also ensured data entered were the correct type for all variables.
Dataset consistency with previous studies
The high number of phytoremediation studies collected on sunflower and hemp in the dataset is consistent with the fact that those crops are well-established options for conventional phytoremediation. Both plants have an abundance of studies and ample data availability found in the other studies26,27,28. Castor bean and bamboo are less-used options for phytoremediation29.
Comparing bioconcentration factor between plant species and components
Besides uptake of an element, which tissue and location the plant moves the element to (i.e., translocation) is an important part of how the plant removes the element from the soil. Differences in elemental levels between different plant components are reported in Fig. 5. On average, the reported numbers for BCF values were higher in the roots compared to the leaf and stem (p < 0.05; Fig. 5 and data repositary25). These results suggest that movement from the roots to the stems and leaves is a significant impediment to concentrating the element in the above-ground biomass. Sunflower was the only plant that had a median BCF value for leaf samples as high as the median BCF for the roots, making this plant the best suited for phytoextraction30,31,32. This explains the popularity of using sunflower in bioremediation experiments.
Usage Notes
The code in the Zenodo repository can be used to reproduce the statistical summaries, tests, and visualizations presented here. For instance, we conducted a set of statistical tests to illustrate potential insights that this database could support. The first step was to subset the data to only include three selected crops (castor bean, hemp, and sunflower), three plant components (stem, root, and leaf), five elements (Cd, Cu, Cr, As, or Pb), and two soil types (polluted and spiked). Because the BCF observations were bounded at zero and strongly right-skewed, we performed the following analyses on BCF transformed using the Box-Cox transformation33 – which is defined as (y^prime =frac(y^lambda -1)lambda ) – and we estimated the value of λ as −0.95 using the BoxCox function in the R MASS package34. We used analysis of variance (ANOVA) to test the hypothesis that BCF values vary as a function of plant component (stem, leaf, or root), element (Cd, Cu, Cr, As, or Pb), and soil type (polluted vs spiked; see Table 3). We assessed the normality of residuals using a Shapiro-Wilks test and visual inspection of the quantile-quantile plot.
Toxic element bioconcentration in naturally polluted and spiked soils
In the roots, stems, and leaves of sunflower and hemp, we observed that when the soil was deliberately spiked with toxic elements such as arsenic, cadmium, chromium, copper, and lead, the BCF values were elevated (p < 0.05) (Table 3). However, when the soil was naturally polluted, the BCF values were typically much notably lower. In the sunflower observations, the mean cadmium BCF was roughly three times as large in spiked soils relative to polluted soil (i.e., 5.85 to 1.89), while for copper, the BCF value has increased over six-fold (i.e., 2.57 to 0.42) and chromium BCF increased over 80 fold (i.e., 24.45 to 0.29). In the spiked experiments, the chemical was typically dissolved in water and introduced into the soil, rendering the chemical accessible to the plant. This variation in the data was expected as in naturally polluted soils, the element may be bound in an inaccessible form, such as a silicate or within the confines of a pebble, preventing the plant from accessing the element. Even within polluted and spiked soils, it is likely that the BCF values are also confounded by the heterogeneous methods used across studies. The database includes a number of covariates related to experimental design, and users are cautioned to account for such heterogeneity when interpreting results and consult original publications when encountering discrepancies.
Despite the Box-Cox transformation, the residuals of the ANOVA regression were not normal. The Shapiro-Wilk test was significant and the Q-Q plot showed deviations from normality in the tails of the distribution. However, due to the large sample size (2,803 observations of 6,679 were used in statistical analysis) and the fact that the results were significant and consistent under log 10 and rank transformation, we assume the impact of these deviations are minimal.
Code availability
The code used to process this data is archived on Zenodo (Ha et al.25) accessible through the https://doi.org/10.5281/zenodo.13363473.
References
Chen, T. et al. Assessment of heavy metal pollution in surface soils of urban parks in Beijing, China. Chemosphere (Oxford) 60(4), 542–551, https://doi.org/10.1016/j.chemosphere.2004.12.072 (2005).
Sergeeva, A., Zinicovscaia, I., Vergel, K., Yushin, N. & Urošević, M. A. The Effect of Heavy Industry on Air Pollution Studied by Active Moss Biomonitoring in Donetsk Region (Ukraine). Archives of Environmental Contamination and Toxicology 80(3), 546–557, https://doi.org/10.1007/s00244-021-00834-2 (2021).
Louriño-Cabana, B. et al. Potential risks of metal toxicity in contaminated sediments of Deûle river in Northern France. Journal of Hazardous Materials 186(2–3), 2129–2137, https://doi.org/10.1016/j.jhazmat.2010.12.124 (2011).
Sharma, A. & Nagpal, A. K. Contamination of vegetables with heavy metals across the globe: hampering food security goal. Journal of Food Science and Technology 57(2), 391–403, https://doi.org/10.1007/s13197-019-04053-5 (2020).
Padmavathiamma, P. K. & Li, L. Y. Phytoremediation Technology: Hyper-accumulation Metals in Plants. Water, Air, and Soil Pollution 184(1–4), 105–126, https://doi.org/10.1007/s11270-007-9401-5 (2007).
Rascio, N. & Navari-Izzo, F. Heavy metal hyperaccumulating plants: How and why do they do it? And what makes them so interesting? Plant Science (Limerick) 180(2), 169–181, https://doi.org/10.1016/j.plantsci.2010.08.016 (2011).
Surucu, A., Marif, A., Majid, S., Farooq, S. & Tahir, N. Effect of different water sources and water availability regimes on heavy metal accumulation in two sunflower species. Carpathian journal of earth and environmental sciences. 15, 289–300, https://doi.org/10.26471/cjees/2020/015/129 (2020).
Mahardika, G., Rinanti, A. & Fachrul, M. F. Phytoremediation of heavy metal copper (Cu2+) by sunflower (Helianthus annuus l.). IOP Conference Series. Earth and Environmental Science 106(1), 12120-, https://doi.org/10.1088/1755-1315/106/1/012120 (2018).
Alaboudi, K. A., Ahmed, B. & Brodie, G. Phytoremediation of Pb and Cd contaminated soils by using sunflower (Helianthus annuus) plant. Annals of Agricultural Science 63(1), 123–127, https://doi.org/10.1016/j.aoas.2018.05.007 (2018).
Adler, T. Botanical Cleanup Crews. In Science news (Washington) (Vol. 150, Issue 3, pp. 42–43). Science Service. https://doi.org/10.2307/3980349 (1996).
Bamagoos, A. A. et al. Alleviating lead-induced phytotoxicity and enhancing the phytoremediation of castor bean. International Journal of Phytoremediation, 24(9) (2021).
Ali, S. et al. Microbe-citric acid assisted phytoremediation of chromium by castor bean (Ricinus communis L.). Chemosphere (Oxford) 296, 134065–134065, https://doi.org/10.1016/j.chemosphere.2022.134065 (2022).
Olivares, A. R., Carrillo-González, R., González-Chávez, M. D. C. A. & Hernández, R. M. S. Potential of castor bean (Ricinus communis L.) for phytoremediation of mine tailings and oil production. Journal of environmental management 114, 316–323 (2013).
Bian, F., Zhong, Z., Zhang, X., Yang, C. & Gai, X. Bamboo – An untapped plant resource for the phytoremediation of heavy metal contaminated soils. Chemosphere (Oxford) 246, 125750–125750, https://doi.org/10.1016/j.chemosphere.2019.125750 (2020).
Piouceau, J. et al. Bamboo plantations for phytoremediation of pig slurry: Plant response and nutrient uptake. Plants (Basel) 9(4), 522-, https://doi.org/10.3390/plants9040522 (2020).
Placido, D. F. & Lee, C. C. Potential of Industrial Hemp for Phytoremediation of Heavy Metals. Plants (Basel) 11(5), 595-, https://doi.org/10.3390/plants11050595 (2022).
Todde, G., Carboni, G., Marras, S., Caria, M. & Sirca, C. Industrial hemp (Cannabis sativa L.) for phytoremediation: Energy and environmental life cycle assessment of using contaminated biomass as an energy resource. Sustainable Energy Technologies and Assessments 52, 102081-, https://doi.org/10.1016/j.seta.2022.102081 (2022).
De Vos, B., Souza, M. F., Michels, E. & Meers, E. Industrial hemp (Cannabis sativa L.) in a phytoattenuation strategy: Remediation potential of a Cd, Pb and Zn contaminated soil and valorization potential of the fibers for textile production. Industrial Crops and Products 178, 114592-, https://doi.org/10.1016/j.indcrop.2022.114592 (2022).
Guo, Y., Wen, L., Zhao, X., Xing, C. & Huang, R. Industrial hemp (Cannabis sativa L.) can utilize and remediate soil strongly contaminated with Cu, As, Cd, and Pb by Phytoattenuation. Chemosphere, 142199 (2024).
U.S. Government. Agriculture Improvement Act of 2018. Washington: U.S. Government Publishing Office; n.d. Public law 115–3342018.
Koricheva, J., Gurevitch, J. & Mengersen, K. (Eds.) Handbook of meta-analysis in ecology and evolution. Princeton University Press (2013).
Famulari, S. & Witz, K. A user-friendly phytoremediation database: creating the searchable database, the users, and the broader implications. International journal of phytoremediation 17(8), 737–744, https://doi.org/10.1080/15226514.2014.987369 (2015).
Reeves, R. D. et al. A global database for plants that hyperaccumulate metal and metalloid trace elements. The New phytologist 218(2), 407–411, https://doi.org/10.1111/nph.14907 (2018).
Wickham, H. et al. Welcome to the Tidyverse. Journal of open source software 4(43), 1686 (2019).
Ha, H. et al. Remediating Toxic Elements with Sunflower, Hemp, Castor Bean, and Bamboo: An Open Dataset of Harmonized Variables, Zenodo., https://doi.org/10.5281/zenodo.13363473 (2024).
Eben, P., Mohri, M., Pauleit, S., Duthweiler, S. & Helmreich, B. Phytoextraction potential of herbaceous plant species and the influence of environmental factors – A meta-analytical approach. Ecological Engineering. 199. https://doi.org/10.1016/j.ecoleng.2023.107169 (2024).
Mark, T. et al. Economic viability of industrial hemp in the United States: a review of state pilot programs (2020).
Rizwan, M. et al. Phytomanagement of heavy metals in contaminated soils using sunflower: a review. Critical Reviews in Environmental Science and Technology 46(18), 1498–1528 (2016).
Liang, Z., Kovács, G. P., Gyuricza, C. & Neményi, A. Potential use of bamboo in the phytoremediation in of heavy metals: A review. Acta Agraria Debreceniensis, (1), 91-97 (2022).
Niu, Z., Li, X. & Mahamood, M. Accumulation potential cadmium and Lead by sunflower (Helianthus annuus L.) under citric and glutaric acid-assisted phytoextraction. International Journal of Environmental Research and Public Health 20(5), 4107 (2023).
Shah, N. et al. EDTA and IAA ameliorates phytoextraction potential and growth of sunflower by mitigating Cu-induced morphological and biochemical injuries. Life 13(3), 759 (2023).
Zhao, X., Joo, J. C., Du, D., Li, G. & Kim, J. Y. Modelling heavy-metal phytoextraction capacities of Helianthus annuus L. and Brassica napus L. Chemosphere 337, 139341 (2023).
Box, G. E. P. & Cox, D. R. An analysis of transformations (with discussion). Journal of the Royal Statistical Society B 26, 211–252 (1964).
Venables, W. N. & Ripley, B. D. Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0 (2002).
Author information
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ha, H., Sweat, K.G., Conrow, K.D. et al. Remediating toxic elements with sunflower, hemp, castor bean, & bamboo: an open dataset of harmonized variables.
Sci Data 12, 905 (2025). https://doi.org/10.1038/s41597-025-05239-7
Received: 20 November 2024
Accepted: 20 May 2025
Published: 29 May 2025
DOI: https://doi.org/10.1038/s41597-025-05239-7
“]] Scientific Data – Remediating toxic elements with sunflower, hemp, castor bean, & bamboo: an open dataset of harmonized variables Read More