The HAND Database: a gateway to understanding the role of HIV in HIV-associated neurocognitive disorders

Background Despite an augmented research effort and scale-up of highly active antiretroviral therapy, a high prevalence of HIV-1-associated neurocognitive disorders (HAND) persists in the HIV-infected population. Nearly 50 % of all HIV-1-infected individuals suffer from a neurocognitive disorder due to neural and synaptodendritic damage. Challenges in HAND research, including limited availability of brain tissue from HIV patients, variation in HAND study protocols, and virus genotyping inconsistency and errors, however, have resulted in studies with insufficient power to delineate molecular mechanisms underlying HAND pathogenesis. There exists, therefore, a great need for a reliable and centralized resource specific to HAND research, particularly for epidemiological study and surveillance in resource-limited countries where severe forms of HAND persist. Description To address the aforementioned imperative need, here we present the HAND Database, a resource containing well-curated and up-to-date HAND virus information and associated clinical and epidemiological data. This database provides information on 5,783 non-redundant HIV-1 sequences from global HAND research published to date, representing a total of 163 unique individuals that have been assessed for HAND. A user-friendly interface allows for flexible searching, filtering, browsing, and downloading of data. The most comprehensive database of its kind, the HAND Database not only bolsters current HAND research by increasing sampling power and reducing study biases caused by protocol variation and genotyping inconsistency, it allows for comparison between HAND studies across different dimensions. Development of the HAND Database has also revealed significant knowledge gaps in HIV-driven neuropathology. These gaps include inadequate sequencing of viral genes beyond env, lack of HAND viral data from HIV epidemiologically important regions including Asian and Sub-Saharan African countries, and biased sampling toward the male gender, all factors that impede efforts toward providing an improved quality of life to HIV-infected individuals, and toward elimination of viruses in the brain. Conclusion Our aim with the HAND database is to provide researchers in both the HIV and neuroscience fields a comprehensive and rigorous data source toward better understanding virus compartmentalization and to help in design of improved strategies against HAND viruses. We also expect this resource, which will be updated on a regular basis, to be useful as a reliable reference for further HAND epidemiology studies. The HAND Database is freely available and accessible online at http://www.handdatabase.org.


Background
Human immunodeficiency virus (HIV)-associated neurocognitive disorder (HAND) occurs due to damage to neurons and synapses by viral protein products, and due to a chemokine/cytokine imbalance in the brain, a proinflammatory response to HIV infection of macrophages and microglia [1][2][3]. HIV entry into the brain is an early event following infection [4], and presence of the blood brain barrier greatly limits entry of antiretroviral therapy into the brain. Our ability to control viral levels within and viral damage to the HIV-infected brain, therefore, remains highly limited. While the introduction of highly active antiretroviral therapy (HAART) brought about a decrease in the incidence of the most severe forms of HAND, i.e., HIV-associated dementia, the prevalence of milder forms has continued to increase [5][6][7]. In the recent HIV Anti-Retroviral Therapy Effects Research Study, nearly 50 % of all HIV-1 individuals exhibited some form of HAND, including deficits in motor function, verbal fluency, learning, memory, and attention [8]. HAND individuals experience difficulty performing dayto-day tasks, are less likely to adhere to medical treatments and other HIV-1 prevention practices, and ultimately suffer from around a threefold increased risk of death as compared to a mentally-healthy HIV-1 individual [9]. In addition, in resource-limited countries, the most severe forms of HAND continue to devastate the mental health of HIV individuals [9].
Delineating the underpinning molecular mechanisms of HAND development is critical to providing HIVinfected individuals an elevated quality of life, as well as toward clearance of the virus repertoire in the brain. Research in this area, however, has been largely limited by availability of samples from both the brain and from HAND-assessed individuals. In addition, a need to understand HAND progression across an HIV individual's lifespan, coupled with difficulty in obtaining brain samples, has made cerebrospinal fluid (CSF) sampling a surrogate endpoint for assessing HAND development [10]. Both, small sample size from individual studies and indirect CSF inference have made it difficult to fully assess the complex interaction between viruses and the brain in the HAND setting. Additionally, variations in study methodologies and result interpretations have further confounded HAND studies, leading to conflicting findings in the field. To address these issues, there therefore exists a great need for a reliable HIV sequence resource, of adequate sample size, for HAND research.
Toward this effort, we developed a centralized HAND Database based on all HAND studies published to date. This resource database is freely accessible at: http://www.handdatabase.org. The HAND Database serves as the most comprehensive database in its field, and contains well-curated HAND virus information, epidemiology sampling data, patient clinical status, and therapy treatment information. All information was cross-validated using multiple resources, including the literature, GenBank entry, and author contact. Furthermore, all viral sequences have undergone stringent quality control examination, including genotyping validation, in order to minimize genotyping errors frequently seen in HIV subtypebased studies [11].
The only other published HIV database related to brain tissue, The HIV Brain Sequence Database [12], contains HIV env sequences from brain tissue, as well as from other tissues in patients with brain samples. In contrast, our database contains HAND-specific information with regards to virus sequences (genome coverage beyond env), epidemiology sampling information, clinical data, and treatment status, all factors important to the study of HAND pathogenesis. Unprecedented in its comprehensiveness of curated HAND HIV information, our HAND Database serves as a centralized gateway to study the role of HIV in the HAND setting.

Data sources
An extensive literature review was conducted to develop a comprehensive set of HAND-related research articles, from which we then extracted sequence data from HAND-assessed individuals. This literature search resulted in the use of data from 41 published studies. Publically available HIV-1 sequence data were collected from the GenBank (last accessed 3/2013) and the LANL HIV sequence database (last accessed 2/2014) [13,14]. HIV-1 individual sampling and clinical information was collected from the relevant literature, the two aforementioned databases, and through communication with publication authors.

Sequence and clinical data filtering
All collected sequence data were validated through a series of quality control steps. We first employed the LANL quality control pipeline to check for potential problematic viruses with sequencing errors [13]. Amplification contamination was detected using BLASTn (v. 2.2.26) [15]. In addition, data regarding epidemiology sampling, clinical status, and treatment status were cross-referenced whenever available in more than one of the resources listed above.

Genotyping analysis
Genotyping of HIV sequence data is frequently inconsistent and error-prone [11].
Therefore, all filtered HIV sequences were regenotyped. Here we applied the jumping profile Hidden Markov Model genotyping program (jpHMM), whose genotyping accuracy has been established [16][17][18]. In brief, following a hypermutation analysis [13], sequences greater than 300 nucleotides in length and with a hypermutation p-value of 0.05 or greater were subject to genotyping.

Database schema
The HAND Database was constructed using the relational database management system MySQL (v.5.6.17). MySQL was chosen for its ease of use, its high reliability, and as it is freely available. HIV-1 sequence and clinical data were compiled into one flat file, with annotations divided into three major categories: sequence and sequence descriptor data, HIV-1 patient descriptor data, and sample descriptor data (Table 1). Sequence data included the HIV-1 nucleotide sequence, sequence accession number, sequence genotype information, and sequence length. Epidemiology data included the geographical location and year at time of sampling, as well as tissue sampled. Patient data at time of sampling included patient age, risk factor, health status, CD4 count, viral load, HIV treatment information (treatment status, and when applicable, treatment type and duration), and patient HAND information (HAND status, the presence or absence of HAND, and when applicable, HAND type).

Database access and web query interface
The HAND Database was developed into a publically available, web accessible resource. The database website provides a home page with background information on HAND, as well as a help page to assist with database navigation (Fig. 1). The database itself allows for easy The sequence genotyping annotation provides genotyping and recombination information in the following format: Subtype as reported in the original source material (Subtype as reported by jpHMM genotyping) and recombination information. Additional symbols used in this annotation include: "#" = not tested due to insufficient length of sequence, "*" = p < 0.05 for recombination test, "(No)" = p > 0.05 for recombination test querying and downloading of user-defined data subsets. Researchers can perform a simple search using a keyword, or employ multiple column filters for a custommade data subset. Selected entries can subsequently be downloaded into a variety of formats at the user's discretion. Additional features include sorting by annotation of interest, as well as an option for viewing the complete record for any given entry.

Database content
The HAND Database currently contains 5,783 HIV-1 sequences, representing a total of 163 unique individuals assessed for HAND status. For the 87 individuals with age information available, ages ranged from 19 to 63 years, with the largest proportion of individuals between 30 and 49 years of age (69 %) (Fig. 2). Gender information was available for 64 individuals, the majority of whom were males (77 %). HAND status, the absence or presence of HAND, was obtained for almost all database individuals (96 %), and indicated a close split be-   the rest had received one or more forms of HIV monotherapy (54 %) (Fig. 4). Geographical region sampling information was available for 156 patients, with the top three sampling regions being North America (60 %), Europe (25 %), and Asia (6.4 %) (Fig. 5). Samples were derived from 20 different tissue types, with the top three sampling tissues being brain (47 %), lymph node (14 %), and CSF (7 %).
Five HIV-1 genes were represented in our database, gag, pol, env, tat, and nef, with the majority of sequence coverage in the env gene (Fig. 6). This result was expected due the known role of env in macrophage tropism, viral replication, and activation of pro-inflammatory responses toward neuronal injury [19][20][21]. Of all archived sequences, 79 % of sequences that underwent genotyping validation were of the pure B subtype, and all non-recombinant sequences were confirmed as having been correctly reported Fig. 3 Distribution of HAND Database Entries By HAND Status And HAND Type. The top chart shows HAND status distribution across all database individuals, and the bottom chart shows HAND type distribution across database individuals for whom this information was available. The majority of individuals with HAND had HIVassociated dementia (HAD), followed by HIV-encephalitis (HIVE), AIDS dementia complex (ADC), and minor cognitive-motor disorder (MCMD). HAND type designations were obtained from the literature, and for some individuals, more than one HAND type had been assigned Fig. 4 Distribution of HAND Database Entries By HIV Therapy Status And HIV Therapy Type. The top chart shows HIV therapy status distribution across all database individuals, and the bottom chart shows HIV therapy type distribution across database individuals for whom this information was available. Nearly half of all treated individuals had received HAART. Therapy type designations were as we found to be reported in the literature, and for some individuals, more than one HIV therapy type had been assigned in the literature. Sixteen sequences were found to have undergone recombination events not reported in either the source literature or databases.

Discussion
Despite increased HAND research and treatment efforts, the persistent prevalence of HAND continues to pose a great challenge to the HIV research and patient communities. Investigation in this area is limited by small sample sizes, primarily due to difficulty in obtaining tissue samples, and by variation in study protocols and result interpretation. Furthermore, errors and inconsistency in HIV genotyping compound the complexity in delineating viral mechanisms toward neuropathology. The HAND Database described here serves to narrow these research gaps and addresses the need for a reliable and centralized HAND data source for advanced research purposes.
The HAND database contains up-to-date and wellcurated HAND virus and patient information. All sequence data have been subject to stringent quality control examination and re-genotyping, thereby laying a solid foundation toward elucidation of viral mechanisms driving neuropathology under various epidemiology settings.  In creating this resource we noted a number of sequencing and sampling biases that currently limit research in the area, and have developed a set of potential research directions that may greatly benefit the HAND research community. First, although prior studies have indicated the role of multiple HIV proteins, including Nef, Vpr, and Tat [22][23][24][25][26][27][28][29], toward HAND development, the majority of research in the area has focused on the gp120 envelope glycoprotein. This sequencing bias is largely due to interest in Env for its role in conferring viral tropism for microglia and macrophage cells [30][31][32][33], its role in nonneuronal cell replication [34], and for its potential as an HIV therapeutic target [35]. A shortage of sequence data beyond the env gene, however, limits our ability to perform data-driven HAND research on the complete viral genome, and therefore an increase in sequencing efforts in other areas of the genome would provide insight into the role of regulatory and accessory proteins toward HAND pathogenesis. Second, there is a distinct lack of sequence data from HIV epidemiologically important regions including many Asian and Sub-Saharan African countries (Fig. 5). Limited access to HAART contributes to an increased vulnerability of HIV individuals in these geographical regions to the most severe forms of HAND.
Recent studies indicate HIV-associated dementia (HAD) affects over 25 % of HIV individuals in several Sub-Saharan African countries [36][37][38]. In addition, research on treatment-naïve HIV-1-individuals in Thailand has greatly contributed to our understanding of HAND pathogenesis [39]. Finally, we noted a bias toward sequencing of male individuals. Research beyond the HIV field has implicated gender as playing a role in determining those genetic processes leading to neurocognitive deficiencies [40,41]. A lack of information on HAND females, however, currently proves an obstacle in determining potential gender differences in HAND pathogenesis.

Conclusions
Developing a better understanding of mechanisms underlying the development of neurocognitive disorders is crucial toward providing the HIV patient community with a higher quality of life, and toward prevention of enhanced transmission. Through consolidation and validation of data from multiple data sources, here we have developed the HAND Database, a single, intuitive platform from which researchers can launch their highthroughput HAND sequencing projects. The HAND database contains up-to-date and curated HAND HIV virus and HIV-infected individual information, providing a solid foundation toward the elucidation of viral mechanisms driving this neuropathology. In particular, we anticipate this database will be of great use in increasing HAND research efforts in resource-limited countries. We plan to continue expanding the HAND Database as new HAND viral sequence data become publically available.

Availability and requirements
All records are freely available and accessible at www.handdatabase.org.