Friday 14 March 2008

Cheminformatics, Chemogenomics, Chemometrics

Cheminformatics
.
Cheminformatics (also known as chemoinformatics and chemical informatics) is the use of computer and informational techniques, applied to a range of problems in the field of chemistry. These in silico techniques are used in pharmaceutical companies in the process of drug discovery. These methods can also be used in chemical and allied industries in various other forms.
The term Chemoinformatics was defined by F.K. Brown in 1998:
Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization.Since then, both spellings have been used, and some have evolved to be established as Cheminformatics, while European Academia settled in 2006 for Chemoinformatics.
Cheminformatics combines the scientific working fields of chemistry and computer science for example in the area of chemical graph theory and mining the chemical space. It is to be expected that the chemical space contains at least 1060 molecules. Cheminformatics can also applied to data analysis for various industries like paper and pulp,dyes and such allied industries.
The primary application of cheminformatics is in the storage of information relating to compounds. The efficient search of such stored information includes topics that are dealt in computer science as data mining and machine learning. Related research topics include: The in silico representation of chemical structures uses specialized formats such as the XML-based Chemical Markup Language, or SMILES. These representations are often used for storage in large chemical databases. While some formats are suited for visual representations in 2 or 3 dimensions, others are more suited for studying physical interactions, modeling and docking studies.
.
Chemogenomics:
.
Chemogenomics can be defined as a genomic response to chemical compounds. The goal is the rapid identification of novel drugs and drug targets embracing multiple early phase drug discovery technologies ranging from target identification and validation, over compound design and chemical synthesis to biological testing and ADME profiling.
.
Chemometrics:
.
Chemometrics is the application of mathematical or statistical methods to chemical data. The International Chemometrics Society (ICS) offers the following definition:
Chemometrics is the science of relating measurements made on a chemical system or process to the state of the system via application of mathematical or statistical methods.
Chemometric research spans a wide area of different methods which can be applied in chemistry. There are techniques for collecting good data (optimization of experimental parameters, design of experiments, calibration, signal processing) and for getting information from these data (statistics, pattern recognition, modeling, structure-property-relationship estimations).
Chemometrics tries to build a bridge between the methods and their application in chemistry.

In spectroscopy, the applications of chemometrics is most often in calibration. Calibration is achieved by using the spectra as multivariate descriptors to predict concentrations of constituents of interest using statistical approaches such as Multiple Linear Regression, Principal components analysis and Partial Least Squares. Other popular chemometry techniques include approaches for ab initio prediction of number of components, noise reduction and multivariate curve resolution.

Text Source: Wikipedia Liscence NGU

Biotechnology Part-2

Current Research:
.
In January 2008, Christopher S. Chen made an exciting discovery that could potentially alter the future of medicine. He found that cell signaling that is normally biochemically regulated could be simulated with magnetic nanoparticles attached to a cell surface. The discovery of Donald Ingber, Robert Mannix, and Sanjay Kumar, who found that a nanobead can be attached to a monovalent ligand, and that these compounds can bind to Mast cells without triggering the clustering response, inspired Chen’s research. Usually, when a multivalent ligand attaches to the cell’s receptors, the signal pathway is activated. However, these nanobeads only initiated cell signaling when a magnetic field was applied to the area, thereby causing the nanobeads to cluster. It is important to note that this clustering triggered the cellular response, not merely the force applied to the cell due to the receptor binding. This experiment was carried out several times with time-varying activation cycles. However, there is no reason to suggest that the response time could not be reduced to seconds or even milliseconds. This low response time has exciting applications in the medical field. Currently it takes minutes or hours for a pharmaceutical to affect its environment, and when it does so, the changes are irreversible. With the current research in mind, though, a future of millisecond response times and reversible effects is possible. Imagine being able to treat various allergic responses, colds, and other such ailments almost instantaneously. This future has not yet arrived, however, and further research and testing must be done in this area, but this is an important step in the right direction.
.
Agriculture
.
Improve yield from crops
Using the techniques of modern biotechnology, one or two genes may be transferred to a highly developed crop variety to impart a new character that would increase its yield . However, while increases in crop yield are the most obvious applications of modern biotechnology in agriculture, it is also the most difficult one. Current genetic engineering techniques work best for effects that are controlled by a single gene. Many of the genetic characteristics associated with yield (e.g., enhanced growth) are controlled by a large number of genes, each of which has a minimal effect on the overall yield . There is, therefore, much scientific work to be done in this area.
.
Reduced vulnerability of crops to environmental stresses
Crops containing genes that will enable them to withstand biotic and abiotic stresses may be developed. For example, drought and excessively salty soil are two important limiting factors in crop productivity. Biotechnologists are studying plants that can cope with these extreme conditions in the hope of finding the genes that enable them to do so and eventually transferring these genes to the more desirable crops. One of the latest developments is the identification of a plant gene, At-DBF2, from thale cress, a tiny weed that is often used for plant research because it is very easy to grow and its genetic code is well mapped out. When this gene was inserted into tomato and tobacco cells, the cells were able to withstand environmental stresses like salt, drought, cold and heat, far more than ordinary cells. If these preliminary results prove successful in larger trials, then At-DBF2 genes can help in engineering crops that can better withstand harsh environments . Researchers have also created transgenic rice plants that are resistant to rice yellow mottle virus (RYMV). In Africa, this virus destroys majority of the rice crops and makes the surviving plants more susceptible to fungal infections .
.
Increased nutritional qualities of food crops
Proteins in foods may be modified to increase their nutritional qualities. Proteins in legumes and cereals may be transformed to provide the amino acids needed by human beings for a balanced diet . A good example is the work of Professors Ingo Potrykus and Peter Beyer on the so-called Goldenrice.
.
Reduced dependence on fertilizers, pesticides and other agrochemicals
Most of the current commercial applications of modern biotechnology in agriculture are on reducing the dependence of farmers on agrochemicals. For example, Bacillus thuringiensis (Bt) is a soil bacterium that produces a protein with insecticidal qualities. Traditionally, a fermentation process has been used to produce an insecticidal spray from these bacteria. In this form, the Bt toxin occurs as an inactive protoxin, which requires digestion by an insect to be effective. There are several Bt toxins and each one is specific to certain target insects. Crop plants have now been engineered to contain and express the genes for Bt toxin, which they produce in its active form. When a susceptible insect ingests the transgenic crop cultivar expressing the Bt protein, it stops feeding and soon thereafter dies as a result of the Bt toxin binding to its gut wall. Bt corn is now commercially available in a number of countries to control corn borer (a lepidopteran insect), which is otherwise controlled by spraying (a more difficult process).
Crops have also been genetically engineered to acquire tolerance to broad-spectrum herbicide. The lack of cost-effective herbicides with broad-spectrum activity and no crop injury was a consistent limitation in crop weed management. Multiple applications of numerous herbicides were routinely used to control a wide range of weed species detrimental to agronomic crops. Weed management tended to rely on preemergence — that is, herbicide applications were sprayed in response to expected weed infestations rather than in response to actual weeds present. Mechanical cultivation and hand weeding were often necessary to control weeds not controlled by herbicide applications. The introduction of herbicide tolerant crops has the potential of reducing the number of herbicide active ingredients used for weed management, reducing the number of herbicide applications made during a season, and increasing yield due to improved weed management and less crop injury. Transgenic crops that express tolerance to glyphosphate, glufosinate and bromoxynil have been developed. These herbicides can now be sprayed on transgenic crops without inflicting damage on the crops while killing nearby weeds.
.
Biological Engineering:
Biotechnological engineering or biological engineering is a branch of engineering that focuses on biotechnologies and biological science. It includes different disciplines such as biochemical engineering, biomedical engineering, bio-process engineering, biosystem engineering and so on. Because of the novelty of the field, the definition of a bioengineer is still undefined. However, in general it is an integrated approach of fundamental biological sciences and traditional engineering principles.
Bioengineers are often employed to scale up bio processes from the laboratory scale to the manufacturing scale. Moreover, as with most engineers, they often deal with management, economic and legal issues. Since patents and regulation (e.g. FDA regulation in the U.S.) are very important issues for biotech enterprises, bioengineers are often required to have knowledge related to these issues.
The increasing number of biotech enterprises is likely to create a need for bioengineers in the years to come. Many universities throughout the world are now providing programs in bioengineering and biotechnology (as independent programs or specialty programs within more established engineering fields).
.
Bioremediation and Biodegradation:
Biotechnology is being used to engineer and adapt organisms especially microorganisms in an effort to find sustainable ways to clean up contaminated environments. The elimination of a wide range of pollutants and wastes from the environment is an absolute requirement to promote a sustainable development of our society with low environmental impact. Biological processes play a major role in the removal of contaminants and biotechnology is taking advantage of the astonishing catabolic versatility of microorganisms to degrade/convert such compounds. New methodological breakthroughs in sequencing, genomics, proteomics, bioinformatics and imaging are producing vast amounts of information. In the field of Environmental Microbiology, genome-based global studies open a new era providing unprecedented in silico views of metabolic and regulatory networks, as well as clues to the evolution of degradation pathways and to the molecular adaptation strategies to changing environmental conditions. Functional genomic and metagenomic approaches are increasing our understanding of the relative importance of different pathways and regulatory networks to carbon flux in particular environments and for particular compounds and they will certainly accelerate the development of bioremediation technologies and biotransformation processesMarine environments are especially vulnerable since oil spills of coastal regions and the open sea are poorly containable and mitigation is difficult. In addition to pollution through human activities, millions of tons of petroleum enter the marine environment every year from natural seepages. Despite its toxicity, a considerable fraction of petroleum oil entering marine systems is eliminated by the hydrocarbon-degrading activities of microbial communities, in particular by a remarkable recently discovered group of specialists, the so-called hydrocarbonoclastic bacteria (HCB).
Text Source: Wikipedia Liscence NGU

Tuesday 11 March 2008

Biotechnology Part 1

Biotechnology is technology based on biology, especially when used in agriculture, food science, and medicine. The United Nations Convention on Biological Diversity defines biotechnology.
.
Biotechnology is often used to refer to genetic engineering technology of the 21st century, however the term encompasses a wider range and history of procedures for modifying biological organisms according to the needs of humanity, going back to the initial modifications of native plants into improved food crops through artificial selection and hybridization. Bioengineering is the science upon which all Biotechnological applications are based. With the development of new approaches and modern techniques, traditional biotechnology industries are also acquiring new horizons enabling them to improve the quality of their products and increase the productivity of their systems.
Before 1971, the term, biotechnology, was primarily used in the food processing and agriculture industries. Since the 1970s, it began to be used by the Western scientific establishment to refer to laboratory-based techniques being developed in biological research, such as recombinant DNA or tissue culture-based processes, or horizontal gene transfer in living plants, using vectors such as the Agrobacterium bacteria to transfer DNA into a host organism. In fact, the term should be used in a much broader sense to describe the whole range of methods, both ancient and modern, used to manipulate organic materials to reach the demands of food production. So the term could be defined as, "The application of indigenous and/or scientific knowledge to the management of (parts of) microorganisms, or of cells and tissues of higher organisms, so that these supply goods and services of use to the food industry and its consumers.
Biotechnology combines disciplines like genetics, molecular biology, biochemistry, embryology and cell biology, which are in turn linked to practical disciplines like chemical engineering, information technology, and robotics. Patho-biotechnology describes the exploitation of pathogens or pathogen derived compounds for beneficial effect.
.
History:
.
The most practical use of biotechnology, which is still present today, is the cultivation of plants to produce food suitable to humans. Agriculture has been theorized to have become the dominant way of producing food since the Neolithic Revolution. The processes and methods of agriculture have been refined by other mechanical and biological sciences since its inception. Through early biotechnology farmers were able to select the best suited and highest-yield crops to produce enough food to support a growing population, including Ali. Other uses of biotechnology were required as crops and fields became increasingly large and difficult to maintain. Specific organisms and organism byproducts were used to fertilize, restore nitrogen, and control pests. Throughout the use of agriculture farmers have inadvertently altered the genetics of their crops through introducing them to new environments and breeding them with other plants--one of the first forms of biotechnology. Cultures such as those in Mesopotamia, Egypt, and Iran developed the process of brewing beer. It is still done by the same basic method of using malted grains (containing enzymes) to convert starch from grains into sugar and then adding specific yeasts to produce beer. In this process the carbohydrates in the grains were broken down into alcohols such as ethanol. Later other cultures produced the process of Lactic acid fermentation which allowed the fermentation and preservation of other forms of food. Fermentation was also used in this time period to produce leavened bread. Although the process of fermentation was not fully understood until Louis Pasteur’s work in 1857, it is still the first use of biotechnology to convert a food source into another form.
Combinations of plants and other organisms were used as medications in many early civilizations. Since as early as 200 BC, people began to use disabled or minute amounts of infectious agents to immunize themselves against infections. These and similar processes have been refined in modern medicine and have led to many developments such as antibiotics, vaccines, and other methods of fighting sickness.
In the early twentieth century scientists gained a greater understanding of microbiology and explored ways of manufacturing specific products. In 1917, Chaim Weizmann first used a pure microbiological culture in an industrial process, that of manufacturing corn starch using Clostridium acetobutylicum to produce acetone, which the United Kingdom desperately needed to manufacture explosives during World War I.
The field of modern biotechnology is thought to have largely begun on June 16, 1980, when the United States Supreme Court ruled that a genetically-modified microorganism could be patented in the case of Diamond v. Chakrabarty Indian-born Ananda Chakrabarty, working for General Electric, had developed a bacterium (derived from the Pseudomonas genus) capable of breaking down crude oil, which he proposed to use in treating oil spills. A university in Florida is now studying ways to prevent tooth decay. They altered the bacteria in the tooth called Streptococcus mutans by stripping it down so it could not produce lactic acid.
.
Applications:
.
Biotechnology has applications in four major industrial areas, including health care (medical), crop production and agriculture, non food (industrial) uses of crops and other products (e.g. biodegradable plastics, vegetable oil, biofuels), and environmental uses.
For example, one application of biotechnology is the directed use of organisms for the manufacture of organic products (examples include beer and milk products). Another example is using naturally present bacteria by the mining industry in bioleaching. Biotechnology is also used to recycle, treat waste, clean up sites contaminated by industrial activities (bioremediation), and also to produce biological weapons.
A series of derived terms have been coined to identify several branches of biotechnology, for example:
.
Red biotechnology:

Red biotechnology is applied to medical processes. Some examples are the designing of organisms to produce antibiotics, and the engineering of genetic cures through genomic manipulation.
.
Green biotechnology :

Green biotechnology is biotechnology applied to agricultural processes. An example would be the selection and domestication of plants via micropropagation. Another example is the designing of transgenic plants to grow under specific environmental conditions or in the presence (or absence) of certain agricultural chemicals. One hope is that green biotechnology might produce more environmentally friendly solutions than traditional industrial agriculture. An example of this is the engineering of a plant to express a pesticide, thereby eliminating the need for external application of pesticides. An example of this would be Bt corn. Whether or not green biotechnology products such as this are ultimately more environmentally friendly is a topic of considerable debate.
.
White biotechnology:

White biotechnology , also known as industrial biotechnology, is biotechnology applied to industrial processes. An example is the designing of an organism to produce a useful chemical. Another example is the using of enzymes as industrial catalysts to either produce valuable chemicals or destroy hazardous/polluting chemicals (examples using oxidoreductases are given in Feng Xu (2005) “Applications of oxidoreductases: Recent progress” Ind. Biotechnol. 1, 38-50). White biotechnology tends to consume less in resources than traditional processes used to produce industrial goods.
.
Blue biotechnology:

Blue biotechnology is a term that has been used to describe the marine and aquatic applications of biotechnology, but its use is relatively rare.
.
Medicine:
In medicine, modern biotechnology finds promising applications in such areas as:
1- Pharmacogenomics:
Pharmacogenomics is the study of how the genetic inheritance of an individual affects his/her body’s response to drugs. It is a coined word derived from the words “pharmacology” and “genomics”. It is hence the study of the relationship between pharmaceuticals and genetics. The vision of pharmacogenomics is to be able to design and produce drugs that are adapted to each person’s genetic makeup. Pharmacogenomics results in the following benefits:
1. Development of tailor-made medicines. Using pharmacogenomics, pharmaceutical companies can create drugs based on the proteins, enzymes and RNA molecules that are associated with specific genes and diseases. These tailor-made drugs promise not only to maximize therapeutic effects but also to decrease damage to nearby healthy cells.
2. More accurate methods of determining appropriate drug dosages. Knowing a patient’s genetics will enable doctors to determine how well his/ her body can process and metabolize a medicine. This will maximize the value of the medicine and decrease the likelihood of overdose.
3. Improvements in the drug discovery and approval process. The discovery of potential therapies will be made easier using genome targets. Genes have been associated with numerous diseases and disorders. With modern biotechnology, these genes can be used as targets for the development of effective new therapies, which could significantly shorten the drug discovery process.
4. Better vaccines. Safer vaccines can be designed and produced by organisms transformed by means of genetic engineering. These vaccines will elicit the immune response without the attendant risks of infection. They will be inexpensive, stable, easy to store, and capable of being engineered to carry several strains of pathogen at once.
.
Pharmaceutical products:

Computer-generated image of insulin hexamers highlighting the threefold symmetry, the zinc ions holding it together, and the histidine residues involved in zinc binding.
Most traditional pharmaceutical drugs are relatively simple molecules that have been found primarily through trial and error to treat the symptoms of a disease or illness. Biopharmaceuticals are large biological molecules known as proteins and these usually (but not always, as is the case with using insulin to treat type 1 diabetes mellitus) target the underlying mechanisms and pathways of a malady; it is a relatively young industry. They can deal with targets in humans that may not be accessible with traditional medicines. A patient typically is dosed with a small molecule via a tablet while a large molecule is typically injected.
Small molecules are manufactured by chemistry but large molecules are created by living cells such as those found in the human body: for example, bacteria cells, yeast cells, animal or plant cells.
Modern biotechnology is often associated with the use of genetically altered microorganisms such as E. coli or yeast for the production of substances like synthetic insulin or antibiotics. It can also refer to transgenic animals or transgenic plants, such as Bt corn. Genetically altered mammalian cells, such as Chinese Hamster Ovary (CHO) cells, are also used to manufacture certain pharmaceuticals. Another promising new biotechnology application is the development of plant-made pharmaceuticals.
Biotechnology is also commonly associated with landmark breakthroughs in new medical therapies to treat hepatitis B, hepatitis C, cancers, arthritis, haemophilia, bone fractures, multiple sclerosis, and cardiovascular disorders. The biotechnology industry has also been instrumental in developing molecular diagnostic devices than can be used to define the target patient population for a given biopharmaceutical. Herceptin, for example, was the first drug approved for use with a matching diagnostic test and is used to treat breast cancer in women whose cancer cells express the protein HER2.
Modern biotechnology can be used to manufacture existing medicines relatively easily and cheaply. The first genetically engineered products were medicines designed to treat human diseases. To cite one example, in 1978 Genentech developed synthetic humanized insulin by joining its gene with a plasmid vector inserted into the bacterium Escherichia coli. Insulin, widely used for the treatment of diabetes, was previously extracted from the pancreas of cattle and/or pigs. The resulting genetically engineered bacterium enabled the production of vast quantities of synthetic human insulin at low cost.
Since then modern biotechnology has made it possible to produce more easily and cheaply human growth hormone, clotting factors for hemophiliacs, fertility drugs, erythropoietin and other drugs. Most drugs today are based on about 500 molecular targets. Genomic knowledge of the genes involved in diseases, disease pathways, and drug-response sites are expected to lead to the discovery of thousands more new targets.
.
Gene therapy:

Gene therapy using an Adenovirus vector. A new gene is inserted into an adenovirus vector, which is used to introduce the modified DNA into a human cell. If the treatment is successful, the new gene will make a functional protein.
Gene therapy may be used for treating, or even curing, genetic and acquired diseases like cancer and AIDS by using normal genes to supplement or replace defective genes or to bolster a normal function such as immunity. It can be used to target somatic (i.e., body) or germ (i.e., egg and sperm) cells. In somatic gene therapy, the genome of the recipient is changed, but this change is not passed along to the next generation. In contrast, in germline gene therapy, the egg and sperm cells of the parents are changed for the purpose of passing on the changes to their offspring.
There are basically two ways of implementing a gene therapy treatment:
1. Ex vivo, which means “outside the body” – Cells from the patient’s blood or bone marrow are removed and grown in the laboratory. They are then exposed to a virus carrying the desired gene. The virus enters the cells, and the desired gene becomes part of the DNA of the cells. The cells are allowed to grow in the laboratory before being returned to the patient by injection into a vein.
2. In vivo, which means “inside the body” – No cells are removed from the patient’s body. Instead, vectors are used to deliver the desired gene to cells in the patient’s body.
Currently, the use of gene therapy is limited. Somatic gene therapy is primarily at the experimental stage. Germline therapy is the subject of much discussion but it is not being actively investigated in larger animals and human beings.
As of June 2001, more than 500 clinical gene-therapy trials involving about 3,500 patients have been identified worldwide. Around 78% of these are in the United States, with Europe having 18%. These trials focus on various types of cancer, although other multigenic diseases are being studied as well. Recently, two children born with severe combined immunodeficiency disorder (“SCID”) were reported to have been cured after being given genetically engineered cells.
Gene therapy faces many obstacles before it can become a practical approach for treating disease. At least four of these obstacles are as follows:
1. Gene delivery tools. Genes are inserted into the body using gene carriers called vectors. The most common vectors now are viruses, which have evolved a way of encapsulating and delivering their genes to human cells in a pathogenic manner. Scientists manipulate the genome of the virus by removing the disease-causing genes and inserting the therapeutic genes. However, while viruses are effective, they can introduce problems like toxicity, immune and inflammatory responses, and gene control and targeting issues.
2. Limited knowledge of the functions of genes. Scientists currently know the functions of only a few genes. Hence, gene therapy can address only some genes that cause a particular disease. Worse, it is not known exactly whether genes have more than one function, which creates uncertainty as to whether replacing such genes is indeed desirable.
3. Multigene disorders and effect of environment. Most genetic disorders involve more than one gene. Moreover, most diseases involve the interaction of several genes and the environment. For example, many people with cancer not only inherit the disease gene for the disorder, but may have also failed to inherit specific tumor suppressor genes. Diet, exercise, smoking and other environmental factors may have also contributed to their disease.
4. High costs. Since gene therapy is relatively new and at an experimental stage, it is an expensive treatment to undertake. This explains why current studies are focused on illnesses commonly found in developed countries, where more people can afford to pay for treatment. It may take decades before developing countries can take advantage of this technology.
Human Genome Project
The Human Genome Project is an initiative of the U.S. Department of Energy (“DOE”) that aims to generate a high-quality reference sequence for the entire human genome and identify all the human genes.
The DOE and its predecessor agencies were assigned by the U.S. Congress to develop new energy resources and technologies and to pursue a deeper understanding of potential health and environmental risks posed by their production and use. In 1986, the DOE announced its Human Genome Initiative. Shortly thereafter, the DOE and National Institutes of Health developed a plan for a joint Human Genome Project (“HGP”), which officially began in 1990.
The HGP was originally planned to last 15 years. However, rapid technological advances and worldwide participation accelerated the completion date to 2003 (making it a 13 year project). Already it has enabled gene hunters to pinpoint genes associated with more than 30 disorders
Text Source: Wikipedia Liscence NGU

Bioinformatics

Introduction
.
The terms bioinformatics and computational biology are often used interchangeably. However bioinformatics more properly refers to the creation and advancement of algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data. Computational biology, on the other hand, refers to hypothesis-driven investigation of a specific biological problem using computers, carried out with experimental or simulated data, with the primary goal of discovery and the advancement of biological knowledge. Put more simply, bioinformatics is concerned with the information while computational biology is concerned with the hypotheses. A similar distinction is made by National Institutes of Health in their working definitions of Bioinformatics and Computational Biology, where it is further emphasized that there is a tight coupling of developments and knowledge between the more hypothesis-driven research in computational biology and technique-driven research in bioinformatics. Bioinformatics is also often specified as an applied subfield of the more general discipline of Biomedical informatics.
A common thread in projects in bioinformatics and computational biology is the use of mathematical tools to extract useful information from data produced by high-throughput biological techniques such as genome sequencing. A representative problem in bioinformatics is the assembly of high-quality genome sequences from fragmentary "shotgun" DNA sequencing. Other common problems include the study of gene regulation to perform expression profiling using data from microarrays or mass spectrometry.

.
Major Research Areas:
.
Sequence Analysis:
.
Sequence alignment and Sequence databaseSince the Phage Φ-X174 was sequenced in 1977, the DNA sequences of hundreds of organisms have been decoded and stored in databases. The information is analyzed to determine genes that encode polypeptides, as well as regulatory sequences. A comparison of genes within a species or between different species can show similarities between protein functions, or relations between species (the use of molecular systematics to construct phylogenetic trees). With the growing amount of data, it long ago became impractical to analyze DNA sequences manually. Today, computer programs are used to search the genome of thousands of organisms, containing billions of nucleotides. These programs would compensate for mutations (exchanged, deleted or inserted bases) in the DNA sequence, in order to identify sequences that are related, but not identical. A variant of this sequence alignment is used in the sequencing process itself. The so-called shotgun sequencing technique (which was used, for example, by The Institute for Genomic Research to sequence the first bacterial genome, Haemophilus influenzae) does not give a sequential list of nucleotides, but instead the sequences of thousands of small DNA fragments (each about 600-800 nucleotides long). The ends of these fragments overlap and, when aligned in the right way, make up the complete genome. Shotgun sequencing yields sequence data quickly, but the task of assembling the fragments can be quite complicated for larger genomes. In the case of the Human Genome Project, it took several months of CPU time (on a circa-2000 vintage DEC Alpha computer) to assemble the fragments. Shotgun sequencing is the method of choice for virtually all genomes sequenced today, and genome assembly algorithms are a critical area of bioinformatics research.
Another aspect of bioinformatics in sequence analysis is the automatic search for genes and regulatory sequences within a genome. Not all of the nucleotides within a genome are genes. Within the genome of higher organisms, large parts of the DNA do not serve any obvious purpose. This so-called junk DNA may, however, contain unrecognized functional elements. Bioinformatics helps to bridge the gap between genome and proteome projects--for example, in the use of DNA sequences for protein identification.

.
Genome Annotation:
.
In the context of genomics, annotation is the process of marking the genes and other biological features in a DNA sequence. The first genome annotation software system was designed in 1995 by Dr. Owen White, who was part of the team that sequenced and analyzed the first genome of a free-living organism to be decoded, the bacterium Haemophilus influenzae. Dr. White built a software system to find the genes (places in the DNA sequence that encode a protein), the transfer RNA, and other features, and to make initial assignments of function to those genes. Most current genome annotation systems work similarly, but the programs available for analysis of genomic DNA are constantly changing and improving.
.
Computational evolutionary Biology:
.
Evolutionary biology is the study of the origin and descent of species, as well as their change over time. Informatics has assisted evolutionary biologists in several key ways; it has enabled researchers to trace the evolution of a large number of organisms by measuring changes in their DNA, rather than through physical taxonomy or physiological observations alone, more recently, compare entire genones, which permits the study of more complex evolutionary events, such as gene duplication, lateral gene transfer, and the prediction of factors important in bacterial speciation, build complex computational models of populations to predict the outcome of the system over time track and share information on an increasingly large number of species and organisms

Future work endeavours to reconstruct the now more complex tree of life.
The area of research within computer science that uses genetic algorithms is sometimes confused with computational evolutionary biology, but the two areas are unrelated.

.
Measuring Biodiversity:
.
Biodiversity of an ecosystem might be defined as the total genomic complement of a particular environment, from all of the species present, whether it is a biofilm in an abandoned mine, a drop of sea water, a scoop of soil, or the entire biosphere of the planet Earth. Databases are used to collect the species names, descriptions, distributions, genetic information, status and size of populations, habitat needs, and how each organism interacts with other species. Specialized software programs are used to find, visualize, and analyze the information, and most importantly, communicate it to other people. Computer simulations model such things as population dynamics, or calculate the cumulative genetic health of a breeding pool (in agriculture) or endangered population (in conservation). One very exciting potential of this field is that entire DNA sequences, or genones of endangered species can be preserved, allowing the results of Nature's genetic experiment to be remembered in silico, and possibly reused in the future, even if that species is eventually lost.
.
Analysis of Gene Expression:
.
The expression of many genes can be determined by measuring mRNA levels with multiple techniques including microarrays, expressed cDNA sequence tag (EST) sequencing, serial analysis of gene expression (SAGE) tag sequencing, massively parallel signature sequencing (MPSS), or various applications of multiplexed in-situ hybridization. All of these techniques are extremely noise-prone and/or subject to bias in the biological measurement, and a major research area in computational biology involves developing statistical tools to separate signal from noise in high-throughput gene expression studies. Such studies are often used to determine the genes implicated in a disorder: one might compare microarray data from cancerous epithelial cells to data from non-cancerous cells to determine the transcripts that are up-regulated and down-regulated in a particular population of cancer cells.
.
Analysis of Regulation:
.
Regulation is the complex orchestration of events starting with an extracellular signal such as a hormone and leading to an increase or decrease in the activity of one or more proteins. Bioinformatics techniques have been applied to explore various steps in this process. For example, promoter analysis involves the identification and study of sequence motifs in the DNA surrounding the coding region of a gene. These motifs influence the extent to which that region is transcribed into mRNA. Expression data can be used to infer gene regulation: one might compare microarray data from a wide variety of states of an organism to form hypotheses about the genes involved in each state. In a single-cell organism, one might compare stages of the cell cycle, along with various stress conditions (heat shock, starvation, etc.). One can then apply clustering algorithms to that expression data to determine which genes are co-expressed. For example, the upstream regions (promoters) of co-expressed genes can be searched for over-represented regulatory elements.
.
Analysis of Protein Expression:
.
Protein microarrays and high throughput (HT) mass spectrometry (MS) can provide a snapshot of the proteins present in a biological sample. Bioinformatics is very much involved in making sense of protein microarray and HT MS data; the former approach faces similar problems as with microarrays targeted at mRNA, the latter involves the problem of matching large amounts of mass data against predicted masses from protein sequence databases, and the complicated statistical analysis of samples where multiple, but incomplete peptides from each protein are detected.
.
Analysis of Mutations In Cancer:
.
In cancer, the genomes of affected cells are rearranged in complex or even unpredictable ways. Massive sequencing efforts are used to identify previously unknown point mutations in a variety of genes in cancer. Bioinformaticians continue to produce specialized automated systems to manage the sheer volume of sequence data produced, and they create new algorithms and software to compare the sequencing results to the growing collection of human genome sequences and germline polymorphisms. New physical detection technology are employed, such as oligonucleotide microarrays to identify chromosomal gains and losses (called comparative genomic hybridization), and single nucleotide polymorphism arrays to detect known point mutations. These detection methods simultaneously measure several hundred thousand sites throughout the genome, and when used in high-throughput to measure thousands of samples, generate terabytes of data per experiment. Again the massive amounts and new types of data generate new opportunities for bioinformaticians.
Text Source: Wikipedia Liscence NGU