genex banner
Weekly Status Updates
- GeneX Release Home
- GeneX Dev Home
2003 Updates
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
2002 Updates
- December
- November
GeneX Development Supported By:
SourceForge Logo
November 28th, 2002
Happy Thanksgiving!
November 21st, 2002

Caltech Update - Diane Trout

* current accomplishments

None, well no genex related ones

* Next week

More pymerase bugfixes & test code
Make debs for installing genex

* Current problems

Distracted by proposal for other project.


Caltech Update - Brandon King

Summery:
-Worked on non-genex related work
-Worked more on getting GenePix GPR files to load into genex

Next Week:
-Finish GPR loading (Monday Deadline)

Problems:
-Too many things to do, too little time.
-Need to get GPR files loaded into GeneX 2.x ASAP
-Still fighting with the dataloader =o( (Altough, progress has been made =o)


UVa Update - Tom Laudeman

Accomplishments:
- created sql_lib.pl which holds the SQL queries, and has a single subroutine getq() which takes a query name and returns $sth. All new queries are in here, and I'm moving old ones in whenever I have to work on them. It is cool, and the code using it is far more legible.

- fixed code in various places due to fields renamed. We're working on having unique field names across all tables.

- created a working data export that makes an R formatted file from a single SQL query (albeit an interesting query with several subselects)

- started work on chgrp utilities for orders, studies, and files

- fixed the code for usf_fk renamed to als_fk in am_spots_mas5. We stopped using that field with usf_fk a long time ago.

- put the contact_type and species data init into a separate SQL file

Current problems:
- I'm not sure how conditions and condition labels should be handled in exported data files

- Our install process needs more testing and tweaking

- Customers using the system have requested a dozen UI tweaks


Open Informatics Update - Jason Stewart

current accomplishments:
* modified array-design-insert.pl to accept Affy reporter files.
* moved all Mason apps into new Mason/ dir to support language-specific directories (to enable the addition of Python, for example)
* updated Mason app framework to be easier to develop new apps by removing any dependancy on understanding/using the authentication and sessioning mechanism in Genex-2
* Added some performance tweaks for the Mason apps by pre-loading often used Perl modules at server initialization time (like XML::Xerces which takes forever to load)
* helped Caltech team with issues

problems: * hit the wall with huge INSERT's into Postgres. Got some info that I might need to a) drop indices before insert and re-create them afterwards, or b) do it outside of a transaction. Tom Lane, from the Pg developer group, indicated that this was completely wrong and is working with me to figure it out.

Until the insert problem is addressed, getting Affy array designs into the DB is on hold, which means getting Affy data into the DB is on hold.

* Nasty, awful Apache::Session issue, that screwed a whole day for Harry and I. Nothing wrong with Apache::Session, it is just that they are locking rows in the Session table and I didn't know it.

next week:
* Affy loader
* more Mason apps for downloading tab-delimited data


UCI Update - Harry Mangalam

Out of town
November 14th, 2002

Caltech Update - Diane Trout

Current accomplishments:
* connected novosoft reader to pymerase
** Constructs database using UML model
** Constructs object model connected to database

next week:
* finish debugging object model to database
* work on debs

problems:
* just slogging through the code

as an aside here's a guess at what needs to be done to "finish" GeneX.
Brandon came up with the tasks, I estimated how long I thought the
tasks would take.

-----

GeneX 2.x Task and Dependancy Outline

Notes about time, all time comments are in [ ]
[< x] implies will take less than x time units
[x-y] implies will take between x and y time units
[???] means I don't want to guess
[x, y%] implies that I am y% confident that the task will be done in x
time units
[x, y% - m, n%] implies that think there's a y% chance that it will be
done in x units, and n% in m units. e.g.
[now, 0% - infinity, 99%]
[ x-y * T ] implies between x * T and y * T length, where
task is the length of time needed to accomplish task T
[ +x-y T ] implies x - y time units in addition to completing task T.

Note all times are my (diane's) estimates for how long it will take to
accomplish task, not how long from now the task will be done.

abbrieviations
wk = week
d = day
h = hour
m = month

Notes about people performing tasks
( x ) means that person x will do task
( x | y ) means that person x or y could do the task
( x & y ) means that person x and y are needed to the task

GeneX 2.x Project
-----------------
*ExperimentSet (Prototype Done)
*ArrayDesign (Prototype Done)
**Data-loader (Jason) [< 1 week, 75%]
**Affy File Format --> MAGE-ML (Jason) [< 1 week, 60%]
**GenePix File Format --> MAGE-ML (Brandon) [< 1 week, 90%]
**Agilent File Format --> MAGE-ML (?) [ 0.8-1.2 * GenePix task time]
**QuantArray File Format --> MAGE-ML (done)
***(Requires working data loader and >= 1 working File Format)
***Data export [0-2 weeks, 80%]
***Gene Spring Connection (Prototype) [2 d, 60% - 1 wk, 90%]
***Basic Analysis (Prototype)
****MLX connection ( Diane | Chris ) [ 3 d, 70% - 1 wk, 95% ]
****CyberT, Rcluster ( Harry ) [< 1 wk 60% ]
****OpenDX integration ( Harry ) [ ??? ]
***User scratch tables for expression values (Jason) [ 2d 50% -1wk, 80%]
****Basic Analysis (Beta/Final) ( Harry | Jason | Diane | Joe)
[ 2 wk 50% - 4 wk 90%]
*****Quality Control
*****Normalization
*****Filtering
*****Outlier detection
****Gene Spring Connection (Beta/Final) ( Joe | Diane | Harry )
[ +1d-2d to prototype ]
****Moving data from scratch table into final expression table (Jason)
[ 2d, 60% - 1 wk 90% ]
*Annotation Editors ( unknown ) [ ???? ]
**research DAS, Cartwheel, current art ( unknown ) [ 3d, 70% - 5d, 95% ]
**implement annotation functionality ( unknown ) [ ???? ]

(Requires: Possible now)
*Define Security Model (Done)
**Security Model Implementation (Jason) [ 2d, 75% - 1wk 95%]
***All applications will have to be updated [ ??? ]
---to support the finished model, doesn't prevent
---work from being done on applications.

(Requires: Possible now)
*Updates to Pymerase for GeneX support
**Python API for Genex (Prototype) (Diane & Brandon) [ 2d, 70% - 3d, 90%]
***MLX connection through API (Prototype)
***Python based GUIs & Apps for GeneX (Prototype) [ 2wk,40% - 4 wk,80% ]

(Requires: Finished Security Model Implementation)
*Security model update to Pymerase
**Python API for GeneX (Beta/Final)
***MLX connection through API (Beta/Final)
***Python based GUIs & Apps for GeneX (Beta/Final)


UCI Update - Harry Mangalam

Most of what I've done over the past week has been related to installs (of Debian Linux, multiple times) and of the GeneX2 database on Debian systems.

With a fresh Debian system, GeneX2 installs pretty much as described (now) in the INSTALL doc. I collected the .deb requirements (and some more Perl peculiarities) for later installs. Next week, I'll find out just how well I recorded them with a new series of installs at GMU and at the NRL (talk about installing under the gun..).

Also, I've just about finished figuring out the GeneSpring <-> GeneX 2 intereraction with both JDBC <-> ODBC <-> Postgresql (on Windows clients) and JDBC <-> GeneX (on Linux clients). The former I posted, the latter is being written up.

What remains is the actual mapping of GeneSpring data columns to GeneX data columns, for which I may need a bit more time to learn the how each is laid out, or (if the financing is OK) Mark Wilkinson may finish it off.

However, GeneSpring IS talking to GeneX; from sniffing the tcp packets go back and forth, I can see that the only reason data isn't showing up in GeneSpring is because there is a data mismatch (as well as a lack of data).

I'm off for much of next week to visit Virginia so I'll probably not make the Thursday deadline, but I'll post an update when I get back.


Caltech Update - Brandon King

Summery:
*Created GenePix-GPR-1.0 FeatureExtractionSoftware xml file
*Found bugs in array-design.html and data-loader.html
*Worked on task/depandancy outline with Diane
*Had strategy meeting with Barbara Wold, Jose Luis Riechmann, Eric Mjolsness, and Jennifer Weller
*Created tasklist, with estimated development times and dependancies (see Diane's status update)
**Talked about Caltech Deadlines
***Two-week deadline (now one-week):
****Expression data imported and exported from GeneX 2.x
***Christmas deadline:
****Allow Biologists to easily get data into the database
*****Basic Analysis (Normalization, Quality Control, etc.)
***GeneSpring Connection
***Annotation of experiments
***I think there is more, but I am forgetting what it is... I'll get back to list on this.
***Talked about having a more standardized format for these summeries.
****Something like this where it can be easily skimmed for important updates... something like:
*****Summary Section
*****Next Week
*****Problems
*****Future (optional)
*****other notes / sections (optional)

Next Week:
*Update GeneX 2.x installation with Jason's fixes
*Test GenePix upload
*Create Agilent FeatureExtractionSoftware and/or supply data for testing

Problems:
*ArrayDesign and Data-loader where broken, Jason says they are fixed now.

Future:
*Discusion of GUI's
*GUI Widget Auto-generation with Pymerase

New or Changing Priorities:
*2 New Deadlines
**Two week (see Summery)
**Christmas (see Summer)

November 7th, 2002

Caltech Update - Diane Trout

Current accomplishments:
* Built a number of packages to help install genex
* Helped brandon with some of the bugs he's run into with installing
genex.
* A whole lot of meetings

Next week:
* Work on conneting novosoft xmi reader to pymerase (high priority)
(for both an internal project as well as Python-Mage)
* Fix issues that brandon found in pymerase (high)
* Work on connecting pymerase to GeneX's new security model (mid)
* Build debs for Bio::Mage & GeneX (low)

Problems:
* Haven't had time to work on debs or preping a download site to be
hosted on sourceforge.


Caltech Update - Brandon King

Summery:
* Installed GeneX 2.x with help from Diane and Jason
* Started uploading QuantArray test data to GeneX 2.x
* No form to upload ExperimentSets (looks like now fixed), used SQL as shorterm solution
* Successfully generated and uploaded ArrayDesigns
* Tested uploading of existing FeatureExtractionSoftware, detected entry and didn't upload. =o)
* Tried to upload QuantArray expression data, ran into bugs, submited bug reports
* Worked on testing and updating pymerase to help prerpare for GeneX 2.x and MAGE support.

Next Week:
* Finish Uploading QuantArray expression data
* Create GenePix FeatureExtractionSoftware XML file
* Create ArrayDesign from GenePix data
* Upload GenePix expression data
* Help Diane update Pymerase

Problems:
* Usual bugs found in development projects.

Future:
* Look into ways of helping out with GeneX project more effeciently

New or Changing Priorities:
* No changes, yet.

Random Ideas:
* Maybe using a format similar to this for weekly updates, where Summery, Next Week, and Problems are most important to include.
* Go to lunch =o)


UCI Update - Harry Mangalam

...And thanks to Brandon ans Diane for the biff to the butt for getting this started.

Over the last 2 months, I've been mostly working on analysis add-ons and tweaks to GeneX and related gene expression paths, with the emphasis on OpenDX:

Open Data Explorer (DX) is an advanced visualization system developed by IBM over the last 15 years. It was initially a commercial product aimed at high end visualization markets, using advance single and multi-CPU Unix workstations. In 1999, as part of their Deep Computing Initiative, IBM released the source code to DX as Open Source. Over the years since, it has been ported to Linux and Windows, and continues to improve. It is particularly notable as complex visualizations can be programmed visually by means of 'drag and dropping' icons representing data transformations onto a canvas and 'wiring' them together by means of connecting input and output tabs together with mouse clicks. One problem with DX is that while it is an exceptional visualization environment, it was developed prior to the current emphasis on accessing data from relational databases, relying on the large data file formats that were (and still are) standard in these fields - primarily HDF and netCDF. This is a problem particularly for the gene expression field, where data is progressively stored in relational databases to support complex queries. [see http://www.opendx.org]

Mostly this has been in pursuit of a Perl-OpenDX link so that a module can be added to a visual DX net allowing arbitrary Perl code to be run on the input. There are various problems - DX prefers to operate on particular types of data fields, not the simple string input that Perl favors, but there are many possibilitiess, high among them using Perl's DBI to suck data from RDBs and format it for DX to do automated analysis and visualization on.

Status: I've gotten some DX-Perl modules built, but am still a ways away from a solid generic module.

A close second priority is the integration of the R statistical language in the same way - compile it as a shared lib and have it communicate to DX via sockets in the same way that I have Perl doing. R is actually a better fit as it has an idea about data objects and already has support for many of the interconversions of data types that DX supports.

A logical 3rd candidate would be Python, as it theoretically integrates more easily with C apps than Perl and there is already a Python project for interacting with DX (tho it seems to be for controlling DX from Python than rolling Python into DX.)

The other project is the updating and converting to commandline scripts those cgi analysis scripts that were included in the GeneX 1.0 release. These include the CyberT significance testing, the Eisen/Sherlock clustering code, (now incorporating Swaine Lin Chen's slcview heatmap generation), and Karen's Rcluster codes.

These updates should be making their way back into the CVS tree very soon. Currently they are designed to work with a tree of QuantArray-formatted replicates, soing significance testing on all of the subgroups and then a summary significance test on all the subgroups (this last summary test is specific to the experiment, so it may not be appro for all expt'al designs).

I'm also just about finished a larger descriptive page describing the software in more detail and will post the location as soon as it's ready, probably next week.

My last noise is a suggestion of starting to put together ideas for making an external GUI app that does nothing but query the GeneX DB and makes the output available as: - tab-delimited spreadsheet-like files - MAGE-ML - N-dimensional file such as netCDF/HDF/XDF - maybe a few others

My suggestion would be to use Python and Qt so that it runs cross-platform, using either the Qt designer (or BlackAdder if it EVER sees the light of day). The Python bindings seem to be more stable and coherent than for Perl.


UVa Update - Tom Laudeman

Following Harry's good example, here's what we're up to:

We're showing GeneX to end users this morning. We gave it to the Microarray Center people last week, but they've had a lull in orders.

Jodi/Teela has finished the Analysis Tree schema. I think it has 8 tables. This schema allows for the linking analyses together, and for the analysis module plugins.

Teela has been working on the analysis tree backend.

I've got the trees drawing correctly, nodes delete, add, and rename. Trees render correctly no matter how complex. I created this before the db was ready, so I've been integrating with the Analysis Tree schema.

The tree drawing is way cool. We can set up accounts on reed6.med for people who want to see it.


Open Informatics Update - Jason Stewart

Here's the status for me and my sub-contractors, Mark Wilkinson, and Hyojoo Kang.

Mark:
Added sysadmin tools for doing users/group mainenance and created Mason front ends for them.

Hyojoo:
Created a Java GUI for creating QT Dimensions that uses the MAGE-Java API and thus can export MAGE-ML. The MAGE-ML QT Dimension is needed to configure the data loader.

Jason:

Planning
========
I've been getting ready to switch to Postgres7.3

Service
=======
Helped Brandon get Genex-2.0a1 running at Caltech

Genex-2
=====

New Tools
--
* create user tools for adding Group's and ExperimentSet's and created a mason front end for them.

* Install - now uses a MANIFEST file. This file documents all local files to be installed and where they will be installed. This is a substitutable file (created from MANIFEST.in), so it is possible to use the Config.pm values.

Schema changes
--
* ExperimentSet - removed creation_date column. This information can be found by using the Audit trail.

Modifications to Perl API
--
* XMLUtils.pm - changed the internals of xml2sql() to use the new SQL writing functions in Connect.pm. Added creation of rules to all security views for INSERT/UPDATE/DELETE. Removed permission granting code to make it more generic. Added support for a single master sequence to be used by all tables.

* create_genex_db.pl - modified to handle granting of permissions.

Goals for next week
===================
Finish rules code - need to fix problem with GroupSec table having ro/rw_groupname fkeys to itself.

Ensure that ArrayDesign importer can handle Affy U133 files.

Add ability to data loader to import .CEL data and MAS5 data.

[an error occurred while processing this directive]