Hi Paul,
I totally agree with you. I had attached my file in the first post. Anyways, Here is the attachment: Attachment 31795
Ok,I will concentrate only on the string as you have said.
Regards,
Naveen.
Printable View
Hi Paul,
I totally agree with you. I had attached my file in the first post. Anyways, Here is the attachment: Attachment 31795
Ok,I will concentrate only on the string as you have said.
Regards,
Naveen.
If what you attached is a data file, you can't just write a program to parse what is in that attachment. No way.
Things like this:
You have to know what isQuote:
\* Metabolism *\
\* Objective function *\
Maximize
-3.442882e-006 ADix -3.442882e-006 ADPix -3.442882e-006 AHCYSix -3.442882e-006 AMPix -3.442882e-006 cmnm5s2UMPix -3.442882e-006 CMPix -3.442882e-006 DR5Pix -3.442882e-006 FMETix -3.442882e-006 FORix -3.442882e-006 GDPix -3.442882e-006 GmMPix
-3.442882e-006 GMPix -3.442882e-006 GNix -3.442882e-006 Hix -3.442882e-006 k2CMPix -3.442882e-006 LIPOYLLYSix -3.442882e-006 m1GMPix -3.442882e-006 m2GMPix -3.442882e-006 m62AMPix -3.442882e-006 m7GMPix -3.442882e-006 NH3ix -3.442882e-006 NMNix
-3.442882e-006 PIix -3.442882e-006 PPIix -3.442882e-006 pSERix -3.442882e-006 PSIURIMPix -3.442882e-006 pTHRix -3.442882e-006 pTYRix -3.442882e-006 s4UMPix -3.442882e-006 SNGLYPix -3.442882e-006 THFix -3.442882e-006 THYix -3.442882e-006 UmMPix
-3.442882e-006 UMPix -3.442882e-006 URAix +0.2777778 Growth
\* Constraints *\
1) A comment line
2) What is a data line.
3) Which data you're reading
4) Depending on the data, how to interpret each component of the data
...
etc..
In other words, you need to take that file, look at it, and write in pencil and paper, the rules that constitutes a valid data file, including how to skip over comments, etc.. That in itself is a large enough assignment, and it has nothing to do with linear programming.
Regards,
Paul McKenzie
Hi Paul,
What I have attached is indded a data file. You can also download the file from here : https://simtk.org/project/xml/downlo...l?group_id=714
In this link , you will find different formats of FBA metabolic model sub-model.I converted all the file formats into text file. Only 1 format made some sense and it was the file that I had attached above. It is from a Stanford systems biology lab. It models all the metabolites in a cell. If you have any other idea to solve this problem, please do share with me.
Regards,
Naveen.
If it is not possible with c++ , then I will try to use python for text processing. But ultimately, I need to parse this file into c++ because the linear programming solver is coded in c++.
There is no "other idea". I stated exactly what you need to do to parse this file.
Again, if you expected a magical function to appear out of nowhere to parse such a file with this information, then you've been sadly mistaken. Unless you find an already written parser that parses this particular file, then the "solution" is exactly what I have stated in previous posts.
Regards,
Paul McKenzie
Again, the problem isn't "text processing", and it doesn't matter what language you use. You still will run into the same issue, and that is you need to know the syntax or file-layout rules of such a file before writiing a single line of code. Do you think python, C++, Java, or any language will know what is a comment? What is a variable? What is a coefficient?
Regards,
Paul McKenzie
Hi Paul,
Ok, if you have a close look at the file , we can easily delete the comments and have a structure like I posted in the first post:
Maximize:
obj: 3e-06 A - 3e-06 B + 2.7e-01 F
constraints:
RXN1: -1 A + 1 B -1 C + 1 D -1 E -1 F = 0
RXN2: -1 A + 1 B -1 C + 1 D -1 E -1 F = 0
RXN3: -1 A + 1 B -1 C + 1 D -1 E -1 F = 0
RXN4: -1 A + 1 B -1 C + 1 D -1 E -1 F = 0
... many constraints like this
Bounds:
A >= 0
B <= 100
C >= 0
.....
...........many bounds like this.
I can do some more refinement like deleting the starting word till colon " : " manually and then you have the format exactly like this:
Maximize:
obj: 3e-06 A - 3e-06 B + 2.7e-01 F
constraints:
-1 A + 1 B -1 C + 1 D -1 E -1 F = 0
-1 A + 1 B -1 C + 1 D -1 E -1 F = 0
-1 A + 1 B -1 C + 1 D -1 E -1 F = 0
-1 A + 1 B -1 C + 1 D -1 E -1 F = 0
... many constraints like this
Bounds:
A >= 0
B <= 100
C >= 0
.....
...........many bounds like this.
I think that in this way, we have a text file , which is similar to the constraints specified in the Linear programming solver. But, As you have mentioned , there is no magic function to parse it directly. So, I need to create my own function to parse this string. The logic to parse this string is: split the string into 3 parts -- LHS, RHS and inbetween -- and then convert them into
constant = K (RHS)
ratio = symbol in between RHS and LHS
QsimplexVariable = (ci,Xi) (LHS)
data structures like this. But, It is not very easy. It will have some more steps. That is why, I am seeking your expert advice.
Regards,
Naveen.
I will also keep trying
Again, you're thinking in terms of what you see, and believing that because it looks obvious to do "by hand", it becomes an easy job to write a program to do it, and all it takes is simple code.
I will try and give you an example: What is the answer to this?
Of course by using order of operations and parens, we can do this easily "by hand" and state the answer is 11. Now write a computer program that takes that string and computes the answer? Get the point now?Code:1 + (3 * 4) - 2
If you stated to me "how to solve this problem", the solution would be to write a program that parses each of those tokens and use a operator / operand stack that maintains precedence which more than likely converts the Infix expression to postfix. Does that sound obvious to you?
That is exactly the situation you're in now. It isn't just writing a parser. You need to write the grammar rules formally as to the layout of a file. Do you know what grammar rules are? Are you familiar with a recursive descent parser?
Regards,
Paul McKenzie
If the data obtained from the given web site when converted into a text file produces a file in the format you require, have you considered that others might have had the same parsing requirement and that a c++ parser for this format has already been written? Have you thought about contacting the site and asking them about this? It may save you a whole heap of work as even writing the simplest one symbol look-ahead recursive descent parser is not trivial - and producing a correct syntax diagram (which has to be done before coding is even thought about) can be a complex task.
As 2kaud suggested, a parser of some sort was probably already written, if the file is one that is used by other programs/computer languages.
For example, no one (unless for some odd reason) writes an XML parser. The reason is because there are so many available.
If a parser is written for this file, more than likely it has created "hooks", so that for each token, component, etc. that the parser encounters in the file, the parser calls your "hook function" to store the data found in any way you see fit. An example would be the comments -- if the parser encounters a comment, it would call your "Comment Hook" function to handle the comment (which could mean simply to ignore it and continue parsing).
Regards,
Paul McKenzie
Hi Paul and 2kaud,
I have learnt about formal Grammar rules in Theory of Computation or Formal Languages. I guess we have a prebuilt lexer and parser to do this at my place. But, it will take time. I will be back to work after few hours . I will decide about next step if I should use the prebuilt Lexer and Parser.
Thanks Paul for explaining to me patiently.
Regards,
Naveen.
Hello Paul,
The file was created using this Matlab Code it seems : https://github.com/CovertLab/WholeCe.../FbaLPWriter.m
Regards,
Naveen.
So a project you need to make is a reader (or parser) for this file. From looking at the site, I don't even know if they considered that programs may want to try and read the data that's written. Maybe you're trying to use the file in ways it was never intended it to be used.
To be honest, maybe trying to turn that file into XML, and then using a C++ XML parser would be the better way to go. You still would need to translate and to coherently determine the XML tags and what data would go into the tags. But that is up to you as to how to set up the tags.
Regards,
Paul McKenzie