Why do you use XML?

J

Jon Skeet [C# MVP]

I suppose. I did recently misinterpret an ASCII file for Unicode and
had all sorts of trouble. Thank goodness StreamReader is smarter than
I am.

Right. And then there's the issue when someone sends you a file and
you have to *guess* what encoding it's in. That's only a problem in
XML if they've actually encoded it improperly (i.e. the declaration
doesn't match reality).
Indeed you are right. The standard method for handling embedding
spaces (or even commas in CSVs) is to use quotes or some other
character. Again, Regex handles this very well.

Except that in my experience, there are subtly different rules for
different flavours of CSV. Then you've got the issue of escaping
newlines etc.
It is just the same as not knowing which XML format you are using.

No, it's not. XML has a standard. Where's the official definition for
CSV?
XML can represent the same data in many/unlimited? different ways. If code
is written correctly, the amount of code aware of the data source is
minimal.

I believe you're confusing synactic format with semantic format. Yes,
you need to understand the semantics - but the syntax is well defined.
You can validate an XML document on its own (or with its DTD) with no
effort. That's not true of CSV - you have to know whether or not to
ignore the first line, how various things are being escaped etc.
Unless you are working with an extremely deleveloper-
unfriendly format, I say it is just as easy to pull data out no matter
what. The only benefit to XML is that someone already did most of the
parsing work for you, via XPath for example. However, writing a custom
parser can be as simple as a regex expression, so why force your input
into XML when SSV or CSV is readily available?

As I keep saying, it depends on *exactly* which flavour of CSV you're
using. Or perhaps it's actually fixed record. Or some horrible mixture
of the two. Maybe there's a header line or maybe there isn't. You have
to change the code to cope.

An XML parser doesn't try to understand the data, but it *will*
validate the syntax and let you get at the data at a higher level, for
free.
They don't "have" to be fixed-length records. There just needs to be
an agreement that there is one record per line. The additional benefit
is that you aren't forced to send a 20MB file if all your user wants
are the child records, which is a common scenario.

So if you know that the record you want is on line 300,354 how do you
get to that quickly, if the lines aren't all the same number of bytes?
You have to read every line in until you get there.

Now, XML doesn't try to solve that - but databases do. If you want a
database, use a database IMO.
I'm not arguing with you. I am just pointing out that it is a matter
of the environment, and so far what you are said doesn't intice me to
rely on XML. I still don't see how XML can benefit me *here*.

If you've got a format which already exists and you've already got
code for, there's probably not a lot of point in changing. But you do
seem overly resistant to XML just because it doesn't help in one
particular already-solved situation.
In my environment, we work primarily with CSV and SSV. Relationships
are typically formed on the database or are fixed. Again, XML repeats
the data definition every time. I would call this the biggest waste
generator when using XML. It's just not that practical in my
environment.

If everything you've got already uses CSV, then moving to a different
format is likely to be a pain. That's not the fault of XML at all
though.

Repeating the data definition is a mixed blessing - added space
(although as others have said, it compresses really well) but easier
validation.
You make it sound like your parser "knows" what to do with what it
extracts.

No, I don't. I make it sound like the parser knows how to do the
*parsing* part, so you don't need to worry about the escaping level.
That's all it *should* do. That's the syntactic part, not the semantic
part.
What does it extract, where does it go? How does it
magically make your code know what to do? At some point you the
developer must know what the data *means*. You need to know where to
find the data. You need to know where to put it. That requires knowing
the syntax *and* the symantics. Unless you have found a way to
overcome this need for developer intervention? Did ya? 'Cause that
would be something I would like to have in my possession!

The developer doesn't need to know enough to write an XML parser -
they just need to know how to deal with the higher level
representation (the DOM or whatever API you're using to read it in).
They don't need to know how to unescape character entities etc - that
will be done for you by the XML parser. There's a clear distinction
between understanding the syntactic format and understanding what the
data's meant to mean.
But if you think about it, all of these applications of XML are due to
someone deciding to jump on the XML train. They didn't have to use
XML. What about the "in-between". Anyone can send a format over a
network. Does XML provide a benefit that couldn't have been achieved
otherwise?

It means I don't need to write something to do all the escaping, and
the other party doesn't need to write a parser - because they're
already there. The standard is agreed by everyone.
or do these applications use XML because it makes their
code seem more up-to-date? or do they use XML because of the pre-built
tools available?

Well, if you include parsers built into things like Java and .NET
framework libraries as "tools" then yes - and that's a very good
reason!
I would venture that someone could create a CSV
standard document format and the tools to work with it that could
rival any XML-based platform.

Only for two dimensional data though - less flexible. You'd then also
need something equivalent to xsd/dtd for extra validation - to
describe that a particular column could only contain numbers, for
example.

By the time you'd added all of that kind of thing - and the ability to
embed one document within another, etc, I think you'd find all the
attractive simplicity of CSV had disappeared.
Create the tools and you will have
people who use your format. Especially if you can convince them that
it is "the way" to do data tranfer.

Right, you go ahead. You tell everyone to drop XML and adopt your tool
chain instead. You make it an international standard, and make it
available for just about every development platform in existence. Let
us know how you get on :)
When was the last time you had to create your own XML format and use
it in your own code? Are we just working with tools that make our
lives easier, and it is just coincidental that they used XML? I can't
say. I feel as though there is some reason for all the hype. I just
wish I knew how to utilize it.

Yes, we're working with tools to make life easier. XML happens to be a
data format which meets all my needs most of the time (which CSV
wouldn't, by the way - simply because it only deals with two
dimensional data, when I often want hierarchies in a single document)
and which is well supported by tools. That makes it very useful, IMO!
Not perfect for every occasion, of course, but still not something to
be dismissed as just hype.

Jon
 
C

christery

Stop this, I got an Idéa... data is binary... and its complete..
0100101000100100010100101011000101010100101010101001010001010100101001001010111010010
Invented a long time ago... (dont know what I wrote in the binary
string) but guess what?

Morse code... wait... thats just what we are doing... trying to get
information insted of data across... to make the data/information as
clear as possible...

data->information if the sender and the receiver has the same
understanding of what it means.

but in XML its the data tagged that does not appear, still the rest
of data will be read. if u code it that way, or extra data will be
ignored - solve that in csv if U can, put in a value as number 2 in a
row of 20 and let it parse it correctly...

//CY
 
S

Scott Roberts

Indeed you are right. The standard method for handling embedding
Except that in my experience, there are subtly different rules for
different flavours of CSV. Then you've got the issue of escaping
newlines etc.

Indeed. Sometimes you wrap all strings in quotes, sometimes you only wrap
strings in quotes if they contain a comma. Our CSV output routines have a
bajillion configuration options to accommodate all of the different systems
we output to in this "standard" CSV format.


That's a pretty huge benefit.

I'm currently writing import routines for our application. We get mostly
fixed-width files. Parsing these files in code is not technically
challenging, but when there are 300+ fields it can be tedious. Then, of
course, there are the times when something has gone wrong and I have to
inspect the raw data. This requires that I have the file format on hand so
that I can find the offset to the field I need. Oh, and does the format give
the offset from 0 or 1? Well, that depends on the format.
As I keep saying, it depends on *exactly* which flavour of CSV you're
using. Or perhaps it's actually fixed record. Or some horrible mixture
of the two. Maybe there's a header line or maybe there isn't. You have
to change the code to cope.

Or add many, many config settings.

The title of your thread is "Why do you use XML?". I don't think that anyone
is trying to convince you to use XML. Quite the opposite, almost everyone
has said that if the shoe doesn't fit don't wear it.

We get some pretty big fixed-width files (tens of millions of records).
There's no doubt that if they were XML they would be much larger. But even
so, bandwidth and disk space are cheap. I wouldn't mind at all if they were
XML.

There are no doubt people in this world that say "Just use XML!" as if that
solves all of the world's problems and ends the discussion. However, I've
not seen such a reply in this thread. I initially had the same reaction you
are having - XML is just a pretty face on an old technology (flat files).
And in some respects it is. But it really does offer some useful features
under certain circumstances.

Ask yourself this: flat-files have been around since the dawn of computers,
so why aren't there a plethora of tools focused on reading/writing CSV and
fixed-length files? Oh, there are tools out there, to be sure. But why don't
they dominate the market like XML tools? They had a 50 year head-start and
are still being ousted by XML in many areas.

IMO, human readability and pre-built tools.

EDI tried to do that very thing. I guess it was moderately successful among
larger companies, but after taking one look at it I ran away crying.
Only for two dimensional data though - less flexible. You'd then also
need something equivalent to xsd/dtd for extra validation - to
describe that a particular column could only contain numbers, for
example.

By the time you'd added all of that kind of thing - and the ability to
embed one document within another, etc, I think you'd find all the
attractive simplicity of CSV had disappeared.

And how! I have 6 volumes of EDI specs for submitting simple orders. No
thanks!

Again, CSV was "the way" to do data transfer for decades. The fact that it
is being replaced in some circumstances is really all the proof you need
that it was inadequate. Either the format itself was never standardized or
the few tools that were actually created to help read/write them were not
sufficient - or both. Regadless, the proof is in the pudding.
 
J

jehugaleahsa

Right. And then there's the issue when someone sends you a file and
you have to *guess* what encoding it's in. That's only a problem in
XML if they've actually encoded it improperly (i.e. the declaration
doesn't match reality).



Except that in my experience, there are subtly different rules for
different flavours of CSV. Then you've got the issue of escaping
newlines etc.



No, it's not. XML has a standard. Where's the official definition for
CSV?


I believe you're confusing synactic format with semantic format. Yes,
you need to understand the semantics - but the syntax is well defined.
You can validate an XML document on its own (or with its DTD) with no
effort. That's not true of CSV - you have to know whether or not to
ignore the first line, how various things are being escaped etc.


As I keep saying, it depends on *exactly* which flavour of CSV you're
using. Or perhaps it's actually fixed record. Or some horrible mixture
of the two. Maybe there's a header line or maybe there isn't. You have
to change the code to cope.

An XML parser doesn't try to understand the data, but it *will*
validate the syntax and let you get at the data at a higher level, for
free.



So if you know that the record you want is on line 300,354 how do you
get to that quickly, if the lines aren't all the same number of bytes?
You have to read every line in until you get there.

Now, XML doesn't try to solve that - but databases do. If you want a
database, use a database IMO.


If you've got a format which already exists and you've already got
code for, there's probably not a lot of point in changing. But you do
seem overly resistant to XML just because it doesn't help in one
particular already-solved situation.



If everything you've got already uses CSV, then moving to a different
format is likely to be a pain. That's not the fault of XML at all
though.

Repeating the data definition is a mixed blessing - added space
(although as others have said, it compresses really well) but easier
validation.



No, I don't. I make it sound like the parser knows how to do the
*parsing* part, so you don't need to worry about the escaping level.
That's all it *should* do. That's the syntactic part, not the semantic
part.


The developer doesn't need to know enough to write an XML parser -
they just need to know how to deal with the higher level
representation (the DOM or whatever API you're using to read it in).
They don't need to know how to unescape character entities etc - that
will be done for you by the XML parser. There's a clear distinction
between understanding the syntactic format and understanding what the
data's meant to mean.



It means I don't need to write something to do all the escaping, and
the other party doesn't need to write a parser - because they're
already there. The standard is agreed by everyone.


Well, if you include parsers built into things like Java and .NET
framework libraries as "tools" then yes - and that's a very good
reason!


Only for two dimensional data though - less flexible. You'd then also
need something equivalent to xsd/dtd for extra validation - to
describe that a particular column could only contain numbers, for
example.

By the time you'd added all of that kind of thing - and the ability to
embed one document within another, etc, I think you'd find all the
attractive simplicity of CSV had disappeared.


Right, you go ahead. You tell everyone to drop XML and adopt your tool
chain instead. You make it an international standard, and make it
available for just about every development platform in existence. Let
us know how you get on :)


Yes, we're working with tools to make life easier. XML happens to be a
data format which meets all my needs most of the time (which CSV
wouldn't, by the way - simply because it only deals with two
dimensional data, when I often want hierarchies in a single document)
and which is well supported by tools. That makes it very useful, IMO!
Not perfect for every occasion, of course, but still not something to
be dismissed as just hype.

Jon

Ah, I see. I think I understand you now. I was wrong to think a CSV
could allow a developer to jump to a record. I have fixed record files
on the brain. It would be doggedly slow to find the 1 millionth record
in a CSV.

And I understand what you mean about predefined syntax now. You are
perfectly correct. Sorry I didn't see it earlier.

And I suppose if you have the tools to work with a predefined syntax,
you are more likely to use that tool. I mean, it is no different than
any other language then. Make a compiler and people will use it.

You also made a good point that I was trying to apply XML to problem
with an already well-defined solution. Additionally, XML does provide
a means to check the validity of a file in a standard way. I don't
think CSV or SSV or Fixed-Record can handle that. I just wish I could
use XML in my solution.

I still wish I knew when to use XML and when to avoid it. Some people
here have given some tips. After reading about design patterns, did
you find yourself trying to use patterns for everything? did you
usually find that you were using it incorrectly and it had the
opposite affect? That is the way I feel about XML. I feel like someone
found something really cool that they know how and when to use. So I
start trying to use it in as many situations as possible. Of course, I
have no experience with the tool so I just use it inappropriately.
Perhaps in 10 years this post will seem a waste of time.
 
J

jehugaleahsa

Indeed. Sometimes you wrap all strings in quotes, sometimes you only wrap
strings in quotes if they contain a comma. Our CSV output routines have a
bajillion configuration options to accommodate all of the different systems
we output to in this "standard" CSV format.


That's a pretty huge benefit.


I'm currently writing import routines for our application. We get mostly
fixed-width files. Parsing these files in code is not technically
challenging, but when there are 300+ fields it can be tedious. Then, of
course, there are the times when something has gone wrong and I have to
inspect the raw data. This requires that I have the file format on hand so
that I can find the offset to the field I need. Oh, and does the format give
the offset from 0 or 1? Well, that depends on the format.


Or add many, many config settings.


The title of your thread is "Why do you use XML?". I don't think that anyone
is trying to convince you to use XML. Quite the opposite, almost everyone
has said that if the shoe doesn't fit don't wear it.


We get some pretty big fixed-width files (tens of millions of records).
There's no doubt that if they were XML they would be much larger. But even
so, bandwidth and disk space are cheap. I wouldn't mind at all if they were
XML.


There are no doubt people in this world that say "Just use XML!" as if that
solves all of the world's problems and ends the discussion. However, I've
not seen such a reply in this thread. I initially had the same reaction you
are having - XML is just a pretty face on an old technology (flat files).
And in some respects it is. But it really does offer some useful features
under certain circumstances.


Ask yourself this: flat-files have been around since the dawn of computers,
so why aren't there a plethora of tools focused on reading/writing CSV and
fixed-length files? Oh, there are tools out there, to be sure. But why don't
they dominate the market like XML tools? They had a 50 year head-start and
are still being ousted by XML in many areas.


IMO, human readability and pre-built tools.


EDI tried to do that very thing. I guess it was moderately successful among
larger companies, but after taking one look at it I ran away crying.



And how! I have 6 volumes of EDI specs for submitting simple orders. No
thanks!


Again, CSV was "the way" to do data transfer for decades. The fact that it
is being replaced in some circumstances is really all the proof you need
that it was inadequate. Either the format itself was never standardized or
the few tools that were actually created to help read/write them were not
sufficient - or both. Regadless, the proof is in the pudding.

I honestly agree with everything you and Jon Skeet are saying. I am
just taking the devil's advocate (probably out of bitterness). What
confuses me is that 4 years ago my company adjusted this meter reading
device to accept XML. Now, this year, they have suddenly decided to
move to a CSV format. The problem with grunts like me is that we don't
see the reasons why we are writing code. All we see is the request and
are told to get it resolved ASAP. People like myself know about tools
like XML and want to incorporate them into our designs. However, some
times what seems the "right" wasn't an option because some big-wig
makes a quick decision without involving the grunts. I think a lot of
companies forget or are completely unaware that involving the measley
computer scientist is part of the CSs job! I am here to give me
professional opinion! I feel a lot of companies reduce developers to
code monkeys, and that's not right.

In any case, I feel like my confidence in XML is restored. When the
entire development community defends a tool, it must be useful. No
argument there. I suppose I will spend the next few years succeeding
and failing to use XML correctly until I get it right. But like I said
in a post long ago, this was the first time in 2 years where using XML
seemed like a via solution. Maybe I shouldn't expect to create my own
XML, but use the XML that is written for me. I mean, I already work
with serialization, SOAP and configuration files. Additionally I might
call the folks asking for the CVS and ask them why they changed their
minds suddenly. I suppose letting the grunt be a little more involved
could solve some of these issues up-front.

Thanks to everyone.
 
A

Anthony Jones

Scott Roberts said:
EDI tried to do that very thing. I guess it was moderately successful among
larger companies, but after taking one look at it I ran away crying.


And how! I have 6 volumes of EDI specs for submitting simple orders. No
thanks!

The very mention of EDI just sent shivers down my spine. I remember sitting
on a committee here in the UK trying define EDI Fact Orders and Invoices for
the Ophthalmic lens trade. What a nightmare. Had we had a base XML format
to work with we could simply have invented our own namespace for our
specialist requirements and included elements of that namespace in the
documents as needed.

Thinking about it, unless I've been lazy and missed it elsewhere in this
thread, but one of XML key strengths is that it is an eXtensible Markup
Language with the emphasis on extensible. XML reduces the cost of
discovering post-production (or even before) that you needed to include
extra data. It enables the extension of an existing XML format without
breaking existing code. These are things much harder to achieve with CSV.
 
C

christery

Drop XML! its clear your boss is right!!!

Why not use CTX? CSV or RFC4180 is old school... look at
http://www.creativyst.com/Doc/Std/ctx/ctx.htm

Using what everyoe else uses is not putting your company in the
(b)leading edge. Make up your own standard witch only can be read with
a 200 line regex op... to produce 1/0 and in the end gets binary
encoded to reverse UTF-32.

Just kidding, or am I?

//CY
 
C

Cor Ligthert[MVP]

Hi,

To add too a little bit,

It seems typical a habbit of some C type programlanguage user to worry about
space.
(As I see in other dotnet newsgroups)

Most modern desktop computers have more than 100Gb drives on board.
Most modern desktop computers have at least .5 Gb memory.
Most datanet connections in the western world are faster then 1Gb
Most mobile devices have more memory than what was available on a Commondore
64 (to name one)

Why botter about space. An ini file was in past relative much larger than
any XML config file today on whatever device.

And then all the arguments as written before in this message thread.

Cor
 
A

Arne Vajhøj

I have a simple request to anyone interested. We at work are aware of
the (so-called) importance of XML. We have laboriously attempted to
find practical uses of XML. However, in our environment, working with
XML seems like a complete waste of time. For instance, we pull data
from a database and send it to a database.

Recently I attempted to pull a space separated value file into an XML
format and used XSLT to convert it to a comma separated value file.
Well, XSLT doesn't work naturally for anything but XML. It was a waste
of time and it would have been easier to just send the SSV to a CSV
directly with some minimal coding.

Since we work primarily with databases and nothing in-between, I am
really starting to wonder if knowing anything about XML is really all
that useful to my life right now. That is why I would like for
everyone reading to tell me you opinions and practical experiences
with XML. Do you use XML as regularly as hype seems to say you should?
Do you use it in an environment that rarely goes outside of database
interactivity? Do you think XML is all that important to a developer
who can generally write code to perform the same task, potentially
with less effort?

I am personally take the point that XML is rarely needed tool, outside
of the occasional configuration file. And even then, all these tools
like DTDs, XPath, XPointer, XSLT, etc., seem even less used. I feel
like there is this enormous push for XML but see it as only useful in
environments where data is being passed out and about. Am I completely
missing the benefits of XML or is it just my environment?

I can see how it useful to save data out so it can be used later,
especially when a database is not involved. And XML is easier to read
than some pre-XML formats, but it also requires an enormous amount
more space. Thinking about a SSV file that contains over 4MB of data -
converting that document to XML usually triples (at least) the amount
of memory needed. And usually, here, software expects a non-XML
format. For me, it is like putting an extra level of code that not
only consumes an enormous amount of runtime, memory and development
time, but it also ends up not having any benefit at all in the end. I
suppose the world would be a better place if we all started writing
our applications to accept and output XML. I really need a way to
determine, from a developer's point of view, whether XML is a good
decision or just something that sounds good.

XML are widely used.

But as many other has said, then XML is not a good solution for
everything.

The strengths of XML are:
* standardized
* structured
* tools & libs support

In most cases the extra space are not a problem.

My understanding of your context is that you have app X
that exports SSV files and app Y that imports CSV files
and you need to convert from SSV to CSV.

In that context XML really does not provide anything.

You should go back to XML when one or more of the following
happen:
- app X and Y start to use XML
- you need to move non-rectangular data
- you need to deliver data to an external entity that you
don't know anything about

Arne
 
J

John B

Hello, everyone:
...I really need a way to
determine, from a developer's point of view, whether XML is a good
decision or just something that sounds good.

What do you think?

From my pov, I would love to be working with XML instead of fixed width
line separated files.
I currently exchange 6+ files with an outside company that are all fixed
width.
It is nigh on impossible to remember the format so I am continually
having to look at the spec to work out where fields begin/end when
debugging.
Having a nice xml schema to work with would be a blessing, considering
we already compress these files using gzip as the largest contain 1M+
records.

JB
 
C

Chris Shepherd

I still wish I knew when to use XML and when to avoid it. Some people
here have given some tips. After reading about design patterns, did
you find yourself trying to use patterns for everything? did you
usually find that you were using it incorrectly and it had the
opposite affect? That is the way I feel about XML. I feel like someone
found something really cool that they know how and when to use. So I
start trying to use it in as many situations as possible. Of course, I
have no experience with the tool so I just use it inappropriately.
Perhaps in 10 years this post will seem a waste of time.

Two things:
1. Holy crow, trim if you're top-posting!

2. I wouldn't worry so much about whether you're using it in the right places or
not. For me, the biggest key to understanding when/where XML was a practical
solution was simply by using it in my own pet projects. Once you get familiar
with its capabilities you find it will be incredibly easy to spot where it's a
godsend and where it's overkill (and yes, in some cases it is overkill).

Your point about design patterns is dead on too. I've seen a lot of people read
about the factory pattern and then spend all their time trying to figure out how
they can write everything they're doing with it. That's a bad thing probably,
but it eventually just becomes a part of the learning process. A lot of the time
learning what doesn't work is just as important as learning what does.

Chris.
 
A

Anthony Jones

Cor Ligthert said:
Hi,

To add too a little bit,

It seems typical a habbit of some C type programlanguage user to worry about
space.
(As I see in other dotnet newsgroups)

Most modern desktop computers have more than 100Gb drives on board.
Most modern desktop computers have at least .5 Gb memory.
Most datanet connections in the western world are faster then 1Gb
Most mobile devices have more memory than what was available on a Commondore
64 (to name one)

Why botter about space. An ini file was in past relative much larger than
any XML config file today on whatever device.

As a general comment on concern over size thats fine when looking at the
client, but when looking at a server the size of XML compared to more
compact formats may have an impact on scalability. Personally the
flexibility of XML nearly always out-weighs scalability factors.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top