How to import data

R

Ray K

I have a set of data representing the average price of houses for sale
in the San Francisco area on each day from early 2006 to the present.
(That's 1800+ dates.) The format of the data is as follows:

YYYY*MM*DD*PPPPPP*YYYY*MM*DD*PPPPPP*YYYY*MM*DD*PPPPPP* and so on.

YYYY is the year, MM is the month, DD is the day, and PPPPPPP is the
price. For clarity, the * above is actually just an ordinary space that
follows the year, month, day, and price. The only Enter symbol is at the
end of the data set.

My ultimate goal is to compute and plot a moving average (or trendline)
over a 1-, 2-, or 3-month period. The data is presently saved as a .txt
file.

There seems to be two ways for organizing the data in Excel: In column 1
have the complete date and in column 2 have the price; or in column 1
have the year, in column 2 the month, in column 3 the day, and in column
4 the price.

Questions:
1. Which is the better way to organize the imported data? If with just
two columns, how do I format the date column, since the
Format/Cells/Number/Category Date doesn't have a Type with the year
first. Perhaps a better approach is just to assign the number 1 to the
first date, the number 2 to the next date, the number 3 to the next
date, etc., so I'm working with relative, not absolute, dates.

2. How do I do the importing, especially so each new date starts in
column 1 of a new row? (The problem is that there is just a single space
following the price, same as follows the year, month and day; there is
no special delimiter or pairs of spaces to indicate the start of a new
date and thus a new row.)

Adding trendlines or moving averages seems pretty straight forward,
using the Help screen and typing Averages in the Index search box.

Thanks for you suggestions.

Ray
 
N

norie

Ray

You say there is no delimiter?

What about the space?

Also isn't the data pretty uniform, year, month, price, year, month,
price...

Where is the data coming from?

Is it definitely just one long line of text?

I would forget about the format of the dates - that's not going to
affect anything.

The important thing is the date value, and you can probably use the
month and year for that.

Post some sample data and indicate what result you would want from it.

PS If you do want to format a date with year first try a custom format
like yyyy-mm.
 
R

Ray K

Ron said:
1. If, as you write, all of the data is on one line, you will be
getting close to the 32,767 character limit per cell available in
Excel.

No cell would contain more than the six characters that represent prices
ranging from 100000 to 999999 dollars. Actually, by allowing leading
zeros six characters could represent house prices from $1 to $999,999. I
should have mentioned that none of the P characters was "wasted" on
dollar signs or commas.

If I use column 1 for the complete date, it would have 10 characters:
four for year, two for month, two for day, and two for a separator like
a / or a hyphen. If I spread the date over three columns, then the year
cell would be only four characters and the adjacent month and day cells
would have only two characters.
You would exceed that limit with >1927 dates, so you do have some
leeway.

2. With a fixed field length record, as you show, and less than
32,767 characters, there should not be a significant problem in
copying the data into a cell.

Not what I was trying to do, but as I learned from your message, that
was what I had to do.
Although I am curious as to how San Francisco average house prices can
be represented with six digits.

See above.

3. I would use one column for the date, and the adjacent column for
the price.

Assume your text is in F1 and you want your data in columns A and B

A2: (the dates)
=--SUBSTITUTE(LEFT(MID($F$1,(ROWS($1:1)-1)*18+1,17),10)," ","-")

B2: (the dollars)
=--MID(MID($F$1,(ROWS($1:1)-1)*18+1,17),12,6)

Select A2:B2 and fill down as far as required
Format column A as dates; column B as currency.

Ron, your formulas worked perfectly. Many thanks. Now let's see if I can
plot a moving average or trend line on my own, using guidance in Excel Help.

Ray

P.S. I should have posted sample data in my original message. It's of
academic interest now, but here are the first 10 dates and prices:

2006 04 02 685855 2006 04 03 630165 2006 04 04 679400 2006 04 05 608920
2006 04 06 651383 2006 04 07 632552 2006 04 08 640144 2006 04 09 652228
2006 04 10 603246 2006 04 11 594371
 
R

Ray K

norie said:
Ray

You say there is no delimiter?

What about the space?

Also isn't the data pretty uniform, year, month, price, year, month,
price...

Where is the data coming from?

Is it definitely just one long line of text?

I would forget about the format of the dates - that's not going to
affect anything.

The important thing is the date value, and you can probably use the
month and year for that.

Post some sample data and indicate what result you would want from it.

PS If you do want to format a date with year first try a custom format
like yyyy-mm.

Please refer to my response to Ron.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top