The SPORTS PAGE ARCHIVE

Serving the Tri-Cities from 1976 to 1984

SECTIONS

SPI Menu

Page List

Contact Sheet List

Search

Documentation


740: Database Records

720: AUTO LOAD Pages

760: Parse DIR INFO.

770: Audit issues

710: Test LINE read

Directory test

FTP test




500-ERROR analysis and tests:

"Auto Load" basics to test 500 error

Force 500 (NO flush)

Error, but NO 500. (with cfFLUSH)




SPI-t010
rev 5/18/2022 15:28


SPI-0800.cfm rev 05/16/2022 13:00       SPORTS PAGE DOCUMENTATION       {ts '2022-10-06 03:27:08'}

 
RUN THE SYSTEM

  To Run the system: http://www.mmcctech.com/sportspage
 
SCRIPTS

Scripts are found on the server in:
  /d/Websites/bcra-mlscom/mmcctech/SportsPage
 
mySQL DATABASE TABLES
Tables are stored in the MyBayCity.com database.

Sample query:
<cfquery name="SPIq_Stories" datasource="MyBayCity" >
      SELECT * FROM mbc_tsportspage_idx
      WHERE SPI_Type = 'P'
      ORDER-BY SPI_IssueDate, SPI_Issue_Page
</cfquery>

 
DATA FILES
Lowest level: TXT file containing HTML rendering of a scanned newspaper page.
            yyyy-mm/12-06-1976_THE BAY COUNTY SPORTS PAGE01.txt
OJ had all of the newspaper pages scanned as an image files, one file per page.

Those image files were run through an optical character recognition engine to make an HTML file.
      (The conversion of the scanned page-image files produced passable HTML code.
      We think that pictures were identified and saved separately as image files.
      The full page-image files are available, but not used in the web system. It's just too much volume!


Steve renamed the HTML file to TXT.

Those TXT files are on a thumb drive in folders by year and month. (Some folders have multiple months.)

The folders were uploaded to the web server. They were run through a conversion script and written to the database, one record per page.

The file name format is significant:
      It must start with the issue date in the format mm-dd-yyyy.
      It must include the page number in the format PAGEnn.
      For example:     12-06-1976_THE BAY COUNTY SPORTS PAGE01.txt

These files are REMOVED from the web server once the conversion is complete.


Contact Sheets: One file per 16 or 24 page issue.
Every issue has a single thumbnail of all 16 or 24 pages. That thumbnail will be uploaded to the server.

The link attached to an individual page will point to the thumbnail/contact sheet of the issue the page appears in.

When a user does a SEARCH, the script will return a table of ALL pages on which the search terms appear. These results are used to show a single copy of EACH thumbnail/contact sheet at the bottom of the page.

Each thumbnail/contact sheet's "link" will point to the web location where the page is for sale. Typically an Amazon Kindle book.

You can see the contact sheet from several places.
Use VIEW from the "Page List" script (SPI-0100.cfm).

Old style file name:   1976_07_19_Contact_Sheet.jpg
New style file name: CS_1976_08_02.jpg

Contact sheets are LEFT on the web server after the conversion is complete.
They are shows to the user in many places.
We think they will represent what is sold to customers.



Raw data FOLDER and FILE EXAMPLES: (This will all be deleted once conversion is done)
 
Two folders that could contain raw data files:

  /d/Websites/bcra-mlscom/mmcctech/SportsPage/1976-08-09       1976 Month 8 and 9 (Aug - Sept)

  /d/Websites/bcra-mlscom/mmcctech/SportsPage/1976-10-11       1976 Month 10 and 11 (Oct - Nov)


A folders containing some raw data files:

  /d/Websites/bcra-mlscom/mmcctech/SportsPage/1976-12           1976 Month 12 (Dec)
      08-02-1976_THE BAY COUNTY SPORTS PAGE-BkMrk.txt
      12-06-1976_THE BAY COUNTY SPORTS PAGE01.txt
      12-06-1976_THE BAY COUNTY SPORTS PAGE02.txt
          THROUGH
      12-06-1976_THE BAY COUNTY SPORTS PAGE24.txt
 

 
DATABASE
The primary data used by the Sports Page system is a single table in the MyBayCity.com database.

This single table contains TWO types of records:
      Type P     One record for every PAGE.
      Type T     One record for every THUMBNAIL / CONTACT PAGE on the server.

Each record contains
      Unique ID
      Issue and the page number,
      Type (P or T)
      Text name of the issue
      LINK
          Type P link points to the Thumbnail / contact page image file, and the matching T record.
          Type T link points to the URL where the issue is sold.
      Data which is searched.




DataSource: MyBayCity

Table: mbc_tsportspage_idx
Lowest level of data - One entry for each page from every issue.
Fields
      SPI_ProgramID Auto increment unique identifier to a page.
      SPI_Type ONE byte text field indicating the record type:
    P   Single page with link to issue contact sheet.
    T   A single, 16-page issue description with link to where sold.
      SPI_Issue 50 byte text field that is the name of the page
      SPI_Link Link to the thumbnail image for the entire issue
Each page from the same issue points to the same ISSUE THUMBNAIL page.
Having the link associated with the page, any page COULD point somewhere else.
      SPI_Data "Text" block containing a summary of the page
This is generated from the page scan,
which is then read by an OCR-like processor.
WHAT's FOR SALE
It's not clear exactly what will be sold.
We THINK that each issue (normally 16 pages) will be published as a Kindle book.

A single image (JPG) "contact sheet" containing thumbnail sized images of each of the 16 pages in that issue will be stored on the server. A description of that image will be found in the database as record type "T".

That contact sheet image entry will include a link pointing to wherever that issue is sold.
(For testing, they all point to Kent's "James Milton" book.)

SO... let's says that the name "Steve" appears on four individual pages. Two of those pages are in the 7/5/76 issue and two are in the 7/19/76 issue.

Following the search the system will show a table of the four references and the page and issue the reference appears in.

Following that table of references the system will show the contact sheet image for each of the two issues.

Each contact page image will be a link, which will to to the internet address where the "book" of that issue can be purchased.

The customer will click the link of the issue/book they want and make the purchase.
METHODS
 
    How can I read a simple text file, processing each line of the file?
ColdFusion makes it easy to read a file using the <cfloop> tag.
By using the file attribute, you can tell <cfloop> to iterate over each line of a file.
This sample reads in a text file and displays each line:


<cfset myfile = expandPath("./dump.txt")>

      <cfloop index="line" file="#myfile#">
          <cfoutput>
              The current line is #line#
          </cfoutput>
      </cfloop>

This question was written by Hal Helms
It was last updated on July 1, 2008.
Found at:
      https://www.coldfusioncookbook.com/entries/How-can-I-read-a-simple-text-file-processing-each-line-of-the-file.html
 








 
 




END OF RIGHT COLUMN, ROW, Table (in t900 rev 05/08/2022 13:46 )
 



application.cfm for SportsPage. rev: 2022/04/27 11:15



SPI-t010.cfm rev 5/18/2022 15:28
now {ts '2022-10-06 03:27:08'}       SPI_TickBegin 1,665,041,228,787
SPI_template: SPI-0800.cfm--- end of t010 startup ---




Sports Page SPI-0800.cfm DOCUMENTATION screen.

 


SPIc-t900     rev 4/25/2020 10:38
  now: {ts '2022-10-06 03:27:08'}
  SPI_template [SPI-0800.cfm]

Tick counts from t010
SPI_TickNow 1,665,041,228,787
SPI_TickBegin 1,665,041,228,787
SPI_Elapsed 0