Issue
Fixed it
For an educational project I am trying to store the scraped data on a MS SQL Database. First of all I'd like every unique item to be placed in products_tb
. Once the unique product is inserted, SQL must generate an unique ID for said item, being productgroupid
. The products_tb
table will only yield the product information which will never change, such as productid, category, name and description
. In a second table, which I will create after I get this working, I will store the following data: productgroupid, price, timestamp
. The reason for this is that these might change every now and then. With the productgroupid
I can always group all the data at any given time and create graphs and so on.
The problem is that I cannot get my pipelines.py
to work.. I however did manage to insert data into my SQL database using the commented chunk of code:
# self.cursor.execute("INSERT INTO products_tb(productid, category, name, description, price, timestamp) VALUES (%s, %s, %s, %s, %s, %s)",
# (item['productid'], item['category'], item['name'], item['description'], item['price'], item['timestamp']))
It seems to be working with the following code
pipelines.py
import pymssql
class KrcPipeline(object):
def __init__(self):
self.conn = pymssql.connect(host='DESKTOP-P1TF28R', user='sa', password='123', database='kaercher')
self.cursor = self.conn.cursor()
def process_item(self, item, spider):
# self.cursor.execute("INSERT INTO products_tb(productid, category, name, description, price, timestamp) VALUES (%s, %s, %s, %s, %s, %s)",
# (item['productid'], item['category'], item['name'], item['description'], item['price'], item['timestamp']))
sql_statement = f'''
BEGIN
IF NOT EXISTS (SELECT * FROM [kaercher].[dbo].[products_tb]
WHERE productid = {item['productid']})
BEGIN
INSERT INTO [kaercher].[dbo].[products_tb] (productid, category, name, description)
OUTPUT (Inserted.productgroupid)
VALUES ({item['productid']}, '{item['category']}', '{item['name']}', '{item['description']}')
END
ELSE
BEGIN
SELECT productgroupid FROM [kaercher].[dbo].[products_tb]
WHERE productid = {item['productid']}
END
END
'''
self.cursor.execute(sql_statement)
self.conn.commit()
return item
items.py
import scrapy
class KrcItem(scrapy.Item):
productid=scrapy.Field()
name=scrapy.Field()
description=scrapy.Field()
price=scrapy.Field()
producttype=scrapy.Field()
timestamp=scrapy.Field()
category=scrapy.Field()
pass
Solution
EDIT:
Another small error I missed. "IF NOT EXIST" needs to changed to "IF NOT EXISTS".
import pymssql
class KrcPipeline(object):
def __init__(self):
self.conn = pymssql.connect(host='DESKTOP-P1TF28R', user='sa', password='123', database='kaercher')
self.cursor = self.conn.cursor()
def process_item(self, item, spider):
# self.cursor.execute("INSERT INTO products_tb(productid, category, name, description, price, timestamp) VALUES (%s, %s, %s, %s, %s, %s)",
# (item['productid'], item['category'], item['name'], item['description'], item['price'], item['timestamp']))
sql_statement = f'''
BEGIN
IF NOT EXISTS (SELECT * FROM [kaercher].[dbo].[products_tb]
WHERE productid = {item['productid']})
BEGIN
INSERT INTO [kaercher].[dbo].[products_tb] (productid, category, name, description)
OUTPUT (Inserted.productgroupid)
VALUES ({item['productid']}, {item['category']}, {item['name']}, {item['description']})
END
ELSE
BEGIN
SELECT productgroupid FROM [kaercher].[dbo].[products_tb]
WHERE productid = {item['productid']}
END
END
'''
self.cursor.execute(sql_statement)
self.conn.commit()
return item
ORIGINAL:
You're not calling the values in the item dictionary correctly when defining sql_statement. Try this:
import pymssql
class KrcPipeline(object):
def __init__(self):
self.conn = pymssql.connect(host='DESKTOP-P1TF28R', user='sa', password='123', database='kaercher')
self.cursor = self.conn.cursor()
def process_item(self, item, spider):
# self.cursor.execute("INSERT INTO products_tb(productid, category, name, description, price, timestamp) VALUES (%s, %s, %s, %s, %s, %s)",
# (item['productid'], item['category'], item['name'], item['description'], item['price'], item['timestamp']))
sql_statement = f'''
BEGIN
IF NOT EXIST (SELECT * FROM [kaercher].[dbo].[products_tb]
WHERE productid = {item['productid']})
BEGIN
INSERT INTO [kaercher].[dbo].[products_tb] (productid, category, name, description)
OUTPUT (Inserted.productgroupid)
VALUES ({item['productid']}, {item['category']}, {item['name']}, {item['description']})
END
ELSE
BEGIN
SELECT productgroupid FROM [kaercher].[dbo].[products_tb]
WHERE productid = {item['productid']}
END
END
'''
self.cursor.execute(sql_statement)
self.conn.commit()
return item
Answered By - samredai
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.